Showing posts with label nvidia. Show all posts
Showing posts with label nvidia. Show all posts

Wednesday, May 20, 2020

Install cuda 10.2

1. Uninstall existing nvidia driver

nvidia-installer --uninstall

or

The correct way to uninstall just cuda and keep your nvidia drivers would be:
sudo apt purge "libcublas*" "cuda-*" cuda
Possible, because they were installed as requirements, you can also purge,
sudo apt purge "nsight-*" nvidia-modprobe
After that you can, if you want, also remove the nvidia drivers with:
sudo apt purge "*nvidia*"
Of course, if you installed cuda using nvidia's .run file then this won't work; in that case you probably have some uninstall script.

2. Close X server
# To stop:
sudo init 3
# To resume:
sudo init 5
3. Download & install
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.runsudo sh cuda_10.2.89_440.33.01_linux.run

Thursday, October 12, 2017

Nvidia driver version mismatch (which cause tensorflow gpu not work)

Problem:
When using tensorflow-gpu, get the following error:

Solved in the environment Ubuntu 16.04.

tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE

1. May be the nvidia driver version problem. Check the installed driver.

nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

Show installed nvidia driver
$ dpkg --get-selections | grep nvidia
nvidia-375 install
nvidia-384 install
nvidia-opencl-icd-375 deinstall
nvidia-opencl-icd-384 install
nvidia-prime install
nvidia-settings install

dpkg -l | grep -i nvidia

ii  bbswitch-dkms                              0.8-3ubuntu1                                  amd64        Interface for toggling the power on NVIDIA Optimus video cards
ii  libcuda1-375                               375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA CUDA runtime library
ii  nvidia-375                                 375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA binary driver - version 375.82
ii  nvidia-opencl-icd-375                      375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                               0.8.2                                         amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                            384.90-0ubuntu0~gpu16.04.1                    amd64        Tool for configuring the NVIDIA graphics driver

2. Uninstall current driver and reinstall nvidia-375

$nvidia-uninstall

If there is no nvidia-uninstall, you should remove all nvidia driver
  1. Run sudo apt-get purge nvidia-*
  2. Run sudo add-apt-repository ppa:graphics-drivers/ppa and then sudo apt-get update.
  3. Run sudo apt-get install nvidia-375.
  4. Reboot and your graphics issue should be fixed.

You can check your installation status with the following command
lsmod | grep nvidia
Done! Then the "nvidia-smi" should work.



3. May turn off the ubuntu automatical updates.

Other helpful commands.
to list the devices
ubuntu-drivers devices


4. Important. Prevent driver auto-update
Sometimes, only this step is enough!!!

$ sudo apt-mark hold nvidia-375
$ dpkg --get-selections | grep nvidia
nvidia-375 hold
nvidia-384 install
nvidia-opencl-icd-384 install
nvidia-prime install

nvidia-settings install

5. Check tensorflow

import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X