Thursday, October 12, 2017

Nvidia driver version mismatch (which cause tensorflow gpu not work)

Problem:
When using tensorflow-gpu, get the following error:

Solved in the environment Ubuntu 16.04.

tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE

1. May be the nvidia driver version problem. Check the installed driver.

nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

Show installed nvidia driver
$ dpkg --get-selections | grep nvidia
nvidia-375 install
nvidia-384 install
nvidia-opencl-icd-375 deinstall
nvidia-opencl-icd-384 install
nvidia-prime install
nvidia-settings install

dpkg -l | grep -i nvidia

ii  bbswitch-dkms                              0.8-3ubuntu1                                  amd64        Interface for toggling the power on NVIDIA Optimus video cards
ii  libcuda1-375                               375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA CUDA runtime library
ii  nvidia-375                                 375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA binary driver - version 375.82
ii  nvidia-opencl-icd-375                      375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                               0.8.2                                         amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                            384.90-0ubuntu0~gpu16.04.1                    amd64        Tool for configuring the NVIDIA graphics driver

2. Uninstall current driver and reinstall nvidia-375

$nvidia-uninstall

If there is no nvidia-uninstall, you should remove all nvidia driver
  1. Run sudo apt-get purge nvidia-*
  2. Run sudo add-apt-repository ppa:graphics-drivers/ppa and then sudo apt-get update.
  3. Run sudo apt-get install nvidia-375.
  4. Reboot and your graphics issue should be fixed.

You can check your installation status with the following command
lsmod | grep nvidia
Done! Then the "nvidia-smi" should work.



3. May turn off the ubuntu automatical updates.

Other helpful commands.
to list the devices
ubuntu-drivers devices


4. Important. Prevent driver auto-update
Sometimes, only this step is enough!!!

$ sudo apt-mark hold nvidia-375
$ dpkg --get-selections | grep nvidia
nvidia-375 hold
nvidia-384 install
nvidia-opencl-icd-384 install
nvidia-prime install

nvidia-settings install

5. Check tensorflow

import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X

No comments:

Post a Comment