Problem:
When using tensorflow-gpu, get the following error:
Solved in the environment Ubuntu 16.04.
tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE
1. May be the nvidia driver version problem. Check the installed driver.
$
nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
Show installed nvidia driver
$ dpkg --get-selections | grep nvidia
nvidia-375 install
nvidia-384 install
nvidia-opencl-icd-375 deinstall
nvidia-opencl-icd-384 install
nvidia-prime install
nvidia-settings install
$
dpkg -l | grep -i nvidia
ii bbswitch-dkms 0.8-3ubuntu1 amd64 Interface for toggling the power on NVIDIA Optimus video cards
ii libcuda1-375 375.82-0ubuntu0~gpu16.04.1 amd64 NVIDIA CUDA runtime library
ii nvidia-375 375.82-0ubuntu0~gpu16.04.1 amd64 NVIDIA binary driver - version 375.82
ii nvidia-opencl-icd-375 375.82-0ubuntu0~gpu16.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-settings 384.90-0ubuntu0~gpu16.04.1 amd64 Tool for configuring the NVIDIA graphics driver
2. Uninstall current driver and reinstall nvidia-375
$nvidia-uninstall
If there is no nvidia-uninstall, you should remove all nvidia driver
- Run
sudo apt-get purge nvidia-*
- Run
sudo add-apt-repository ppa:graphics-drivers/ppa
and then sudo apt-get update
.
- Run
sudo apt-get install nvidia-375
.
- Reboot and your graphics issue should be fixed.
You can check your installation status with the following command
Done! Then the "nvidia-smi" should work.
3. May turn off the ubuntu automatical updates.
Other helpful commands.
to list the devices
ubuntu-drivers devices
4. Important. Prevent driver auto-update
Sometimes, only this step is enough!!!
$ sudo apt-mark hold nvidia-375
$ dpkg --get-selections | grep nvidia
nvidia-375
hold
nvidia-384
install
nvidia-opencl-icd-384
install
nvidia-prime
install
nvidia-settings
install
5. Check tensorflow
import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX TITAN X