Useful link: tensorflow

Problem:
When using tensorflow-gpu, get the following error:

Solved in the environment Ubuntu 16.04.

tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE

1. May be the nvidia driver version problem. Check the installed driver.

$ nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

Show installed nvidia driver
$ dpkg --get-selections | grep nvidia
nvidia-375 install
nvidia-384 install
nvidia-opencl-icd-375 deinstall
nvidia-opencl-icd-384 install
nvidia-prime install
nvidia-settings install

$ dpkg -l | grep -i nvidia

ii bbswitch-dkms 0.8-3ubuntu1 amd64 Interface for toggling the power on NVIDIA Optimus video cards
ii libcuda1-375 375.82-0ubuntu0~gpu16.04.1 amd64 NVIDIA CUDA runtime library
ii nvidia-375 375.82-0ubuntu0~gpu16.04.1 amd64 NVIDIA binary driver - version 375.82
ii nvidia-opencl-icd-375 375.82-0ubuntu0~gpu16.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-settings 384.90-0ubuntu0~gpu16.04.1 amd64 Tool for configuring the NVIDIA graphics driver

2. Uninstall current driver and reinstall nvidia-375

$nvidia-uninstall

If there is no nvidia-uninstall, you should remove all nvidia driver

Run sudo apt-get purge nvidia-*
Run sudo add-apt-repository ppa:graphics-drivers/ppa and then sudo apt-get update.
Run sudo apt-get install nvidia-375.
Reboot and your graphics issue should be fixed.

You can check your installation status with the following command

lsmod | grep nvidia

Done! Then the "nvidia-smi" should work.

3. May turn off the ubuntu automatical updates.

Other helpful commands.
to list the devices

ubuntu-drivers devices

4. Important. Prevent driver auto-update
Sometimes, only this step is enough!!!

$ sudo apt-mark hold nvidia-375
$ dpkg --get-selections | grep nvidia
nvidia-375 hold
nvidia-384 install
nvidia-opencl-icd-384 install
nvidia-prime install

nvidia-settings install

5. Check tensorflow

import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX TITAN X

Problem occurs when using direct pip install:
make tensorflow c++ code
third_party/eigen3/unsupported/eigen/cxx11/tensor: no such file or directory

Fixed by compiling open source
Environment used
Ubuntu 16.04
gcc 5.4.0
Cuda 8.0
Cudnn 5
python 2.7
tensorflow 1.2.1

Oficial doc:
https://www.tensorflow.org/install/install_sources#PrepareLinux

Clone the TensorFlow repository

Start the process of building TensorFlow by cloning a TensorFlow repository.

To clone the latest TensorFlow repository, issue the following command:





$ git clone https://github.com/tensorflow/tensorflow

The preceding git clone command creates a subdirectory named tensorflow. After cloning, you may optionally build a specific branch (such as a release branch) by invoking the following commands:





$ cd tensorflow
$ git checkout Branch # where Branch is the desired branch

For example, to work with the r1.0 release instead of the master release, issue the following command:





$ git checkout r1.2

Prepare environment for Linux

Install Bazel

If bazel is not installed on your system, install it now by following these directions.

Bug:
The latest bazel has problem to build, need to roll back to 0.5.2
Download from
https://github.com/bazelbuild/bazel/releases/download/0.5.2/bazel_0.5.2-linux-x86_64.deb

Install TensorFlow Python dependencies

To install these packages for Python 2.7, issue the following command:





$ sudo apt-get install python-numpy python-dev python-pip python-wheel

To install these packages for Python 3.n, issue the following command:





$ sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel


Optional: install TensorFlow for GPU prerequisites

Finally, you must also install libcupti-dev by invoking the following command:









 $ sudo apt-get install libcupti-dev 

Next

After preparing the environment, you must now configure the installation.

$ cd tensorflow  # cd to the top-level directory created
$ ./configure
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python2.7/dist-packages]

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N]
No MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N]
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] Y
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]
nvcc will be used as CUDA compiler
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 5
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0
Do you wish to build TensorFlow with MPI support? [y/N] 
MPI support will not be enabled for TensorFlow
Configuration finished

To build a pip package for TensorFlow with GPU support, invoke the following command:





$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"

NOTE on gcc 5 or later: the binary pip packages available on the TensorFlow website are built with gcc 4, which uses the older ABI. To make your build compatible with the older ABI, you need to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" to your bazel build command.

The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the /tmp/tensorflow_pkg directory:

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tmp/tensorflow_pkg

Install via pip in virtual environment

$ virtualenv tensorflow

(tensorflow)$ source ~/tensorflow/bin/activate

(tensorflow) $ pip install /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl

Update 2018.01.16

Problems for

cuda 9.1
cudnn 7.0
tensorflow 1.5.0

When building using bazel ...

Describe the problem 1

While trying to compile the latest TensorFlow(cloned from 798fa36), such error will be raised:

ERROR: /home/ubuntu/tensorflow/tensorflow/contrib/seq2seq/BUILD:64:1: error while parsing .d file: /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/execroot/org_tensorflow/bazel-out/k8-py3-opt/bin/tensorflow/contrib/seq2seq/_objs/python/ops/_beam_search_ops_gpu/tensorflow/contrib/seq2seq/kernels/beam_search_ops_gpu.cu.pic.d (No such file or directory)
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14:0,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
                 from ./tensorflow/contrib/seq2seq/kernels/beam_search_ops.h:19,
                 from tensorflow/contrib/seq2seq/kernels/beam_search_ops_gpu.cu.cc:20:
external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/Core:59:34: fatal error: math_functions.hpp: No such file or directory

It turns out that in CUDA 9.1, math_functions.hpp is located at cuda/include/crt/math_functions.hpp, rather than cuda/include/math_functions.hpp (CUDA 9.0 does), which leads to this error.
ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp will fix this problem and complete the compiling process.

Reference

https://stackoverflow.com/a/47807106/2666624

Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi. Furthermore if you are using TensorFlow package created from source remember to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" as bazel command to compile the Python package.

Problem 2

no such package '@nasm//': java.io.IOException: Error downloading [https://mirror.bazel.build/www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2, http://pkgs.fedoraproject.org/repo/pkgs/nasm/nasm-2.12.02.tar.bz2/d15843c3fb7db39af80571ee27ec6fad/nasm-2.12.02.tar.bz2]

Solution
https://github.com/tensorflow/tensorflow/issues/16862

The problem is that one of two mirrors for nasm is dead, and the second one is sort some reason problematic. Workaround would be to add one more mirror:

      urls = [
          "https://mirror.bazel.build/www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2",  
          "http://www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2",
          "http://pkgs.fedoraproject.org/repo/pkgs/nasm/nasm-2.12.02.tar.bz2/d15843c3fb7db39af80571ee27ec6fad/nasm-2.12.02.tar.bz2",
      ]

tensorflow/tensorflow/workspace.bzl

Line 216 in aed54c8

"https://mirror.bazel.build/www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2",

Problem 3

'Numpy dangling symbolic links' when building from source
Solution

sudo pip install --no-cache-dir --upgrade --force-reinstall numpy

Update 2018.06.05

Problem for

cuda 9.0

cudnn 7.0

tensorflow 1.7.0

bazel 0.14

when building using bazel

bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ustring_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasDsymm_v2@libcublas.so.9.0

........

Solution
1. check $LD_LIBRARY_PATH in ~/.bashrc
2. check CUDA_PATH
3 The solution is to not use LD_LIBRARY_PATH but ldconfig:

sudo echo "/usr/local/cuda/lib64" > /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

Updated on 2018.12.21

Reinstall bazel

https://docs.bazel.build/versions/master/install-ubuntu.html

rm ~/.cache/bazel -fr

rm -fr ~/.bazel ~/.bazelrc

Step 2: Download Bazel

Next, download the Bazel binary installer named `bazel-<version>-installer-linux-x86_64.sh` from the Bazel releases page on GitHub.

Step 3: Run the installer

Run the Bazel installer as follows:

chmod +x bazel-<version>-installer-linux-x86_64.sh
./bazel-<version>-installer-linux-x86_64.sh --user

The --user flag installs Bazel to the $HOME/bin directory on your system and sets the .bazelrc path to $HOME/.bazelrc. Use the --help command to see additional installation options.

Different version of bazel for different tensorflow

Linux

Version	Python version	Compiler	Build tools
tensorflow-1.12.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0
tensorflow-1.11.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0
tensorflow-1.10.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0
tensorflow-1.9.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.11.0
tensorflow-1.8.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.10.0
tensorflow-1.7.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.10.0
tensorflow-1.6.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.9.0
tensorflow-1.5.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.8.0
tensorflow-1.4.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.5.4
tensorflow-1.3.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.5
tensorflow-1.2.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.5
tensorflow-1.1.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.2
tensorflow-1.0.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.2

Version	Python version	Compiler	Build tools	cuDNN	CUDA
tensorflow_gpu-1.12.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0	7	9
tensorflow_gpu-1.11.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0	7	9
tensorflow_gpu-1.10.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0	7	9
tensorflow_gpu-1.9.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.11.0	7	9
tensorflow_gpu-1.8.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.10.0	7	9
tensorflow_gpu-1.7.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.9.0	7	9
tensorflow_gpu-1.6.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.9.0	7	9
tensorflow_gpu-1.5.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.8.0	7	9
tensorflow_gpu-1.4.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.5.4	6	8
tensorflow_gpu-1.3.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.5	6	8
tensorflow_gpu-1.2.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.5	5.1	8
tensorflow_gpu-1.1.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.2	5.1	8
tensorflow_gpu-1.0.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.4.2	5.1	8

Update 2020.01.19

Tested on (version matters!)

tensorflow 1.8

Cuda 10.0

cudnn 7.6

python 3.6

bazel 0.15.0

Useful link

Sunday, June 24, 2018

BUG: tensorflow tf.scatter_nd will accumulate (or undermined) values when indices have duaplicates

Thursday, October 12, 2017

Nvidia driver version mismatch (which cause tensorflow gpu not work)

Monday, September 18, 2017

Install tensorflow by building source

Clone the TensorFlow repository

Prepare environment for Linux

Install Bazel

Install TensorFlow Python dependencies

Optional: install TensorFlow for GPU prerequisites

Next

Update 2018.01.16

cuda 9.1
cudnn 7.0
tensorflow 1.5.0

When building using bazel ...

Describe the problem 1

Reference

Problem 2

Problem 3

sudo pip install --no-cache-dir --upgrade --force-reinstall numpy

Update 2018.06.05

Updated on 2018.12.21

Step 2: Download Bazel

Next, download the Bazel binary installer named `bazel-<version>-installer-linux-x86_64.sh` from the Bazel releases page on GitHub.

Step 3: Run the installer

Linux

Sunday, June 24, 2018

BUG: tensorflow tf.scatter_nd will accumulate (or undermined) values when indices have duaplicates

Thursday, October 12, 2017

Nvidia driver version mismatch (which cause tensorflow gpu not work)

Monday, September 18, 2017

Install tensorflow by building source

Clone the TensorFlow repository

Prepare environment for Linux

Install Bazel

Install TensorFlow Python dependencies

Optional: install TensorFlow for GPU prerequisites

Next

Update 2018.01.16

cuda 9.1cudnn 7.0tensorflow 1.5.0

When building using bazel ...

Describe the problem 1

Reference

Problem 2

Problem 3

sudo pip install --no-cache-dir --upgrade --force-reinstall numpy

Update 2018.06.05

Updated on 2018.12.21

Step 2: Download Bazel

Next, download the Bazel binary installer named bazel-<version>-installer-linux-x86_64.sh from the Bazel releases page on GitHub.

Step 3: Run the installer

Linux

cuda 9.1
cudnn 7.0
tensorflow 1.5.0

Next, download the Bazel binary installer named `bazel-<version>-installer-linux-x86_64.sh` from the Bazel releases page on GitHub.