Monday, December 18, 2017

ssh login timeout but scp works

Problem:
ssh login timeout
scp and sftp work

Analysis:

scp and sftp working means port 22 works.

This was due to my router blocking TCP keepalive messages when I connected wirelessly (go figure).
Solution:
ssh my_server -o TCPKeepAlive=no 

From the documentation:
TCPKeepAlive
  Specifies whether the system should send TCP keepalive messages
  to the other side. If they are sent, death of the connection or
  crash of one of the machines will be properly noticed.  However,
  this means that connections will die if the route is down tem-
  porarily, and some people find it annoying.  On the other hand,
  if TCP keepalives are not sent, sessions may hang indefinitely on
  the server, leaving "ghost" users and consuming server resources.

  The default is "yes" (to send TCP keepalive messages), and the
  server will notice if the network goes down or the client host
  crashes.  This avoids infinitely hanging sessions.

  To disable TCP keepalive messages, the value should be set to
  "no".

Friday, November 17, 2017

Mount a hard disk in ubuntu + reboot (may have different filesystem type)

1. $ sudo fdisk -l
display device
Disk /dev/sdc: 2 TiB, 2197949513728 bytes, 4292870144 sectors

2. $ mkdir /media/DiskC
Better use /media instead of /mnt, because /mnt is usually the automatic mount point, may be replaced when restarting.

3 $ mount /dev/sdc /media/DiskC
Then error occured
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

(for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount. helper program)
This is relevant given that you are trying to mount NFS. The /sbin/mount.nfs helper program is provided by nfs-common. You can install it with:
sudo apt install nfs-common
On the other hand, if you are trying to mount CIFS, the helper program is provided by cifs-utils. You can install it with:
sudo apt install cifs-utils
4. convert file type
$ sudo mkfs.ext4 /dev/sdc

goto step 3.
Done!


Auto mount on start up.

[IMPORTANT] sudo cp /etc/fstab /etc/fstab.old - Create a backup of the fstab file just in case something unwanted happens.

Auto-mount at boot

We want the drive to auto-mount at boot.  This usually means editing /etc/fstab.

Firstly, it's always best to use the drives UUID.  To find the drive's UUID do

ls -al /dev/disk/by-uuid/

Copy the resultant UUID (for your disk) and then open fstab for editing (note I'm using vim here but use whatever editor you prefer):

sudo vim /etc/fstab

You want to add an entry for the UUID and mount point.  Below is an example of an fstab file with an entry added for the mount above:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sdb1 during installation
UUID=63a46dce-b895-4c1f-9034-b1104694a956 /               ext4    errors=remount-ro 0       1
# swap was on /dev/sdb5 during installation
UUID=b9b9ee49-c69c-475b-894b-1279d44034ae none            swap    sw              0       0
# data drive
UUID=19fa40a3-fd17-412f-9063-a29ca0e75f93 /media/data   ext4    defaults        0       0

Note: the entry added is the last line.

Test fstab

We always want to test the fstab before rebooting (an incorrect fstab can render a disk unbootable).  To test do:

sudo findmnt --verify

check the last line for errors.  Warnings can help in improving your fstab.


Use lsblk -o NAME,FSTYPE,UUID to find out the UUIDs and filesystems of the partition you want to mount. For example:
$ lsblk -o NAME,FSTYPE,UUID
NAME   FSTYPE UUID
sda
├─sda2
├─sda5 swap   498d24e5-7755-422f-be45-1b78d50b44e8
└─sda1 ext4   d4873b63-0956-42a7-9dcf-bd64e495a9ff

NTFS

UUID=<uuid> <pathtomount> ntfs uid=<userid>,gid=<groupid>,umask=0022,sync,auto,rw 0 0
Examples for the <> variables:
  • <uuid>=3087106951D2FA7E
  • <pathtomount>=/home/data/
  • <userid>=1000
  • <groupid>=1000
Use id -u <username> to get the userid and id -g <username> to get the groupid.

# Mount all disks
sudo mount -a

Wednesday, November 8, 2017

Parallel compression and decompression command

There is option for tar program:
option c for compression, x for extraction.
-I, --use-compress-program PROG
      filter through PROG (must accept -d)
You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz
Use lbzip2

$ time tar cf tmp.tar.bz2 tmp --use-compress-program=lbzip2
$ time tar xf tmp.tar.bz2 --use-compress-program=lbzip2

Python locale error: unsupported locale setting

When using pip install, the local error problem occur.
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
sudo dpkg-reconfigure locales
Then select en_US.UTF-8 to install

Occurred when install virtualenv.

Thursday, October 19, 2017

Common command for screen

Linux Screen allows you to:
  • Use multiple shell windows from a single SSH session.
  • Keep a shell active even through network disruptions.
  • Disconnect and re-connect to a shell sessions from multiple locations.
  • Run a long running process without maintaining an active shell session.
1. Start screen
$ screen

2 Create a screen
$ Ctr+a, c

3. Switch next screen
$ Ctr+a, n

4. List screen
$ screen -ls

5. Deattach screen
$ Ctr+a, d
or terminate
$ Ctr+a, k

6. Reattach screen
$ screen -r <screenid>

ssh forward port to solve denied server port


Sometimes, some port is denied from servers. When you want to access webpage set in server, you can forward remote port to local.

$ ssh -L16006:localhost:6006 user1@remote_server

Set port 6006 at remote server, and you can access at client via port 16006.

Thursday, October 12, 2017

Nvidia driver version mismatch (which cause tensorflow gpu not work)

Problem:
When using tensorflow-gpu, get the following error:

Solved in the environment Ubuntu 16.04.

tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE

1. May be the nvidia driver version problem. Check the installed driver.

nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

Show installed nvidia driver
$ dpkg --get-selections | grep nvidia
nvidia-375 install
nvidia-384 install
nvidia-opencl-icd-375 deinstall
nvidia-opencl-icd-384 install
nvidia-prime install
nvidia-settings install

dpkg -l | grep -i nvidia

ii  bbswitch-dkms                              0.8-3ubuntu1                                  amd64        Interface for toggling the power on NVIDIA Optimus video cards
ii  libcuda1-375                               375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA CUDA runtime library
ii  nvidia-375                                 375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA binary driver - version 375.82
ii  nvidia-opencl-icd-375                      375.82-0ubuntu0~gpu16.04.1                    amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                               0.8.2                                         amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                            384.90-0ubuntu0~gpu16.04.1                    amd64        Tool for configuring the NVIDIA graphics driver

2. Uninstall current driver and reinstall nvidia-375

$nvidia-uninstall

If there is no nvidia-uninstall, you should remove all nvidia driver
  1. Run sudo apt-get purge nvidia-*
  2. Run sudo add-apt-repository ppa:graphics-drivers/ppa and then sudo apt-get update.
  3. Run sudo apt-get install nvidia-375.
  4. Reboot and your graphics issue should be fixed.

You can check your installation status with the following command
lsmod | grep nvidia
Done! Then the "nvidia-smi" should work.



3. May turn off the ubuntu automatical updates.

Other helpful commands.
to list the devices
ubuntu-drivers devices


4. Important. Prevent driver auto-update
Sometimes, only this step is enough!!!

$ sudo apt-mark hold nvidia-375
$ dpkg --get-selections | grep nvidia
nvidia-375 hold
nvidia-384 install
nvidia-opencl-icd-384 install
nvidia-prime install

nvidia-settings install

5. Check tensorflow

import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X

Tuesday, October 10, 2017

Batch file process in linux bash


Batch process *.ext files

Template

$ find /path/ -name *.ext -exec sh -c 'command "$1"' _ {} \;

Example

print file names without extension
$ find /path/ -name *.ext -exec sh -c 'echo "${1%.*}"' _ {} \;
print base name (dirname), directory (dirname)
$ find /path/ -name *.ext -exec sh -c 'echo $(basename "$1")' _ {} \;

Batch convert *.ext to *.ext_bak

Template

$ find /path/ -name *.ext -exec sh -c 'convert "$1" "${1%.*}".ext_bak' _ {} \;

Example

convert egg to bam file
$ find /path/ -name *.egg -exec sh -c 'egg2bam "$1" -o "${1%.*}".bam' _ {} \;

Monday, September 18, 2017

Install tensorflow by building source

Problem occurs when using direct pip install:
make tensorflow c++ code
third_party/eigen3/unsupported/eigen/cxx11/tensor: no such file or directory

Fixed by compiling open source
Environment used
Ubuntu 16.04
gcc 5.4.0
Cuda 8.0
Cudnn 5
python 2.7
tensorflow 1.2.1

Oficial doc:
https://www.tensorflow.org/install/install_sources#PrepareLinux

Clone the TensorFlow repository

Start the process of building TensorFlow by cloning a TensorFlow repository.
To clone the latest TensorFlow repository, issue the following command:
$ git clone https://github.com/tensorflow/tensorflow
The preceding git clone command creates a subdirectory named tensorflow. After cloning, you may optionally build a specific branch (such as a release branch) by invoking the following commands:
$ cd tensorflow $ git checkout Branch # where Branch is the desired branch
For example, to work with the r1.0 release instead of the master release, issue the following command:
$ git checkout r1.2

Prepare environment for Linux

Install Bazel

If bazel is not installed on your system, install it now by following these directions.
Bug:
The latest bazel has problem to build, need to roll back to 0.5.2
Download from
https://github.com/bazelbuild/bazel/releases/download/0.5.2/bazel_0.5.2-linux-x86_64.deb

Install TensorFlow Python dependencies

To install these packages for Python 2.7, issue the following command:
$ sudo apt-get install python-numpy python-dev python-pip python-wheel
To install these packages for Python 3.n, issue the following command:
$ sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel

Optional: install TensorFlow for GPU prerequisites

Finally, you must also install libcupti-dev by invoking the following command:

$ sudo apt-get install libcupti-dev

Next

After preparing the environment, you must now configure the installation.
$ cd tensorflow  # cd to the top-level directory created
$ ./configure
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python2.7/dist-packages]

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N]
No MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N]
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] Y
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]
nvcc will be used as CUDA compiler
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 5
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0
Do you wish to build TensorFlow with MPI support? [y/N] 
MPI support will not be enabled for TensorFlow
Configuration finished
To build a pip package for TensorFlow with GPU support, invoke the following command:
$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" 
NOTE on gcc 5 or later: the binary pip packages available on the TensorFlow website are built with gcc 4, which uses the older ABI. To make your build compatible with the older ABI, you need to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" to your bazel build command.


The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the /tmp/tensorflow_pkg directory:

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tmp/tensorflow_pkg

Install via pip in virtual environment
virtualenv tensorflow
(tensorflow)$ source ~/tensorflow/bin/activate
(tensorflow) $ pip install /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl


Update 2018.01.16

Problems for

cuda 9.1
cudnn 7.0
tensorflow 1.5.0


When building using bazel ...


Describe the problem 1

While trying to compile the latest TensorFlow(cloned from 798fa36), such error will be raised:
ERROR: /home/ubuntu/tensorflow/tensorflow/contrib/seq2seq/BUILD:64:1: error while parsing .d file: /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/execroot/org_tensorflow/bazel-out/k8-py3-opt/bin/tensorflow/contrib/seq2seq/_objs/python/ops/_beam_search_ops_gpu/tensorflow/contrib/seq2seq/kernels/beam_search_ops_gpu.cu.pic.d (No such file or directory)
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14:0,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
                 from ./tensorflow/contrib/seq2seq/kernels/beam_search_ops.h:19,
                 from tensorflow/contrib/seq2seq/kernels/beam_search_ops_gpu.cu.cc:20:
external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/Core:59:34: fatal error: math_functions.hpp: No such file or directory
It turns out that in CUDA 9.1, math_functions.hpp is located at cuda/include/crt/math_functions.hpp, rather than cuda/include/math_functions.hpp (CUDA 9.0 does), which leads to this error.
ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp will fix this problem and complete the compiling process.

Reference


Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi. Furthermore if you are using TensorFlow package created from source remember to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" as bazel command to compile the Python package.

Problem 2

no such package '@nasm//': java.io.IOException: Error downloading [https://mirror.bazel.build/www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2, http://pkgs.fedoraproject.org/repo/pkgs/nasm/nasm-2.12.02.tar.bz2/d15843c3fb7db39af80571ee27ec6fad/nasm-2.12.02.tar.bz2]

Solution
https://github.com/tensorflow/tensorflow/issues/16862
The problem is that one of two mirrors for nasm is dead, and the second one is sort some reason problematic. Workaround would be to add one more mirror:
      urls = [
          "https://mirror.bazel.build/www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2",  
          "http://www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2",
          "http://pkgs.fedoraproject.org/repo/pkgs/nasm/nasm-2.12.02.tar.bz2/d15843c3fb7db39af80571ee27ec6fad/nasm-2.12.02.tar.bz2",
      ]
in
"https://mirror.bazel.build/www.nasm.us/pub/nasm/releasebuilds/2.12.02/nasm-2.12.02.tar.bz2",

Problem 3

'Numpy dangling symbolic links' when building from source
Solution

sudo pip install --no-cache-dir --upgrade --force-reinstall numpy

Update 2018.06.05

Problem for 
cuda 9.0
cudnn 7.0
tensorflow 1.7.0
bazel 0.14

when building using bazel

bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ustring_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasDsymm_v2@libcublas.so.9.0
........

Solution
1. check $LD_LIBRARY_PATH in ~/.bashrc 
2. check CUDA_PATH
The solution is to not use LD_LIBRARY_PATH but ldconfig:
sudo echo "/usr/local/cuda/lib64" > /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

Updated on 2018.12.21


Reinstall bazel

https://docs.bazel.build/versions/master/install-ubuntu.html
rm ~/.cache/bazel -fr
rm -fr ~/.bazel ~/.bazelrc

Step 2: Download Bazel

Next, download the Bazel binary installer named bazel-<version>-installer-linux-x86_64.sh from the Bazel releases page on GitHub.

Step 3: Run the installer

Run the Bazel installer as follows:
chmod +x bazel-<version>-installer-linux-x86_64.sh
./bazel-<version>-installer-linux-x86_64.sh --user
The --user flag installs Bazel to the $HOME/bin directory on your system and sets the .bazelrc path to $HOME/.bazelrc. Use the --help command to see additional installation options.
Different version of bazel for different tensorflow

Linux

VersionPython versionCompilerBuild tools
tensorflow-1.12.02.7, 3.3-3.6GCC 4.8Bazel 0.15.0
tensorflow-1.11.02.7, 3.3-3.6GCC 4.8Bazel 0.15.0
tensorflow-1.10.02.7, 3.3-3.6GCC 4.8Bazel 0.15.0
tensorflow-1.9.02.7, 3.3-3.6GCC 4.8Bazel 0.11.0
tensorflow-1.8.02.7, 3.3-3.6GCC 4.8Bazel 0.10.0
tensorflow-1.7.02.7, 3.3-3.6GCC 4.8Bazel 0.10.0
tensorflow-1.6.02.7, 3.3-3.6GCC 4.8Bazel 0.9.0
tensorflow-1.5.02.7, 3.3-3.6GCC 4.8Bazel 0.8.0
tensorflow-1.4.02.7, 3.3-3.6GCC 4.8Bazel 0.5.4
tensorflow-1.3.02.7, 3.3-3.6GCC 4.8Bazel 0.4.5
tensorflow-1.2.02.7, 3.3-3.6GCC 4.8Bazel 0.4.5
tensorflow-1.1.02.7, 3.3-3.6GCC 4.8Bazel 0.4.2
tensorflow-1.0.02.7, 3.3-3.6GCC 4.8Bazel 0.4.2
VersionPython versionCompilerBuild toolscuDNNCUDA
tensorflow_gpu-1.12.02.7, 3.3-3.6GCC 4.8Bazel 0.15.079
tensorflow_gpu-1.11.02.7, 3.3-3.6GCC 4.8Bazel 0.15.079
tensorflow_gpu-1.10.02.7, 3.3-3.6GCC 4.8Bazel 0.15.079
tensorflow_gpu-1.9.02.7, 3.3-3.6GCC 4.8Bazel 0.11.079
tensorflow_gpu-1.8.02.7, 3.3-3.6GCC 4.8Bazel 0.10.079
tensorflow_gpu-1.7.02.7, 3.3-3.6GCC 4.8Bazel 0.9.079
tensorflow_gpu-1.6.02.7, 3.3-3.6GCC 4.8Bazel 0.9.079
tensorflow_gpu-1.5.02.7, 3.3-3.6GCC 4.8Bazel 0.8.079
tensorflow_gpu-1.4.02.7, 3.3-3.6GCC 4.8Bazel 0.5.468
tensorflow_gpu-1.3.02.7, 3.3-3.6GCC 4.8Bazel 0.4.568
tensorflow_gpu-1.2.02.7, 3.3-3.6GCC 4.8Bazel 0.4.55.18
tensorflow_gpu-1.1.02.7, 3.3-3.6GCC 4.8Bazel 0.4.25.18
tensorflow_gpu-1.0.02.7, 3.3-3.6GCC 4.8Bazel 0.4.25.18

Update 2020.01.19
Tested on (version matters!)
tensorflow 1.8
Cuda 10.0
cudnn 7.6
python 3.6
bazel 0.15.0