Error:
"cpp_extension.py", line 1561, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range
Solution:
Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.
If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )
Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html
CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.*release \([0-9]\+\.[0-9]\+\).*$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
else
echo "unsupported cuda version."
exit 1
fi
If the gpu driver is loaded correctly, execute the following statement in the python console
>>> torch.cuda.get_device_capability(0)
(6, 1)
that means TORCH_CUDA_ARCH_LIST="6.1"
. However, in most cases, cuda is unavailable because you have specified gpu incorrectly.
No comments:
Post a Comment