不久前在实验使用/StanfordVL/rubiksnet这个视频动作识别模型时,发现其对python版要求3.7以上,于是尝鲜下载了个python3.9.6编译安装:
wget /ftp/python/3.9.6/Python-3.9.6.tgztar xf Python-3.9.6.tgzcd Python-3.9.6sudo apt-get install build-essential python3-dev python3-setuptools python3-pip libncursesw5-dev libgdbm-dev libc6-dev zlib1g-dev libsqlite3-dev tk-dev libssl-dev openssl libffi-dev./configure --with-ssl --prefix=/usr/local/python3sudo makesudo make install
然后手工修改python3链接由原有的python3.6指向python3.9
cd /usr/binrm python3ln -s /usr/local/python3/bin/python3.6.9 python3
然后安装CUDA11.1.1(我的服务器使用的是RTX3090,需要使用这个版本以上才能正常工作,但是也不能安装最新的CUDA11.4,因为pytorch最新的1.9版本也支持到了CUDA11.1,如果安装了CUDA11.4,跑代码用到了cuda时肯定会报错RuntimeError: CUDA error: no kernel image is available for execution on the device):
wget https://developer./compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pinsudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600wget https://developer./compute/cuda/11.1.1/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.debsudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.debsudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pubsudo apt-get updatesudo apt-get -y install cuda
结果报错:
Traceback (most recent call last):
File "/usr/bin/quirks-handler", line 26, in <module>
import Quirks.quirkapplier
ModuleNotFoundError: No module named 'Quirks'
dpkg: error processing package nvidia-dkms-470 (--configure):
installed nvidia-dkms-470 package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of cuda-drivers-470:
cuda-drivers-470 depends on nvidia-dkms-470 (>= 470.57.02); however:
Package nvidia-dkms-470 is not configured yet.
...
update-initramfs: Generating /boot/initrd.img-5.4.0-72-generic
W: Possible missing firmware /lib/firmware/rtl_nic/rtl8125a-3.fw for module r8169
W: Possible missing firmware /lib/firmware/rtl_nic/rtl8168fp-3.fw for module r8169
Errors were encountered while processing:
nvidia-dkms-470
cuda-drivers-470
nvidia-driver-470
cuda-drivers
cuda-runtime-11-4
cuda-11-4
cuda-demo-suite-11-4
cuda
E: Sub-process /usr/bin/dpkg returned an error code (1)
试着单独安装ubuntu-drivers-common(Quirks包含在内):
sudo apt-get install --reinstall ubuntu-drivers-common
发现/usr/bin/quirks-handler更新了,但是还是报上面找不到Quirks的错,并且提示:
you can either revert the python3 link to the previous version, or change the python3 executable specified in /usr/bin/quirks-handler to the previous version executable(ex: python3.5).
感觉像是当前python版本python3.9.6太高了,CUDA安装还不支持,所以把python3的链接改回去指向python3.6:
cd /usr/binrm python3ln -s python3.6 python3
再重新安装cuda就成功了,安装完CUDA后记得重启动让GPU driver生效,否则,可能还报下面的错误:
GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the GeForce RTX 3090 GPU with PyTorch, please check the instructions at /get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
>>> print(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/python3/lib/python3.9/site-packages/torch/tensor.py", line 193, in __repr__
return torch._tensor_str._str(self)
File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 383, in _str
return _str_intern(self)
File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 358, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 242, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 90, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device
上面这个错误,原因就是安装的pytorch版本和它支持的CUDA版本与现在安装的CUDA版本对不上,或者版本对得上,但是安装CUDA后还没有重启生效。