欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

ubuntu18.04配置nvidia驱动+tensorflow-gpu1.15.0总结

程序员文章站 2022-07-06 11:05:23
...

安装显卡驱动

1.禁用secure boot

这一步很重要,如果没有禁用之后会报错。

首先,根据自己电脑的情况(F12或F10)进入BIOS。
将Secure Boot Option改成Disabled
ubuntu18.04配置nvidia驱动+tensorflow-gpu1.15.0总结
我用的是雷神电脑,修改这里之后重启又恢复成了Enable,其他的电脑也有可能出现这种情况,需要调整为自定义模式,其是就是将下面一栏,Change to Customization启用,这样Secure boot会自动变为Disabled。
ubuntu18.04配置nvidia驱动+tensorflow-gpu1.15.0总结

2.禁用nouveau

编辑文件blacklist.conf

sudo vim /etc/modprobe.d/blacklist.conf

在文件最后部分插入以下两行内容

blacklist nouveau
options nouveau modeset=0

更新系统

sudo update-initramfs -u

重启系统

验证nouveau是否已禁用

lsmod | grep nouveau

没有信息显示,说明nouveau已被禁用,接下来可以安装nvidia的显卡驱动。

3. 在英伟达的官网上查找你自己电脑的显卡型号然后下载相应的驱动。网址:http://www.nvidia.cn

将下载后的run文件拷贝至home目录下

4. 在ubuntu下进入命令行界面

我是ctrl+alt+f3,不同的电脑会不同。

首先切换至root用户:

su root

关闭图形界面,不执行会出错。

service lightdm stop 

然后卸载掉原有驱动:

apt-get remove nvidia-*

给驱动run文件赋予执行权限

chmod  a+x [NVIDIA run文件]

安装:

./[NVIDIA run文件] -no-x-check -no-nouveau-check -no-opengl-files 

-no-x-check:安装驱动时关闭X服务
-no-nouveau-check:安装驱动时禁用nouveau
-no-opengl-files:只安装驱动文件,不安装OpenGL文件
避免出现循环登陆的问题。

安装过程中的选项:

  1. Continue installation
  2. Install without signing

其他选择ok或者yes就行。

挂载Nvidia驱动:

modprobe nvidia

检查驱动是否安装成功:

nvidia-smi

ubuntu18.04配置nvidia驱动+tensorflow-gpu1.15.0总结

conda安装tensorflow-gpu1.15.0

之所以选择这个版本是因为它是一个承前启后的版本,可以向后兼容2.0.0的内容。
而通过conda安装可以自动配置合适的cuda和cudnn。

conda install tensorflow-gpu=1.15.0

报错解决:
首先是可能因为网速的问题出现下载失败的情况,需要将conda配置为清华源:
运行以下命令:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

其次我出现了以下错误:

Verifying transaction: failed

RemoveError: 'setuptools' is a dependency of conda and cannot be removed from
conda's operating environment.

一开始使用:

conda install -c anaconda setuptools

但还是报错。

感觉是conda版本需要更新:

conda update --force conda

成功解决

验证gpu

import tensorflow as tf
a = tf.test.is_built_with_cuda()  # 判断CUDA是否可以用
b = tf.test.is_gpu_available(
    cuda_only=False,
    min_cuda_compute_capability=None
)                                  # 判断GPU是否可以用
print(a)
print(b)

输出结果是:
True
True
代表CUDA和GPU可用

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

输出如下:

2020-04-13 22:44:58.936998: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-04-13 22:44:58.968713: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-04-13 22:44:58.969389: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55aab2112f20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-13 22:44:58.969426: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-13 22:44:58.972287: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-13 22:44:59.320078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-13 22:44:59.320520: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55aab1df0a10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-13 22:44:59.320539: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2020-04-13 22:44:59.320701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-13 22:44:59.320951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2020-04-13 22:44:59.357052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-13 22:44:59.361052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-04-13 22:44:59.400897: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-04-13 22:44:59.445225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-04-13 22:44:59.446472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-04-13 22:44:59.497395: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-04-13 22:44:59.528163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-13 22:44:59.528302: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-13 22:44:59.528658: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-13 22:44:59.528860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-13 22:44:59.528901: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-13 22:44:59.529559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-13 22:44:59.529571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-04-13 22:44:59.529576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-04-13 22:44:59.529651: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-13 22:44:59.529887: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-13 22:44:59.530106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3686 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2020-04-13 22:44:59.530773: I tensorflow/core/common_runtime/direct_session.cc:359] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1