深度学习框架坑点集中贴（持续更新）

程序员文章站 2022-07-02 19:20:38

...

坑点一： Tensorflow 卷积报错

详细描述： RTX2070 + 驱动 410 + cuda 10.0 + cudnn 7.5.0 环境
报错内容：UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Tensorflow 1.3+ Tensorflow2.0 均会报错
报错原因推测： tensorflow cudnn cuda 版本不协调，降级是一个解决办法，比如降到 cuda9 ，tensorflow降低 1.9，但是毕竟不能使用新版本。若使用2.0 版本，即使降低cudnn到7.4，依然会报错，再将版本已然不能满足版本需求。

解决方案：1）对于 Tensorflow1.13+ ，使用session的情况，不使用estimator
在定义session前添加：

  session_conf = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
 session_conf.gpu_options.allow_growth=True
 session_conf.gpu_options.per_process_gpu_memory_fraction = 0.9  # 配置gpu占用率  
 sess = tf.Session(config=session_conf)

2）对于 Tensorflow1.13+ ，使用estimator
在文件开头添加：

from tensorflow import ConfigProto
from tensorflow import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
#session = InteractiveSession(config=config)
run_config = tf.estimator.RunConfig().replace(session_config=config)

并在模型训练时候，添加相应run_config参数，以IMDB的textcnn模型为例子:

imdb_classifier = tf.estimator.Estimator(model_fn=cnn_model_fn,model_dir="imdb_model_textcnn",config=run_config)

3）对于1.13+，使用 keras

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

4）使用eager execution

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.enable_eager_execution(config=config)

5）tensorflow 2.0

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

上一篇：学习笔记——跑crf框架解读(持续更新中）

下一篇：一分钟掌握js中的map方法

深度学习框架 坑点集中贴 （持续更新）

坑点一： Tensorflow 卷积报错

深度学习框架 坑点集中贴 （持续更新）

深度学习框架坑点集中贴（持续更新）

深度学习框架坑点集中贴（持续更新）