Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
要疯了。这几把玩意要搞死我。
不要以为很简单的事,那是因为有人替你负重前行,自己搞一下子成功纯属运气。
老子死磕了这么久了还没解决问题。Anaconda也是个垃圾玩意。
错误提示如下:
Epoch 1/100
2019-11-12 22:56:58.736778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-12 22:56:59.069012: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-12 22:56:59.069677: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-12 22:56:59.069729: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
2019-11-12 22:56:59.069743: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-12 22:56:59.069761: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
Traceback (most recent call last):
File "main_ResNet.py", line 217, in <module>
shuffle=True)
File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
run_metadata=self.run_metadata)
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
[[Mean/_417]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
0 successful operations.
0 derived errors ignored.
目前没人帮我搞这些,运维也不会。这个问题我百度了。也是来源于CSDN博文,谁知道能不能行呢?试试吧
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda
rm -rf .nv/
结果仍旧是同样的错误出现,可见没用。我也关闭重开了。怀疑人生不???
参考官方链接重头搞:https://www.tensorflow.org/install/pip 这几个版本要符合要求
python3 --version
pip3 --version
virtualenv --version
Requires Python > 3.4 and pip >= 19.0,下面是安装更新方法
sudo apt update
sudo apt install python3-dev python3-pip
sudo pip3 install -U virtualenv # system-wide install
如果没有sudo权限,寡人就没有,没办法。pip可通过升级
python3 -m pip install --upgrade pip
后来发现cudnn没有在pip list中,经过搜索,发现cuda10.0至少要cudnn7.5.1。安装后发现没用。
【甩手掌柜要不得,搬砖工不能有这种心态】
挨个试试csdn的方法,说代码有问题。我是不相信的,代码有个屁的问题。什么设置GPU按需求分配??
import tensorflow as tf
import keras.backend as K
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
K.set_session(sess)
事实证明这种同样没用。
Train on 15285 samples, validate on 3822 samples
Epoch 1/100
2019-11-13 11:47:34.709111: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-13 11:47:35.062741: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 11:47:35.064299: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 11:47:35.064397: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
2019-11-13 11:47:35.064428: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 11:47:35.064473: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
Traceback (most recent call last):
File "main_ResNet.py", line 223, in <module>
shuffle=True)
File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
run_metadata=self.run_metadata)
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
[[Mean/_417]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
0 successful operations.
0 derived errors ignored.
【说明:直接测试tf.test.is_gpu_available()测试返回True是没用的,要实际用代码测试】
其他版本的“按需分配”也试了,没用。
搞得我一点脾气都没有了,提issue吧
同事说版本试试,我试了1.15的tf也是不行
Train on 15285 samples, validate on 3822 samples
Epoch 1/100
2019-11-13 15:07:10.609111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-11-13 15:07:10.846521: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-13 15:07:10.847414: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 15:07:10.847488: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
2019-11-13 15:07:10.847506: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 15:07:10.847533: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
Traceback (most recent call last):
File "main_ResNet.py", line 229, in <module>
shuffle=True)
File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/./anaconda3/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
run_metadata=self.run_metadata)
File "/./anaconda3/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
[[Mean/_417]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
0 successful operations.
0 derived errors ignored.
试了下1.12的,结果似乎需要cuda9.0,而我的是10.0的
Traceback (most recent call last):
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/./anaconda3/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/./anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main_ResNet.py", line 10, in <module>
import tensorflow as tf
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/./anaconda3/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/./anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
而1.13.1的结果与1.14的相同。
我再开一篇,因为不只是这个问题。还有
Possibly insufficient driver version: 410.48.0
另外有相关问题可以加入QQ群讨论,不设微信群
QQ群:868373192
语音深度学习群
上一篇: 亲爸亲妈系列图片,让爸爸带孩子,更有爱!
下一篇: 自己画的漂亮不
推荐阅读
-
Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED的一种解决方案
-
could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
-
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
-
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
-
could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
-
could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
-
could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
-
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
-
E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC
-
could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR