could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

程序员文章站 2022-05-26 23:43:45

...

当我们深度学习做训练的时候，偶尔会发生这种情况，我把源错误贴出来：

Epoch 1/16
2019-09-11 09:34:11.000335: E C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-11 09:34:11.000570: F C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\kernels\conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 

Process finished with exit code -1073740791 (0xC0000409)

就是这么个事，按照我的理解，这是你使用GPU来训练网络的时候显卡不够用了，但是这个时候如果用CPU还是可以的，但是CPU和GPU的训练速度差的简直不是一星半点，我这里差了有十倍，解决方法只需要在import后边添加两行代码，让占用的显卡内存一开始不要那么高：
如下，只有最后两行有用

import cv2
‘’‘’‘’
import各种
‘’‘’‘’
import matplotlib.pyplot as plt

config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
sess = tf.Session(config=config)

这样就可以训练了

Epoch 1/16
2019-09-11 09:41:21.671435: E C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\grappler\clusters\utils.cc:81] Failed to get device properties, error code: 30
2019-09-11 09:41:33.268699: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.282936: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.301694: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.29GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.342715: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.352678: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.359968: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.399060: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.410350: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.416374: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-11 09:41:33.614362: W C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

  1/300 [..............................] - ETA: 1:15:45 - loss: 90.8376
  2/300 [..............................] - ETA: 38:57 - loss: 89.4276  
  3/300 [..............................] - ETA: 26:41 - loss: 87.2493
  4/300 [..............................] - ETA: 20:32 - loss: 84.7477
  5/300 [..............................] - ETA: 16:51 - loss: 82.1824

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED的一种解决方案

could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR