tensorflow l'exécution d'erreur avec cublas

quand j'ai réussi à installer tensorflow sur le cluster, j'ai immédiatement l'exécution de mnist démo pour vérifier si elle va bien, mais ici, je suis venu avec un problème. Je ne sais pas ce qu'est ce tout au sujet, mais il semble que l'erreur provient de CUDA

python3 -m tensorflow.models.image.mnist.convolutional
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Tesla K20m
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:03:00.0
Total memory: 5.00GiB
Free memory: 4.92GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:03:00.0)
Initialized!
E tensorflow/stream_executor/cuda/cuda_blas.cc:461] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
  File "/home/gpuusr/local/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InternalError: Blas SGEMM launch failed : a.shape=(64, 3136), b.shape=(3136, 512), m=64, n=512, k=3136
 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, Variable_4/read)]]
 [[Node: add_5/_35 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_299_add_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gpuusr/local/lib/python3.5/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
  File "/home/gpuusr/local/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py", line 316, in <module>
tf.app.run()
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py", line 294, in main
feed_dict=feed_dict)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Blas SGEMM launch failed : a.shape=(64, 3136), b.shape=(3136, 512), m=64, n=512, k=3136
 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, Variable_4/read)]]
 [[Node: add_5/_35 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_299_add_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'MatMul', defined at:
  File "/home/gpuusr/local/lib/python3.5/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
  File "/home/gpuusr/local/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py", line 316, in <module>
tf.app.run()
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py", line 221, in main
logits = model(train_data_node, True)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py", line 213, in model
hidden = tf.nn.relu(tf.matmul(reshape, fc1_weights) + fc1_biases)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1209, in matmul
name=name)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1178, in _mat_mul
transpose_b=transpose_b, name=name)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/ops/op_def_library.py", line 704, in apply_op
op_def=op_def)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2260, in create_op
original_op=self._default_original_op, op_def=op_def)
  File "/home/gpuusr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1230, in __init__
self._traceback = _extract_stack()

Segmentation fault (core dumped)
Dans le but de construire ou d'exécuter TensorFlow avec support GPU, les deux NVIDIA Cuda Toolkit (>= 7.0) et cuDNN (>= v2) doivent être installés. TensorFlow support GPU nécessite d'avoir une carte GPU NVidia Compute Capability >= 3.0. avez-vous suivez les officcial l'installation? tensorflow.org/versions/r0.9/get_started/os_setup.html
absolument oui, mon cuda version 7.5 et cudnn version v4
ok, et votre carte graphique possède une capacité supérieure ou égale à 3.0?
Ma carte graphique est Nvidia Tesla K20m. Je viens de regardé et a trouvé sa cuda fonction est de 3,5(est-ce le calcul de la capacité?) à partir de site web de Nvidia
Avez-vous jamais trouver une solution? je suis en cliquant sur cette maintenant

OriginalL'auteur Pengqi Lu | 2016-07-11