How do I fix an encog "kernel launch failure" error when running: "./encog benchmark /gpu:1" -


as part of encog install testing, tried running ./encog benchmark /gpu:0, worked fine, when tried ./encog benchmark /gpu:1, got:

encog-core/cuda_eval.cu(286) : getlastcudaerror() cuda error : kernel launch failure : (13) invalid device symbol. 

i on ubuntu 11.10, got source code https://github.com/encog/encog-c, , "make arch=64 cuda=1" went without error.

thanks in solving problem.

here's console list benchmark worked fine:

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ ./encog benchmark /gpu:0  * * encog c/c++ (64 bit, cuda) command line v1.0 * * copyright 2012 heaton research, released under apache license build date: may 4 2013 07:24:00 processor/core count: 32 basic data type: double (64 bits) gpu: disabled input count: 10 ideal count: 1 records: 10000 iterations: 100  performing benchmark...please wait benchmark time(seconds): 3.2856 benchmark time includes training time.  encog finished. run time 00:00:03.2904 

=============================================

here's benchmark run had problem

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ ./encog benchmark /gpu:1  * * encog c/c++ (64 bit, cuda) command line v1.0 * * copyright 2012 heaton research, released under apache license build date: may 4 2013 07:24:00 processor/core count: 32 basic data type: double (64 bits) gpu: enabled input count: 10 ideal count: 1 records: 10000 iterations: 100  performing benchmark...please wait encog-core/cuda_eval.cu(286) : getlastcudaerror() cuda error : kernel launch failure : (13) invalid device symbol. 

==========================================

here's gpu environment looks like:

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ ./encog cuda  * * encog c/c++ (64 bit, cuda) command line v1.0 * * copyright 2012 heaton research, released under apache license build date: may 4 2013 07:24:00 processor/core count: 32 basic data type: double (64 bits) gpu: enabled device 0: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes device 1: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes device 2: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes device 3: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes device 4: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes device 5: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes device 6: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes device 7: geforce gtx 690 cuda driver version / runtime version 5.0 / 5.0 cuda capability major/minor version number: 3.0 total amount of global memory: 2048 mbytes (2147287040 bytes)  ( 8) multiprocessors x (192) cuda cores/mp: 1536 cuda cores gpu clock speed: 1.02 ghz total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 65536 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 2147483647 x 65535 x 65535 maximum memory pitch: 2147483647 bytes texture alignment: 512 bytes performing cuda test. vector addition cuda vector add test successful. encog finished. run time 00:00:10.9206 

===============================

here's output of "make":

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ make arch=64 cuda=1 mkdir -p ./obj-cmd gcc -c -o obj-cmd/encog-cmd.o encog-cmd/encog-cmd.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-cmd gcc -c -o obj-cmd/cuda_test.o encog-cmd/cuda_test.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-cmd gcc -c -o obj-cmd/node_unix.o encog-cmd/node_unix.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-cmd /usr/local/cuda/bin/nvcc -o obj-cmd/cuda_vecadd.cu.o -c encog-cmd/cuda_vecadd.cu -i./encog-core/ -m64 mkdir -p ./obj-lib gcc -c -o obj-lib/activation.o encog-core/activation.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/errorcalc.o encog-core/errorcalc.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/network_io.o encog-core/network_io.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/util.o encog-core/util.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/util_str.o encog-core/util_str.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/data.o encog-core/data.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/errors.o encog-core/errors.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/network.o encog-core/network.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/pso.o encog-core/pso.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/util_file.o encog-core/util_file.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/vector.o encog-core/vector.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/encog.o encog-core/encog.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/nm.o encog-core/nm.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/object.o encog-core/object.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/rprop.o encog-core/rprop.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/hash.o encog-core/hash.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib gcc -c -o obj-lib/train.o encog-core/train.c -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include mkdir -p ./obj-lib /usr/local/cuda/bin/nvcc -o obj-lib/encog_cuda.cu.o -c encog-core/encog_cuda.cu -i./encog-core/ -m64 mkdir -p ./obj-lib /usr/local/cuda/bin/nvcc -o obj-lib/cuda_eval.cu.o -c encog-core/cuda_eval.cu -i./encog-core/ -m64 ptxas /tmp/tmpxft_00001b04_00000000-5_cuda_eval.ptx, line 141; warning : double not supported. demoting float mkdir -p ./lib ar rcs ./lib/encog.a ./obj-lib/activation.o ./obj-lib/errorcalc.o ./obj-lib/network_io.o ./obj-lib/util.o ./obj-lib/util_str.o ./obj-lib/data.o ./obj-lib/errors.o ./obj-lib/network.o ./obj-lib/pso.o ./obj-lib/util_file.o ./obj-lib/vector.o ./obj-lib/encog.o ./obj-lib/nm.o ./obj-lib/object.o ./obj-lib/rprop.o ./obj-lib/hash.o ./obj-lib/train.o ./obj-lib/encog_cuda.cu.o ./obj-lib/cuda_eval.cu.o gcc -o encog obj-cmd/encog-cmd.o obj-cmd/cuda_test.o obj-cmd/node_unix.o obj-cmd/cuda_vecadd.cu.o lib/encog.a -i./encog-core/ -fopenmp -std=gnu99 -pedantic -o3 -wall -m64 -dencog_cuda=1 -i/usr/local/cuda/include -lm ./lib/encog.a -l/usr/local/cuda/lib64 -lcudart rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ 

i tried running on over geforce 580, , no problem. on different platform you 6 series. looked error on few places in google. looks perhaps issue way local memory used, might not work 6 series. might want submit issue here:

https://github.com/encog/encog-c/issues


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -