I am trying to generate PTX code for 'nbody' sample program's kernel (nbody_kernel.cu) using clang/LLVM version 3.2. The nbody CUDA program is available in Nvidia's SDK.
I am referring to https://github.com/jholewinski/llvm-ptx-samples project. Following are my commands, clang++ -O4 -S -I/usr/local/cuda/include -emit-llvm -target nvptx64 nbody_kernel.cu -o nbody_kernel.ll opt -O3 -loop-unroll -unroll-allow-partial nbody_kernel.ll -o nbody_kernel.ll llc nbody_kernel.ll -o nbody_kernel.ptx After execution of the last command(llc) I get a UNREACHABLE executed! error with the following stack trace [DEVICE-C++] nbody.kernel.cpp unexpected address space UNREACHABLE executed at /home/pratnali/LLVM/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1317! 0 libLLVM-3.3svn.so 0x00007f3857bdf0cb llvm::sys::PrintStackTrace(_IO_FILE*) + 43 1 libLLVM-3.3svn.so 0x00007f3857bde74a 2 libpthread.so.0 0x00007f3856c3c460 3 libc.so.6 0x00007f3855a90b15 gsignal + 53 4 libc.so.6 0x00007f3855a91f96 abort + 390 5 libLLVM-3.3svn.so 0x00007f3857bc30f7 llvm::llvm_unreachable_internal(char const*, char const*, unsigned int) + 359 6 libLLVM-3.3svn.so 0x00007f385722967d 7 libLLVM-3.3svn.so 0x00007f385722b6d7 8 libLLVM-3.3svn.so 0x00007f3857341723 llvm::FPPassManager::doInitialization(llvm::Module&) + 99 9 libLLVM-3.3svn.so 0x00007f385734639d llvm::MPPassManager::runOnModule(llvm::Module&) + 205 10 libLLVM-3.3svn.so 0x00007f3857349b7c llvm::PassManagerImpl::run(llvm::Module&) + 268 11 llc 0x000000000040b534 12 llc 0x000000000040d131 main + 465 13 libc.so.6 0x00007f3855a7d4bd __libc_start_main + 253 14 llc 0x0000000000406e59 Stack dump: 0. Program arguments: llc nbody.kernel.ll -o nbody.kernel.ptx make: *** [nbody.kernel.ptx] Aborted I replaced the global indexes like for e.g threadIdx.x with __builtin_ptx_read_tid_x() and others. There are no problems in generating LLVM IR (i.e .ll). The error pops up while trying to generate PTX from the IR using llc. Any pointers on what might be going on here ? Will appreciate any help in going forward I have attached my program and observations in a README here. LLVM_PTX_nbody.tar.gz You can easily reproduce the problem using this. |
I noticed you're using cuda_runtime.h in the source file. Where are you getting this file? From the CUDA toolkit? Since the error is in the back-end, can you just post the .ll or .bc file you are trying to compile?
On Mon, Mar 18, 2013 at 12:42 AM, upit <[hidden email]> wrote: I am trying to generate PTX code for 'nbody' sample program's kernel Thanks, Justin Holewinski
_______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev |
Please find the .ll attached below . Yes, I am using the cuda_runtime.h from the toolkit.
nbody.kernel.ll - Uday |
The problem you are seeing is because clang is putting the global variables in address space 0, which in NVPTX means the generic address space. PTX does not allow this, so the back-end *should* be printing an error for you. Are you using trunk or 3.2? Generally, clang won't be compatible with the CUDA Toolkit headers. If you want to use the __constant__ modifier from CUDA in Clang, define it like so:On Mon, Mar 18, 2013 at 6:31 PM, upit <[hidden email]> wrote: Please find the .ll attached below . Yes, I am using the cuda_runtime.h from -- Thanks, Justin Holewinski
_______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev |
OK. That helps.
It does flash a warning though [DEVICE-C++] nbody.kernel.cpp nbody.kernel.cpp:29:9: warning: '__constant__' macro redefined #define __constant__ __attribute__((address_space(2))) ^ /opt/cuda/include/host_defines.h:183:9: note: previous definition is here #define __constant__ \ ^ 1 warning generated. Another question is What about extern __shared__ ? I can see that the error goes away if I replace "extern __shared__ float4 sharedPos[]" with "__shared__ float4* sharedPos;". Do I have to dynamically allocate the shared memory by specifying size in kernel Launch? If so, why doesn't the second use of the same statement in another function cause the error ? I am using 3.2. |
On Wed, Mar 20, 2013 at 11:29 AM, upit <[hidden email]> wrote: OK. That helps. I would just do away with the toolkit headers. I may try to put together some minimalistic headers for clang w/ nvptx at some point. Your best bet is to just define what you need yourself for now. __shared__ would be address space 3, so: #define __shared__ __attribute__((address_space(3))) Either using [] or * should work. Just be aware that you will need to specify a shared size when you launch the kernel. You can get the address space mapping from lib/Target/NVPTX/MCTargetDesc/NVPTXBaseInfo.h.
-- Thanks, Justin Holewinski
_______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev |
Thanks a lot Justin,
I will remove the toolkit header. Just one last question..(maybe ;) ) If I do away with toolkit headers it says unknown type name '__device__'. Does this function qualifier have an alternative ? or I can just do away with ? |
Not really. Clang does not have a way to annotate device vs. kernel functions in C/C++ mode. You're probably better off trying to use OpenCL or CUDA mode in clang.
In the clang unit tests, there is a cuda.h header that provides very basic support for these keywords: tests/SemaCUDA/cuda.h
If you compile as CUDA (use .cu extension, or "-x cuda") and use this header, you will have basic support. You can invoke clang with something like:
$ clang test1.cu -Xclang -fcuda-is-device -I ../src/clang/test/SemaCUDA -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S
... assuming your clang source directory is ../src/clang, you want 64-bit PTX, and your target SM is 2.0. Adjust accordingly. Clang also knows how to map OpenCL to PTX, so you would do something like:
$ clang test1.cl -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S On Wed, Mar 20, 2013 at 3:29 PM, upit <[hidden email]> wrote: Thanks a lot Justin, Thanks, Justin Holewinski
_______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev |
Well, I tried the command line given by you and I get the following error
clang++ nbody.kernel.cu -Xclang -fcuda-is-device -I/home/upitamba/llvm-3.2.src/tools/clang/test/SemaCUDA/ -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_10 -S fatal error: error in backend: Cannot select: 0x334a870: v4f32 = NVPTXISD::MoveParam 0x334a770 [ORD=1] [ID=22] 0x334a770: v4f32 = TargetExternalSymbol'.PARAM0' [ID=1] In function: computeBodyAccel Am I doing anything wrong here ? Attached my new nbody.kernel.cu.cu here |
That particular error is fixed in trunk, but with your code I'm now hitting a new issue. I'll get the fix in soon. On Fri, Mar 22, 2013 at 1:08 AM, upit <[hidden email]> wrote: Well, I tried the command line given by you and I get the following error Thanks, Justin Holewinski
_______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev |
This file should compile fine now on trunk. On Sat, Mar 23, 2013 at 8:43 PM, Justin Holewinski <[hidden email]> wrote:
Thanks, Justin Holewinski
_______________________________________________ LLVM Developers mailing list [hidden email] http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev |
Free forum by Nabble | Edit this page |