Generating PTX code from template kernel failes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Generating PTX code from template kernel failes

Haidl, Michael

Hello!

 

I’m trying to generate IR from CUDA C++. This works fine until templates come into play. Using a  template __device__ function within a __global__ function works well. The specific instantiations of the template function are generated. However, trying to forward a template parameter from the kernel launch code to the device function breaks somehow the transformation process and an empty ll file is emitted.

 

The used code:

 

#ifndef __CUDACC__

 

#include <stddef.h>

#define __constant__ __attribute__((constant))

#define __device__ __attribute__((device))

#define __global__ __attribute__((global))

#define __host__ __attribute__((host))

#define __shared__ __attribute__((shared))

#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))

 

struct dim3 {

  unsigned x, y, z;

  __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}

};

 

typedef struct cudaStream *cudaStream_t;

 

int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,

                      cudaStream_t stream = 0);

 

#endif

 

 

template <typename T>

__device__ int blubblub(T& a, float& b, double& c)

{

        a = a * b;

        b = b - c;

        c = a * c;

        return a;

}

 

 

template <typename T>

__global__ void kernel(T a, float b, double c)

{

        int result = blubblub<T>(a, b, c);

}

 

 

int main()

{

 

        kernel<int><<<dim3(1), dim3(1)>>>(5, 0.7f, 12.34);

        return 0;

}

 

The command line to compile:

 

clang++ -x cuda -S -emit-llvm -target nvptx64 -Xclang -fcuda-is-device -o test.dev.ll test.cu

 

clang version 3.5 (trunk 200831)

 

 

Any help to fix this problem is highly appreciated.

 

Michael Haidl


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Generating PTX code from template kernel failes

Justin Holewinski-2
Looks like a Clang issue, adding cfe-dev.

I would file a bug report if you have not already.


On Thu, Feb 6, 2014 at 4:44 AM, Haidl, Michael <[hidden email]> wrote:

Hello!

 

I’m trying to generate IR from CUDA C++. This works fine until templates come into play. Using a  template __device__ function within a __global__ function works well. The specific instantiations of the template function are generated. However, trying to forward a template parameter from the kernel launch code to the device function breaks somehow the transformation process and an empty ll file is emitted.

 

The used code:

 

#ifndef __CUDACC__

 

#include <stddef.h>

#define __constant__ __attribute__((constant))

#define __device__ __attribute__((device))

#define __global__ __attribute__((global))

#define __host__ __attribute__((host))

#define __shared__ __attribute__((shared))

#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))

 

struct dim3 {

  unsigned x, y, z;

  __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}

};

 

typedef struct cudaStream *cudaStream_t;

 

int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,

                      cudaStream_t stream = 0);

 

#endif

 

 

template <typename T>

__device__ int blubblub(T& a, float& b, double& c)

{

        a = a * b;

        b = b - c;

        c = a * c;

        return a;

}

 

 

template <typename T>

__global__ void kernel(T a, float b, double c)

{

        int result = blubblub<T>(a, b, c);

}

 

 

int main()

{

 

        kernel<int><<<dim3(1), dim3(1)>>>(5, 0.7f, 12.34);

        return 0;

}

 

The command line to compile:

 

clang++ -x cuda -S -emit-llvm -target nvptx64 -Xclang -fcuda-is-device -o test.dev.ll test.cu

 

clang version 3.5 (trunk 200831)

 

 

Any help to fix this problem is highly appreciated.

 

Michael Haidl


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




--

Thanks,

Justin Holewinski

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev