[llvm-dev] Best way of implement a fat pointer for C

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Best way of implement a fat pointer for C

Jeremy Morse via llvm-dev
Dear All,

I’m working on a project that extends C. I’m adding a new type of pointer
that is a fat pointer. It has some metadata about the pointed object besides
the starting address of the object. Currently I implemented this pointer as
an llvm:StructType. In llvm::Type generation function
llvm::Type *CodeGenTypes::ConvertType(QualType T)
in the case for clang::Type::Pointer, instead of creating an llvm::PointerType
I create an llvm::StructType type for this new type of pointer. And I added some
helper code in llvm::StructType and in multiple places I added code to trick
the compiler to believe sometimes a struct is actually a pointer. Until now
it compile test programs fine with -O0 but I got lots of assertion failures when
compiling with -O1 or -O2 majorly because of the confusion of type mismatch.

LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit depending
on the architecture), and since this is quite a fundamental assumption, I started
to question whether my way of implementing the fat pointer is feasible.
I thought about adding a new llvm type that inherits both llvm:PointerType
and llvm:StructType; but I’m not sure if this is the correct path. It looks like
this demands substantial changes to the compiler including adding code
for bitcode generation. Can you give me some advice on how to implement
a fat pointer in llvm?

Thanks,
- Jie

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Best way of implement a fat pointer for C

Jeremy Morse via llvm-dev
On Mon, Jan 6, 2020, 18:45 Jie Zhou via llvm-dev <[hidden email]> wrote:
Dear All,
<snip>
Can you give me some advice on how to implement
a fat pointer in llvm?

Rustc currently implements fat pointers in function arguments by passing a pair of arguments: the pointer to the object's data and the associated data which is either the length as a usize (C's uintptr_t) or a pointer to the vtable. Fat pointers in return values are passed as a two-member struct where the first member is the pointer to the object's data and the second is the length or vtable pointer.

See https://rust.godbolt.org/z/cjoRNG for the (definitely non-idiomatic) rust source code as well as the LLVM IR.

Btw, you should definitely check out Rust if you haven't already at https://www.rust-lang.org/

Jacob Lifshay

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Best way of implement a fat pointer for C

Jeremy Morse via llvm-dev
In reply to this post by Jeremy Morse via llvm-dev
Hi,

For CHERI, we use pointers in address space 200 to represent memory
capabilities (which are a kind of fat pointer).  These are able to pass
through the LLVM pipeline and we then lower them to special instructions
in the various targets that understand that pointers are a
heardware-enforced type.  It would be possible to add late pass that
then expanded these into a StructType that contained the address and
whatever metadata you wanted, and expanded loads and stores to use the
address component.  Note that if you want to hoist checks out of loops,
you will want to do this expansion somewhere in the middle of your pass
pipeline.

The tricky part here is in function parameters: you cannot easily change
the type of a function after it has been created.  Your best bet here is
to always pass fat pointers as the structure representation.

LLVM does not assume that pointers are integers - we have done a lot of
work to remove that assumption and the IR has always made the two types
distinct.  There are still a few rough areas, but these are bugs.  We
are able to compile nontrivial codebases (e.g. FreeBSD, WebKit) with
optimisations enabled for targets where pointers and integers are
distinct types at the hardware level.

David

On 07/01/2020 02:45, Jie Zhou via llvm-dev wrote:

> Dear All,
>
> I’m working on a project that extends C. I’m adding a new type of pointer
> that is a fat pointer. It has some metadata about the pointed object besides
> the starting address of the object. Currently I implemented this pointer as
> an llvm:StructType. In llvm::Type generation function
> /llvm::Type *CodeGenTypes::ConvertType(QualType T)/
> in the case for /clang::Type::Pointer/, instead of creating an
> llvm::PointerType
> I create an llvm::StructType type for this new type of pointer. And I
> added some
> helper code in llvm::StructType and in multiple places I added code to trick
> the compiler to believe sometimes a struct is actually a pointer. Until now
> it compile test programs fine with -O0 but I got lots of assertion
> failures when
> compiling with -O1 or -O2 majorly because of the confusion of type mismatch.
>
> LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit
> depending
> on the architecture), and since this is quite a fundamental assumption,
> I started
> to question whether my way of implementing the fat pointer is feasible.
> I thought about adding a new llvm type that inherits both llvm:PointerType
> and llvm:StructType; but I’m not sure if this is the correct path. It
> looks like
> this demands substantial changes to the compiler including adding code
> for bitcode generation. Can you give me some advice on how to implement
> a fat pointer in llvm?
>
> Thanks,
> - Jie
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Best way of implement a fat pointer for C

Jeremy Morse via llvm-dev
In reply to this post by Jeremy Morse via llvm-dev

Jie,

 

Do you actually want a fat-pointer specifically, or do you just want an efficient way to associate metadata with C pointers?  Because (as I‘m sure you know) fat pointers have serious compatibility problems with external libraries, and also may break C programs in other ways due to lack of sound type information.

 

John Criswell (copied) had created an improved version of Baggy Bounds which gives a efficient and compatible solution at low memory overhead.  I suggest contacting him if you’re interested.

 

—Vikram Adve

 

+ Donald B. Gillies Professor of Computer Science, University of Illinois at Urbana-Champaign

+ Admin: Kimberly Baker – [hidden email]

+ Skype: vikramadve || Zoom: https://zoom.us/j/2173900467
+ Home page: http://vikram.cs.illinois.edu

+ Center for Digital Agriculture: https://digitalag.illinois.edu

 

 

 

 

From: llvm-dev <[hidden email]> on behalf of via llvm-dev <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Date: Tuesday, January 7, 2020 at 8:06 AM
To: "[hidden email]" <[hidden email]>
Subject: llvm-dev Digest, Vol 187, Issue 17

 

Date: Tue, 7 Jan 2020 02:45:01 +0000

From: Jie Zhou via llvm-dev <[hidden email]>

To: Eli Friedman via llvm-dev <[hidden email]>

Subject: [llvm-dev] Best way of implement a fat pointer for C

Message-ID: <[hidden email]>

Content-Type: text/plain; charset="utf-8"

 

Dear All,

 

I’m working on a project that extends C. I’m adding a new type of pointer

that is a fat pointer. It has some metadata about the pointed object besides

the starting address of the object. Currently I implemented this pointer as

an llvm:StructType. In llvm::Type generation function

llvm::Type *CodeGenTypes::ConvertType(QualType T)

in the case for clang::Type::Pointer, instead of creating an llvm::PointerType

I create an llvm::StructType type for this new type of pointer. And I added some

helper code in llvm::StructType and in multiple places I added code to trick

the compiler to believe sometimes a struct is actually a pointer. Until now

it compile test programs fine with -O0 but I got lots of assertion failures when

compiling with -O1 or -O2 majorly because of the confusion of type mismatch.

 

LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit depending

on the architecture), and since this is quite a fundamental assumption, I started

to question whether my way of implementing the fat pointer is feasible.

I thought about adding a new llvm type that inherits both llvm:PointerType

and llvm:StructType; but I’m not sure if this is the correct path. It looks like

this demands substantial changes to the compiler including adding code

for bitcode generation. Can you give me some advice on how to implement

a fat pointer in llvm?

 

Thanks,

- Jie

-------------- next part --------------

An HTML attachment was scrubbed...




_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Best way of implement a fat pointer for C

Jeremy Morse via llvm-dev

Hi Vikram,

I’m working on the Checked C project (https://www.microsoft.com/en-us/research/project/checked-c/)
to enhance it with temporal memory safety. Fundamentally we want an (super) efficient way
(ideally with less than 5% performance overhead) of associating metadata with C pointers,
and the reason we chose fat pointer is we believe this would be the most efficient way,
although at the cost of breaking the backward compatibility. I’ve done a literature survey and
found that most solutions (e.g., CETS, DANGNULL, FreeSentry, DangSan, etc) use disjoint data structures
to keep track of the point-to relations and maintaining the data structures is where all
the overhead (both performance and memory) comes from, and none of the existing solutions
are fast enough. I worked on this project last summer at Microsoft with David Tarditi, and
our conclusion is that fat pointer is the way to go if speed and memory consumption are more
critical issues than compatibility.

Actually I’ve discussed this issue with John and we have received a small grant from Microsoft Research
with the "fat-pointer” design in the proposal (John is my advisor at Rochester :-)) . John talked
about implementing the fat pointer using an llvm vector or 128-bit integer, but we would still have
the type mismatch problem because in lots of places the compiler is expecting an llvm::PointerType.
I’ll discuss this more with John.

Thanks,
- Jie

On Jan 7, 2020, at 09:44, Adve, Vikram Sadanand via llvm-dev <[hidden email]> wrote:

Jie,
 
Do you actually want a fat-pointer specifically, or do you just want an efficient way to associate metadata with C pointers?  Because (as I‘m sure you know) fat pointers have serious compatibility problems with external libraries, and also may break C programs in other ways due to lack of sound type information.
 
John Criswell (copied) had created an improved version of Baggy Bounds which gives a efficient and compatible solution at low memory overhead.  I suggest contacting him if you’re interested.
 
—Vikram Adve
 
+ Donald B. Gillies Professor of Computer Science, University of Illinois at Urbana-Champaign
+ Admin: Kimberly Baker – [hidden email]
+ Skype: vikramadve || Zoom: https://zoom.us/j/2173900467
+ Home page: http://vikram.cs.illinois.edu
+ Center for Digital Agriculture: https://digitalag.illinois.edu
 
 
 
 
From: llvm-dev <[hidden email]> on behalf of via llvm-dev <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Date: Tuesday, January 7, 2020 at 8:06 AM
To: "[hidden email]" <[hidden email]>
Subject: llvm-dev Digest, Vol 187, Issue 17
 
Date: Tue, 7 Jan 2020 02:45:01 +0000
From: Jie Zhou via llvm-dev <[hidden email]>
To: Eli Friedman via llvm-dev <[hidden email]>
Subject: [llvm-dev] Best way of implement a fat pointer for C
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"
 
Dear All,
 
I’m working on a project that extends C. I’m adding a new type of pointer
that is a fat pointer. It has some metadata about the pointed object besides
the starting address of the object. Currently I implemented this pointer as
an llvm:StructType. In llvm::Type generation function
llvm::Type *CodeGenTypes::ConvertType(QualType T)
in the case for clang::Type::Pointer, instead of creating an llvm::PointerType
I create an llvm::StructType type for this new type of pointer. And I added some
helper code in llvm::StructType and in multiple places I added code to trick
the compiler to believe sometimes a struct is actually a pointer. Until now
it compile test programs fine with -O0 but I got lots of assertion failures when
compiling with -O1 or -O2 majorly because of the confusion of type mismatch.
 
LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit depending
on the architecture), and since this is quite a fundamental assumption, I started
to question whether my way of implementing the fat pointer is feasible.
I thought about adding a new llvm type that inherits both llvm:PointerType
and llvm:StructType; but I’m not sure if this is the correct path. It looks like
this demands substantial changes to the compiler including adding code
for bitcode generation. Can you give me some advice on how to implement
a fat pointer in llvm?
 
Thanks,
- Jie
-------------- next part --------------
An HTML attachment was scrubbed...


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIGaQ&c=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU&r=KAtyTEI8n3FritxDpKpR7rv3VjdmUs0luiVKZLb_bNI&m=kwZQV6kYBqMJiQ-hUwT6xk1CwdZ_yKG_ydk5ktG8LjE&s=u5oJBUPGpLf1PPp7CE54c9iqUq_2loVCQS6-U8gWEo0&e=


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Best way of implement a fat pointer for C

Jeremy Morse via llvm-dev
In reply to this post by Jeremy Morse via llvm-dev


> On Jan 7, 2020, at 06:51, David Chisnall via llvm-dev <[hidden email]> wrote:
>
> Hi,
>
> For CHERI, we use pointers in address space 200 to represent memory capabilities (which are a kind of fat pointer).  These are able to pass through the LLVM pipeline and we then lower them to special instructions in the various targets that understand that pointers are a heardware-enforced type.  It would be possible to add late pass that then expanded these into a StructType that contained the address and whatever metadata you wanted, and expanded loads and stores to use the address component.  Note that if you want to hoist checks out of loops, you will want to do this expansion somewhere in the middle of your pass pipeline.
>
> The tricky part here is in function parameters: you cannot easily change the type of a function after it has been created.  Your best bet here is to always pass fat pointers as the structure representation.

Hi David,

Yes passing pointers thorough function parameters and return values is tricky;
I got yelled at by the compiler when I tried to use llvm::Value::mutateType()
to change the prototype of a function. :-)

>
> LLVM does not assume that pointers are integers - we have done a lot of work to remove that assumption and the IR has always made the two types distinct.  There are still a few rough areas, but these are bugs.  We are able to compile nontrivial codebases (e.g. FreeBSD, WebKit) with optimisations enabled for targets where pointers and integers are distinct types at the hardware level.

You’re correct that LLVM does not assume that pointers are integers. I think
what I really tried to say in my previous email is that current LLVM implements
PointerType as Integer and this gives me trouble in tons of places such as
the memory layout for a pointer. In you experience, do you think it’s feasible
to create a new llvm Type that inherits both llvm::PointerType and llvm::StructType
and modify the bitcode generator to support this new type?

Thanks,
- Jie

>
> David
>
> On 07/01/2020 02:45, Jie Zhou via llvm-dev wrote:
>> Dear All,
>> I’m working on a project that extends C. I’m adding a new type of pointer
>> that is a fat pointer. It has some metadata about the pointed object besides
>> the starting address of the object. Currently I implemented this pointer as
>> an llvm:StructType. In llvm::Type generation function
>> /llvm::Type *CodeGenTypes::ConvertType(QualType T)/
>> in the case for /clang::Type::Pointer/, instead of creating an llvm::PointerType
>> I create an llvm::StructType type for this new type of pointer. And I added some
>> helper code in llvm::StructType and in multiple places I added code to trick
>> the compiler to believe sometimes a struct is actually a pointer. Until now
>> it compile test programs fine with -O0 but I got lots of assertion failures when
>> compiling with -O1 or -O2 majorly because of the confusion of type mismatch.
>> LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit depending
>> on the architecture), and since this is quite a fundamental assumption, I started
>> to question whether my way of implementing the fat pointer is feasible.
>> I thought about adding a new llvm type that inherits both llvm:PointerType
>> and llvm:StructType; but I’m not sure if this is the correct path. It looks like
>> this demands substantial changes to the compiler including adding code
>> for bitcode generation. Can you give me some advice on how to implement
>> a fat pointer in llvm?
>> Thanks,
>> - Jie
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIGaQ&c=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU&r=KAtyTEI8n3FritxDpKpR7rv3VjdmUs0luiVKZLb_bNI&m=2-GzO8LApi_o_V9sEL0O1W7epVVG8TKzx6D4yoSozXY&s=ChAnwkSzjNn11lIv696-rLlKWO9h6ON3g1knJDUG31g&e= 
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIGaQ&c=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU&r=KAtyTEI8n3FritxDpKpR7rv3VjdmUs0luiVKZLb_bNI&m=2-GzO8LApi_o_V9sEL0O1W7epVVG8TKzx6D4yoSozXY&s=ChAnwkSzjNn11lIv696-rLlKWO9h6ON3g1knJDUG31g&e= 

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Best way of implement a fat pointer for C

Jeremy Morse via llvm-dev
On 07/01/2020 16:00, Jie Zhou wrote:
>> LLVM does not assume that pointers are integers - we have done a lot of work to remove that assumption and the IR has always made the two types distinct.  There are still a few rough areas, but these are bugs.  We are able to compile nontrivial codebases (e.g. FreeBSD, WebKit) with optimisations enabled for targets where pointers and integers are distinct types at the hardware level.
> You’re correct that LLVM does not assume that pointers are integers. I think
> what I really tried to say in my previous email is that current LLVM implements
> PointerType as Integer and this gives me trouble in tons of places such as
> the memory layout for a pointer. In you experience, do you think it’s feasible
> to create a new llvm Type that inherits both llvm::PointerType and llvm::StructType
> and modify the bitcode generator to support this new type?

I don't think that this is something that would be maintainable long term.

For your use case, I would:

  - Pick an address space to use for fat pointers.
  - Define a DataLayout that has the correct size.
  - In Clang, use that address space for any pointers you wish.
  - Define intrinsics that get and set the various properties of the fat
pointer.
  - Define your calling convention to use a struct representation.
  - Use addressspacecast for converting where needed.

At this point, you'll only use the struct representation across function
call boundaries.  After inlining, run a pass that propagates the values
from get and set intrinsics and eliminates the addresspacecasts, so you
after inlining you are just using pointers in your address space.

Then you run whatever mid-level optimisations you want (probably the
standard clang set).

Late, run a pass that transforms pointers in that address space into
pointers in AS0 (this is now a function-local transform) along with the
checks.

Finally, run any passes you need to hoist checks out of loops.

David
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev