[llvm-dev] Position independent code writes absolute pointer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Position independent code writes absolute pointer

Johannes Doerfert via llvm-dev

Hello everyone,

 

I have an issue with some code that I jit/load as position independent code. I have a feeling that it is not possible to solve the issue but I wanted to give it a try.

 

#include <stdio.h>

 

int magicValue  = 123;

int magicValue2 = 321;

 

volatile int *pValue = &magicValue;

 

void printMagicValue()

{

       printf("Planschi...\n");

       printf("The magic value is %i 0x%p && 0x%p\n", magicValue, &magicValue, pValue);

}

 

void setMagicValue(int value)

{

       magicValue = value;

}

 

This is the code which I will load as PIC, for the JTMB I use the following settings:

JTMB->setRelocationModel(llvm::Reloc::PIC_);

JTMB->setCodeModel(llvm::CodeModel::Small);

The code will be loaded into a shared memory. Two process will execute the memory from there, calling “printMagicValue”, “setMagicValue(120)” and “printMagicValue” again. Only the first process will JIT the code, every other process will access it from the shared memory.

 

The first Process will say:

Planschi...

The magic value is 123 0x00000270BB090038 && 0x00000270BB090038

Planschi...

The magic value is 120 0x00000270BB090038 && 0x00000270BB090038

 

The second Process will say:

Planschi...

The magic value is 120 0x00000237A5DE0038 && 0x00000270BB090038

Planschi...

The magic value is 120 0x00000237A5DE0038 && 0x00000270BB090038

 

The values will be read correctly! Hurray! But my problem is, that the pointer ‘pValue’ was written with an absolute value and not with a PIC conform value. The second process will now print the address from the first process. I hoped, that – since the code is PIC – that also the pointers are written PIC like. I think I understand why this is not the case, but can I somehow change this behaviour without calculating the offset myself? My overall goal is to share the entire code between two processes.

 

I hope my question is somewhat understandable and I hope even more, that there is a solution to this…

 

Thank you for any help in advance and kind greetings

Björn

Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Position independent code writes absolute pointer

Johannes Doerfert via llvm-dev

I wanted to add an thought to this:

 

Could it be possible to modify the code on the IR-Level to store PIC/offset address and not absolute address? I’m not familiar with the LLVM IR so I don’t know what is possible and how it effects the code at all.

 

From: llvm-dev <[hidden email]> On Behalf Of Gaier, Bjoern via llvm-dev
Sent: 08 January 2020 16:29
To: [hidden email]
Subject: [llvm-dev] Position independent code writes absolute pointer

 

Hello everyone,

 

I have an issue with some code that I jit/load as position independent code. I have a feeling that it is not possible to solve the issue but I wanted to give it a try.

 

#include <stdio.h>

 

int magicValue  = 123;

int magicValue2 = 321;

 

volatile int *pValue = &magicValue;

 

void printMagicValue()

{

       printf("Planschi...\n");

       printf("The magic value is %i 0x%p && 0x%p\n", magicValue, &magicValue, pValue);

}

 

void setMagicValue(int value)

{

       magicValue = value;

}

 

This is the code which I will load as PIC, for the JTMB I use the following settings:

JTMB->setRelocationModel(llvm::Reloc::PIC_);

JTMB->setCodeModel(llvm::CodeModel::Small);

The code will be loaded into a shared memory. Two process will execute the memory from there, calling “printMagicValue”, “setMagicValue(120)” and “printMagicValue” again. Only the first process will JIT the code, every other process will access it from the shared memory.

 

The first Process will say:

Planschi...

The magic value is 123 0x00000270BB090038 && 0x00000270BB090038

Planschi...

The magic value is 120 0x00000270BB090038 && 0x00000270BB090038

 

The second Process will say:

Planschi...

The magic value is 120 0x00000237A5DE0038 && 0x00000270BB090038

Planschi...

The magic value is 120 0x00000237A5DE0038 && 0x00000270BB090038

 

The values will be read correctly! Hurray! But my problem is, that the pointer ‘pValue’ was written with an absolute value and not with a PIC conform value. The second process will now print the address from the first process. I hoped, that – since the code is PIC – that also the pointers are written PIC like. I think I understand why this is not the case, but can I somehow change this behaviour without calculating the offset myself? My overall goal is to share the entire code between two processes.

 

I hope my question is somewhat understandable and I hope even more, that there is a solution to this…

 

Thank you for any help in advance and kind greetings

Björn

Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika

Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Position independent code writes absolute pointer

Johannes Doerfert via llvm-dev
Hi Gaier,

There's no way to do this automatically in LLVM at the moment. It
sounds kind of related to pointer compression techniques (also not
supported right now).

On Thu, 9 Jan 2020 at 08:14, Gaier, Bjoern via llvm-dev
<[hidden email]> wrote:
> Could it be possible to modify the code on the IR-Level to store PIC/offset address and not absolute address? I’m not familiar with the LLVM IR so I don’t know what is possible and how it effects the code at all.

It depends how much control you have over the code. You could
instrument code so that it converted all stores of pointers to be
relative to some fixed global (PC-relative doesn't work there because
it will be loaded at a different address, and "relative to the address
it's being stored to" would break memcpy). But that has some major
issues:

1. It's an ABI break, so you have to be able to recompile all code,
including any system libraries you make use of.
2. LLVM can only convert the pointers it knows about, so it would
still be broken by someone storing a pointer via an intptr_t cast and
probably other things I haven't thought of.
3. There probably isn't even a relocation for any statically
initialized pointers. You might be able to convert all of them to use
a dynamic module initializer instead though.
4. I'd expect debugging to go horribly wrong.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Position independent code writes absolute pointer

Johannes Doerfert via llvm-dev
Hey Tim,

Thank you for the answer! I expected something like that sadly :<

However...
> It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major

This sounds interesting from a learning perspective, because I never have done something like that. Is this difficult to do? Also why only convert the stores? Shouldn't I also convert the reads so they are also valid?

Kind greetings
Björn

-----Original Message-----
From: Tim Northover <[hidden email]>
Sent: 09 January 2020 11:08
To: Gaier, Bjoern <[hidden email]>
Cc: [hidden email]
Subject: Re: [llvm-dev] Position independent code writes absolute pointer

Hi Gaier,

There's no way to do this automatically in LLVM at the moment. It sounds kind of related to pointer compression techniques (also not supported right now).

On Thu, 9 Jan 2020 at 08:14, Gaier, Bjoern via llvm-dev <[hidden email]> wrote:
> Could it be possible to modify the code on the IR-Level to store PIC/offset address and not absolute address? I’m not familiar with the LLVM IR so I don’t know what is possible and how it effects the code at all.

It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major
issues:

1. It's an ABI break, so you have to be able to recompile all code, including any system libraries you make use of.
2. LLVM can only convert the pointers it knows about, so it would still be broken by someone storing a pointer via an intptr_t cast and probably other things I haven't thought of.
3. There probably isn't even a relocation for any statically initialized pointers. You might be able to convert all of them to use a dynamic module initializer instead though.
4. I'd expect debugging to go horribly wrong.

Cheers.

Tim.
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Position independent code writes absolute pointer

Johannes Doerfert via llvm-dev
Hi Bjoern,

On Thu, 9 Jan 2020 at 10:34, Gaier, Bjoern <[hidden email]> wrote:
> > It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major
>
> This sounds interesting from a learning perspective, because I never have done something like that. Is this difficult to do? Also why only convert the stores? Shouldn't I also convert the reads so they are also valid?

Sorry, I meant to say you'd have to undo the transformation on the
loads (and atomicrmw, cmpxchg) too. I think getting something that
sometimes works would actually be quite easy. You'd want to make it a
ModulePass to handle the globals, then you'd iterate through each
function, turning a store like:

    store %type* %val, %type** %ptr

into:

    %val.int = ptrtoint %type* %val to i64
    %val.int.new = sub i64 %val.int, ptrtoint(i8* @__GLOBAL_ANCHOR to i64)
    %val.new = inttoptr i64 %val.int.new to %type*
    store %type* %val.new, %type** %ptr

The corresponding load side would add back @__GLOBAL_ANCHOR. At the
Module level you'd add some kind of tentative definition for
GLOBAL_ANCHOR so it can be merged if needed, and convert a definition
like:

    @var = global i8* @other_global

into

    @var = global i8* null
    define void @__MODULE_INIT() {
      ; Duplicate store code above to put a relative value for
@other_global into @var
    }
    %0 = type { i32, void ()*, i8* }
    @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535,
void ()* @__MODULE_INIT, i8* null }]

Unfortunately I've also thought of a couple more nasty problems while
writing this out:
1. Things like target-specific vector intrinsics that do loads and
stores might obscure the fact that they're storing a pointer by
casting it to an i64 or something.
2. You'd have to make sure the stack for both programs as in the
shared region or no-one ever used a pointer to a local variable.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Position independent code writes absolute pointer

Johannes Doerfert via llvm-dev
Hello Tim,

Thank you a lot for the code! Seems like I have to learn more about the LLVM assembly to understand everything in detail.

It is still kinda sad that it is not possible to achieve the behaviour but I understand more and more why it is not possible. Thank you a lot!

Kind greetings
Björn

-----Original Message-----
From: Tim Northover <[hidden email]>
Sent: 09 January 2020 12:28
To: Gaier, Bjoern <[hidden email]>
Cc: [hidden email]
Subject: Re: [llvm-dev] Position independent code writes absolute pointer

Hi Bjoern,

On Thu, 9 Jan 2020 at 10:34, Gaier, Bjoern <[hidden email]> wrote:
> > It depends how much control you have over the code. You could
> > instrument code so that it converted all stores of pointers to be
> > relative to some fixed global (PC-relative doesn't work there
> > because it will be loaded at a different address, and "relative to
> > the address it's being stored to" would break memcpy). But that has
> > some major
>
> This sounds interesting from a learning perspective, because I never have done something like that. Is this difficult to do? Also why only convert the stores? Shouldn't I also convert the reads so they are also valid?

Sorry, I meant to say you'd have to undo the transformation on the loads (and atomicrmw, cmpxchg) too. I think getting something that sometimes works would actually be quite easy. You'd want to make it a ModulePass to handle the globals, then you'd iterate through each function, turning a store like:

    store %type* %val, %type** %ptr

into:

    %val.int = ptrtoint %type* %val to i64
    %val.int.new = sub i64 %val.int, ptrtoint(i8* @__GLOBAL_ANCHOR to i64)
    %val.new = inttoptr i64 %val.int.new to %type*
    store %type* %val.new, %type** %ptr

The corresponding load side would add back @__GLOBAL_ANCHOR. At the Module level you'd add some kind of tentative definition for GLOBAL_ANCHOR so it can be merged if needed, and convert a definition
like:

    @var = global i8* @other_global

into

    @var = global i8* null
    define void @__MODULE_INIT() {
      ; Duplicate store code above to put a relative value for @other_global into @var
    }
    %0 = type { i32, void ()*, i8* }
    @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @__MODULE_INIT, i8* null }]

Unfortunately I've also thought of a couple more nasty problems while writing this out:
1. Things like target-specific vector intrinsics that do loads and stores might obscure the fact that they're storing a pointer by casting it to an i64 or something.
2. You'd have to make sure the stack for both programs as in the shared region or no-one ever used a pointer to a local variable.

Cheers.

Tim.
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev