[MCJIT] Multiple GOT handling in RuntimeDyldELF

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[MCJIT] Multiple GOT handling in RuntimeDyldELF

Keno Fischer-2
Hello everyone,

As part of my quest to add TLS relocation support to MCJIT, I've been taking a closer look at the GOT implementation in RuntimeDyldELF and I believe that is not valid as currently implemented. In particular, I am wondering about the multiple GOT handling support introduced in r192020. If I understand correctly this can make code reuse the GOT table entry in a different object file. This doesn't seem correct to me as there is no guarantee that the loaded object files are allocated within 2GB of each other in memory. What was the intended use case of this feature? Additionally, it seems that currently every access through the GOT get it's own entry, when identical relocations could be combined into one entry. The GOTEntries array is also never cleared, causing memory and performance problems when loading multiple object files (this is a bug and easily fixed, but makes me think this feature isn't particularly well tested). I'm planning to redesign the GOT mechanism, but I would like to understand the use case intended in r192020 first, to make sure I don't design myself into a corner.

Thanks,
Keno

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [MCJIT] Multiple GOT handling in RuntimeDyldELF

Keno Fischer-2
FWIW, I verified that we indeed crash with an assertion failure, if two copies of

declare void @global_foo()

define internal void @foo() {
call void @global_foo()
ret void
}

are too far apart (with a definition of global_foo anywhere in the address space).


On Sun, Jan 18, 2015 at 2:38 PM, Keno Fischer <[hidden email]> wrote:
Hello everyone,

As part of my quest to add TLS relocation support to MCJIT, I've been taking a closer look at the GOT implementation in RuntimeDyldELF and I believe that is not valid as currently implemented. In particular, I am wondering about the multiple GOT handling support introduced in r192020. If I understand correctly this can make code reuse the GOT table entry in a different object file. This doesn't seem correct to me as there is no guarantee that the loaded object files are allocated within 2GB of each other in memory. What was the intended use case of this feature? Additionally, it seems that currently every access through the GOT get it's own entry, when identical relocations could be combined into one entry. The GOTEntries array is also never cleared, causing memory and performance problems when loading multiple object files (this is a bug and easily fixed, but makes me think this feature isn't particularly well tested). I'm planning to redesign the GOT mechanism, but I would like to understand the use case intended in r192020 first, to make sure I don't design myself into a corner.

Thanks,
Keno


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [MCJIT] Multiple GOT handling in RuntimeDyldELF

Kaylor, Andrew
In reply to this post by Keno Fischer-2

Hi Keno,

 

I _think_ that the GOT support we currently have can be made to work if the memory manager provides the necessary help (more on that below), but I will readily admit that it is implemented in a fairly non-standard way that is likely to seem completely wrong on first inspection (and probably still seems at least slightly wrong on second inspection).  It may also have inherent limitations that can’t be overcome without a redesign, but if so I don’t know what those limitations might be.

 

It may be helpful to refer to the comments in my original GOT implementation patch (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184265.html) when trying to decipher the intent of the existing code as unfortunately I seem to have said quite a bit more there than I did in the actual code comments.

 

I’m pretty sure that the “multiple GOT” patch was intended to support the case where additional modules are loaded after finalizeLoad() has been called.  It looks like we were at some point trying to use a single GOT for all modules, but once it had been “finalized” another GOT had to be created for subsequent loads.  It’s been a while since I looked at this code, but I believe that we defer calculating the offsets for the GOT until a “finalize” is performed.  This is because the memory for loaded sections may be remapped before that time to handle remote (or out-of-process) execution.  It appears that we are also deferring allocation of the GOT section memory until this time.

 

With regard to the 2 GB+ offset problem, we’re dependent on the memory manager in that regard.  Even with a single object being loaded there is no guarantee that the memory allocated for the GOT section will be within 2 GB of the memory allocated for other sections unless the memory manager does something to make it so.  An interface was added sometime in the past year (I think) that optionally pre-calculates the amount of memory that will be needed for an object load so that the memory manager can allocate all of this memory as a single block.  I’m not sure this interface properly accounts for the possibility of GOT sections and I don’t know how it works with multiple modules.

 

The default memory manager attempts to use system address hints to allocate sections in the same region of the address space, but not all OSs support the flags we’d like to use and the address requests are never guaranteed to be respected.  FWIW, Address Sanitizer is very good at exposing issues of this sort.

 

I should also mention that there is some variation in how GOT-related issues are handled from architecture to architecture within RuntimeDyldELF.  When I implemented the GOT support, I intended for it to be capable of supporting any architecture, but there was some support for GOT-related relocations for non-x86 platforms that pre-dated my GOT implementation and I suspect those will continue to be used as long as they are working correctly.  For instance, several architectures extended the allocated size of code sections and use the extra space at the end of the section to create stubs for PC-relative function calls.

 

Let me know if there’s anything more I can do to help you get things working.

 

-Andy

 

 

From: Keno Fischer [mailto:[hidden email]]
Sent: Sunday, January 18, 2015 5:38 AM
To: LLVM Developers Mailing List; Lang Hames; Kaylor, Andrew; Thirumurthi, Ashok
Subject: [MCJIT] Multiple GOT handling in RuntimeDyldELF

 

Hello everyone,

 

As part of my quest to add TLS relocation support to MCJIT, I've been taking a closer look at the GOT implementation in RuntimeDyldELF and I believe that is not valid as currently implemented. In particular, I am wondering about the multiple GOT handling support introduced in r192020. If I understand correctly this can make code reuse the GOT table entry in a different object file. This doesn't seem correct to me as there is no guarantee that the loaded object files are allocated within 2GB of each other in memory. What was the intended use case of this feature? Additionally, it seems that currently every access through the GOT get it's own entry, when identical relocations could be combined into one entry. The GOTEntries array is also never cleared, causing memory and performance problems when loading multiple object files (this is a bug and easily fixed, but makes me think this feature isn't particularly well tested). I'm planning to redesign the GOT mechanism, but I would like to understand the use case intended in r192020 first, to make sure I don't design myself into a corner.

 

Thanks,

Keno


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [MCJIT] Multiple GOT handling in RuntimeDyldELF

Keno Fischer-2


On Mon, Jan 19, 2015 at 8:51 PM, Kaylor, Andrew <[hidden email]> wrote:

Hi Keno,

 

I _think_ that the GOT support we currently have can be made to work if the memory manager provides the necessary help (more on that below), but I will readily admit that it is implemented in a fairly non-standard way that is likely to seem completely wrong on first inspection (and probably still seems at least slightly wrong on second inspection).  It may also have inherent limitations that can’t be overcome without a redesign, but if so I don’t know what those limitations might be.

 

It may be helpful to refer to the comments in my original GOT implementation patch (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184265.html) when trying to decipher the intent of the existing code as unfortunately I seem to have said quite a bit more there than I did in the actual code comments.

 

I’m pretty sure that the “multiple GOT” patch was intended to support the case where additional modules are loaded after finalizeLoad() has been called.  It looks like we were at some point trying to use a single GOT for all modules, but once it had been “finalized” another GOT had to be created for subsequent loads.  It’s been a while since I looked at this code, but I believe that we defer calculating the offsets for the GOT until a “finalize” is performed.  This is because the memory for loaded sections may be remapped before that time to handle remote (or out-of-process) execution.  It appears that we are also deferring allocation of the GOT section memory until this time.


We call finalizeLoad for every object I believe, so we essentially end up with one GOT per object file anyway. We're deferring filling (though not allocating) the GOT until we call resolveRelocations. 
 

With regard to the 2 GB+ offset problem, we’re dependent on the memory manager in that regard.  Even with a single object being loaded there is no guarantee that the memory allocated for the GOT section will be within 2 GB of the memory allocated for other sections unless the memory manager does something to make it so.  An interface was added sometime in the past year (I think) that optionally pre-calculates the amount of memory that will be needed for an object load so that the memory manager can allocate all of this memory as a single block.  I’m not sure this interface properly accounts for the possibility of GOT sections and I don’t know how it works with multiple modules.


While this is true, it's actually not the case I'm worried about. The case I'm worried about is where we load enough object files to exhaust 2GB worth of objects (this doesn't even have to be 2GB worth of code, for example I hit this with msan). The current interface basically forces all code to fit within two GB, which is precisely what the GOT is supposed to avoid.

Just to be very explicit, the case I'm concerned about is

- Allocate Object file 1 with GOTPCREL to `foo`
- [ Allocate 2GB worth of other data ]
- Allocate Object file 2 with GOTPCREL to `foo`

Object file 2 will reuse Object file 1's GOT (though we'll still allocate space in object file 2's GOT, so it's not like we're doing this to save memory)

 

The default memory manager attempts to use system address hints to allocate sections in the same region of the address space, but not all OSs support the flags we’d like to use and the address requests are never guaranteed to be respected.  FWIW, Address Sanitizer is very good at exposing issues of this sort.


Yes, I agree this is a concern, though it seems solvable to always allocate one ObjectFile within 2GB, while it doesn't necessarily seem right to impose this to impose the restriction that all code ever loaded has to fit within 2GB.
 

I should also mention that there is some variation in how GOT-related issues are handled from architecture to architecture within RuntimeDyldELF.  When I implemented the GOT support, I intended for it to be capable of supporting any architecture, but there was some support for GOT-related relocations for non-x86 platforms that pre-dated my GOT implementation and I suspect those will continue to be used as long as they are working correctly.  For instance, several architectures extended the allocated size of code sections and use the extra space at the end of the section to create stubs for PC-relative function calls.


Yes, I've seen this code. 
 

Let me know if there’s anything more I can do to help you get things working. 


Thanks for replying. I have a half-way functioning prototype that makes GOTs local to each object file again and also deduplicates GOTEntries where possible. I'll finish it up and post it here as soon as I can.
 

-Andy

 

 

From: Keno Fischer [mailto:[hidden email]]
Sent: Sunday, January 18, 2015 5:38 AM
To: LLVM Developers Mailing List; Lang Hames; Kaylor, Andrew; Thirumurthi, Ashok
Subject: [MCJIT] Multiple GOT handling in RuntimeDyldELF

 

Hello everyone,

 

As part of my quest to add TLS relocation support to MCJIT, I've been taking a closer look at the GOT implementation in RuntimeDyldELF and I believe that is not valid as currently implemented. In particular, I am wondering about the multiple GOT handling support introduced in r192020. If I understand correctly this can make code reuse the GOT table entry in a different object file. This doesn't seem correct to me as there is no guarantee that the loaded object files are allocated within 2GB of each other in memory. What was the intended use case of this feature? Additionally, it seems that currently every access through the GOT get it's own entry, when identical relocations could be combined into one entry. The GOTEntries array is also never cleared, causing memory and performance problems when loading multiple object files (this is a bug and easily fixed, but makes me think this feature isn't particularly well tested). I'm planning to redesign the GOT mechanism, but I would like to understand the use case intended in r192020 first, to make sure I don't design myself into a corner.

 

Thanks,

Keno



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [MCJIT] Multiple GOT handling in RuntimeDyldELF

Lang Hames
Hi Keno, Andy,
 

Let me know if there’s anything more I can do to help you get things working. 


Thanks for replying. I have a half-way functioning prototype that makes GOTs local to each object file again and also deduplicates GOTEntries where possible. I'll finish it up and post it here as soon as I can.

RuntimeDyldMachO uses this scheme (though it makes no effort to de-duplicate GOT entries yet) for the reasons that Keno highlighted. I'm glad to see it being implemented in RuntimeDyldELF.

Thanks for looking in to this Keno!

Cheers,
Lang.



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev