A number of newbie questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

A number of newbie questions

Marcel Weiher
Hi,

I am currently experimenting with LLVM to provide native code  
compilation services for a project of mine I call Objective-
Smalltalk, and so far quite pleased with the results.  I was able to  
JIT-compile some functions that send Objective-C messages, and now  
look forward to compiling full methods.

I do have a couple of questions that I haven't been able to answer  
after looking through what I think is the available documentation:

1. Executable size

Executables appear to be gargantuan, a framework that wraps the parts  
required for the above functionality weighs in at 13 MB fully  
stripped ( -x ) and at 72 MB (!) with debugging symbols.  Is there  
any way of significantly reducing this size, at present or planned in  
the future?

2. Global (Function) naming

It appears that I have to give 'functions' a global/module visible  
name in order to create them, which is a bit odd for the case of  
compiling methods, as their "name" is really more a function of where  
they get stuffed in the method table of the class in question,  
something I might not even know at the time I am compiling the  
method.  Also these names seem to actually exist in the global  
function/symbol namespace of the running program, or at least  
interact with it.

I currently just synthesize a dummy name from the address of the  
object in question, but that's really a bit of a hack.  Is there some  
way of interacting with LLVM without having to interact with this  
global namespace?

3. Modules / JITs / functions

As far as I can tell, I need a 'Module' in order to create a  
function, at least that's the only way I've been able to make it work  
so far, but I am not really clear why this should be the case.  Of  
course, I also need this Module to create the JIT (or do I?).  I've  
now made the Module (or rather my wrapper) a singleton, effectively a  
global, but I don't feel very comfortable about it.   Also, I also  
remember some issues with not being able to create a second JIT  
later, so it seems like one module per lifetime of a process that  
wants to do jitting.

Is this correct or am I missing something?

4. Jitted functions / ownership / memory

Once a function is jitted I can get a function pointer to it and call  
it, that's great.  Can I also find out how long it is, for example if  
I wanted to write an object file?  All in all, the jit-result seems  
to be fairly opaque and hidden.  Is this intentional, or is there  
more I am missing?


Thanks,

Marcel

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: A number of newbie questions

Chris Lattner
On Mon, 9 Jan 2006, Marcel Weiher wrote:
> I am currently experimenting with LLVM to provide native code
> compilation services for a project of mine I call Objective-Smalltalk,
> and so far quite pleased with the results.  I was able to JIT-compile
> some functions that send Objective-C messages, and now look forward to
> compiling full methods.

Cool!

> I do have a couple of questions that I haven't been able to answer after
> looking through what I think is the available documentation:
>
> 1. Executable size
>
> Executables appear to be gargantuan, a framework that wraps the parts
> required for the above functionality weighs in at 13 MB fully stripped (
> -x ) and at 72 MB (!) with debugging symbols.  Is there any way of
> significantly reducing this size, at present or planned in the future?

It depends on what you're building.  A release build of LLVM (make
ENABLE_OPTIMIZED=1, with the results in llvm/Release) is significantly
smaller than a debug build.  Even with that, however, the binaries are
larger than they should be (5M?).  Noone has spent the time to track down
why this is to my knowledge.

> 2. Global (Function) naming
>
> It appears that I have to give 'functions' a global/module visible name
> in order to create them, which is a bit odd for the case of compiling
> methods, as their "name" is really more a function of where they get
> stuffed in the method table of the class in question, something I might
> not even know at the time I am compiling the method.  Also these names
> seem to actually exist in the global function/symbol namespace of the
> running program, or at least interact with it.

You can use "" for the name.  Multiple functions are allowed to have "" as
a name without problem.

> I currently just synthesize a dummy name from the address of the object
> in question, but that's really a bit of a hack.  Is there some way of
> interacting with LLVM without having to interact with this global
> namespace?

Yup :)

> 3. Modules / JITs / functions
>
> As far as I can tell, I need a 'Module' in order to create a function,
> at least that's the only way I've been able to make it work so far, but
> I am not really clear why this should be the case.

Yes, Function objects must be embedded into Module objects for the LLVM
code to be well formed.

> Of course, I also
> need this Module to create the JIT (or do I?).

Yes, the JIT does need a module to know where to get code to compile from.

> I've now made the Module
> (or rather my wrapper) a singleton, effectively a global, but I don't
> feel very comfortable about it.

This should work.  This of it as just a container for the LLVM code you
are creating.

> Also, I also remember some issues with not being able to create a second
> JIT later, so it seems like one module per lifetime of a process that
> wants to do jitting.

I'm not sure what you mean here.

> 4. Jitted functions / ownership / memory
>
> Once a function is jitted I can get a function pointer to it and call
> it, that's great.  Can I also find out how long it is, for example if I
> wanted to write an object file?
> All in all, the jit-result seems to be
> fairly opaque and hidden.  Is this intentional, or is there more I am
> missing?

There are ways, but there isn't an elegant public interface for this yet.
For a couple of reasons, it is tricky to JIT code to memory, then wrap it
up into an object file (in particular, the JIT'd code is already
relocated).  The start of a direct ELF writer is available in
lib/CodeGen/ELFWriter.cpp, but it is not complete yet.  It uses the same
codegen interfaces as the JIT to do the writing.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: A number of newbie questions

Marcel Weiher
Hi Chris,

thanks for your answers!

[large executables]

> It depends on what you're building.  A release build of LLVM (make  
> ENABLE_OPTIMIZED=1, with the results in llvm/Release) is  
> significantly smaller than a debug build.  Even with that, however,  
> the binaries are larger than they should be (5M?).  Noone has spent  
> the time to track down why this is to my knowledge.

OK, 5MB seems a lot better, I'll try doing a release  build to see if  
that gets me to that point.

Ahh...yes, one thing I like about multi-CPU machines is that they  
make background compiles very smooth.  Anyway, the framework is now  
down to 5 MB, 4 MB after stripping with -x, and that compresses down  
to around 1.1 MB with gzip, so quite good enough for now.  Lovely!

[thanks for the ""-function-name trick]

>> 3. Modules / JITs / functions
[...]
>> I've now made the Module (or rather my wrapper) a singleton,  
>> effectively a global, but I don't feel very comfortable about it.
>
> This should work.  This of it as just a container for the LLVM code  
> you are creating.

Yeah, but I really don't like globals, especially if they accumulate  
stuff as this one does.  It would be *great* if there were a way to  
isolate these guys, but I haven't found one yet.

>> Also, I also remember some issues with not being able to create a  
>> second JIT later, so it seems like one module per lifetime of a  
>> process that wants to do jitting.
>
> I'm not sure what you mean here.

In my unit test code, I tried to allocate a new JIT for each test in  
order to isolate the tests (not really a conscious decision, more  
standard operating procedure).  The program crashed once I tried to  
use the second allocated JIT.

Combining this (possibly flawed) observation with the fact that a JIT  
has to be initialized with a module, it seems that you can only have  
a single module in a process (as having a second module would require  
a second JIT).

It is quite likely that I was doing something wrong at the time,  
these were my very first baby steps, but from what I've gleamed it  
*appears* to be that LLVM sort of expects these to be pretty much  
singletons, or at the very least some sot of hierarchical invocation  
as you would see in a command line compiler, and it also expects a  
process to do a (big) compilation job and then exit. Is this  
impression correct or am I misinterpreting my initial experiences?


>> 4. Jitted functions / ownership / memory
>>
>> Once a function is jitted I can get a function pointer to it and  
>> call it, that's great.  Can I also find out how long it is, for  
>> example if I wanted to write an object file?
>> All in all, the jit-result seems to be fairly opaque and hidden.  
>> Is this intentional, or is there more I am missing?
>
> There are ways, but there isn't an elegant public interface for  
> this yet.
> For a couple of reasons, it is tricky to JIT code to memory, then  
> wrap it up into an object file (in particular, the JIT'd code is  
> already relocated).

OK, writing an object file was possibly not the best example, but it  
would be good to be able to take control of the result and control  
its lifecycle.  For example, imagine an IDE-type environment where  
you want to overwrite a particular method (and not necessarily with  
code coming from LLVM).

> The start of a direct ELF writer is available in lib/CodeGen/
> ELFWriter.cpp, but it is not complete yet.  It uses the same  
> codegen interfaces as the JIT to do the writing.

Very cool, will have to take a look at that...though what I will  
need, at least initially, is Mach-O, not ELF... :-)

Thanks again,

Marcel

--
Marcel Weiher                          Metaobject Software Technologies
[hidden email]         www.metaobject.com
The simplicity of power            HOM, IDEAs, MetaAd etc.
         1d480c25f397c4786386135f8e8938e4


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: A number of newbie questions

Chris Lattner
On Mon, 9 Jan 2006, Marcel Weiher wrote:

> [large executables]
>
>> It depends on what you're building.  A release build of LLVM (make
>> ENABLE_OPTIMIZED=1, with the results in llvm/Release) is significantly
>> smaller than a debug build.  Even with that, however, the binaries are
>> larger than they should be (5M?).  Noone has spent the time to track
>> down why this is to my knowledge.
> Ahh...yes, one thing I like about multi-CPU machines is that they make
> background compiles very smooth.  Anyway, the framework is now down to 5
> MB, 4 MB after stripping with -x, and that compresses down to around 1.1
> MB with gzip, so quite good enough for now.  Lovely!

Great!

> [thanks for the ""-function-name trick]

No problem.

>>> 3. Modules / JITs / functions
> [...]
>>> I've now made the Module (or rather my wrapper) a singleton,
>>> effectively a global, but I don't feel very comfortable about it.
>>
>> This should work.  This of it as just a container for the LLVM code
>> you are creating.
>
> Yeah, but I really don't like globals, especially if they accumulate
> stuff as this one does.  It would be *great* if there were a way to
> isolate these guys, but I haven't found one yet.

Yeah, I understand.  Just file it away whereever you put your JIT...
leading to...

>>> Also, I also remember some issues with not being able to create a
>>> second JIT later, so it seems like one module per lifetime of a
>>> process that wants to do jitting.
>>
>> I'm not sure what you mean here.
>
> In my unit test code, I tried to allocate a new JIT for each test in
> order to isolate the tests (not really a conscious decision, more
> standard operating procedure).  The program crashed once I tried to use
> the second allocated JIT.

Hrm.  This is a bug, please try to track down how it's crashing and we
will try to fix it.

> Combining this (possibly flawed) observation with the fact that a JIT
> has to be initialized with a module, it seems that you can only have a
> single module in a process (as having a second module would require a
> second JIT).

Nope, other stuff creates multiple modules at the same time.  Nothing that
I'm aware of tries to create multiple JIT's though.  Probably a JIT bug.

> It is quite likely that I was doing something wrong at the time, these
> were my very first baby steps, but from what I've gleamed it *appears*
> to be that LLVM sort of expects these to be pretty much singletons, or
> at the very least some sot of hierarchical invocation as you would see
> in a command line compiler, and it also expects a process to do a (big)
> compilation job and then exit. Is this impression correct or am I
> misinterpreting my initial experiences?

I wouldn't be suprised if there was a hidden assumption that the JIT had
to be a singleton, but Modules shouldn't be.

>>> 4. Jitted functions / ownership / memory
>>>
>>> Once a function is jitted I can get a function pointer to it and call
>>> it, that's great.  Can I also find out how long it is, for example if
>>> I wanted to write an object file?
>>> All in all, the jit-result seems to be fairly opaque and hidden.  Is
>>> this intentional, or is there more I am missing?
>>
>> There are ways, but there isn't an elegant public interface for this
>> yet.
>> For a couple of reasons, it is tricky to JIT code to memory, then wrap
>> it up into an object file (in particular, the JIT'd code is already
>> relocated).
>
> OK, writing an object file was possibly not the best example, but it
> would be good to be able to take control of the result and control its
> lifecycle.  For example, imagine an IDE-type environment where you want
> to overwrite a particular method (and not necessarily with code coming
> from LLVM).

Two things you can do.  First, after a Function has been JITed, you can
delete the LLVM IR for it, by calling Function::deleteBody().  You don't
want to delete the Function itself because the JIT retains pointers to it.

The second thing you can do is call
ExecutionEngine::recompileAndRelinkFunction.  This is useful if you have a
function whose implementation you want to change.  Generally you'd call
deleteBody() and rebuild a new body, or you would toy around with the
current body, then call this method.  See comments above for what it does.

Also available is ExecutionEngine::freeMachineCodeForFunction, but that is
currently a noop (because noone has implemented it yet, not because it
can't be implemented).

>> The start of a direct ELF writer is available in lib/CodeGen/
>> ELFWriter.cpp, but it is not complete yet.  It uses the same codegen
>> interfaces as the JIT to do the writing.
>
> Very cool, will have to take a look at that...though what I will need,
> at least initially, is Mach-O, not ELF... :-)

Looking into my mystical crystal ball, I wouldn't be suprised if that
(mach-o writing) got implemented in the next 6-12 months, but that is
probably too long for you to wait.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev