LLVM-based JVM JIT for libgcj

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

LLVM-based JVM JIT for libgcj

Tom Tromey
I recently wrote an LLVM-based JIT plugin for libgcj and I thought
it'd be worthwhile to mention it here.

It is in cvs on sourceforge, but afaics anonymous cvs there is pretty
broken at the moment... so if you want a copy, ask and I will email it
to you.


Basically I hacked libgcj to (optionally) dynamically load JIT module
at startup.  If a JIT is loaded then bytecode is passed to it rather
than to the libgcj bytecode interpreter.

The LLVM JIT is pretty raw at the moment.  It can run "hello world"
and a few microbenchmarks (empty loops, method calls, that sort of
thing).  I haven't tested it seriously yet.  On my little benchmarks
it is 5x-6x faster than our interpreter.

Exception handling definitely does not work, I didn't even try to
implement it yet.  I've been thinking about having some kind of simple
bridge between the LLVM and GCC worlds here -- very inefficient, but
at least I could get it working rather quickly.  Long term I'm hoping
someone else will be solving this problem... :-)


FWIW I actually did this work twice, once for libjit and once for
LLVM.  I'm happy to provide a comparison, from a jit-writing
perspective, if you're interested.

Thanks for writing LLVM.  It is awesome to be able to add a JIT to
libgcj this easily.

Tom

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Chris Lattner
On Tue, 18 Apr 2006, Tom Tromey wrote:
> I recently wrote an LLVM-based JIT plugin for libgcj and I thought
> it'd be worthwhile to mention it here.

Cool!

> Exception handling definitely does not work, I didn't even try to
> implement it yet.  I've been thinking about having some kind of simple
> bridge between the LLVM and GCC worlds here -- very inefficient, but
> at least I could get it working rather quickly.  Long term I'm hoping
> someone else will be solving this problem... :-)

If I had to speculate, I would guess that LLVM will support dwarf-2 style
zero cost exceptions in the next 2-3 months.

> FWIW I actually did this work twice, once for libjit and once for
> LLVM.  I'm happy to provide a comparison, from a jit-writing
> perspective, if you're interested.

Given your experience with both, I'd be very interested in any thoughts
you have on how we can make LLVM better. :)

> Thanks for writing LLVM.  It is awesome to be able to add a JIT to
> libgcj this easily.

:)

-Chris

--
http://nondot.org/sabre/
http://llvm.org/

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Tom Tromey
>>>>> "Chris" == Chris Lattner <[hidden email]> writes:

>> FWIW I actually did this work twice, once for libjit and once for
>> LLVM.  I'm happy to provide a comparison, from a jit-writing
>> perspective, if you're interested.

Chris> Given your experience with both, I'd be very interested in any
Chris> thoughts you have on how we can make LLVM better. :)

libjit has a few advantages over LLVM in terms of the "gloss" -- how
it is packaged, JIT development using it:

* The API documentation is better.

  libjit's documentation is not perfectly complete, but for my
  purposes it was generally more complete and better organized than
  LLVM's.  With LLVM I ended up reading the header files to figure
  everything out; with libjit I didn't.

  Also libjit uses texinfo... sometimes I think I'm the last remaining
  person who likes using info in Emacs, but this did make my life
  simpler, so I thought I'd mention it.  (Obviously this is a
  subjective thing.. can you tell I'm defensive about it?  :-)

  Not to belabor this too much, but I've always found doxygen output
  borderline unreadable... libjit also does comment extraction from
  the source for its documentation, but puts it into a more-or-less
  nicely structured context.

* libjit is a lot smaller.  Of course this is both a plus and a minus
  (in the sense that small usually means things are missing).
  However, in terms of development productivity, libjit is a win here:
  a rebuild and relink of my libjit-based code takes under a minute.
  I think it takes 20 minutes or more to link my LLVM-based JIT on my
  laptop.

* Likewise, libjit installs very simply: it is a couple of shared
  libraries (one for the library and an extra one containing the C++
  API).  At least with the default install, LLVM is a weird (to me)
  mix of static libraries and object files.
  llvm-config saved the day here, in terms of the Makefile
  hacking.

  I only saw today in the mail archives that there is a way to build
  LLVM as shared libraries -- I haven't tried it yet, so apologies if
  this is just my ignorance.

* One oddity with LLVM came because a BasicBlock is a Value.  I passed
  it as the wrong argument to an AllocaInst constructor... oops.
  (libjit's API is much simpler ... no names for instructions, new
  instructions are implicitly linked into the current block, etc.
  This has both plusses and minuses.  I did wonder how much it costs
  to have names everywhere...)


I think libjit only has one technical idea that is missing from LLVM.
In libjit you can create a new function and get a pointer to it, but
set things up so that the IR for the function is also created lazily.
As I understand it, right now in LLVM you can make the IR and lazily
compile it, but not lazily make the IR.  This seems pretty handy, at
least for my situation.  It also looks pretty easy to add to LLVM :-)


I don't want to get you down or anything.  LLVM has many advantages
over libjit as well, which is why I chose to translate the JIT from
libjit to LLVM in the first place:

* LLVM has a friendlier license
* LLVM has a *much* more active community
* LLVM is much further ahead in every technical aspect: more ports,
  more optimizations, etc.


I hope this helps.  And, thanks again.

Tom

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Chris Lattner
On Tue, 18 Apr 2006, Tom Tromey wrote:
>>>>>> "Chris" == Chris Lattner <[hidden email]> writes:
>>> FWIW I actually did this work twice, once for libjit and once for
>>> LLVM.  I'm happy to provide a comparison, from a jit-writing
>>> perspective, if you're interested.
>
> Chris> Given your experience with both, I'd be very interested in any
> Chris> thoughts you have on how we can make LLVM better. :)

Nice writeup, thanks for taking the time to do it.

> libjit has a few advantages over LLVM in terms of the "gloss" -- how
> it is packaged, JIT development using it:
>
> * The API documentation is better.
>
>  libjit's documentation is not perfectly complete, but for my
>  purposes it was generally more complete and better organized than
>  LLVM's.  With LLVM I ended up reading the header files to figure
>  everything out; with libjit I didn't.
>
>  Also libjit uses texinfo... sometimes I think I'm the last remaining
>  person who likes using info in Emacs, but this did make my life
>  simpler, so I thought I'd mention it.  (Obviously this is a
>  subjective thing.. can you tell I'm defensive about it?  :-)
>
>  Not to belabor this too much, but I've always found doxygen output
>  borderline unreadable... libjit also does comment extraction from
>  the source for its documentation, but puts it into a more-or-less
>  nicely structured context.

Understood.  It certainly would be nice to have a "how to use the JIT"
document that is concise and targetted for this.  Also, unfortunately,
most of the docs for the LLVM API are still in the headers, which sucks.
:(

Perhaps after the release I can help improve this situation.

> * libjit is a lot smaller.  Of course this is both a plus and a minus
>  (in the sense that small usually means things are missing).
>  However, in terms of development productivity, libjit is a win here:
>  a rebuild and relink of my libjit-based code takes under a minute.
>  I think it takes 20 minutes or more to link my LLVM-based JIT on my
>  laptop.

Are you using a debug or a release build?  A release build (built with
make ENABLE_OPTIMIZED=1) is often 10x to 20x smaller than a debug build,
and links correspondingly faster.  On some machines, a release build
builds *faster* than a debug build because the debug symbols are so huge.
The only thing you lose with a release build is the ability to step into
LLVM libraries in a debugger.

> * Likewise, libjit installs very simply: it is a couple of shared
>  libraries (one for the library and an extra one containing the C++
>  API).  At least with the default install, LLVM is a weird (to me)
>  mix of static libraries and object files.
>  llvm-config saved the day here, in terms of the Makefile
>  hacking.

Yup, go llvm-config! :)

>  I only saw today in the mail archives that there is a way to build
>  LLVM as shared libraries -- I haven't tried it yet, so apologies if
>  this is just my ignorance.

I'd suggest sticking with llvm-config and not using shared libraries.

> * One oddity with LLVM came because a BasicBlock is a Value.  I passed
>  it as the wrong argument to an AllocaInst constructor... oops.
>  (libjit's API is much simpler ... no names for instructions, new
>  instructions are implicitly linked into the current block, etc.
>  This has both plusses and minuses.  I did wonder how much it costs
>  to have names everywhere...)

Instruction/BB names are completely optional (you can pass in "" for
everything, and everything will still work fine) but are quite handy when
trying to read the LLVM code.

It would be straight-forward to add a new "easy" interface for creating
LLVM instructions.  Would something like this work well for you?

class InstructionCreator {
   BasicBlock *CurBB;
public:
   void setCurrentBlock(BasicBlock *);

   Value *createAdd(Value *LHS, Value *RHS, const std::string &Name = "");
   Value *createSub(Value *LHS, Value *RHS, const std::string &Name = "");
   ...
};

Given this, use would be much more implicit:

InstructionCreator IC;
IC.setBasicBlock(FalseBB);
Value *A = IC.createAdd(LHS, RHS);
Value *B = IC.createSetEQ(A, RHS);
IC.createBr(B, TrueBB, FalseBB);
IC.setBasicBlock(TrueBB);
...

if so, I can add this.  Do you have a suggestion for a name better than
"InstructionCreator"?

> I think libjit only has one technical idea that is missing from LLVM.
> In libjit you can create a new function and get a pointer to it, but
> set things up so that the IR for the function is also created lazily.
> As I understand it, right now in LLVM you can make the IR and lazily
> compile it, but not lazily make the IR.  This seems pretty handy, at
> least for my situation.  It also looks pretty easy to add to LLVM :-)

Yup, it would be great to have this. :)

> I don't want to get you down or anything.

Heh, no problem.  If we couldn't admit that improvement is possible, we
probably wouldn't improve. :)

-Chris

--
http://nondot.org/sabre/
http://llvm.org/

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Jakob Praher-2
In reply to this post by Tom Tromey
hi Tom,

I am really glad that someone has found time to step into that :-).

Tom Tromey wrote:
> I recently wrote an LLVM-based JIT plugin for libgcj and I thought
> it'd be worthwhile to mention it here.
>
> It is in cvs on sourceforge, but afaics anonymous cvs there is pretty
> broken at the moment... so if you want a copy, ask and I will email it
> to you.

wow. that is really nice.
I was looking for some contractor for working more on LLVM, but it is
sad that I did not succeed as far.
I would definitely like to look into it.

>
>
> Basically I hacked libgcj to (optionally) dynamically load JIT module
> at startup.  If a JIT is loaded then bytecode is passed to it rather
> than to the libgcj bytecode interpreter.
>

> The LLVM JIT is pretty raw at the moment.  It can run "hello world"
> and a few microbenchmarks (empty loops, method calls, that sort of
> thing).  I haven't tested it seriously yet.  On my little benchmarks
> it is 5x-6x faster than our interpreter.
>
Looks promising.

> Exception handling definitely does not work, I didn't even try to
> implement it yet.  I've been thinking about having some kind of simple
> bridge between the LLVM and GCC worlds here -- very inefficient, but
> at least I could get it working rather quickly.  Long term I'm hoping
> someone else will be solving this problem... :-)
>
I would offer some help. But as I have always told because of a lack of
money supporting me, I have just my very scarce free time.

>
> FWIW I actually did this work twice, once for libjit and once for
> LLVM.  I'm happy to provide a comparison, from a jit-writing
> perspective, if you're interested.

Yes very much! How did you find writing it directly in SSA-form. What is
the footprint of libjit compared to the more heavy LLVM jit? How about
recompiling in libjit? Has it support for CFG, dataflow analysis?

>
> Thanks for writing LLVM.  It is awesome to be able to add a JIT to
> libgcj this easily.

Yes kudos to the LLVM people!

-- Jakob

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Tom Tromey
>>>>> "Jakob" == Jakob Praher <[hidden email]> writes:

Jakob> I would definitely like to look into it.

I'll send it in private email.

Jakob> Yes very much! How did you find writing it directly in
Jakob> SSA-form.

Actually I used what Chris called "the alloca trick"... the JIT
doesn't really generate SSA form but instead uses alloca to allocate
space for the stack and locals (and temporaries where needed) and then
emits explicit loads and stores everywhere.  On irc Chris showed me
how to invoke the LLVM pass to turn this back into something sane :-)

Jakob> What is the footprint of libjit compared to the more heavy LLVM
Jakob> jit? How about recompiling in libjit? Has it support for CFG,
Jakob> dataflow analysis?

Oh, libjit is much smaller.  I don't think it does much by way of
optimization.

libjit takes a very simple approach to recompilation.  There is a
function you can call to request recompilation for a method.  Then
(AIUI -- didn't implement this in the JIT yet) libjit will call your
build function again, to re-create the IR.  The docs talk a bit about
being able to run more optimizations, but I think that is just the
general idea, since I don't think there are actually other optimizers
available.

Even this amount of support is worthwhile in a JVM, fwiw.  You can
generate better code once constant pool entries have been resolved,
and this pretty much has to be done lazily (not what the VM spec says,
but important for actual compatibility).

Tom

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Tom Tromey
In reply to this post by Chris Lattner
Chris> Are you using a debug or a release build?  A release build (built with
Chris> make ENABLE_OPTIMIZED=1) is often 10x to 20x smaller than a debug
Chris> build, and links correspondingly faster.  On some machines, a release
Chris> build builds *faster* than a debug build because the debug symbols are
Chris> so huge. The only thing you lose with a release build is the ability
Chris> to step into LLVM libraries in a debugger.

Ok, I rebuilt with ENABLE_OPTIMIZED=1.  This did make a huge
difference -- my rebuild went down to 8 seconds (from 16 minutes... I
timed it this time; my earlier guess was off a bit).

Unfortunately it turns out I do need the debugging capabilities.
Darn.

Chris> I'd suggest sticking with llvm-config and not using shared
Chris> libraries.

I didn't dig into the Makefiles... are the libraries and whatnot built
-fPIC?  I ask because I want to dynamically load this code into
libgcj.  JVMs pretty much have to be shared libraries (or have a
separate version which is a shared library), at least if you want to
support the invocation API.

Chris> It would be straight-forward to add a new "easy" interface for
Chris> creating LLVM instructions.  Would something like this work well for
Chris> you?

I considered doing this myself but in the end didn't have much need
for it.  Anyway, don't add it on my account, I doubt I have too many
more problems like this in my code :-)

Tom

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Ralph Corderoy

Hi Tom,

> I didn't dig into the Makefiles... are the libraries and whatnot built
> -fPIC?

If you do `make Verb=' then you'll see all the actual command
invocations and can grep for bits of interest.

Cheers,


Ralph.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Reid Spencer
The correct way to do that is:

make VERBOSE=1

you can also do:

make TOOL_VERBOSE=1

which implies VERBOSE=1 and also tells each tool (compiler, linker, etc)
to be verbose about the actions it is taking.

Reid.

On Wed, 2006-04-19 at 10:39 +0100, Ralph Corderoy wrote:

> Hi Tom,
>
> > I didn't dig into the Makefiles... are the libraries and whatnot built
> > -fPIC?
>
> If you do `make Verb=' then you'll see all the actual command
> invocations and can grep for bits of interest.
>
> Cheers,
>
>
> Ralph.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LLVM-based JVM JIT for libgcj

Jakob Praher-2
In reply to this post by Tom Tromey
Tom Tromey wrote:
>>>>>> "Jakob" == Jakob Praher <[hidden email]> writes:
>
> Jakob> I would definitely like to look into it.
>
> I'll send it in private email.
Would be nice.

>
> Jakob> Yes very much! How did you find writing it directly in
> Jakob> SSA-form.
>
> Actually I used what Chris called "the alloca trick"... the JIT
> doesn't really generate SSA form but instead uses alloca to allocate
> space for the stack and locals (and temporaries where needed) and then
> emits explicit loads and stores everywhere.  On irc Chris showed me
> how to invoke the LLVM pass to turn this back into something sane :-)
>
Okay that is because memory is not in SSA form?


> Jakob> What is the footprint of libjit compared to the more heavy LLVM
> Jakob> jit? How about recompiling in libjit? Has it support for CFG,
> Jakob> dataflow analysis?
>
> Oh, libjit is much smaller.  I don't think it does much by way of
> optimization.
>
Yes I can imagine that.

> libjit takes a very simple approach to recompilation.  There is a
> function you can call to request recompilation for a method.  Then
> (AIUI -- didn't implement this in the JIT yet) libjit will call your
> build function again, to re-create the IR.  The docs talk a bit about
> being able to run more optimizations, but I think that is just the
> general idea, since I don't think there are actually other optimizers
> available.

Okay. Which is the IR then? Is it LLVM bytecode?
I have to confess, that I don't know really much about the design, but
am very interested.

>
> Even this amount of support is worthwhile in a JVM, fwiw.  You can
> generate better code once constant pool entries have been resolved,
> and this pretty much has to be done lazily (not what the VM spec says,
> but important for actual compatibility).

Kewl. I also once looked at kprobes to trap into readonly text
segements, and be able to trap for instance hot traces (like loops) for
inlining and so on. DynInst API is interesting in this regard. Kprobes
can be used for userprocess instrumentation as well.

After I have finished my thesis, I want to add generating LLVM bytecode
from intel assembly. This would be interesting to instrument native code
as well :-)


-- Jakob
>
> Tom

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev