Hooking the global symbol resolver

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Hooking the global symbol resolver

Jonathan S. Shapiro-2
Okay, we're starting to dig in, and I've hit a question that will no
doubt seem strange.

Context: BitC is a polymorphic language. Since it has unboxed value
types, our approach to compiling a polymorphic function is to
polyinstantate it -- once for each signature.

The name mangling scheme is both stable and reversible. At the site of
the use occurrence, we can fully determine the mangled name of the
desired instantiation. Given the mangled name of an instantiation that
has not yet been synthesized, we can determine which procedure must be
synthesized.

At present, our front end implements a demand-driven instantiator that
proceeds from main() until all calls are statically resolved.

It seems to me, however, that this duplicates (nearly) functionality
that is probably already present in the LLVM JIT layer.

Here is what I am wondering:

Is there some way we can "hook" the global symbol resolver so that we
can run the polyinstantiator on demand? I *think* this could be done by
running the resolver normally, noticing failure on a symbol that we know
ought to exist, code generating the instance definition corresponding to
the needed symbol, and then re-performing the lookup at global scope.

What I'm not sure about here is several issues:

  1. Does this make sense?

  2. If so, where is the right place to hook the resolver?

  3. Are we going to run into trouble arising from the fact that we
     are likely to end up using the module compiler in a recursive
     way? That is: the referencing module's compile will (in effect) be
     paused in its symbol resolution phase while we introduce a new
     module and compile that. I can imagine implementations where this
     would wreak serious havoc on the implementation of environments
     within the LLVM infrastructure.

  4. Is there a better/cleaner approach? What other options should I
     consider?

We can, thankfully, ensure that this recursive instantiation exercise
terminates. :-)


shap

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Óscar Fuentes
"Jonathan S. Shapiro" <[hidden email]> writes:

[snip]

>   4. Is there a better/cleaner approach? What other options should I
>      consider?

My front-end is very similar to yours in the feature of the multiple
instantiations on demand, etc.

One thing I learnt about LLVM is that it's philosophy is to be a
friendly backend for frontends, but whatever your frontend does
internally is not LLMV's business.

Answuring your question, I'm not aware of any LLVM facility for what you
describe. And if you find one, I would advise against using it.

--
Oscar

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Gordon Henriksen-3
In reply to this post by Jonathan S. Shapiro-2
Hi Jonathan,

In the context of a static compiler, I would recommend that you implement your own “on the side” symbol table in order to track this state and perform on-demand instantiation as required. It is worthwhile to consider the LLVM module to be a passive output sink, not an active object.

The JIT compiler, by contrast, is an active object, cooperating with its environment via the ModuleProvider interface. Overriding materializeFunction should be useful for on-demand instantiation.

— Gordon

On Mar 26, 2008, at 17:07, Jonathan S. Shapiro wrote:
Okay, we're starting to dig in, and I've hit a question that will no doubt seem strange.

Context: BitC is a polymorphic language. Since it has unboxed value types, our approach to compiling a polymorphic function is to polyinstantate it -- once for each signature.

The name mangling scheme is both stable and reversible. At the site of the use occurrence, we can fully determine the mangled name of the desired instantiation. Given the mangled name of an instantiation that has not yet been synthesized, we can determine which procedure must be synthesized.

At present, our front end implements a demand-driven instantiator that proceeds from main() until all calls are statically resolved.

It seems to me, however, that this duplicates (nearly) functionality that is probably already present in the LLVM JIT layer.

Here is what I am wondering:

Is there some way we can "hook" the global symbol resolver so that we can run the polyinstantiator on demand? I *think* this could be done by running the resolver normally, noticing failure on a symbol that we know ought to exist, code generating the instance definition corresponding to the needed symbol, and then re-performing the lookup at global scope.

What I'm not sure about here is several issues:

 1. Does this make sense?

 2. If so, where is the right place to hook the resolver?

 3. Are we going to run into trouble arising from the fact that we
    are likely to end up using the module compiler in a recursive
    way? That is: the referencing module's compile will (in effect) be
    paused in its symbol resolution phase while we introduce a new
    module and compile that. I can imagine implementations where this
    would wreak serious havoc on the implementation of environments
    within the LLVM infrastructure.

 4. Is there a better/cleaner approach? What other options should I
    consider?

We can, thankfully, ensure that this recursive instantiation exercise terminates. :-)


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Jonathan S. Shapiro-2
On Thu, 2008-03-27 at 07:44 -0400, Gordon Henriksen wrote:
> In the context of a static compiler, I would recommend that you
> implement your own “on the side” symbol table in order to track this
> state and perform on-demand instantiation as required. It is
> worthwhile to consider the LLVM module to be a passive output sink,
> not an active object.

I think I understand what you are saying, but let me delve into this a
little further. My main point is that the technique we are after is
generally useful for languages having well-integrated template
mechanisms. The approach you advocate may still be the best approach,
but I'ld like to make sure that we are having the same conversation.


Let me illustrate the problem concretely.

Consider the BitC primitive definition of lists:

  (defunion (list 'a)
    (cons 'a (list 'a))
    nil)

Modulo surface syntax, this looks on first inspection to be exactly like
the corresponding definitions in ML or Haskell. But in ML or Haskell
there is only one run-time "expansion" of this type, because all of the
possible element types occupy exactly one fundamental unit of storage.
This is not true for BitC types. On a 32-bit machine having suitable
alignment, the above definition guarantees that variables and bindings
of type

  (list int64)

are pointers to boxes whose first element is 64 bits (the int64) and
whose second element is 32 bits (the next pointer). That is: the type
must be instantiated with different concrete representations for
different uses. Since we had to do this anyway, we use the same
technique to resolve type classes, thereby eliminating the need for
dictionary pointers. The BitC procedure:

  (define (add-one x) (+ x 1))

has the type:

  (forall (Arith 'a) (fn ('a) 'a))

which (given our instantiation approach) is best imagined to be a
template-like construct -- albeit one that is fully checked for
consistency and whose expansion is known not to fail (barring memory
exhaustion within the compiler).

When a client program uses a dynamic library providing these sorts of
constructs, we have two choices:

  1. Generate them statically at static link time. This works,
     but it is not robust across dynamic library version changes
     (which is an endemic problem in languages that support both
     templates and dynamic linking).

  2. Generate them dynamically at load time (or on first call).
     This is what we want to do.

That is: we want to use a continuous compilation strategy, which is
precisely what LLVM is supposedly attempting to achieve.



If we adopt the approach that you suggest, we will end up implementing
our own "generate on demand" infrastructure that has to operate in
collusion with the dynamic loader. We know how to do that, but it is a
moderately dicey business. Basically we have to run a pre-resolver
before we emit code to an LLVM module, after which LLVM will run a
second resolver. I certainly agree that this will work.

But I didn't have in mind originally to view Modules as active things.
It was not my intention to "extend" a module in mid compile. Rather, it
was my thought that we could provide the unresolved symbol by emitting
and compiling a second module. Where the LLVM infrastructure currently
has something like this:

  if (!(sym = llvm_resolve_global(GlobalScope, symName)))
    some_failure_action();

it would now look something like:

  sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym && frontend_has_symbol_generator
      && frontend_generate_symbol(symname))
    // Note: if frontend_generate_symbol() has succeeded, it will have
    // constructed some LLVM Module and called the LLVM compiler to
    // admit it, with the consequence that GlobalScope will have been
    // updated to contain a binding for the desired symbol.
    sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym)
    some_failure_action();

I don't think that it's any more complicated than that. This is
basically the test that our static polyinstantiator runs right now.

Note: I'm still not claiming that this is a good approach in the context
of LLVM. I don't have my head wrapped around LLVM enough to have an
opinion about that. I simply wanted to make sure that the question was
clearly framed.

I do, provisionally, think that this particular hook is consistent with
the notion of continuous compilation. It seems (to me) necessary (and
perhaps even sufficient) to let the front end participate in the
continuousness of continuous compilation.


shap

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Jonathan S. Shapiro-2
In reply to this post by Óscar Fuentes
On Wed, 2008-03-26 at 23:48 +0100, Óscar Fuentes wrote:
> "Jonathan S. Shapiro" <[hidden email]> writes:
> My front-end is very similar to yours in the feature of the multiple
> instantiations on demand, etc.

Oscar: after you have a chance to read my recent reply to Gordon, would
you be kind enough to let me know whether you still believe the
situations are similar. Also whether you think that what I am looking
for is likely to be generally useful?

> Answuring your question, I'm not aware of any LLVM facility for what you
> describe. And if you find one, I would advise against using it.

I understand the argument for separation of concerns, but your last
sentence concerns me. Why would you recommend against using this kind of
mechanism if it exists?


Jonathan

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Óscar Fuentes
"Jonathan S. Shapiro" <[hidden email]> writes:

> On Wed, 2008-03-26 at 23:48 +0100, Óscar Fuentes wrote:
>> "Jonathan S. Shapiro" <[hidden email]> writes:
>> My front-end is very similar to yours in the feature of the multiple
>> instantiations on demand, etc.
>
> Oscar: after you have a chance to read my recent reply to Gordon, would
> you be kind enough to let me know whether you still believe the
> situations are similar. Also whether you think that what I am looking
> for is likely to be generally useful?

I had no time to read in detail your reply to Gordon (I'll go back to it
later), but the similarities are astonishing. I suspect that it is not
just a coincidence that we use a lispy syntax (defmacro anyone?). I'll
like to learn about your project, so if this is possible, would you like
to provide a pointer to some description? (by private e-mail)

>> Answuring your question, I'm not aware of any LLVM facility for what you
>> describe. And if you find one, I would advise against using it.
>
> I understand the argument for separation of concerns, but your last
> sentence concerns me. Why would you recommend against using this kind of
> mechanism if it exists?

Whatever LLVM has, it is there either for the convenience of frontend ->
backend information flux (note that I exclude the reverse direction) or
for the convenience of LLVM itself. Chris et al. are no shy about
improving the interfaces when they perceive a clear gain, so there are
some chances that whatever you find convenient for your internal stuff
today will be not there tomorrow.

--
Oscar

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Óscar Fuentes
In reply to this post by Jonathan S. Shapiro-2
"Jonathan S. Shapiro" <[hidden email]> writes:

[snip]

>   if (!(sym = llvm_resolve_global(GlobalScope, symName)))
>     some_failure_action();
>
> it would now look something like:
>
>   sym = llvm_resolve_global(GlobalScope, symName);
>   if (!sym && frontend_has_symbol_generator
>       && frontend_generate_symbol(symname))
>     // Note: if frontend_generate_symbol() has succeeded, it will have
>     // constructed some LLVM Module and called the LLVM compiler to
>     // admit it, with the consequence that GlobalScope will have been
>     // updated to contain a binding for the desired symbol.
>     sym = llvm_resolve_global(GlobalScope, symName);
>   if (!sym)
>     some_failure_action();
>
> I don't think that it's any more complicated than that. This is
> basically the test that our static polyinstantiator runs right now.
>
> Note: I'm still not claiming that this is a good approach in the context
> of LLVM. I don't have my head wrapped around LLVM enough to have an
> opinion about that. I simply wanted to make sure that the question was
> clearly framed.
>
> I do, provisionally, think that this particular hook is consistent with
> the notion of continuous compilation. It seems (to me) necessary (and
> perhaps even sufficient) to let the front end participate in the
> continuousness of continuous compilation.

I'm all for hooks and delegation, but the problem here is that your
proposal is not general enough and is hard to generalize it. It does not
work for my project, for instance, although I face almost the same
requirements than you wrt dynamic generation. The symbol name is enough
for you, but not for me, and there is no way to teach LLVM about what
info I need.

This doesn't mean that the LLVM developers shouldn't consider your
proposal on the context of the typical LLVM user. Maybe your case is
common enough.

--
Oscar

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Jonathan S. Shapiro-2
On Thu, 2008-03-27 at 21:22 +0100, Óscar Fuentes wrote:
> I'm all for hooks and delegation, but the problem here is that your
> proposal is not general enough and is hard to generalize it. It does not
> work for my project, for instance, although I face almost the same
> requirements than you wrt dynamic generation. The symbol name is enough
> for you, but not for me, and there is no way to teach LLVM about what
> info I need.

Is this because you have a more complicated scenario, or is it because
your name mangling scheme is not sufficiently well designed?

The evolution of mangling schemes in C++ initially ignored
reversibility. This was a *huge* mistake, and it took years to correct
it. The original problem was linkers that could not handle very long
identifiers. Over time that issue was fixed, and eventually the world
converged on an invertible scheme. Modern compilers almost universally
use an invertible scheme today.

I know nothing at all about your language (though I would like to
correct that deficiency), but I am very confident that if an invertible
mangling scheme is possible for you in principle, the time spent to
develop one early will be repaid many times over later.


shap

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Hooking the global symbol resolver

Óscar Fuentes
"Jonathan S. Shapiro" <[hidden email]> writes:

> On Thu, 2008-03-27 at 21:22 +0100, Óscar Fuentes wrote:
>> I'm all for hooks and delegation, but the problem here is that your
>> proposal is not general enough and is hard to generalize it. It does not
>> work for my project, for instance, although I face almost the same
>> requirements than you wrt dynamic generation. The symbol name is enough
>> for you, but not for me, and there is no way to teach LLVM about what
>> info I need.
>
> Is this because you have a more complicated scenario, or is it because
> your name mangling scheme is not sufficiently well designed?

Both :-) I guess that I could invent a name mangling schema, but it
certainly would be very complex and generate names of several kilobytes
for *each* function invocation. The issue is that the actual function
that is generated depends on the invocation point. For instance:

(foo)
(bar)
;; The code generated for this call to `foo'
;; may differ from the first call:
(foo)

[snip]
> but I am very confident that if an invertible
> mangling scheme is possible for you in principle, the time spent to
> develop one early will be repaid many times over later.

I agree that your idea about the hook can be useful for most typical
front-ends, as my project can be considered "weird".

--
Oscar

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev