With regard to Bartlett-style collectors, they are also used in CMUCL/SBCL, OpenDylan, and various products based on Ravenbrook's Memory Pool System. Also, while Mono doesn't use a Bartlett-style collector, it does support pinning objects referenced from ambiguous stack roots, in an otherwise copying collector. While ITA has expressed concerns with SBCL's GC, they seem to involve the kind of problems copying collectors generally face with huge heaps, not anything particular to conservative stack scanning. In any case, people interested in the trade-offs involved might want to talk to people involved with these projects about their experiences.
Also, it is probably possible to support a more-precise method of stack scanning without additional support from LLVM. I bet the method of this paper: http://www.cs.kent.ac.uk/pubs/2009/3128/content.pdf, could be implemented more directly in LLVM using the exception handling framework and a custom stack walker. The basic idea would be to generate code in a landing pad to save the values (but not the addresses) of live roots to some memory location. Instead of call instructions, the front-end emits invoke instructions that can branch to that landing pad. During GC, a custom stack walker can walk up the stack and call the code in the landing pads to gather roots, without unwinding the stack. This approach allows LLVM to promote roots to SSA values, and encodes the right semantics: it's okay to create derived roots across a call (safepoint) so long as the original value is available for the landing pad. It does not allow the values to be changed during GC, but at least addresses the concern of false retention from ambiguous roots.
Thanks for the shout-out. It is interesting to note that the approach where we used branch-after-call to implement exceptions explicitly and then used them for both GC and the source language exceptions was even better than try/catch, at least with gcc as the backend. This trivially allows both copying for roots and register allocation.
Of course it was still slower than Bartlett.
It should be possible, if somewhat more difficult, to generate landing pads that merge control flow back to the non-exceptional case, after reloading the values of roots from some memory location. This approach expresses the data-flow semantic that the values of roots may be reloaded from memory across a call. The tricky part with this approach is how to express to handle the fact that LLVM's invoke instruction only defines a value along the non-exceptional edge. But I'm sure there is some cleverness that could address that.