Error handling in LLVMObject library

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Error handling in LLVMObject library

Alexey Samsonov-2
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Matthias Braun
I don't much about this area in the code so I can't say much about the current state of things. But in my experience of writing parsers there is a variant 4 that works nicely:

4: Record error and return dummy value:
uint64_t getSymbolAddress(...) {
   if (somethingfailed) {
       errors.push_back(ERRORCODE); // maybe also a message
       return ~0; // just return an invalid/error value.
   }
}

Now you don't have to check for the return value of getSymbolAddress() everywhere but just at the place where you need a legal SymbolAddress value which is typically far less places as most of the data is just stored away in class fields.

All you have to do know is add an
if (!error.empty()) {
   ...;
}
at the highest level parsing function...

- Matthias

On May 29, 2015, at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:

Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

David Majnemer
In reply to this post by Alexey Samsonov-2


On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

I'm afraid approach #3 will make LLVMObject unusable for lld.  Verifying that all relocations are OK is pretty expensive if you don't need them.
 

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Sean Silva-2
In reply to this post by Alexey Samsonov-2


On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Unfortunately this option isn't possible with the current libObject. libObject is specifically designed to avoid paging in or allocating any memory beyond simply mmap'ing the file. This makes it annoying to use, but also allows it to be used for demanding use cases like a linker. For many things, I've always wished we had a "libEasyObject" that avoided this annoyance, but it seems like a lot of work compared to the benefit.

-- Sean Silva
 

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Justin Bogner
In reply to this post by Alexey Samsonov-2
On Friday, May 29, 2015, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

The "error handling boilerplate" here isn't actually all that bad, and ErrorOr can generally just be used like an API that returns a pointer and null on err, but with better errors, ie:

  auto Addr = getSymbolAddress(...)
  if (auto EC = Addr.getError()) {
    // handle the error
  }
  // use *Addr via dereference

This is the same number of lines of code a (1), but harder to misuse. There are situations where it's not quite this nice, like ErrorOr<unique_ptr>, but that's worth solving in its own right if this is an issue.

I haven't looked too much at the code in question, but I'm pretty confident that this is the right way to go about it.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;vonosmas@gmail.com&#39;);" target="_blank">vonosmas@...

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Rafael Espíndola
In reply to this post by Alexey Samsonov-2
On 29 May 2015 at 19:06, Alexey Samsonov <[hidden email]> wrote:

> Hi everyone,
>
> Having proper error handling in LLVM's Object parsing library is a nice
> thing by itself, but it would additionally allow us to find bugs by fuzzing
> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
> input validation is essential.
>
> This is a generic discussion of state of affairs. I want to do some progress
> in fuzzing before we finish it (especially if we decide to make a
> significant intrusive changes), you may scroll down for my plan.
>
> The code in lib/Object calls report_fatal_error() far too often, both when
> we're (a) just constructing the specific implementation of ObjectFile, and
> (b) when we access its contents and find out the file is broken and can't be
> parsed properly.
>
> We should just go and fix (a): ObjectFile factory methods return
> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
> appropriately.
>
> (b) is harder. The current approach is to use std::error_code as a return
> type, and store the result in by-reference argument, for instance:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> I wanted to follow this approach in a proposed large MachO API change
> (http://reviews.llvm.org/D10111), but it raised discussion on whether this
> approach is right.
> Moving this discussion here. I see the following options:
>
> 1. Use the current approach:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> Pros:
>   * don't need to change a large number of (often virtual) API functions
>   * has a nice error handling pattern used in LLVM tools:
>   uint64_t Addr;
>   if (error(getSymbolAddress(Symbol, Addr)))
>     return;  // or continue, or do anything else.
>
> Cons:
>   * return value can just be silently ignored. Adding warn_unused_result
> attribute on per-function basis is ugly
>   * adds extra overhead for cases when we're certain the result would be
> valid.

I agree, this is really bad.

> 2. Switch to ErrorOr wrapper:
>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>
> Pros:
>   * handling the error is now mandatory and explicit.
>   * callers can explicitly skip error handling if they know the result would
> be valid:
>     uint64_t Addr = getSymbolAddress(Symbol).get();
>   and it would fail the assert if they are wrong.
>
> Cons:
>   * need to change lots of old code, or live with two versions of functions
>   * error handling boilerplate in regular code on call site is ugly:
>   auto AddrOrErr = getSymbolAddress(Symbol);
>   if (AddrOrErr.hasError())
>     return;  // or continue, or whatever
>   uint64_t Addr = AddrOrErr.get();
>   (can probably be improved with a macro)
>   * adds extra overhead for cases when we're certain the result would be
> valid.

I think this is a strict improvement, and would not object to doing it
in one big step if you wanted.

One further improvement is that it looks like the API was written
without thinking much about what is an error and what functions can
fail.

For the above function, the ErrorOr style is the best we can do since
to get the address of a symbol on a ELF .o file we have to find the
section and that index can be wrong (not that we actually report that
at the moment :-( ).

But for alignment we can use just use "uint32_t
getSymbolAlignment(DataRefImpl Symb) const" (see r238700).

Going further, it might even be better to replace getSymbolAddress
with a getSymbolValue that cannot fail and have the caller handle the
fact that the value is sometimes a virtual address and sometimes
section relative (it is likely more efficient too).


> On IRC discussion Lang suggested
> 3. Check the whole object file contents in constructor or validate() method,
> and get rid
> of all error codes in regular accessors.
>
> Pros:
>   * the interface is much cleaner
>   * no extra overhead for trusted (e.g. JIT) object files.
>
> Cons:
>   * significant change, fundamentally shifting the way object files are
> handled
>   * verifier function should now absolutely everything about the object
> file, and anticipate all possible use cases. Especially hard, assuming that
> ObjectFile interface allows user to pass any garbage as input arguments
> (e.g. as DataRefImpl in the example above).
>   * verifier can be slow, and might be an overkill if we strongly desire to
> parse some bits of object file lazily.


Please don't do this. At the library level we cannot know what parts
of the file a client would actually use.



> ================
>
> Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal
> incremental changes, that would allow fuzzer to move forward. Namely, I want
> to keep the changes to headers as small as possible, changing functions one
> by one, and preferring to use ErrorOr<> construct (option 2 listed above).
> An additional benefit of this is that each small incremental change would be
> accompanied by the test case generated by fuzzer, that exposed this problem.
>
> Let me know if you think it's a good or terrible idea.

IMHO the most important thing is to make sure that every error path
you add is tested. With tests we can refactor. For example, see the
refactoring in (r238024).

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Alexey Samsonov-2
OK, looks like there's a strong consensus that ErrorOr<> is the way to go. I'd probably
abstain from doing this in one huge change, as it would be too large to digest and review,
and not totally mechanical. I will probably proceed with smaller changes. accompanied with
test cases generated by fuzzer.

On Sun, May 31, 2015 at 5:01 PM, Rafael Espíndola <[hidden email]> wrote:
On 29 May 2015 at 19:06, Alexey Samsonov <[hidden email]> wrote:
> Hi everyone,
>
> Having proper error handling in LLVM's Object parsing library is a nice
> thing by itself, but it would additionally allow us to find bugs by fuzzing
> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
> input validation is essential.
>
> This is a generic discussion of state of affairs. I want to do some progress
> in fuzzing before we finish it (especially if we decide to make a
> significant intrusive changes), you may scroll down for my plan.
>
> The code in lib/Object calls report_fatal_error() far too often, both when
> we're (a) just constructing the specific implementation of ObjectFile, and
> (b) when we access its contents and find out the file is broken and can't be
> parsed properly.
>
> We should just go and fix (a): ObjectFile factory methods return
> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
> appropriately.
>
> (b) is harder. The current approach is to use std::error_code as a return
> type, and store the result in by-reference argument, for instance:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> I wanted to follow this approach in a proposed large MachO API change
> (http://reviews.llvm.org/D10111), but it raised discussion on whether this
> approach is right.
> Moving this discussion here. I see the following options:
>
> 1. Use the current approach:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> Pros:
>   * don't need to change a large number of (often virtual) API functions
>   * has a nice error handling pattern used in LLVM tools:
>   uint64_t Addr;
>   if (error(getSymbolAddress(Symbol, Addr)))
>     return;  // or continue, or do anything else.
>
> Cons:
>   * return value can just be silently ignored. Adding warn_unused_result
> attribute on per-function basis is ugly
>   * adds extra overhead for cases when we're certain the result would be
> valid.

I agree, this is really bad.

> 2. Switch to ErrorOr wrapper:
>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>
> Pros:
>   * handling the error is now mandatory and explicit.
>   * callers can explicitly skip error handling if they know the result would
> be valid:
>     uint64_t Addr = getSymbolAddress(Symbol).get();
>   and it would fail the assert if they are wrong.
>
> Cons:
>   * need to change lots of old code, or live with two versions of functions
>   * error handling boilerplate in regular code on call site is ugly:
>   auto AddrOrErr = getSymbolAddress(Symbol);
>   if (AddrOrErr.hasError())
>     return;  // or continue, or whatever
>   uint64_t Addr = AddrOrErr.get();
>   (can probably be improved with a macro)
>   * adds extra overhead for cases when we're certain the result would be
> valid.

I think this is a strict improvement, and would not object to doing it
in one big step if you wanted.

One further improvement is that it looks like the API was written
without thinking much about what is an error and what functions can
fail.

For the above function, the ErrorOr style is the best we can do since
to get the address of a symbol on a ELF .o file we have to find the
section and that index can be wrong (not that we actually report that
at the moment :-( ).

But for alignment we can use just use "uint32_t
getSymbolAlignment(DataRefImpl Symb) const" (see r238700).

Going further, it might even be better to replace getSymbolAddress
with a getSymbolValue that cannot fail and have the caller handle the
fact that the value is sometimes a virtual address and sometimes
section relative (it is likely more efficient too).


> On IRC discussion Lang suggested
> 3. Check the whole object file contents in constructor or validate() method,
> and get rid
> of all error codes in regular accessors.
>
> Pros:
>   * the interface is much cleaner
>   * no extra overhead for trusted (e.g. JIT) object files.
>
> Cons:
>   * significant change, fundamentally shifting the way object files are
> handled
>   * verifier function should now absolutely everything about the object
> file, and anticipate all possible use cases. Especially hard, assuming that
> ObjectFile interface allows user to pass any garbage as input arguments
> (e.g. as DataRefImpl in the example above).
>   * verifier can be slow, and might be an overkill if we strongly desire to
> parse some bits of object file lazily.


Please don't do this. At the library level we cannot know what parts
of the file a client would actually use.



> ================
>
> Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal
> incremental changes, that would allow fuzzer to move forward. Namely, I want
> to keep the changes to headers as small as possible, changing functions one
> by one, and preferring to use ErrorOr<> construct (option 2 listed above).
> An additional benefit of this is that each small incremental change would be
> accompanied by the test case generated by fuzzer, that exposed this problem.
>
> Let me know if you think it's a good or terrible idea.

IMHO the most important thing is to make sure that every error path
you add is tested. With tests we can refactor. For example, see the
refactoring in (r238024).

This revision uses report_fatal_error() instead of returning the error code. The only benefit of this
is providing a reasonable error message, but this limits the ability to use LLVMObject as a library
for parsing/querying invalid object files.

Can we sacrifice readable error messages if we doesn't want the library to crash? Or we should
add extra object_error enum for every type of error?
 

Cheers,
Rafael



--
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Reid Kleckner-2
In reply to this post by Alexey Samsonov-2
Is it possible to use the error-handling pattern that we use for frontends for libobject? Both clang and the .ll parser do this kind of thing:

Result *ParserClass::doSomething() {
  if (Thingy >= EndOfBuffer) {
    reportError("some message");
    return nullptr;
  }
  return ...;
}
void useit() {
  Result *R = Parser.doSomething();
  if (!R)
    return;
  ...
}

raw_ostream also uses this pattern with error_detected() and clear_error().

I like this pattern because the data describing the error can be as rich as we like (severity level, file location, string message, etc) without figuring out how to represent it inside some overly constrained std::error_code object. We also don't have to sort out things like failed instantations of ErrorOr<std::unique_ptr<...>>.

As long as the returned value has a clearly documented invalid state like nullptr, -1, or ~0ULL, it's not that error prone, since actually using the returned value will typically lead to a crash, assuming that our APIs are doing bounds checking with assertions.

On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

David Blaikie
In reply to this post by Alexey Samsonov-2


On Mon, Jun 1, 2015 at 10:44 AM, Alexey Samsonov <[hidden email]> wrote:
OK, looks like there's a strong consensus that ErrorOr<> is the way to go. I'd probably
abstain from doing this in one huge change, as it would be too large to digest and review,
and not totally mechanical. I will probably proceed with smaller changes. accompanied with
test cases generated by fuzzer.

On Sun, May 31, 2015 at 5:01 PM, Rafael Espíndola <[hidden email]> wrote:
On 29 May 2015 at 19:06, Alexey Samsonov <[hidden email]> wrote:
> Hi everyone,
>
> Having proper error handling in LLVM's Object parsing library is a nice
> thing by itself, but it would additionally allow us to find bugs by fuzzing
> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
> input validation is essential.
>
> This is a generic discussion of state of affairs. I want to do some progress
> in fuzzing before we finish it (especially if we decide to make a
> significant intrusive changes), you may scroll down for my plan.
>
> The code in lib/Object calls report_fatal_error() far too often, both when
> we're (a) just constructing the specific implementation of ObjectFile, and
> (b) when we access its contents and find out the file is broken and can't be
> parsed properly.
>
> We should just go and fix (a): ObjectFile factory methods return
> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error
> appropriately.
>
> (b) is harder. The current approach is to use std::error_code as a return
> type, and store the result in by-reference argument, for instance:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> I wanted to follow this approach in a proposed large MachO API change
> (http://reviews.llvm.org/D10111), but it raised discussion on whether this
> approach is right.
> Moving this discussion here. I see the following options:
>
> 1. Use the current approach:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> Pros:
>   * don't need to change a large number of (often virtual) API functions
>   * has a nice error handling pattern used in LLVM tools:
>   uint64_t Addr;
>   if (error(getSymbolAddress(Symbol, Addr)))
>     return;  // or continue, or do anything else.
>
> Cons:
>   * return value can just be silently ignored. Adding warn_unused_result
> attribute on per-function basis is ugly
>   * adds extra overhead for cases when we're certain the result would be
> valid.

I agree, this is really bad.

> 2. Switch to ErrorOr wrapper:
>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>
> Pros:
>   * handling the error is now mandatory and explicit.
>   * callers can explicitly skip error handling if they know the result would
> be valid:
>     uint64_t Addr = getSymbolAddress(Symbol).get();
>   and it would fail the assert if they are wrong.
>
> Cons:
>   * need to change lots of old code, or live with two versions of functions
>   * error handling boilerplate in regular code on call site is ugly:
>   auto AddrOrErr = getSymbolAddress(Symbol);
>   if (AddrOrErr.hasError())
>     return;  // or continue, or whatever
>   uint64_t Addr = AddrOrErr.get();
>   (can probably be improved with a macro)
>   * adds extra overhead for cases when we're certain the result would be
> valid.

I think this is a strict improvement, and would not object to doing it
in one big step if you wanted.

One further improvement is that it looks like the API was written
without thinking much about what is an error and what functions can
fail.

For the above function, the ErrorOr style is the best we can do since
to get the address of a symbol on a ELF .o file we have to find the
section and that index can be wrong (not that we actually report that
at the moment :-( ).

But for alignment we can use just use "uint32_t
getSymbolAlignment(DataRefImpl Symb) const" (see r238700).

Going further, it might even be better to replace getSymbolAddress
with a getSymbolValue that cannot fail and have the caller handle the
fact that the value is sometimes a virtual address and sometimes
section relative (it is likely more efficient too).


> On IRC discussion Lang suggested
> 3. Check the whole object file contents in constructor or validate() method,
> and get rid
> of all error codes in regular accessors.
>
> Pros:
>   * the interface is much cleaner
>   * no extra overhead for trusted (e.g. JIT) object files.
>
> Cons:
>   * significant change, fundamentally shifting the way object files are
> handled
>   * verifier function should now absolutely everything about the object
> file, and anticipate all possible use cases. Especially hard, assuming that
> ObjectFile interface allows user to pass any garbage as input arguments
> (e.g. as DataRefImpl in the example above).
>   * verifier can be slow, and might be an overkill if we strongly desire to
> parse some bits of object file lazily.


Please don't do this. At the library level we cannot know what parts
of the file a client would actually use.



> ================
>
> Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal
> incremental changes, that would allow fuzzer to move forward. Namely, I want
> to keep the changes to headers as small as possible, changing functions one
> by one, and preferring to use ErrorOr<> construct (option 2 listed above).
> An additional benefit of this is that each small incremental change would be
> accompanied by the test case generated by fuzzer, that exposed this problem.
>
> Let me know if you think it's a good or terrible idea.

IMHO the most important thing is to make sure that every error path
you add is tested. With tests we can refactor. For example, see the
refactoring in (r238024).

This revision uses report_fatal_error() instead of returning the error code. The only benefit of this
is providing a reasonable error message, but this limits the ability to use LLVMObject as a library
for parsing/querying invalid object files.

Can we sacrifice readable error messages if we doesn't want the library to crash? Or we should
add extra object_error enum for every type of error?

It seems reasonable to add an enumerator to the error category enumeration for any specific error path that's useful as a specific diagnostic. At least cases where we have observably different diagnostics already with test cases could be maintained - for other things we could be vague until we have a particular need to be less so... 

(but I don't work with this stuff, so owners/users may have more informed opinions on the matter than I do)

- David
 
 

Cheers,
Rafael



--
Alexey Samsonov
[hidden email]


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Rafael Espíndola
In reply to this post by Reid Kleckner-2
On 1 June 2015 at 13:48, Reid Kleckner <[hidden email]> wrote:

> Is it possible to use the error-handling pattern that we use for frontends
> for libobject? Both clang and the .ll parser do this kind of thing:
>
> Result *ParserClass::doSomething() {
>   if (Thingy >= EndOfBuffer) {
>     reportError("some message");
>     return nullptr;
>   }
>   return ...;
> }
> void useit() {
>   Result *R = Parser.doSomething();
>   if (!R)
>     return;
>   ...
> }
>

I really like any solution that splits the error handling and the
diagnostic handling. The above clang snippet is one way to do it.
Closer by is llvm's DiagnosticHandler. It is used now in the bitcode
reader and has quite a few nice characteristics:

* We only need 2 error codes: InvalidBitcodeSignature, CorruptedBitcode.
* We can provide detailed diagnostics
* It is library friendly but still supports simple users that just
want fatal errors.

The one annoyance is that it is somewhat tied in lib/IR. We need to
refactor parts of it into lib/Support.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Rafael Espíndola
In reply to this post by Alexey Samsonov-2
>> IMHO the most important thing is to make sure that every error path
>> you add is tested. With tests we can refactor. For example, see the
>> refactoring in (r238024).
>
>
> This revision uses report_fatal_error() instead of returning the error code.
> The only benefit of this
> is providing a reasonable error message, but this limits the ability to use
> LLVMObject as a library
> for parsing/querying invalid object files.

It is better than the undefined behavior that preceded the
report_fatal_error, but it can be improved and now with a test to make
sure the new error handling code is working.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Lang Hames
In reply to this post by Sean Silva-2
Out of interest, what parts of an object file commonly don't get mapped in by the linker? I'd have assumed linkers would usually end up touching just about everything.

Even assuming that whole file validation is too coarse for performance reasons, I'd prefer something coarser than what we currently have. For example, right now every access to a relocation has to deal with an error return, but I assume it's rare to parse a single relocation in isolation. Why don't we validate all relocations for each section up front, then have a simpler API for accessing them?

I don't have especially strong feelings about this, just a nagging sense that libObject's error handling is unnecessarily fine grained.

FWIW, of the other two proposals I prefer the ErrorOr approach.

Cheers,
Lang. 

On Fri, May 29, 2015 at 6:45 PM, Sean Silva <[hidden email]> wrote:


On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Unfortunately this option isn't possible with the current libObject. libObject is specifically designed to avoid paging in or allocating any memory beyond simply mmap'ing the file. This makes it annoying to use, but also allows it to be used for demanding use cases like a linker. For many things, I've always wished we had a "libEasyObject" that avoided this annoyance, but it seems like a lot of work compared to the benefit.

-- Sean Silva
 

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Rafael Espíndola

With comdats parts of the file might never be read.

There are also multiple levels of cache and while all the relocations of a file will fit in ram, it is probably still more efficient to validate them as they are read.

But I think (typing on a phone) that relocations are another case where the best is to have a more specific api: the only way the relocation is invalid from the perspective of the api we provide is if the symbol index is invalid. We can just return symbol_end for that and avoid the error_code and the ErrorOr.

Cheers,
Rafael

On Jun 1, 2015 6:19 PM, "Lang Hames" <[hidden email]> wrote:
Out of interest, what parts of an object file commonly don't get mapped in by the linker? I'd have assumed linkers would usually end up touching just about everything.

Even assuming that whole file validation is too coarse for performance reasons, I'd prefer something coarser than what we currently have. For example, right now every access to a relocation has to deal with an error return, but I assume it's rare to parse a single relocation in isolation. Why don't we validate all relocations for each section up front, then have a simpler API for accessing them?

I don't have especially strong feelings about this, just a nagging sense that libObject's error handling is unnecessarily fine grained.

FWIW, of the other two proposals I prefer the ErrorOr approach.

Cheers,
Lang. 

On Fri, May 29, 2015 at 6:45 PM, Sean Silva <[hidden email]> wrote:


On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Unfortunately this option isn't possible with the current libObject. libObject is specifically designed to avoid paging in or allocating any memory beyond simply mmap'ing the file. This makes it annoying to use, but also allows it to be used for demanding use cases like a linker. For many things, I've always wished we had a "libEasyObject" that avoided this annoyance, but it seems like a lot of work compared to the benefit.

-- Sean Silva
 

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Rui Ueyama
There are some occasions in which you read the same data many times from the same object file. In the new COFF linker, we read relocations tables twice  -- one to eliminate unreferenced comdat sections and the other to apply relocations. I'm planning to do comdat merging by contents,  and I'm going to read relocation tables many more times to see if relocations are also the same. After comdat dead-stripping, I'm sure there's no error in the relocation tables.

Am I expected to use libObject like that? Or do you recommend caching results?

On Mon, Jun 1, 2015 at 6:16 PM, Rafael Espíndola <[hidden email]> wrote:

With comdats parts of the file might never be read.

There are also multiple levels of cache and while all the relocations of a file will fit in ram, it is probably still more efficient to validate them as they are read.

But I think (typing on a phone) that relocations are another case where the best is to have a more specific api: the only way the relocation is invalid from the perspective of the api we provide is if the symbol index is invalid. We can just return symbol_end for that and avoid the error_code and the ErrorOr.

Cheers,
Rafael

On Jun 1, 2015 6:19 PM, "Lang Hames" <[hidden email]> wrote:
Out of interest, what parts of an object file commonly don't get mapped in by the linker? I'd have assumed linkers would usually end up touching just about everything.

Even assuming that whole file validation is too coarse for performance reasons, I'd prefer something coarser than what we currently have. For example, right now every access to a relocation has to deal with an error return, but I assume it's rare to parse a single relocation in isolation. Why don't we validate all relocations for each section up front, then have a simpler API for accessing them?

I don't have especially strong feelings about this, just a nagging sense that libObject's error handling is unnecessarily fine grained.

FWIW, of the other two proposals I prefer the ErrorOr approach.

Cheers,
Lang. 

On Fri, May 29, 2015 at 6:45 PM, Sean Silva <[hidden email]> wrote:


On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Unfortunately this option isn't possible with the current libObject. libObject is specifically designed to avoid paging in or allocating any memory beyond simply mmap'ing the file. This makes it annoying to use, but also allows it to be used for demanding use cases like a linker. For many things, I've always wished we had a "libEasyObject" that avoided this annoyance, but it seems like a lot of work compared to the benefit.

-- Sean Silva
 

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Error handling in LLVMObject library

Rafael Espíndola

Just from the description I think you can report an error the first time and assert the following ones.

Cheers,
Rafael

On Jun 1, 2015 9:42 PM, "Rui Ueyama" <[hidden email]> wrote:
There are some occasions in which you read the same data many times from the same object file. In the new COFF linker, we read relocations tables twice  -- one to eliminate unreferenced comdat sections and the other to apply relocations. I'm planning to do comdat merging by contents,  and I'm going to read relocation tables many more times to see if relocations are also the same. After comdat dead-stripping, I'm sure there's no error in the relocation tables.

Am I expected to use libObject like that? Or do you recommend caching results?

On Mon, Jun 1, 2015 at 6:16 PM, Rafael Espíndola <[hidden email]> wrote:

With comdats parts of the file might never be read.

There are also multiple levels of cache and while all the relocations of a file will fit in ram, it is probably still more efficient to validate them as they are read.

But I think (typing on a phone) that relocations are another case where the best is to have a more specific api: the only way the relocation is invalid from the perspective of the api we provide is if the symbol index is invalid. We can just return symbol_end for that and avoid the error_code and the ErrorOr.

Cheers,
Rafael

On Jun 1, 2015 6:19 PM, "Lang Hames" <[hidden email]> wrote:
Out of interest, what parts of an object file commonly don't get mapped in by the linker? I'd have assumed linkers would usually end up touching just about everything.

Even assuming that whole file validation is too coarse for performance reasons, I'd prefer something coarser than what we currently have. For example, right now every access to a relocation has to deal with an error return, but I assume it's rare to parse a single relocation in isolation. Why don't we validate all relocations for each section up front, then have a simpler API for accessing them?

I don't have especially strong feelings about this, just a nagging sense that libObject's error handling is unnecessarily fine grained.

FWIW, of the other two proposals I prefer the ErrorOr approach.

Cheers,
Lang. 

On Fri, May 29, 2015 at 6:45 PM, Sean Silva <[hidden email]> wrote:


On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <[hidden email]> wrote:
Hi everyone,

Having proper error handling in LLVM's Object parsing library is a nice thing by itself, but it would additionally allow us to find bugs by fuzzing (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean input validation is essential.

This is a generic discussion of state of affairs. I want to do some progress in fuzzing before we finish it (especially if we decide to make a significant intrusive changes), you may scroll down for my plan.

The code in lib/Object calls report_fatal_error() far too often, both when we're (a) just constructing the specific implementation of ObjectFile, and (b) when we access its contents and find out the file is broken and can't be parsed properly.

We should just go and fix (a): ObjectFile factory methods return ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate the error appropriately.

(b) is harder. The current approach is to use std::error_code as a return type, and store the result in by-reference argument, for instance:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

I wanted to follow this approach in a proposed large MachO API change
(http://reviews.llvm.org/D10111), but it raised discussion on whether this approach is right.
Moving this discussion here. I see the following options:

1. Use the current approach:
  std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);

Pros:
  * don't need to change a large number of (often virtual) API functions
  * has a nice error handling pattern used in LLVM tools:
  uint64_t Addr;
  if (error(getSymbolAddress(Symbol, Addr)))
    return;  // or continue, or do anything else.

Cons:
  * return value can just be silently ignored. Adding warn_unused_result attribute on per-function basis is ugly
  * adds extra overhead for cases when we're certain the result would be valid.

2. Switch to ErrorOr wrapper:
  ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);

Pros:
  * handling the error is now mandatory and explicit.
  * callers can explicitly skip error handling if they know the result would be valid:
    uint64_t Addr = getSymbolAddress(Symbol).get();
  and it would fail the assert if they are wrong.

Cons:
  * need to change lots of old code, or live with two versions of functions
  * error handling boilerplate in regular code on call site is ugly:
  auto AddrOrErr = getSymbolAddress(Symbol);
  if (AddrOrErr.hasError())
    return;  // or continue, or whatever
  uint64_t Addr = AddrOrErr.get();
  (can probably be improved with a macro)
  * adds extra overhead for cases when we're certain the result would be valid.

On IRC discussion Lang suggested
3. Check the whole object file contents in constructor or validate() method, and get rid
of all error codes in regular accessors.

Unfortunately this option isn't possible with the current libObject. libObject is specifically designed to avoid paging in or allocating any memory beyond simply mmap'ing the file. This makes it annoying to use, but also allows it to be used for demanding use cases like a linker. For many things, I've always wished we had a "libEasyObject" that avoided this annoyance, but it seems like a lot of work compared to the benefit.

-- Sean Silva
 

Pros:
  * the interface is much cleaner
  * no extra overhead for trusted (e.g. JIT) object files.

Cons:
  * significant change, fundamentally shifting the way object files are handled
  * verifier function should now absolutely everything about the object file, and anticipate all possible use cases. Especially hard, assuming that ObjectFile interface allows user to pass any garbage as input arguments (e.g. as DataRefImpl in the example above).
  * verifier can be slow, and might be an overkill if we strongly desire to parse some bits of object file lazily.

================

Instead of http://reviews.llvm.org/D10111, I'm going to proceed with minimal incremental changes, that would allow fuzzer to move forward. Namely, I want to keep the changes to headers as small as possible, changing functions one by one, and preferring to use ErrorOr<> construct (option 2 listed above). An additional benefit of this is that each small incremental change would be accompanied by the test case generated by fuzzer, that exposed this problem.

Let me know if you think it's a good or terrible idea.

-- 
Alexey Samsonov
[hidden email]

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev