[llvm-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
Hi Tim,

> You've certainly argued against them. You haven't provided adequate
> semantics for IR without them though.

I am thinking to use the approach suggested by Eli previously.
i.e. to use a function attribute, lets say "null-pointer-is-valid".

Thanks,
Manoj

On Thu, Apr 26, 2018 at 1:58 PM Tim Northover <[hidden email]>
wrote:

> > In addition, It is already not easy to convince Linux Kernel
maintainers to
> > accept clang specific patches.
> > Worse performance when compared to GCC may make it even harder to push
more
> > patches.
> > (There are already many complains about clang not supporting
optimizations
> > that Linux kernel is used to.
> > As a side note: x86 maintainers deliberately broke clang support in
upstream
> > (https://lkml.org/lkml/2018/4/2/115)

> Yeah, elements of the Linux community seem actively hostile to Clang.
> We shouldn't let their hostility dictate our technical policy.

> > I hope that I have made the case for not using address spaces.

> You've certainly argued against them. You haven't provided adequate
> semantics for IR without them though.

> Cheers.

> Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
Hi,

Turning this on its head. Adding a warning saying something like
"Warning: NULL point check being optimized out at line X"  might be
useful.
The developer might spot that and think "Ah! maybe I put that check in
the wrong place!"

Kind Regards

James



On 18 April 2018 at 19:21, Tim Northover via cfe-dev
<[hidden email]> wrote:

> On 18 April 2018 at 18:13, Manoj Gupta via cfe-dev
> <[hidden email]> wrote:
>> Therefore, I would like to implement support for this flag (maybe with a
>> different name),
>
> I'd suggest -mdo-what-i-mean; the whole idea is horribly
> underspecified, and basically rips up the LangRef in favour of a
> nebulous set of good and bad optimizations (probably as dictated by
> the ones that have bitten someone who wrote bad code recently and was
> grumpy enough about it to complain at us).
>
> In particular I'm skeptical that the address space solution actually
> works. What they actually mean is probably not so much about removing
> the icmps, but about the control-flow decisions based on them. And
> LLVM might not be aware that any given conditional branch comes from a
> tainted comparison.
>
> Cheers.
>
> Tim.
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
<[hidden email]> wrote:
> My understanding is we only want to disable  the optimizations regarding
> undefined behavior
> related to null pointer deference optimizations. And address space checks
> will end up
> disabling more optimizations than needed.

[Repeating what others have already mentioned on this thread]

I this it is better to avoid framing this as "disable optimizations that exploit
UB" and instead frame this as "define some behavior that was undefined before".

> I did look at some of the optimizations/transforms and there are some that
> we definitely want to keep.
>
> Just a quick example from grepping:
> lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> ...........
>              // Don't create memset_pattern16s with address spaces.
>              StorePtr->getType()->getPointerAddressSpace() == 0 &&
>              (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>     // It looks like we can use PatternValue!
>     return LegalStoreKind::MemsetPattern;
>   }
>
> Even worse, Sanitizers do NOT work with address spaces which is a big deal
> breaker IMO.

IMO fixing these seems less engineering overhead in the long term than
introducing
yet another knob to the IR.

More importantly, we should _not_ be doing these optimizations without auditing
them individually.  For instance with -fno-delete-null-pointer checks, it isn't
okay to change a memset loop to a call to llvm.memset unless we've ensured llvm
DTRT for llvm.memset and null checks (i.e. the null check in "llvm.memset(ptr,
...); if (!ptr) {}" does not get optimized away).

> Since address spaces and null pointers are really orthogonal issues, I would
> prefer to
> not conflate them.

I'm not sure I agree with this.  Address spaces are a mechanism to provide
additional semantics to pointers.  In this case the additional property we
want is "address 0 may be dereferenceable", and using address spaces seems
appropriate.

> In addition, It is already not easy to convince Linux Kernel maintainers to
> accept clang specific patches.

Maybe I'm missing something, but I thought in this address space scheme we'd
still provide a -fno-delete-null-ptr-checks flag -- it is clang that would mark
pointers with address space n in this mode.

> From Linux kernel maintainers POV, Clang built kernels are already "getting
> lucky". So I am not too
> worried about missing a few cases.
> I'll be glad to fix the missing cases whenever reported.

It seems to me that adding back missing (but correct) optimizations when
reported is better than removing existing (but incorrect) optimizations when
reported.  If I were a kernel developer (which I am not) I'd rather have a
kernel that boots slower than a security vulnerability.

> I also have some other concerns with address spaces e.g. how to pick a safe
> address space not used by
>  any target e.g. many targets (in and out-of-tree) actively use non-zero
> address spaces.
> User code can also specify any address space via
> __attribute__((address_space(N))) so mixing object
> files will be tricky in such cases.

I think for that we can start a thread on cfe-dev and llvm-dev about reserving
address spaces.  While at it, we should reserve a range of address spaces for
LLVM middle-end use so that we can do more things like
-fno-delete-null-pointer-checks without worrying about address space clashes.

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
> On Apr 28, 2018, at 4:12 PM, Sanjoy Das via cfe-dev <[hidden email]> wrote:
> On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
> <[hidden email]> wrote:
>> My understanding is we only want to disable  the optimizations regarding
>> undefined behavior
>> related to null pointer deference optimizations. And address space checks
>> will end up
>> disabling more optimizations than needed.
>
> [Repeating what others have already mentioned on this thread]
>
> I this it is better to avoid framing this as "disable optimizations that exploit
> UB" and instead frame this as "define some behavior that was undefined before".
>
>> I did look at some of the optimizations/transforms and there are some that
>> we definitely want to keep.
>>
>> Just a quick example from grepping:
>> lib/Transforms/Scalar/LoopIdiomRecognize.cpp
>> ...........
>>             // Don't create memset_pattern16s with address spaces.
>>             StorePtr->getType()->getPointerAddressSpace() == 0 &&
>>             (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>>    // It looks like we can use PatternValue!
>>    return LegalStoreKind::MemsetPattern;
>>  }
>>
>> Even worse, Sanitizers do NOT work with address spaces which is a big deal
>> breaker IMO.
>
> IMO fixing these seems less engineering overhead in the long term than
> introducing
> yet another knob to the IR.
>
> More importantly, we should _not_ be doing these optimizations without auditing
> them individually.  For instance with -fno-delete-null-pointer checks, it isn't
> okay to change a memset loop to a call to llvm.memset unless we've ensured llvm
> DTRT for llvm.memset and null checks (i.e. the null check in "llvm.memset(ptr,
> ...); if (!ptr) {}" does not get optimized away).
>
>> Since address spaces and null pointers are really orthogonal issues, I would
>> prefer to
>> not conflate them.
>
> I'm not sure I agree with this.  Address spaces are a mechanism to provide
> additional semantics to pointers.  In this case the additional property we
> want is "address 0 may be dereferenceable", and using address spaces seems
> appropriate.
>
>> In addition, It is already not easy to convince Linux Kernel maintainers to
>> accept clang specific patches.
>
> Maybe I'm missing something, but I thought in this address space scheme we'd
> still provide a -fno-delete-null-ptr-checks flag -- it is clang that would mark
> pointers with address space n in this mode.
>
>> From Linux kernel maintainers POV, Clang built kernels are already "getting
>> lucky". So I am not too
>> worried about missing a few cases.
>> I'll be glad to fix the missing cases whenever reported.
>
> It seems to me that adding back missing (but correct) optimizations when
> reported is better than removing existing (but incorrect) optimizations when
> reported.  If I were a kernel developer (which I am not) I'd rather have a
> kernel that boots slower than a security vulnerability.
>
>> I also have some other concerns with address spaces e.g. how to pick a safe
>> address space not used by
>> any target e.g. many targets (in and out-of-tree) actively use non-zero
>> address spaces.
>> User code can also specify any address space via
>> __attribute__((address_space(N))) so mixing object
>> files will be tricky in such cases.
>
> I think for that we can start a thread on cfe-dev and llvm-dev about reserving
> address spaces.  While at it, we should reserve a range of address spaces for
> LLVM middle-end use so that we can do more things like
> -fno-delete-null-pointer-checks without worrying about address space clashes.

The LLVM address space design has pushed well beyond the sensible boundaries
of less-is-more and really needs some concerted effort to actually define the expected
properties of different address spaces instead of a dozen different engineers applying
a "don't do this optimization if the pointer is in a non-zero address space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of address-space hack
to support this request.  We'd just need a very simple audit of the places that check
the "are dereferences of the zero address undefined behavior" bit to make sure that
they honor it even in address space 0.  But instead that audit will be confused by a
thousand places that just bail out for non-zero address spaces without further
explanation.

Such a design would hopefully include reserving ranges of address spaces for different
purposes.  Clang is already perfectly capable of remapping address space numbers
from the user-facing address_space attribute when generating IR.

John.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
Personally I don't think adding a new address space to express the idea that loads and stores may successfully dereference address zero is the best design.

The existing code to support the handling of address spaces does not inspire a lot of confidence. I am a simple non-GPU compiler developer. I have no idea how address spaces are supposed to work. When I read code that deals with them, I assume nothing about address spaces. I treat them conservatively. They are a black box, I ignore them.

This may sound bad, but this is normal. We all have different reasons to use and contribute to LLVM, and as much as we benefit by sharing designs and development resources as possible, sometimes we end up creating and playing in our own sandboxes and ignoring possible design space overlap with other users. A small amount of duplication of design and effort is OK if it allows people to make forward progress independently. I think this is an area where we want to do that.

I think we can bear the code complexity cost of having to check for both address spaces and this option, and I think both parties (address space users and -fno-delete-null-checks users) are willing to audit LLVM twice for these transformations.

On Sat, Apr 28, 2018 at 1:12 PM Sanjoy Das via cfe-dev <[hidden email]> wrote:
On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
<[hidden email]> wrote:
> My understanding is we only want to disable  the optimizations regarding
> undefined behavior
> related to null pointer deference optimizations. And address space checks
> will end up
> disabling more optimizations than needed.

[Repeating what others have already mentioned on this thread]

I this it is better to avoid framing this as "disable optimizations that exploit
UB" and instead frame this as "define some behavior that was undefined before".

> I did look at some of the optimizations/transforms and there are some that
> we definitely want to keep.
>
> Just a quick example from grepping:
> lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> ...........
>              // Don't create memset_pattern16s with address spaces.
>              StorePtr->getType()->getPointerAddressSpace() == 0 &&
>              (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>     // It looks like we can use PatternValue!
>     return LegalStoreKind::MemsetPattern;
>   }
>
> Even worse, Sanitizers do NOT work with address spaces which is a big deal
> breaker IMO.

IMO fixing these seems less engineering overhead in the long term than
introducing
yet another knob to the IR.

More importantly, we should _not_ be doing these optimizations without auditing
them individually.  For instance with -fno-delete-null-pointer checks, it isn't
okay to change a memset loop to a call to llvm.memset unless we've ensured llvm
DTRT for llvm.memset and null checks (i.e. the null check in "llvm.memset(ptr,
...); if (!ptr) {}" does not get optimized away).

> Since address spaces and null pointers are really orthogonal issues, I would
> prefer to
> not conflate them.

I'm not sure I agree with this.  Address spaces are a mechanism to provide
additional semantics to pointers.  In this case the additional property we
want is "address 0 may be dereferenceable", and using address spaces seems
appropriate.

> In addition, It is already not easy to convince Linux Kernel maintainers to
> accept clang specific patches.

Maybe I'm missing something, but I thought in this address space scheme we'd
still provide a -fno-delete-null-ptr-checks flag -- it is clang that would mark
pointers with address space n in this mode.

> From Linux kernel maintainers POV, Clang built kernels are already "getting
> lucky". So I am not too
> worried about missing a few cases.
> I'll be glad to fix the missing cases whenever reported.

It seems to me that adding back missing (but correct) optimizations when
reported is better than removing existing (but incorrect) optimizations when
reported.  If I were a kernel developer (which I am not) I'd rather have a
kernel that boots slower than a security vulnerability.

> I also have some other concerns with address spaces e.g. how to pick a safe
> address space not used by
>  any target e.g. many targets (in and out-of-tree) actively use non-zero
> address spaces.
> User code can also specify any address space via
> __attribute__((address_space(N))) so mixing object
> files will be tricky in such cases.

I think for that we can start a thread on cfe-dev and llvm-dev about reserving
address spaces.  While at it, we should reserve a range of address spaces for
LLVM middle-end use so that we can do more things like
-fno-delete-null-pointer-checks without worrying about address space clashes.

-- Sanjoy
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
Thanks everyone for the comments,

I actually have a working patch now for "-fno-delete-null-pointer-checks"
option based on the function attribute.
It is not a *huge* change and I only had to change a few places where IR
decision were based on null pointer references (Though I may have missed
some places).

Will be sending the patches for review after cleanup and testing.

-Manoj
On Mon, Apr 30, 2018 at 11:20 AM Reid Kleckner <[hidden email]> wrote:

> Personally I don't think adding a new address space to express the idea
that loads and stores may successfully dereference address zero is the best
design.

> The existing code to support the handling of address spaces does not
inspire a lot of confidence. I am a simple non-GPU compiler developer. I
have no idea how address spaces are supposed to work. When I read code that
deals with them, I assume nothing about address spaces. I treat them
conservatively. They are a black box, I ignore them.

> This may sound bad, but this is normal. We all have different reasons to
use and contribute to LLVM, and as much as we benefit by sharing designs
and development resources as possible, sometimes we end up creating and
playing in our own sandboxes and ignoring possible design space overlap
with other users. A small amount of duplication of design and effort is OK
if it allows people to make forward progress independently. I think this is
an area where we want to do that.

> I think we can bear the code complexity cost of having to check for both
address spaces and this option, and I think both parties (address space
users and -fno-delete-null-checks users) are willing to audit LLVM twice
for these transformations.

> On Sat, Apr 28, 2018 at 1:12 PM Sanjoy Das via cfe-dev <
[hidden email]> wrote:

>> On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
>> <[hidden email]> wrote:
>> > My understanding is we only want to disable  the optimizations
regarding
>> > undefined behavior
>> > related to null pointer deference optimizations. And address space
checks
>> > will end up
>> > disabling more optimizations than needed.

>> [Repeating what others have already mentioned on this thread]

>> I this it is better to avoid framing this as "disable optimizations that
exploit
>> UB" and instead frame this as "define some behavior that was undefined
before".

>> > I did look at some of the optimizations/transforms and there are some
that

>> > we definitely want to keep.
>> >
>> > Just a quick example from grepping:
>> > lib/Transforms/Scalar/LoopIdiomRecognize.cpp
>> > ...........
>> >              // Don't create memset_pattern16s with address spaces.
>> >              StorePtr->getType()->getPointerAddressSpace() == 0 &&
>> >              (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>> >     // It looks like we can use PatternValue!
>> >     return LegalStoreKind::MemsetPattern;
>> >   }
>> >
>> > Even worse, Sanitizers do NOT work with address spaces which is a big
deal
>> > breaker IMO.

>> IMO fixing these seems less engineering overhead in the long term than
>> introducing
>> yet another knob to the IR.

>> More importantly, we should _not_ be doing these optimizations without
auditing
>> them individually.  For instance with -fno-delete-null-pointer checks,
it isn't
>> okay to change a memset loop to a call to llvm.memset unless we've
ensured llvm
>> DTRT for llvm.memset and null checks (i.e. the null check in
"llvm.memset(ptr,
>> ...); if (!ptr) {}" does not get optimized away).

>> > Since address spaces and null pointers are really orthogonal issues, I
would
>> > prefer to
>> > not conflate them.

>> I'm not sure I agree with this.  Address spaces are a mechanism to
provide
>> additional semantics to pointers.  In this case the additional property
we
>> want is "address 0 may be dereferenceable", and using address spaces
seems
>> appropriate.

>> > In addition, It is already not easy to convince Linux Kernel
maintainers to
>> > accept clang specific patches.

>> Maybe I'm missing something, but I thought in this address space scheme
we'd
>> still provide a -fno-delete-null-ptr-checks flag -- it is clang that
would mark
>> pointers with address space n in this mode.

>> > From Linux kernel maintainers POV, Clang built kernels are already
"getting
>> > lucky". So I am not too
>> > worried about missing a few cases.
>> > I'll be glad to fix the missing cases whenever reported.

>> It seems to me that adding back missing (but correct) optimizations when
>> reported is better than removing existing (but incorrect) optimizations
when
>> reported.  If I were a kernel developer (which I am not) I'd rather have
a
>> kernel that boots slower than a security vulnerability.

>> > I also have some other concerns with address spaces e.g. how to pick a
safe
>> > address space not used by
>> >  any target e.g. many targets (in and out-of-tree) actively use
non-zero
>> > address spaces.
>> > User code can also specify any address space via
>> > __attribute__((address_space(N))) so mixing object
>> > files will be tricky in such cases.

>> I think for that we can start a thread on cfe-dev and llvm-dev about
reserving
>> address spaces.  While at it, we should reserve a range of address
spaces for
>> LLVM middle-end use so that we can do more things like
>> -fno-delete-null-pointer-checks without worrying about address space
clashes.

>> -- Sanjoy
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
[I want to preface this by saying that given I was late to this
discussion, I'd be fine if we proceed with a function attribute based
solution despite my objections]

On Mon, Apr 30, 2018 at 11:19 AM, Reid Kleckner <[hidden email]> wrote:
> Personally I don't think adding a new address space to express the idea that
> loads and stores may successfully dereference address zero is the best
> design.
>
> The existing code to support the handling of address spaces does not inspire
> a lot of confidence. I am a simple non-GPU compiler developer. I have no
> idea how address spaces are supposed to work. When I read code that deals
> with them, I assume nothing about address spaces. I treat them
> conservatively. They are a black box, I ignore them.

As an aside, there are non-GPU users of address spaces too.  For
instance, Azul uses (or at least used to when I was there :) )
represent GC-able pointers using address space 1, and we spent a lot
of time ensuring that all of the normal scalar optimizations kicked in
for address space 1 pointers (though I'm not sure if all of those
changes are upstream yet).

In any case, while I agree with your assessment of the problem, I
don't agree with the proposed solution.  "not everyone (wants to)
understands x" should not be a reason to re-invent a subset of x in
some other form.  Instead we should double down making the existing
thing more clearly and explicitly specified so that understanding it
is easy!

Of course, there are boundaries.  E.g. to me it isn't a big deal if a
backend wants to introduce an intrinsic or an attribute for
convenience despite it overlapping with existing things.  However,
here we're talking injecting new behavior into core LLVM passes so the
scope is larger.

> This may sound bad, but this is normal. We all have different reasons to use
> and contribute to LLVM, and as much as we benefit by sharing designs and
> development resources as possible, sometimes we end up creating and playing
> in our own sandboxes and ignoring possible design space overlap with other
> users. A small amount of duplication of design and effort is OK if it allows
> people to make forward progress independently. I think this is an area where
> we want to do that.

To me, his does not look like a small amount of duplication.  For
instance, we have to answer questions like what happens when we inline
a function marked "null-ptr-is-dereferenceable" into a regular
function (my current understanding is that we can't inline them) and
whether malloc() can return null or not.  I'd rather answer these
kinds of questions in one place if possible.

-- Sanjoy

> I think we can bear the code complexity cost of having to check for both
> address spaces and this option, and I think both parties (address space
> users and -fno-delete-null-checks users) are willing to audit LLVM twice for
> these transformations.
>
> On Sat, Apr 28, 2018 at 1:12 PM Sanjoy Das via cfe-dev
> <[hidden email]> wrote:
>>
>> On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
>> <[hidden email]> wrote:
>> > My understanding is we only want to disable  the optimizations regarding
>> > undefined behavior
>> > related to null pointer deference optimizations. And address space
>> > checks
>> > will end up
>> > disabling more optimizations than needed.
>>
>> [Repeating what others have already mentioned on this thread]
>>
>> I this it is better to avoid framing this as "disable optimizations that
>> exploit
>> UB" and instead frame this as "define some behavior that was undefined
>> before".
>>
>> > I did look at some of the optimizations/transforms and there are some
>> > that
>> > we definitely want to keep.
>> >
>> > Just a quick example from grepping:
>> > lib/Transforms/Scalar/LoopIdiomRecognize.cpp
>> > ...........
>> >              // Don't create memset_pattern16s with address spaces.
>> >              StorePtr->getType()->getPointerAddressSpace() == 0 &&
>> >              (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>> >     // It looks like we can use PatternValue!
>> >     return LegalStoreKind::MemsetPattern;
>> >   }
>> >
>> > Even worse, Sanitizers do NOT work with address spaces which is a big
>> > deal
>> > breaker IMO.
>>
>> IMO fixing these seems less engineering overhead in the long term than
>> introducing
>> yet another knob to the IR.
>>
>> More importantly, we should _not_ be doing these optimizations without
>> auditing
>> them individually.  For instance with -fno-delete-null-pointer checks, it
>> isn't
>> okay to change a memset loop to a call to llvm.memset unless we've ensured
>> llvm
>> DTRT for llvm.memset and null checks (i.e. the null check in
>> "llvm.memset(ptr,
>> ...); if (!ptr) {}" does not get optimized away).
>>
>> > Since address spaces and null pointers are really orthogonal issues, I
>> > would
>> > prefer to
>> > not conflate them.
>>
>> I'm not sure I agree with this.  Address spaces are a mechanism to provide
>> additional semantics to pointers.  In this case the additional property we
>> want is "address 0 may be dereferenceable", and using address spaces seems
>> appropriate.
>>
>> > In addition, It is already not easy to convince Linux Kernel maintainers
>> > to
>> > accept clang specific patches.
>>
>> Maybe I'm missing something, but I thought in this address space scheme
>> we'd
>> still provide a -fno-delete-null-ptr-checks flag -- it is clang that would
>> mark
>> pointers with address space n in this mode.
>>
>> > From Linux kernel maintainers POV, Clang built kernels are already
>> > "getting
>> > lucky". So I am not too
>> > worried about missing a few cases.
>> > I'll be glad to fix the missing cases whenever reported.
>>
>> It seems to me that adding back missing (but correct) optimizations when
>> reported is better than removing existing (but incorrect) optimizations
>> when
>> reported.  If I were a kernel developer (which I am not) I'd rather have a
>> kernel that boots slower than a security vulnerability.
>>
>> > I also have some other concerns with address spaces e.g. how to pick a
>> > safe
>> > address space not used by
>> >  any target e.g. many targets (in and out-of-tree) actively use non-zero
>> > address spaces.
>> > User code can also specify any address space via
>> > __attribute__((address_space(N))) so mixing object
>> > files will be tricky in such cases.
>>
>> I think for that we can start a thread on cfe-dev and llvm-dev about
>> reserving
>> address spaces.  While at it, we should reserve a range of address spaces
>> for
>> LLVM middle-end use so that we can do more things like
>> -fno-delete-null-pointer-checks without worrying about address space
>> clashes.
>>
>> -- Sanjoy
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On Mon, Apr 30, 2018 at 11:14 AM, John McCall <[hidden email]> wrote:

> The LLVM address space design has pushed well beyond the sensible boundaries
> of less-is-more and really needs some concerted effort to actually define the expected
> properties of different address spaces instead of a dozen different engineers applying
> a "don't do this optimization if the pointer is in a non-zero address space" rule to the
> optimizer with a shotgun.
>
> In fact, if we'd already done that, we wouldn't need any sort of address-space hack
> to support this request.  We'd just need a very simple audit of the places that check
> the "are dereferences of the zero address undefined behavior" bit to make sure that
> they honor it even in address space 0.  But instead that audit will be confused by a
> thousand places that just bail out for non-zero address spaces without further
> explanation.

I agree.  The pattern of bailing out if AddrSpace != 0 is unfortunate.

We also need to cap the amount of extra semantics that can be put on address
spaces.  For instance, we should probably never support trapping semantics on
loads/stores, even via address spaces.

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
> On Apr 30, 2018, at 2:58 PM, Sanjoy Das <[hidden email]> wrote:
> On Mon, Apr 30, 2018 at 11:14 AM, John McCall <[hidden email]> wrote:
>> The LLVM address space design has pushed well beyond the sensible boundaries
>> of less-is-more and really needs some concerted effort to actually define the expected
>> properties of different address spaces instead of a dozen different engineers applying
>> a "don't do this optimization if the pointer is in a non-zero address space" rule to the
>> optimizer with a shotgun.
>>
>> In fact, if we'd already done that, we wouldn't need any sort of address-space hack
>> to support this request.  We'd just need a very simple audit of the places that check
>> the "are dereferences of the zero address undefined behavior" bit to make sure that
>> they honor it even in address space 0.  But instead that audit will be confused by a
>> thousand places that just bail out for non-zero address spaces without further
>> explanation.
>
> I agree.  The pattern of bailing out if AddrSpace != 0 is unfortunate.
>
> We also need to cap the amount of extra semantics that can be put on address
> spaces.  For instance, we should probably never support trapping semantics on
> loads/stores, even via address spaces.

I would say instead that address spaces are not the right way to support trapping
semantics on loads/stores.

John.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On 30 April 2018 at 11:14, John McCall via llvm-dev <[hidden email]> wrote:
> On Apr 28, 2018, at 4:12 PM, Sanjoy Das via cfe-dev <[hidden email]> wrote:
> On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
> <[hidden email]> wrote:
>> My understanding is we only want to disable  the optimizations regarding
>> undefined behavior
>> related to null pointer deference optimizations. And address space checks
>> will end up
>> disabling more optimizations than needed.
>
> [Repeating what others have already mentioned on this thread]
>
> I this it is better to avoid framing this as "disable optimizations that exploit
> UB" and instead frame this as "define some behavior that was undefined before".
>
>> I did look at some of the optimizations/transforms and there are some that
>> we definitely want to keep.
>>
>> Just a quick example from grepping:
>> lib/Transforms/Scalar/LoopIdiomRecognize.cpp
>> ...........
>>             // Don't create memset_pattern16s with address spaces.
>>             StorePtr->getType()->getPointerAddressSpace() == 0 &&
>>             (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>>    // It looks like we can use PatternValue!
>>    return LegalStoreKind::MemsetPattern;
>>  }
>>
>> Even worse, Sanitizers do NOT work with address spaces which is a big deal
>> breaker IMO.
>
> IMO fixing these seems less engineering overhead in the long term than
> introducing
> yet another knob to the IR.
>
> More importantly, we should _not_ be doing these optimizations without auditing
> them individually.  For instance with -fno-delete-null-pointer checks, it isn't
> okay to change a memset loop to a call to llvm.memset unless we've ensured llvm
> DTRT for llvm.memset and null checks (i.e. the null check in "llvm.memset(ptr,
> ...); if (!ptr) {}" does not get optimized away).
>
>> Since address spaces and null pointers are really orthogonal issues, I would
>> prefer to
>> not conflate them.
>
> I'm not sure I agree with this.  Address spaces are a mechanism to provide
> additional semantics to pointers.  In this case the additional property we
> want is "address 0 may be dereferenceable", and using address spaces seems
> appropriate.
>
>> In addition, It is already not easy to convince Linux Kernel maintainers to
>> accept clang specific patches.
>
> Maybe I'm missing something, but I thought in this address space scheme we'd
> still provide a -fno-delete-null-ptr-checks flag -- it is clang that would mark
> pointers with address space n in this mode.
>
>> From Linux kernel maintainers POV, Clang built kernels are already "getting
>> lucky". So I am not too
>> worried about missing a few cases.
>> I'll be glad to fix the missing cases whenever reported.
>
> It seems to me that adding back missing (but correct) optimizations when
> reported is better than removing existing (but incorrect) optimizations when
> reported.  If I were a kernel developer (which I am not) I'd rather have a
> kernel that boots slower than a security vulnerability.
>
>> I also have some other concerns with address spaces e.g. how to pick a safe
>> address space not used by
>> any target e.g. many targets (in and out-of-tree) actively use non-zero
>> address spaces.
>> User code can also specify any address space via
>> __attribute__((address_space(N))) so mixing object
>> files will be tricky in such cases.
>
> I think for that we can start a thread on cfe-dev and llvm-dev about reserving
> address spaces.  While at it, we should reserve a range of address spaces for
> LLVM middle-end use so that we can do more things like
> -fno-delete-null-pointer-checks without worrying about address space clashes.

The LLVM address space design has pushed well beyond the sensible boundaries
of less-is-more and really needs some concerted effort to actually define the expected
properties of different address spaces instead of a dozen different engineers applying
a "don't do this optimization if the pointer is in a non-zero address space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of address-space hack
to support this request.  We'd just need a very simple audit of the places that check
the "are dereferences of the zero address undefined behavior" bit to make sure that
they honor it even in address space 0.  But instead that audit will be confused by a
thousand places that just bail out for non-zero address spaces without further
explanation.

Such a design would hopefully include reserving ranges of address spaces for different
purposes.  Clang is already perfectly capable of remapping address space numbers
from the user-facing address_space attribute when generating IR.

From the peanut gallery: has any thought been given to moving away from hardcoded address space numbers entirely? Abstractly it seems like an extra layer of indirection here could help (eg, we could allow IR to define address spaces that can be part of pointer types, and those could nominate which target-specific pointer representation and interpretation they use -- perhaps by name instead of by number -- as well as other properties such as whether zero is a valid address).

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev


On Apr 30, 2018, at 15:05, John McCall via llvm-dev <[hidden email]> wrote:

On Apr 30, 2018, at 2:58 PM, Sanjoy Das <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 11:14 AM, John McCall <[hidden email]> wrote:
The LLVM address space design has pushed well beyond the sensible boundaries
of less-is-more and really needs some concerted effort to actually define the expected
properties of different address spaces instead of a dozen different engineers applying
a "don't do this optimization if the pointer is in a non-zero address space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of address-space hack
to support this request.  We'd just need a very simple audit of the places that check
the "are dereferences of the zero address undefined behavior" bit to make sure that
they honor it even in address space 0.  But instead that audit will be confused by a
thousand places that just bail out for non-zero address spaces without further
explanation.

I agree.  The pattern of bailing out if AddrSpace != 0 is unfortunate.

We also need to cap the amount of extra semantics that can be put on address
spaces.  For instance, we should probably never support trapping semantics on
loads/stores, even via address spaces.

I would say instead that address spaces are not the right way to support trapping
semantics on loads/stores.

Hi John,

I might be misunderstanding the thread here, but are there architectures other than Intel that support alternative address spaces? I’m asking because x86_64 dropped support for having the code, data, stack, and “extra” segments be different from each other; and the only two remaining segment registers, “FS” and “GS”, are only used in practice to alias the current address space. In fact, *user-space* instructions were later added to read/write the FS/GS segment bases, thus embracing the fact that these segment registers are used in practice to alias the current address space.[1]

I don’t think LLVM needs to model FS/GS as anything other than aliases into the existing address space.

Dave

[1] – Note, these new user-space instructions require permission from the kernel to execute, and popular kernels haven’t enabled them. Last I knew, the Linux folks seem receptive to the idea of enabling these instructions, but the conversation keeps stalling on implementation details.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
On 30 April 2018 at 13:26, David Zarzycki via llvm-dev <[hidden email]> wrote:


On Apr 30, 2018, at 15:05, John McCall via llvm-dev <[hidden email]> wrote:

On Apr 30, 2018, at 2:58 PM, Sanjoy Das <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 11:14 AM, John McCall <[hidden email]> wrote:
The LLVM address space design has pushed well beyond the sensible boundaries
of less-is-more and really needs some concerted effort to actually define the expected
properties of different address spaces instead of a dozen different engineers applying
a "don't do this optimization if the pointer is in a non-zero address space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of address-space hack
to support this request.  We'd just need a very simple audit of the places that check
the "are dereferences of the zero address undefined behavior" bit to make sure that
they honor it even in address space 0.  But instead that audit will be confused by a
thousand places that just bail out for non-zero address spaces without further
explanation.

I agree.  The pattern of bailing out if AddrSpace != 0 is unfortunate.

We also need to cap the amount of extra semantics that can be put on address
spaces.  For instance, we should probably never support trapping semantics on
loads/stores, even via address spaces.

I would say instead that address spaces are not the right way to support trapping
semantics on loads/stores.

Hi John,

I might be misunderstanding the thread here, but are there architectures other than Intel that support alternative address spaces?

Yes, they're also used by GPU targets IIUC.
 
I’m asking because x86_64 dropped support for having the code, data, stack, and “extra” segments be different from each other; and the only two remaining segment registers, “FS” and “GS”, are only used in practice to alias the current address space. In fact, *user-space* instructions were later added to read/write the FS/GS segment bases, thus embracing the fact that these segment registers are used in practice to alias the current address space.[1]

I don’t think LLVM needs to model FS/GS as anything other than aliases into the existing address space.

Dave

[1] – Note, these new user-space instructions require permission from the kernel to execute, and popular kernels haven’t enabled them. Last I knew, the Linux folks seem receptive to the idea of enabling these instructions, but the conversation keeps stalling on implementation details.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On Apr 30, 2018, at 4:26 PM, David Zarzycki <[hidden email]> wrote:
On Apr 30, 2018, at 15:05, John McCall via llvm-dev <[hidden email]> wrote:
On Apr 30, 2018, at 2:58 PM, Sanjoy Das <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 11:14 AM, John McCall <[hidden email]> wrote:
The LLVM address space design has pushed well beyond the sensible boundaries
of less-is-more and really needs some concerted effort to actually define the expected
properties of different address spaces instead of a dozen different engineers applying
a "don't do this optimization if the pointer is in a non-zero address space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of address-space hack
to support this request.  We'd just need a very simple audit of the places that check
the "are dereferences of the zero address undefined behavior" bit to make sure that
they honor it even in address space 0.  But instead that audit will be confused by a
thousand places that just bail out for non-zero address spaces without further
explanation.

I agree.  The pattern of bailing out if AddrSpace != 0 is unfortunate.

We also need to cap the amount of extra semantics that can be put on address
spaces.  For instance, we should probably never support trapping semantics on
loads/stores, even via address spaces.

I would say instead that address spaces are not the right way to support trapping
semantics on loads/stores.

Hi John,

I might be misunderstanding the thread here, but are there architectures other than Intel that support alternative address spaces?

Yes.  They're commonplace in GPUs and also used in some distributed system architectures.  Also, any x32-like ABI can support a short/long pointer distinction.

John.

I’m asking because x86_64 dropped support for having the code, data, stack, and “extra” segments be different from each other; and the only two remaining segment registers, “FS” and “GS”, are only used in practice to alias the current address space. In fact, *user-space* instructions were later added to read/write the FS/GS segment bases, thus embracing the fact that these segment registers are used in practice to alias the current address space.[1]

I don’t think LLVM needs to model FS/GS as anything other than aliases into the existing address space.

Dave

[1] – Note, these new user-space instructions require permission from the kernel to execute, and popular kernels haven’t enabled them. Last I knew, the Linux folks seem receptive to the idea of enabling these instructions, but the conversation keeps stalling on implementation details.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On Apr 30, 2018, at 4:24 PM, Richard Smith <[hidden email]> wrote:
On 30 April 2018 at 11:14, John McCall via llvm-dev <[hidden email]> wrote:
> On Apr 28, 2018, at 4:12 PM, Sanjoy Das via cfe-dev <[hidden email]> wrote:
> On Thu, Apr 19, 2018 at 3:56 PM, Manoj Gupta via llvm-dev
> <[hidden email]> wrote:
>> My understanding is we only want to disable  the optimizations regarding
>> undefined behavior
>> related to null pointer deference optimizations. And address space checks
>> will end up
>> disabling more optimizations than needed.
>
> [Repeating what others have already mentioned on this thread]
>
> I this it is better to avoid framing this as "disable optimizations that exploit
> UB" and instead frame this as "define some behavior that was undefined before".
>
>> I did look at some of the optimizations/transforms and there are some that
>> we definitely want to keep.
>>
>> Just a quick example from grepping:
>> lib/Transforms/Scalar/LoopIdiomRecognize.cpp
>> ...........
>>             // Don't create memset_pattern16s with address spaces.
>>             StorePtr->getType()->getPointerAddressSpace() == 0 &&
>>             (PatternValue = getMemSetPatternValue(StoredVal, DL))) {
>>    // It looks like we can use PatternValue!
>>    return LegalStoreKind::MemsetPattern;
>>  }
>>
>> Even worse, Sanitizers do NOT work with address spaces which is a big deal
>> breaker IMO.
>
> IMO fixing these seems less engineering overhead in the long term than
> introducing
> yet another knob to the IR.
>
> More importantly, we should _not_ be doing these optimizations without auditing
> them individually.  For instance with -fno-delete-null-pointer checks, it isn't
> okay to change a memset loop to a call to llvm.memset unless we've ensured llvm
> DTRT for llvm.memset and null checks (i.e. the null check in "llvm.memset(ptr,
> ...); if (!ptr) {}" does not get optimized away).
>
>> Since address spaces and null pointers are really orthogonal issues, I would
>> prefer to
>> not conflate them.
>
> I'm not sure I agree with this.  Address spaces are a mechanism to provide
> additional semantics to pointers.  In this case the additional property we
> want is "address 0 may be dereferenceable", and using address spaces seems
> appropriate.
>
>> In addition, It is already not easy to convince Linux Kernel maintainers to
>> accept clang specific patches.
>
> Maybe I'm missing something, but I thought in this address space scheme we'd
> still provide a -fno-delete-null-ptr-checks flag -- it is clang that would mark
> pointers with address space n in this mode.
>
>> From Linux kernel maintainers POV, Clang built kernels are already "getting
>> lucky". So I am not too
>> worried about missing a few cases.
>> I'll be glad to fix the missing cases whenever reported.
>
> It seems to me that adding back missing (but correct) optimizations when
> reported is better than removing existing (but incorrect) optimizations when
> reported.  If I were a kernel developer (which I am not) I'd rather have a
> kernel that boots slower than a security vulnerability.
>
>> I also have some other concerns with address spaces e.g. how to pick a safe
>> address space not used by
>> any target e.g. many targets (in and out-of-tree) actively use non-zero
>> address spaces.
>> User code can also specify any address space via
>> __attribute__((address_space(N))) so mixing object
>> files will be tricky in such cases.
>
> I think for that we can start a thread on cfe-dev and llvm-dev about reserving
> address spaces.  While at it, we should reserve a range of address spaces for
> LLVM middle-end use so that we can do more things like
> -fno-delete-null-pointer-checks without worrying about address space clashes.

The LLVM address space design has pushed well beyond the sensible boundaries
of less-is-more and really needs some concerted effort to actually define the expected
properties of different address spaces instead of a dozen different engineers applying
a "don't do this optimization if the pointer is in a non-zero address space" rule to the
optimizer with a shotgun.

In fact, if we'd already done that, we wouldn't need any sort of address-space hack
to support this request.  We'd just need a very simple audit of the places that check
the "are dereferences of the zero address undefined behavior" bit to make sure that
they honor it even in address space 0.  But instead that audit will be confused by a
thousand places that just bail out for non-zero address spaces without further
explanation.

Such a design would hopefully include reserving ranges of address spaces for different
purposes.  Clang is already perfectly capable of remapping address space numbers
from the user-facing address_space attribute when generating IR.

From the peanut gallery: has any thought been given to moving away from hardcoded address space numbers entirely? Abstractly it seems like an extra layer of indirection here could help (eg, we could allow IR to define address spaces that can be part of pointer types, and those could nominate which target-specific pointer representation and interpretation they use -- perhaps by name instead of by number -- as well as other properties such as whether zero is a valid address).

I think that would be great design, yeah.  Even if we stuck with identifying them by numbers, I would hope that we would allow them to be described at the top-level this way, and that backward compatibility would be handled by just filling in maximally conservative information.

John.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On Mon, Apr 30, 2018 at 04:26:47PM -0400, David Zarzycki via llvm-dev wrote:
> I might be misunderstanding the thread here, but are there architectures
> other than Intel that support alternative address spaces?

Yes, efficient support for separate kernel and user VA used to be quite
common and effectively means two separate address spaces.

Joerg
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On 30 Apr 2018, at 21:26, David Zarzycki via llvm-dev <[hidden email]> wrote:
>
> I might be misunderstanding the thread here, but are there architectures other than Intel that support alternative address spaces? I’m asking because x86_64 dropped support for having the code, data, stack, and “extra” segments be different from each other; and the only two remaining segment registers, “FS” and “GS”, are only used in practice to alias the current address space. In fact, *user-space* instructions were later added to read/write the FS/GS segment bases, thus embracing the fact that these segment registers are used in practice to alias the current address space.[1]

I’m not 100% sure if you’re asking whether processors support different address space, or whether you’re asking whether targets use LLVM’s notion of an address space, so I’ll try to answer both.

To the first interpretation of your question:

Others have pointed out that GPUs have different memory regions (shared mutable, shared immutable, local, and so on).  Any processor with an MMU supports some notion of address spaces, the simplest of which involves multiple completely distinct address spaces.  This is somewhat complicated by shared memory.  In the C abstract machine, there is no difference between pointers to shared and unshared memory, which is unfortunate as the safety of storing pointers in such regions can vary.  In OpenCL, the host can map regions into which it is safe to store pointers that are valid on both the host and device, which a more sane language than C would regard as a separate address space.

In terms of out-of-tree architectures, a large number of embedded processors have different regions for (for example), stack, code ROM, data ROM, and heap.  Some have different overlapping shared regions.  The architecture that I’ve worked on for the last 6 years, CHERI, provides a flexible notion of address spaces allowing a model like segmentation at the coarse granularity for sandboxing legacy code (with 64-bit integers as pointers), or fine-grained memory safety by representing every pointer as a 128-bit hardware-enforced type that encodes bounds and permissions.

To the second interpretation of your question:

GPUs use different address spaces for their different memory types, as do out-of-tree embedded targets.  Azul and the (apparently now dead) CLR back end using LLVM used AS1 to indicate that a pointer was to GC’d memory.  We use AS200 to indicate a 128-bit fat pointer and AS0 to indicate a 64-bit pointer (which is implicitly relative to a default 128-bit pointer in a special register).

It’s worth noting that LLVM’s notion of an address space is a property of the pointer, whereas embedded C regards it as a property of the underlying memory.  This means that it is always syntactically valid to cast between address spaces in LLVM IR, though the result may be a non-dereferencable pointer.  This is somewhat problematic for optimisers, because this information is not well expressed (for us, for example, casting from AS0 to AS200 always results in a pointer that is valid if the original is valid, but casting from AS200 to AS0 may be null.  We’ve had to do a lot of cleanup on optimisers to prevent them from generating broken code as a result).

The current model of an AS conflates two notions: a different region of memory (potentially with different properties) and a different kind of pointer (potentially with different properties).  It would be nice to decouple these and provide a mechanism similar to function attributes that would allow properties on pointers to be expressed, in an orthogonal manner to address spaces.  This would require moving some information (such as pointer size) into the attributes, but would probably be a long-term cleaner approach.

This would probably be easier after the typeless pointer work is completed, so that pointers are all of type PTR but with attributes indicating their other properties.  The AMD GPU, for example, could benefit from having an attribute indicating that -1, rather than 0, is the ‘invalid pointer’ value for some kinds.  Other useful information would include aliasing scopes (which could be updated on inlining), values that are guaranteed not to be dereferenced, whether out-of-bounds values are representable, and so on.

David

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
Fair warning, the following is a devil's advocate position, but it's
also a serious question.

Given the entire point of this flag appears to be bug mitigation, why
not frame this as a sanitizer?  If we had a hypothetical
-fsanitize=dereference which tried to catch dereferenced pointers
derived from null, wouldn't that handle the case at hand?

i.e. Why are we focused on *hiding* the bug instead of *finding and
fixing* the bug?

Philip

p.s. The above is written with the understanding that the code compiled
with this flag doesn’t actually place valid objects at null.  I know
that is not true for some embedded systems, but that seems distinct from
this proposal.

On 04/18/2018 10:13 AM, Manoj Gupta via llvm-dev wrote:

> Hi,
>
> This is regarding support for -fno-delete-null-pointer-checks in clang (PR
> 9251).
> Linux kernel uses this flag to keep null pointer checks from getting
> optimized away.  Since clang does not currently support this flag, it often
> invites comments from kernel devs that clang is not yet *ready* to compile
> Linux kernel.
> I have also heard that developers working on firmware, bare-metal tools and
> other low level tools often want to use this flag as well.
>
> Therefore, I would like to implement support for this flag (maybe with a
> different name), and would like to know the opinions on how it should be
> implemented.
>
> Some options/opinions:
>
> 1 (From Duncan Sands comment at PR 9251)
>> This could be implemented by putting pointers in a non-default address
> space when this flag is passed.
> Any ideas how will this work for languages/targets that actively make use
> of non-default address spaces e.g. OpenCL/GPU targets.  Also, I believe
> that allocas still need to kept in default address space so they need to be
> bitcast to the *no-default* address space before use.
>
> 2. Use this flag like any other optimization flag. Find and fix all
> optimizations in clang/llvm related to null pointer accesses.
>
> Thanks,
> Manoj
>
> Background:
> https://lkml.org/lkml/2018/4/4/601
>
>   From Linus Torvalds:
>
> Note that we explicitly use "-fno-delete-null-pointer-checks" because
> we do *not* want NULL pointer check removal even in the case of a bug.
>
> See commit a3ca86aea507 ("Add '-fno-delete-null-pointer-checks' to gcc
> CFLAGS") for the reason: we had buggy code that accessed a pointer
> before the NULL pointer check, but the bug was "benign" as long as the
> compiler didn't actually remove the check.
>
> And note how the buggy code *looked* safe. It was doing the right
> check, it was just that the check was hidden and disabled by another
> bug.
>
> Removing the NULL pointer check turned a benign bug into a trivially
> exploitable one by just mapping user space data at NULL (which avoided
> the kernel oops, and then made the kernel use the user value!).
>
> Now, admittedly we have a ton of other security features in place to
> avoid these kinds of issues - SMAP helps on the hardware side, and not
> allowing users to mmap() NULL in the first place helps with most
> distributions, but the point remains: the kernel generally really
> doesn't want optimizations that are perhaps allowed by the standard,
> but that result in code generation that doesn't match the source code.
>
> The NULL pointer removal is one such thing: don't remove checks in the
> kernel based on "standard says". It's ok to do optimizations that are
> based on "hardware does the exact same thing", but not on "the
> standard says this is undefined so we can remove it".
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
On 5/12/2018 9:23 PM, Philip Reames via llvm-dev wrote:
> Fair warning, the following is a devil's advocate position, but it's
> also a serious question.
>
> Given the entire point of this flag appears to be bug mitigation, why
> not frame this as a sanitizer?  If we had a hypothetical
> -fsanitize=dereference which tried to catch dereferenced pointers
> derived from null, wouldn't that handle the case at hand?

It's called "-fsanitize=null": it catches stuff like "x[3]" where x is
null. It's not quite complete; we don't check for arithmetic on a null
pointer.

Yes, that would handle the situation in question, but putting implicit
null checks all over the place is pretty expensive; I don't think most
people would turn that on in production.

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev


On Mon, May 14, 2018 at 12:07 PM Friedman, Eli <[hidden email]> wrote:
On 5/12/2018 9:23 PM, Philip Reames via llvm-dev wrote:
> Fair warning, the following is a devil's advocate position, but it's
> also a serious question.
>
> Given the entire point of this flag appears to be bug mitigation, why
> not frame this as a sanitizer?  If we had a hypothetical
> -fsanitize=dereference which tried to catch dereferenced pointers
> derived from null, wouldn't that handle the case at hand?

It's called "-fsanitize=null": it catches stuff like "x[3]" where x is
null. It's not quite complete; we don't check for arithmetic on a null
pointer.

Yes, that would handle the situation in question, but putting implicit
null checks all over the place is pretty expensive; I don't think most
people would turn that on in production.

-Eli

We had a similar discussion on an internal thread a while back if we can use "-fsanitize=null" where clang
would generate ud2 instruction for null pointer dereferences. Unfortunately, this doesn't work in kernel context.

Quoting the reply from our kernel team:

"It will not cause a kernel panic: it's an exception trigger, and it's
up to the exception handler to decide if it will return (WARN) or not
(BUG). In the referenced function, this is calling WARN_ON() which
will resume execution. (And note that the BUG() implementations are
specifically marked with __attribute__((noreturn)). "
 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Implementing -fno-delete-null-pointer-checks in clang

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev


On 05/14/2018 12:07 PM, Friedman, Eli wrote:

> On 5/12/2018 9:23 PM, Philip Reames via llvm-dev wrote:
>> Fair warning, the following is a devil's advocate position, but it's
>> also a serious question.
>>
>> Given the entire point of this flag appears to be bug mitigation, why
>> not frame this as a sanitizer?  If we had a hypothetical
>> -fsanitize=dereference which tried to catch dereferenced pointers
>> derived from null, wouldn't that handle the case at hand?
>
> It's called "-fsanitize=null": it catches stuff like "x[3]" where x is
> null. It's not quite complete; we don't check for arithmetic on a null
> pointer.
>
> Yes, that would handle the situation in question, but putting implicit
> null checks all over the place is pretty expensive; I don't think most
> people would turn that on in production.
(again, knowingly playing devil's advocate)

But wouldn't it catch - during development, before release - the bug
which motivates the desire for the mitigation?  If so, why worry about
the mitigation?  Is the concern insufficient testing/coverage?

And alternatively, the performance overhead could be likely to be
improved if someone wanted to spend some time doing so.   Between fully
exploiting implicit null checks (support in tree today) and imprecise
fault locations(*), I suspect the overhead could be made quite small.

* Terminology: "implicit null check" is simply constructing a load or
store which is guaranteed to fault if the preceding null check would
have failed.  "imprecise fault location" is allowing the failure to
happen earlier or later in the execution so long as additional side
effect is not visible.  If you're okay with earlier side effects not
being visible, that opens up a lot of flexibility. Similar, allowing
some side effects  (but not say, IO) does as well.

Philip
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
123