[llvm-dev] beneficial optimization of undef examples needed

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
All,
     These discussions seem to be based on the premise that there is a
need for the compiler to exploit undefined behavior for performance
optimization reasons.

So far the only beneficial optimization I am aware of that relies on some
form of “undefined” is Dan Gohman’s original project for LP64 targets of
promoting i32 induction variables to i64 and hoisting sign-extension out
of the loop.

But “undef” / “poison” never appears in either the original or the transformed
IR for these types of loops, instead properties of “+nsw” are used to
justify the transformation.  The transformation does not just fall out because
we’ve done a good job at defining “undef” / “poison” IR nodes.

So I’d like to see some concrete examples of where the compiler can
do useful optimization based on “undef” / “poison” appearing explicitly
In the IR,  finding some would surely advance this discussion.



Peter Lawrence.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
I'll repeat that open-ended requests would that end up generating lots
of work for other people probably aren't going to get great results here.

John



On 6/16/17 4:03 PM, Peter Lawrence via llvm-dev wrote:

> All,
>      These discussions seem to be based on the premise that there is a
> need for the compiler to exploit undefined behavior for performance
> optimization reasons.
>
> So far the only beneficial optimization I am aware of that relies on some
> form of “undefined” is Dan Gohman’s original project for LP64 targets of
> promoting i32 induction variables to i64 and hoisting sign-extension out
> of the loop.
>
> But “undef” / “poison” never appears in either the original or the transformed
> IR for these types of loops, instead properties of “+nsw” are used to
> justify the transformation.  The transformation does not just fall out because
> we’ve done a good job at defining “undef” / “poison” IR nodes.
>
> So I’d like to see some concrete examples of where the compiler can
> do useful optimization based on “undef” / “poison” appearing explicitly
> In the IR,  finding some would surely advance this discussion.
>
>
>
> Peter Lawrence.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
Luckily someone already did the work writing a bunch of examples down:

And +1 for keeping this on-topic on how to implement poison.

- Matthias

On Jun 16, 2017, at 3:19 PM, John Regehr via llvm-dev <[hidden email]> wrote:

I'll repeat that open-ended requests would that end up generating lots of work for other people probably aren't going to get great results here.

John



On 6/16/17 4:03 PM, Peter Lawrence via llvm-dev wrote:
All,
    These discussions seem to be based on the premise that there is a
need for the compiler to exploit undefined behavior for performance
optimization reasons.

So far the only beneficial optimization I am aware of that relies on some
form of “undefined” is Dan Gohman’s original project for LP64 targets of
promoting i32 induction variables to i64 and hoisting sign-extension out
of the loop.

But “undef” / “poison” never appears in either the original or the transformed
IR for these types of loops, instead properties of “+nsw” are used to
justify the transformation.  The transformation does not just fall out because
we’ve done a good job at defining “undef” / “poison” IR nodes.

So I’d like to see some concrete examples of where the compiler can
do useful optimization based on “undef” / “poison” appearing explicitly
In the IR,  finding some would surely advance this discussion.



Peter Lawrence.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
In reply to this post by ORiordan, Martin via llvm-dev
Hi Peter,

Why we need an undef value is covered here:
http://sunfishcode.github.io/blog/2014/07/14/undef-introduction.html
(in short, to do SSA construction well).

For a while we did not have a literal representation for poison.
However, in practice having both undef and poison was problematic (see
the paper), so we decided to ditch undef and keep poison.

However for the new poison to provide the same functionality as the
old undef (which is now going away), we need a literal representation
for the new poison.

-- Sanjoy

On Fri, Jun 16, 2017 at 3:03 PM, Peter Lawrence via llvm-dev
<[hidden email]> wrote:

> All,
>      These discussions seem to be based on the premise that there is a
> need for the compiler to exploit undefined behavior for performance
> optimization reasons.
>
> So far the only beneficial optimization I am aware of that relies on some
> form of “undefined” is Dan Gohman’s original project for LP64 targets of
> promoting i32 induction variables to i64 and hoisting sign-extension out
> of the loop.
>
> But “undef” / “poison” never appears in either the original or the transformed
> IR for these types of loops, instead properties of “+nsw” are used to
> justify the transformation.  The transformation does not just fall out because
> we’ve done a good job at defining “undef” / “poison” IR nodes.
>
> So I’d like to see some concrete examples of where the compiler can
> do useful optimization based on “undef” / “poison” appearing explicitly
> In the IR,  finding some would surely advance this discussion.
>
>
>
> Peter Lawrence.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
In reply to this post by ORiordan, Martin via llvm-dev
Hi Peter,

Undef is certainly useful for vector operations in the back end. It allows shorter instruction sequences for vectors which have some, but not all, elements marked as undef. Lowering vector shuffle as swap, combining arithmetic and similar.

For example, in slightly lispy notation, folding
(+ x (vector i32 undef 5))
and
(+ x (vector i32 4 undef))
to
(+ x (vector i32 4 5))

There should also be optimisations available for bitwise operations on machine words that are partially undef, but I haven't written any yet.

Working with variables that are entirely undef is of less interest to me. For example, folding (add 5 undef) to undef leads to less code, but it's still not code that does anything useful.

Cheers,

Jon

On Sat, Jun 17, 2017 at 1:02 AM, via llvm-dev <[hidden email]> wrote:
Send llvm-dev mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of llvm-dev digest..."


Today's Topics:

   1. beneficial optimization of undef examples needed
      (Peter Lawrence via llvm-dev)
   2. Re: [GlobalISel][AArch64] Toward flipping the switch for O0:
      Please give it a try! (Quentin Colombet via llvm-dev)
   3. Re: beneficial optimization of undef examples needed
      (John Regehr via llvm-dev)
   4. Re: How does sanitizers in compiler-rt work?
      (Dipanjan Das via llvm-dev)
   5. Re: LLC does not do proper copy propagation (or copy
      coalescing) (Alex Susu via llvm-dev)
   6. Re: [GlobalISel][AArch64] Toward flipping the switch for O0:
      Please give it a try! (Quentin Colombet via llvm-dev)
   7. Re: beneficial optimization of undef examples needed
      (Matthias Braun via llvm-dev)
   8. Re: [GlobalISel][AArch64] Toward flipping the switch for O0:
      Please give it a try! (Eric Christopher via llvm-dev)
   9. Re: Wide load/store optimization question
      (Matthias Braun via llvm-dev)


----------------------------------------------------------------------

Message: 1
Date: Fri, 16 Jun 2017 15:03:32 -0700
From: Peter Lawrence via llvm-dev <[hidden email]>
To: llvm-dev <[hidden email]>
Subject: [llvm-dev] beneficial optimization of undef examples needed
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8

All,
     These discussions seem to be based on the premise that there is a
need for the compiler to exploit undefined behavior for performance
optimization reasons.

So far the only beneficial optimization I am aware of that relies on some
form of “undefined” is Dan Gohman’s original project for LP64 targets of
promoting i32 induction variables to i64 and hoisting sign-extension out
of the loop.

But “undef” / “poison” never appears in either the original or the transformed
IR for these types of loops, instead properties of “+nsw” are used to
justify the transformation.  The transformation does not just fall out because
we’ve done a good job at defining “undef” / “poison” IR nodes.

So I’d like to see some concrete examples of where the compiler can
do useful optimization based on “undef” / “poison” appearing explicitly
In the IR,  finding some would surely advance this discussion.



Peter Lawrence.




------------------------------

Message: 2
Date: Fri, 16 Jun 2017 15:06:36 -0700
From: Quentin Colombet via llvm-dev <[hidden email]>
To: Diana Picus <[hidden email]>
Cc: llvm-dev <[hidden email]>, Justin Bogner
        <[hidden email]>, Ahmed Bougacha <[hidden email]>, Aditya
        Nandakumar <[hidden email]>, nd <[hidden email]>
Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the
        switch for O0: Please give it a try!
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"


> On Jun 14, 2017, at 7:27 AM, Diana Picus <[hidden email]> wrote:
>
> On 12 June 2017 at 18:54, Diana Picus <[hidden email] <mailto:[hidden email]>> wrote:
> Hi all,
>
> I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.
>
>
> I did some more investigations on a machine similar to the one running the buildbot. For paq8p and scimark2, I get these results for O0:
>
> PAQ8p:
> Fast isel: 666.344
> Global isel: 731.384
>
> SciMark2-C:
> Fast isel: 463.908
> Global isel: 496.22
>
> The current timeout is 500s (so in this particular case we didn't hit it for scimark2, and it ran successfully to completion). I don't think the difference between FastISel and GlobalISel is too atrocious, so I would propose increasing the timeout for these 2 benchmarks. I'm not sure if we can do this on a per-bot basis, but I see some precedent for setting custom timeout thresholds for various benchmarks on different architectures (sometimes with comments that it's done so we can run O0 on that particular benchmark).
>
> Something along these lines works:
> https://reviews.llvm.org/differential/diff/102547/ <https://reviews.llvm.org/differential/diff/102547/>
>
> What do you guys think about this approach?

Looks reasonable to me.

>
> Thanks,
> Diana
>
> PS: The buildbot is using the Makefiles because that's what our other AArch64 test-suite bots use. Moving all of them to CMake is a transition for another time.
>
> At the moment, it lives in an internal buildmaster that I've setup for this purpose. If we fix it and it proves to be stable for a week or two, I'll move it to the public master.
>
> Cheers,
> Diana
>
> [1] http://master2.llvm.validation.linaro.org/builders/clang-cmake-aarch64-global-isel <http://master2.llvm.validation.linaro.org/builders/clang-cmake-aarch64-global-isel>
>
>
> On 6 June 2017 at 19:11, Quentin Colombet <[hidden email] <mailto:[hidden email]>> wrote:
> Thanks Kristof.
>
> Sounds like we'll need to investigate though I'd say it is not blocking the switch.
>
> At this point I think everybody is on board to flip the switch.
> @Eric, how does that sound to you?
>
> Thanks,
> Q
>
> Le 1 juin 2017 à 07:46, Kristof Beyls <[hidden email] <mailto:[hidden email]>> a écrit :
>
>>
>>> On 31 May 2017, at 17:07, Quentin Colombet <[hidden email] <mailto:[hidden email]>> wrote:
>>>>
>>>> Latest comparisons on my side, after picking up r304244, i.e. the correct Localizer pass.
>>>> * CTMark compile time, comparing "-O0 -g" vs '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0': about 6% increase with globalisel. This was about 3.5% before the Localizer pass landed.
>>>
>>> That one is surprising too. I wouldn’t have expected this pass to show up in the compile time profile. At least not to this extend.
>>> What is the biggest offender?
>>
>> Hmmm. So I took the 3.5% compile time overhead from my last measurement before the localizer landed, from around 24th of May.
>> When using -ftime-report, I see the Localizer pass typically taking very roughly about 1% of compile time.
>> Maybe another part of GlobalISel became a bit slower since I did that 3.5% measurement?
>> Or maybe the Localizer pass changes the structure of the program so that another later pass gets a different compile time profile?
>> Basically, I'd have to do more experiments to figure that one out.
>>
>> As far as where time is spent in the gisel-passes itself, on average, I saw the following on the latest CTMark experiment I ran:
>> Avg compile time spent in IRTranslator: 4.61%
>> Avg compile time spent in InstructionSelect: 7.51%
>> Avg compile time spent in Legalizer: 1.06%
>> Avg compile time spent in Localizer: 0.76%
>> Avg compile time spent in RegBankSelect: 2.12%
>>
>>>
>>>> * My usual performance benchmarking run: 8.5% slow-down. This was about 9.5% before the Localizer pass landed, so a slight improvement.
>>>> * Code size: 3.14% larger. This was about 2.8% before the Localizer pass landed, so a slight regression.
>>>
>>> That one is surprising. Do you have an idea of what is happening?
>>> Alternatively if you can point me to the biggest offender, I can have a look.
>>
>> So the biggest offenders on the mem_bytes metric in LNT are:
>> O0 -g        O0 -g gisel-with-localizer      O0 -g gisel-without-localizer
>> SingleSource/Benchmarks/Misc/perlin  14272   14640   18344   25.95%
>> SingleSource/Benchmarks/Dhrystone/dry        16560   17144   20160   18.21%
>> SingleSource/Benchmarks/Stanford/QueensProfile       13912   14192   15136   6.79%
>> MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url    71400   72272   75504   4.53%
>>
>> I haven't had time to investigate what exact changes make the code size go up that much with the localizer pass in those cases...
>>
>>>
>>> The only thing I can think of is that we duplicate constants that are expensive to materialize. If that’s the case, we were discussing with Ahmed an alternative to the localizer pass that would operate during InstructionSelect so may be worth pursuing.
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/e2843584/attachment-0001.html>

------------------------------

Message: 3
Date: Fri, 16 Jun 2017 16:19:09 -0600
From: John Regehr via llvm-dev <[hidden email]>
To: [hidden email]
Subject: Re: [llvm-dev] beneficial optimization of undef examples
        needed
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8; format=flowed

I'll repeat that open-ended requests would that end up generating lots
of work for other people probably aren't going to get great results here.

John



On 6/16/17 4:03 PM, Peter Lawrence via llvm-dev wrote:
> All,
>      These discussions seem to be based on the premise that there is a
> need for the compiler to exploit undefined behavior for performance
> optimization reasons.
>
> So far the only beneficial optimization I am aware of that relies on some
> form of “undefined” is Dan Gohman’s original project for LP64 targets of
> promoting i32 induction variables to i64 and hoisting sign-extension out
> of the loop.
>
> But “undef” / “poison” never appears in either the original or the transformed
> IR for these types of loops, instead properties of “+nsw” are used to
> justify the transformation.  The transformation does not just fall out because
> we’ve done a good job at defining “undef” / “poison” IR nodes.
>
> So I’d like to see some concrete examples of where the compiler can
> do useful optimization based on “undef” / “poison” appearing explicitly
> In the IR,  finding some would surely advance this discussion.
>
>
>
> Peter Lawrence.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


------------------------------

Message: 4
Date: Fri, 16 Jun 2017 15:23:06 -0700
From: Dipanjan Das via llvm-dev <[hidden email]>
To: llvm-dev <[hidden email]>
Subject: Re: [llvm-dev] How does sanitizers in compiler-rt work?
Message-ID:
        <CAEK-7JLpnet2zF82z5v-RvUKaYrbmZRGtHpc-=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi Vedant,

Thanks for the pointers. Please find my replies inline.

On 16 June 2017 at 14:48, Vedant Kumar <[hidden email]> wrote:

>
> On Jun 16, 2017, at 4:11 AM, Dipanjan Das via llvm-dev <
> [hidden email]> wrote:
>
>
> Can anybody give me any pointer on how compiler-rt, especially the
> sanitizers work? Do they operate on IR as any other LLVM pass? Or are they
> integral part of the frontend itself? I couldn't spot any documentation on
> the internals of compiler-rt project? What happens (sequence of actions)
> when I pass -fsanitizer=dataflow to clang?
>
>
> Passing -fsanitize=dataflow tells clang to insert the dataflow sanitizer's
> instrumentation pass into the normal compilation pipeline. The
> instrumentation occurs at the LLVM IR level. The pass may insert calls into
> runtime functions which are provided by compiler-rt. Therefore, in order to
> link a program compiled with -fsanitize=dataflow, the appropriate runtime
> library from compiler-rt is required.
>
>
> Precisely, I intend to alter the behaviour of DFSan to suit my need.
>
>
> What is your need, exactly?
>
>
Instead of manually inserting the dfsan_create_label() and
dfsan_set_label() calls in the source, I want to automatically insert those
calls in the IR for all the input variables in scanf(). I intend to run the
DFsan pass afterwards, thus instrumenting the IR further as required.


> Therefore, I need to know how it gets integrated in the tool-chain.
> Initially, my idea was to insert the dfsan_set_label() calls to the IR and
> pass it to DFSan. However, I am not sure if it's designed to run on the
> source only, not on IR.
>
>
> You should take a look at lib/Transforms/Instrumentation/DataFlowSanitizer.cpp.
> There doesn't appear to be much done at the source level.
>
> best,
> vedant
>
>
> --
>
> Thanks & Regards,
> Dipanjan
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>


--

Thanks & Regards,
Dipanjan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/b7325d95/attachment-0001.html>

------------------------------

Message: 5
Date: Sat, 17 Jun 2017 02:28:22 +0300
From: Alex Susu via llvm-dev <[hidden email]>
To: llvm-dev <[hidden email]>
Subject: Re: [llvm-dev] LLC does not do proper copy propagation (or
        copy coalescing)
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8; format=flowed

   Hello.
     Wei-Ren, as I've pointed out in the previous email: the piece of code below has the
deficiency that it uses register R5 instead of using R0 - this happens because in LLVM IR
I created 2 variables, varIndexInner and varIndexOuter, since I have 2 loops and the
variable has to be iterated in the inner loop and I need to preserve its value when going
to the next iteration for the outer loop.
       // NOTE: my processor accepts loops in the form of REPEAT(num_times)..END_REPEAT
       R0 = ...
       REPEAT(256)
         R5 = R0; // basically unnecessary reg. copy
         REPEAT(256)
           R10 = LS[R4];
           R2 = LS[R5];
           R4 = R4 + R1;
           R5 = R5 + R1; // should be R0 = R0 + R1
           R10 = R2 * R10;
           R3 = R3 + R10;
         END_REPEAT;
         REDUCE R3;
         R0 = R5; // basically unnecessary reg. copy
       END_REPEAT;


     The reason the RegisterCoalescer.cpp is not able to optimize this problem I mentioned
about is that R0 and R5 have interfering live intervals.

     I'm trying to implement a case to handle this optimization I want in
RegisterCoalescer.cpp, but it seems a bit complicated. (However, it seems more natural to
do a standard copy propagation with Data Flow Analysis on the MachineBasicBlocks with
virtual registers, after coming out of SSA form. Muchnick's book from 1997 talks in detail
about this in Section 12.5.)

     More exactly the registers and copies concerned for the above ASM code (copying text
from the stderr of llc) are:
       BB#0:
         vreg99 = 0 // IMPORTANT: this instruction is dead and I guess if it is DCE-ed
RegisterCoalescer.cpp would be able to optimize my code

       BB#1:
         vreg94 = some_data_offset

       BB#3:
         vreg99 = COPY vreg94 // This copy does propagate

       BB#4:
         vreg61 = LOAD vreg99
         vreg99 = ADD vreg99, 1
         jmp_cond BB#4, BB#9

       BB#9:
         vreg94 = COPY vreg99 // This copy does NOT propagate
         jmp_cond BB#3

     Can somebody tell me how can I run the Dead Code Elimination and then
RegisterCoalescer again in LLC in order to see if I can maybe optimize this piece of code?

     I'm interested in doing this optimization since the code runs on a very wide SIMD
processor and every instruction counts.

   Thank you very much,
     Alex



On 6/15/2017 11:41 PM, 陳韋任 wrote:
>         I see 3 options to address my problem:
>           - implement a case that handles this in PHI elimination (PHIElimination.cpp);
>           - create a new pass that does copy propagation (based on DFA) on machine
>     instructions before Register Allocation;
>           - optimize copy coalescing such as the standard one or the one activated by
>     -pbqp-coalescing in lib/CodeGen/RegAllocPBQP.cpp (there is an email also about PBQP
>     coalescing at http://lists.llvm.org/pipermail/llvm-dev/2016-June/100523.html
>     <http://lists.llvm.org/pipermail/llvm-dev/2016-June/100523.html>).
>
>
> Usually this is done by copy coalescing, do you know why yours cannot be eliminated, is
> your case not be handled well in existing copy coalescing (RegisterCoalescer.cpp for
> example)?
>
> ​HTH,
> chenwj​
>
> --
> Wei-Ren Chen (陳韋任)
> Homepage: https://people.cs.nctu.edu.tw/~chenwj


------------------------------

Message: 6
Date: Fri, 16 Jun 2017 16:43:35 -0700
From: Quentin Colombet via llvm-dev <[hidden email]>
To: Quentin Colombet <[hidden email]>
Cc: llvm-dev <[hidden email]>, Justin Bogner
        <[hidden email]>, Ahmed Bougacha <[hidden email]>, Aditya
        Nandakumar <[hidden email]>, nd <[hidden email]>
Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the
        switch for O0: Please give it a try!
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi all,

We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.


*** Why Is That? ***

We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.
In particular,
1. The APIs are still evolving and can still possibly change significantly
2. The TableGen backend to reuse the existing SD patterns is still at its early stage
3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)

The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.

We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!


*** Short-Term Proposal ***

What we would like to do instead short-term is:
A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)
B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems
C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic

What do people think?


*** Your Help Is Needed ***

- Please share your experience in using the GISel APIs and how we can make them better. Moving forward we’ll have those conversations on open source instead of internally/with a narrower audience.
- Report any performance problem you identify
- Propose patches!

Cheers,
-Quentin



> On Jun 16, 2017, at 3:06 PM, Quentin Colombet via llvm-dev <[hidden email]> wrote:
>
>>
>> On Jun 14, 2017, at 7:27 AM, Diana Picus <[hidden email] <mailto:[hidden email]>> wrote:
>>
>> On 12 June 2017 at 18:54, Diana Picus <[hidden email] <mailto:[hidden email]>> wrote:
>> Hi all,
>>
>> I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.
>>
>>
>> I did some more investigations on a machine similar to the one running the buildbot. For paq8p and scimark2, I get these results for O0:
>>
>> PAQ8p:
>> Fast isel: 666.344
>> Global isel: 731.384
>>
>> SciMark2-C:
>> Fast isel: 463.908
>> Global isel: 496.22
>>
>> The current timeout is 500s (so in this particular case we didn't hit it for scimark2, and it ran successfully to completion). I don't think the difference between FastISel and GlobalISel is too atrocious, so I would propose increasing the timeout for these 2 benchmarks. I'm not sure if we can do this on a per-bot basis, but I see some precedent for setting custom timeout thresholds for various benchmarks on different architectures (sometimes with comments that it's done so we can run O0 on that particular benchmark).
>>
>> Something along these lines works:
>> https://reviews.llvm.org/differential/diff/102547/ <https://reviews.llvm.org/differential/diff/102547/>
>>
>> What do you guys think about this approach?
>
> Looks reasonable to me.
>
>>
>> Thanks,
>> Diana
>>
>> PS: The buildbot is using the Makefiles because that's what our other AArch64 test-suite bots use. Moving all of them to CMake is a transition for another time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/fb1dc279/attachment-0001.html>

------------------------------

Message: 7
Date: Fri, 16 Jun 2017 16:48:15 -0700
From: Matthias Braun via llvm-dev <[hidden email]>
To: John Regehr <[hidden email]>
Cc: [hidden email]
Subject: Re: [llvm-dev] beneficial optimization of undef examples
        needed
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Luckily someone already did the work writing a bunch of examples down:
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html <http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>

And +1 for keeping this on-topic on how to implement poison.

- Matthias

> On Jun 16, 2017, at 3:19 PM, John Regehr via llvm-dev <[hidden email]> wrote:
>
> I'll repeat that open-ended requests would that end up generating lots of work for other people probably aren't going to get great results here.
>
> John
>
>
>
> On 6/16/17 4:03 PM, Peter Lawrence via llvm-dev wrote:
>> All,
>>     These discussions seem to be based on the premise that there is a
>> need for the compiler to exploit undefined behavior for performance
>> optimization reasons.
>>
>> So far the only beneficial optimization I am aware of that relies on some
>> form of “undefined” is Dan Gohman’s original project for LP64 targets of
>> promoting i32 induction variables to i64 and hoisting sign-extension out
>> of the loop.
>>
>> But “undef” / “poison” never appears in either the original or the transformed
>> IR for these types of loops, instead properties of “+nsw” are used to
>> justify the transformation.  The transformation does not just fall out because
>> we’ve done a good job at defining “undef” / “poison” IR nodes.
>>
>> So I’d like to see some concrete examples of where the compiler can
>> do useful optimization based on “undef” / “poison” appearing explicitly
>> In the IR,  finding some would surely advance this discussion.
>>
>>
>>
>> Peter Lawrence.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/98f258f4/attachment-0001.html>

------------------------------

Message: 8
Date: Fri, 16 Jun 2017 23:58:21 +0000
From: Eric Christopher via llvm-dev <[hidden email]>
To: Quentin Colombet <[hidden email]>
Cc: llvm-dev <[hidden email]>, Justin Bogner
        <[hidden email]>, Ahmed Bougacha <[hidden email]>, Aditya
        Nandakumar <[hidden email]>, nd <[hidden email]>
Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the
        switch for O0: Please give it a try!
Message-ID:
        <CALehDX5M=+[hidden email]>
Content-Type: text/plain; charset="utf-8"

On Fri, Jun 16, 2017 at 4:43 PM Quentin Colombet <[hidden email]>
wrote:

> Hi all,
>
> We had some internal discussions about flipping the default for O0 and we
> concluded that we wanted to postpone it.
>
>
> *** Why Is That? ***
>
> We don’t want to send the wrong message that GlobalISel’s design is set in
> stone and ready for broader adoption.
> In particular,
> 1. The APIs are still evolving and can still possibly change significantly
> 2. The TableGen backend to reuse the existing SD patterns is still at its
> early stage
> 3. We want to investigate closely the performance of global-isel
> (compile-time, runtime, code size, fallbacks)
>
> The rationale behind those items is that we want to minimize the pain of
> moving forward for everybody. We also want the out-of-the-box experience to
> be pleasant (like all/most of the tablegen patterns just work, we have
> documentation on how to target a new backend, etc.) Finally, we want to
> gain confidence we are going to be able to address the performance issues
> we have with the current design and if not, derive a plan for that.
>
> We purposely left out of the conversation what will be the right time and
> requirements to flip the switch. We want to gather more data first. Your
> help would be appreciated!
>
>
> *** Short-Term Proposal ***
>
> What we would like to do instead short-term is:
> A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to
> enable GISel with fallbacks and warnings enables (i.e., equivalent of
> -global-isel -global-isel-abort=2)
> B. Advertise this option in the next open source release to allow compiler
> enthusiastic to try it and report problems
> C. Have GISel always built so we can push thing in the right place,
> MachineVerifier in mind, and stop doing some weird gymnastic
>
> What do people think?
>
>
How about -fexperimental-global-isel as a flag to clang?

-eric


>
> *** Your Help Is Needed ***
>
> - Please share your experience in using the GISel APIs and how we can make
> them better. Moving forward we’ll have those conversations on open source
> instead of internally/with a narrower audience.
> - Report any performance problem you identify
> - Propose patches!
>
> Cheers,
> -Quentin
>
>
>
> On Jun 16, 2017, at 3:06 PM, Quentin Colombet via llvm-dev <
> [hidden email]> wrote:
>
>
> On Jun 14, 2017, at 7:27 AM, Diana Picus <[hidden email]> wrote:
>
> On 12 June 2017 at 18:54, Diana Picus <[hidden email]> wrote:
>
>> Hi all,
>>
>> I added a buildbot [1] running the test-suite with -O0 -global-isel. It
>> runs into the same 2 timeouts that I reported previously on this thread
>> (paq8p and scimark2). It would be nice to make it green before flipping the
>> switch.
>>
>>
> I did some more investigations on a machine similar to the one running the
> buildbot. For paq8p and scimark2, I get these results for O0:
>
> PAQ8p:
> Fast isel: 666.344
> Global isel: 731.384
>
> SciMark2-C:
> Fast isel: 463.908
> Global isel: 496.22
>
> The current timeout is 500s (so in this particular case we didn't hit it
> for scimark2, and it ran successfully to completion). I don't think the
> difference between FastISel and GlobalISel is too atrocious, so I would
> propose increasing the timeout for these 2 benchmarks. I'm not sure if we
> can do this on a per-bot basis, but I see some precedent for setting custom
> timeout thresholds for various benchmarks on different architectures
> (sometimes with comments that it's done so we can run O0 on that particular
> benchmark).
>
> Something along these lines works:
> https://reviews.llvm.org/differential/diff/102547/
>
> What do you guys think about this approach?
>
>
> Looks reasonable to me.
>
>
> Thanks,
> Diana
>
> PS: The buildbot is using the Makefiles because that's what our other
> AArch64 test-suite bots use. Moving all of them to CMake is a transition
> for another time.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/76b429ee/attachment-0001.html>

------------------------------

Message: 9
Date: Fri, 16 Jun 2017 17:05:46 -0700
From: Matthias Braun via llvm-dev <[hidden email]>
To: 陳韋任 <[hidden email]>
Cc: LLVM Developers Mailing List <[hidden email]>, upcfrost
        <[hidden email]>
Subject: Re: [llvm-dev] Wide load/store optimization question
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"


> On Jun 16, 2017, at 2:43 PM, 陳韋任 via llvm-dev <[hidden email]> wrote:
>
>
>
> 2017-06-17 4:36 GMT+08:00 upcfrost <[hidden email] <mailto:[hidden email]>>:
> Hi,
>
> Same here, my backend only has 64bit load/store. But i still use 64bit virt regs and expand/declare missing instructions by myself.
>
> I'll try looking into sparc backend, thanks. Also, only after writing this post I found a bunch of built-in transforms. Still trying to understand how to use those.
>
> By the way, constraint-wise (alignment), is there any difference between virt regclass and regtuple?

That question makes no sense.
- Every virtual register has a register class assigned.
- You can construct special register classes that represent register tuples so that when the allocator chooses an entry from that register class it really has choosen a tuple of machine registers (even though it looks like a single register with funny aliasing as far as llvm codegen is concerned).

- Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/30f9fd06/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
llvm-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


------------------------------

End of llvm-dev Digest, Vol 156, Issue 97
*****************************************


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
In reply to this post by ORiordan, Martin via llvm-dev
Sanjoy,
            You have changed the subject.  We still need real world examples
showing how taking advantage of “undef” results in beneficial optimization.

My belief is that they don’t exist, my reasoning is this: real world programmers
are likely to run UBSan before compiling (or if they don’t they should), therefore
it is highly unlikely that any “undef” will actually exist during compilation of their
source code.

Yes, “Undef” can be created during SSA construction, but we already discussed
That in a previous email (its a register allocation problem).


The only other way I can see of “undef” occurring in IR is if we go out of our
way to optimize for example

        if (   ((i64)X + (i64)Y)) < INT_MIN   ||    ((i64)X) + (i64)Y) > INT_MAX   ) {
                . . .
                Z = X  "+nsw”  Y;
                . . .
        }

Here “Z” could in theory be replaced with “undef”, but there is no point in doing so.

Similarly with provably out-of-bounds GEP, similarly with provably invalid pointers,
etc, but again there is no point in doing so.


So is there any other way of having “undef” appear in the IR ?


Peter Lawrence.



> On Jun 16, 2017, at 7:45 PM, Sanjoy Das <[hidden email]> wrote:
>
> Hi Peter,
>
> Why we need an undef value is covered here:
> http://sunfishcode.github.io/blog/2014/07/14/undef-introduction.html
> (in short, to do SSA construction well).
>
> For a while we did not have a literal representation for poison.
> However, in practice having both undef and poison was problematic (see
> the paper), so we decided to ditch undef and keep poison.
>
> However for the new poison to provide the same functionality as the
> old undef (which is now going away), we need a literal representation
> for the new poison.
>
> -- Sanjoy
>
> On Fri, Jun 16, 2017 at 3:03 PM, Peter Lawrence via llvm-dev
> <[hidden email]> wrote:
>> All,
>>     These discussions seem to be based on the premise that there is a
>> need for the compiler to exploit undefined behavior for performance
>> optimization reasons.
>>
>> So far the only beneficial optimization I am aware of that relies on some
>> form of “undefined” is Dan Gohman’s original project for LP64 targets of
>> promoting i32 induction variables to i64 and hoisting sign-extension out
>> of the loop.
>>
>> But “undef” / “poison” never appears in either the original or the transformed
>> IR for these types of loops, instead properties of “+nsw” are used to
>> justify the transformation.  The transformation does not just fall out because
>> we’ve done a good job at defining “undef” / “poison” IR nodes.
>>
>> So I’d like to see some concrete examples of where the compiler can
>> do useful optimization based on “undef” / “poison” appearing explicitly
>> In the IR,  finding some would surely advance this discussion.
>>
>>
>>
>> Peter Lawrence.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev


On Mon, Jun 19, 2017 at 7:36 AM Peter Lawrence via llvm-dev <[hidden email]> wrote:
Sanjoy,
            You have changed the subject.  We still need real world examples
showing how taking advantage of “undef” results in beneficial optimization.

My belief is that they don’t exist, my reasoning is this: real world programmers
are likely to run UBSan before compiling (or if they don’t they should), therefore
it is highly unlikely that any “undef” will actually exist during compilation of their
source code.

Wait - that ^ doesn't seem to follow. The point of undefined behavior optimizations isn't to only break programs that would trigger UBSan. The point is to optimize on the assumption that they won't break UBSan/exhibit undefined behavior.
 
Yes, “Undef” can be created during SSA construction, but we already discussed
That in a previous email (its a register allocation problem).


The only other way I can see of “undef” occurring in IR is if we go out of our
way to optimize for example

        if (   ((i64)X + (i64)Y)) < INT_MIN   ||    ((i64)X) + (i64)Y) > INT_MAX   ) {
                . . .
                Z = X  "+nsw”  Y;
                . . .
        }

Here “Z” could in theory be replaced with “undef”, but there is no point in doing so.

Similarly with provably out-of-bounds GEP, similarly with provably invalid pointers,
etc, but again there is no point in doing so.

Why isn't there any point in doing so? One common situation where exploiting UB (aka: "assuming UB can't happen") may be beneficial is when the code in isolation is fine, but when inlined it's possible to prove certain paths create contradictions. That way even if the conditions for those paths can't be analyzed to prove they are unreachable (or values are not needed, etc), the compiler can skip all that and just collapse the code (remove the unreachable code, delete the value computation, etc) anyway.
 


So is there any other way of having “undef” appear in the IR ?


Peter Lawrence.



> On Jun 16, 2017, at 7:45 PM, Sanjoy Das <[hidden email]> wrote:
>
> Hi Peter,
>
> Why we need an undef value is covered here:
> http://sunfishcode.github.io/blog/2014/07/14/undef-introduction.html
> (in short, to do SSA construction well).
>
> For a while we did not have a literal representation for poison.
> However, in practice having both undef and poison was problematic (see
> the paper), so we decided to ditch undef and keep poison.
>
> However for the new poison to provide the same functionality as the
> old undef (which is now going away), we need a literal representation
> for the new poison.
>
> -- Sanjoy
>
> On Fri, Jun 16, 2017 at 3:03 PM, Peter Lawrence via llvm-dev
> <[hidden email]> wrote:
>> All,
>>     These discussions seem to be based on the premise that there is a
>> need for the compiler to exploit undefined behavior for performance
>> optimization reasons.
>>
>> So far the only beneficial optimization I am aware of that relies on some
>> form of “undefined” is Dan Gohman’s original project for LP64 targets of
>> promoting i32 induction variables to i64 and hoisting sign-extension out
>> of the loop.
>>
>> But “undef” / “poison” never appears in either the original or the transformed
>> IR for these types of loops, instead properties of “+nsw” are used to
>> justify the transformation.  The transformation does not just fall out because
>> we’ve done a good job at defining “undef” / “poison” IR nodes.
>>
>> So I’d like to see some concrete examples of where the compiler can
>> do useful optimization based on “undef” / “poison” appearing explicitly
>> In the IR,  finding some would surely advance this discussion.
>>
>>
>>
>> Peter Lawrence.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
David,
          I’m not asking for the logic behind optimizing UB,
I’m asking for real world examples, a SPEC benchmark would do,
where optimizing UB actually proves to be beneficial.

I believe that in the real world it doesn’t happen, and I disagree
with the generalization of “constant-propagation after inlining is
beneficial” to “UB-propagation after inlining is beneficial”.

rather I believe that undef-propagation is the root cause of our problems.

But even if you disagree with my logic, we still don’t have any real world
examples showing the benefit of “UB-propagation after inlining".


Peter Lawrence.




On Jun 19, 2017, at 8:25 AM, David Blaikie <[hidden email]> wrote:



On Mon, Jun 19, 2017 at 7:36 AM Peter Lawrence via llvm-dev <[hidden email]> wrote:
Sanjoy,
            You have changed the subject.  We still need real world examples
showing how taking advantage of “undef” results in beneficial optimization.

My belief is that they don’t exist, my reasoning is this: real world programmers
are likely to run UBSan before compiling (or if they don’t they should), therefore
it is highly unlikely that any “undef” will actually exist during compilation of their
source code.

Wait - that ^ doesn't seem to follow. The point of undefined behavior optimizations isn't to only break programs that would trigger UBSan. The point is to optimize on the assumption that they won't break UBSan/exhibit undefined behavior.
 
Yes, “Undef” can be created during SSA construction, but we already discussed
That in a previous email (its a register allocation problem).


The only other way I can see of “undef” occurring in IR is if we go out of our
way to optimize for example

        if (   ((i64)X + (i64)Y)) < INT_MIN   ||    ((i64)X) + (i64)Y) > INT_MAX   ) {
                . . .
                Z = X  "+nsw”  Y;
                . . .
        }

Here “Z” could in theory be replaced with “undef”, but there is no point in doing so.

Similarly with provably out-of-bounds GEP, similarly with provably invalid pointers,
etc, but again there is no point in doing so.

Why isn't there any point in doing so? One common situation where exploiting UB (aka: "assuming UB can't happen") may be beneficial is when the code in isolation is fine, but when inlined it's possible to prove certain paths create contradictions. That way even if the conditions for those paths can't be analyzed to prove they are unreachable (or values are not needed, etc), the compiler can skip all that and just collapse the code (remove the unreachable code, delete the value computation, etc) anyway.
 


So is there any other way of having “undef” appear in the IR ?


Peter Lawrence.



> On Jun 16, 2017, at 7:45 PM, Sanjoy Das <[hidden email]> wrote:
>
> Hi Peter,
>
> Why we need an undef value is covered here:
> http://sunfishcode.github.io/blog/2014/07/14/undef-introduction.html
> (in short, to do SSA construction well).
>
> For a while we did not have a literal representation for poison.
> However, in practice having both undef and poison was problematic (see
> the paper), so we decided to ditch undef and keep poison.
>
> However for the new poison to provide the same functionality as the
> old undef (which is now going away), we need a literal representation
> for the new poison.
>
> -- Sanjoy
>
> On Fri, Jun 16, 2017 at 3:03 PM, Peter Lawrence via llvm-dev
> <[hidden email]> wrote:
>> All,
>>     These discussions seem to be based on the premise that there is a
>> need for the compiler to exploit undefined behavior for performance
>> optimization reasons.
>>
>> So far the only beneficial optimization I am aware of that relies on some
>> form of “undefined” is Dan Gohman’s original project for LP64 targets of
>> promoting i32 induction variables to i64 and hoisting sign-extension out
>> of the loop.
>>
>> But “undef” / “poison” never appears in either the original or the transformed
>> IR for these types of loops, instead properties of “+nsw” are used to
>> justify the transformation.  The transformation does not just fall out because
>> we’ve done a good job at defining “undef” / “poison” IR nodes.
>>
>> So I’d like to see some concrete examples of where the compiler can
>> do useful optimization based on “undef” / “poison” appearing explicitly
>> In the IR,  finding some would surely advance this discussion.
>>
>>
>>
>> Peter Lawrence.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] beneficial optimization of undef examples needed

ORiordan, Martin via llvm-dev
In reply to this post by ORiordan, Martin via llvm-dev
Hi Peter,

On Mon, Jun 19, 2017 at 7:36 AM, Peter Lawrence
<[hidden email]> wrote:
>             You have changed the subject.  We still need real world examples

Not sure how -- I though I was pretty clear on how my answer was
related to you question.

> showing how taking advantage of “undef” results in beneficial optimization.
>
> My belief is that they don’t exist, my reasoning is this: real world programmers
> are likely to run UBSan before compiling (or if they don’t they should), therefore
> it is highly unlikely that any “undef” will actually exist during compilation of their
> source code.

The canonical example motivating undef is this:

  int a;
  boil initialized;
  if (condition) {
    a = 5;
    initialized = true;
  }
  ...
  int result = 0;
  if (condition) {
    result += a;
  }

You can't optimize the PHI node that you'll get in side the block
guarded by 'condition' to 5 without something like undef.

> Yes, “Undef” can be created during SSA construction, but we already discussed
> That in a previous email (its a register allocation problem).
>
> The only other way I can see of “undef” occurring in IR is if we go out of our
> way to optimize for example
>
>         if (   ((i64)X + (i64)Y)) < INT_MIN   ||    ((i64)X) + (i64)Y) > INT_MAX   ) {
>                 . . .
>                 Z = X  "+nsw”  Y;
>                 . . .
>         }
>
> Here “Z” could in theory be replaced with “undef”, but there is no point in doing so.
>
> Similarly with provably out-of-bounds GEP, similarly with provably invalid pointers,
> etc, but again there is no point in doing so.

These would be poison, not undef?

Just to re-iterate what I've already said before:

 - If you're willing to have both undef and poison, then you don't
need a literal representation of poison to do the kind of
transformations poison allows (e.g. A s< (A +nsw 1)).
 - However, having both undef and poison is a bad idea for several
reasons outlined in the paper.
 - Since we want the new poison to "fill in" for the undef too, we
need a literal representation for the new poison to addresses cases
like the PHI node case above.

We don't care about optimizing code that actually has undefined
behavior -- we only care about optimizing code *under the assumption*
that it does not have UB.  Both the old undef and the new poison have
been carefully designed to address this use case.

> So is there any other way of having “undef” appear in the IR ?

Loading from uninitalized memory is another way you can have undef
appear in the IR today.

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev