[llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Jeremy Morse via llvm-dev

---------------------------
Wed, Sep 11, 2019:
---------------------------

- LICM vs Loop Sink Strategy (Whitney)
- LICM and SCEV expander host code with no regards to increased
live-ranges. This is a long standing issue where historically
preference has been to keep LICM more aggressive.
- Two questions from IBM side:
a. This problem is not specific to the POWER platform, so we are
wondering if other people are interested?
- b. Where would be the best place to address this issue?
- Since it's hard to come up with an accurate register pressure
estimator in opt, it's probably better to be done fairly late,
maybe after instruction scheduling.
- A good place to start would be instruction re-materialization in
the register allocator.
- Problem is the logic in the register allocator can deal with a
single instruction (instead of groups of instructions) at a time.
- Start by handling one single-instruction at a time and apply the
same logic to groups of instructions iteratively to see the
impact on performance and compile-time.
- live-range editor may have utilities to help with code motion.
- lazy-code-motion may be a good long term solution, but no one seems
to be actively working on it.

- Announcements:
- flang call moved so we are no longer in conflict!

- Philip is working on making loop vectorizer robust in the face of
multiple exits. There are two subproblems
1. vectorizer currently gives up because scev is not giving exit
counts (due to a bug?). This is relatively easy to fix and
Philip will have a patch for it soon.
2. loop exit cannot be analyzed due to data dependent exit, which
is currently handled via predication. There is a lot of room
for improvement, specially for read-only loops.
Please let him know if you are interested.


- Status Updates
- Data Dependence Graph (https://reviews.llvm.org/D65350) (Bardia)
- All review comments are addressed. Waiting for approval.
- Bugzilla bugs update (Vivek)
- Florian has a patch fixing loop bugs related to max trip count.

----------------------------
Tentative Agenda for Sept 25
----------------------------

Presentation from Marc Moreno Maza about his work on delinearization.

- Status Updates
- Follow up on multi-dimensional array indexing RFC (Siddharth)
- Impact of Loop Rotation on existing passes (Min-Yih)
- Data Dependence Graph (https://reviews.llvm.org/D65350) (Bardia)
- Bugzilla bugs update (Vivek)
- Others?


Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Jeremy Morse via llvm-dev
Hi,


On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:

---------------------------
Wed, Sep 11, 2019:
---------------------------

- LICM vs Loop Sink Strategy (Whitney)
- LICM and SCEV expander host code with no regards to increased
live-ranges. This is a long standing issue where historically
preference has been to keep LICM more aggressive.


This issue also motivated adding metadata to disable LICM (llvm.loop.licm.disable) recently. https://reviews.llvm.org/D64557

- Two questions from IBM side:
a. This problem is not specific to the POWER platform, so we are
wondering if other people are interested?
- b. Where would be the best place to address this issue?
- Since it's hard to come up with an accurate register pressure
estimator in opt, it's probably better to be done fairly late,
maybe after instruction scheduling.
- A good place to start would be instruction re-materialization in
the register allocator.
- Problem is the logic in the register allocator can deal with a
single instruction (instead of groups of instructions) at a time.
- Start by handling one single-instruction at a time and apply the
same logic to groups of instructions iteratively to see the
impact on performance and compile-time.
- live-range editor may have utilities to help with code motion.
- lazy-code-motion may be a good long term solution, but no one seems
to be actively working on it.

- Announcements:
- flang call moved so we are no longer in conflict!

- Philip is working on making loop vectorizer robust in the face of
multiple exits. There are two subproblems
1. vectorizer currently gives up because scev is not giving exit
counts (due to a bug?). This is relatively easy to fix and
Philip will have a patch for it soon.
2. loop exit cannot be analyzed due to data dependent exit, which
is currently handled via predication. There is a lot of room
for improvement, specially for read-only loops.
Please let him know if you are interested.


- Status Updates
- Data Dependence Graph (https://reviews.llvm.org/D65350) (Bardia)
- All review comments are addressed. Waiting for approval.
- Bugzilla bugs update (Vivek)
- Florian has a patch fixing loop bugs related to max trip count.

----------------------------
Tentative Agenda for Sept 25
----------------------------

Presentation from Marc Moreno Maza about his work on delinearization.

- Status Updates
- Follow up on multi-dimensional array indexing RFC (Siddharth)
- Impact of Loop Rotation on existing passes (Min-Yih)
- Data Dependence Graph (https://reviews.llvm.org/D65350) (Bardia)
- Bugzilla bugs update (Vivek)
- Others?


Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Jeremy Morse via llvm-dev

Thanks Florian.

Tim you said:
> Some cases can be undone by rematerialization, but not all, and it can involve a lot of effort which increases compile time.

Do you have examples of cases where rematerialization is not possible? We are interested in learning about any previous attempts at trying to address the issue in RA. Have you tried it?

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab
[hidden email] (905) 413-2336



Inactive hide details for Florian Hahn ---2019/09/13 11:16:01 AM---Hi, > On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-devFlorian Hahn ---2019/09/13 11:16:01 AM---Hi, > On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:

From: Florian Hahn <[hidden email]>
To: Bardia Mahjour <[hidden email]>
Cc: via llvm-dev <[hidden email]>, [hidden email]
Date: 2019/09/13 11:16 AM
Subject: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019
Sent by: [hidden email]





Hi,

      On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:

      ---------------------------
      Wed, Sep 11, 2019:
      ---------------------------


      - LICM vs Loop Sink Strategy (Whitney)
      - LICM and SCEV expander host code with no regards to increased
      live-ranges. This is a long standing issue where historically
      preference has been to keep LICM more aggressive.


This issue also motivated adding metadata to disable LICM (llvm.loop.licm.disable) recently. https://reviews.llvm.org/D64557
      - Two questions from IBM side:
      a. This problem is not specific to the POWER platform, so we are
      wondering if other people are interested?
      - b. Where would be the best place to address this issue?
      - Since it's hard to come up with an accurate register pressure
      estimator in opt, it's probably better to be done fairly late,
      maybe after instruction scheduling.
      - A good place to start would be instruction re-materialization in
      the register allocator.
      - Problem is the logic in the register allocator can deal with a
      single instruction (instead of groups of instructions) at a time.
      - Start by handling one single-instruction at a time and apply the
      same logic to groups of instructions iteratively to see the
      impact on performance and compile-time.
      - live-range editor may have utilities to help with code motion.
      - lazy-code-motion may be a good long term solution, but no one seems
      to be actively working on it.


      - Announcements:
      - flang call moved so we are no longer in conflict!


      - Philip is working on making loop vectorizer robust in the face of
      multiple exits. There are two subproblems
      1. vectorizer currently gives up because scev is not giving exit
      counts (due to a bug?). This is relatively easy to fix and
      Philip will have a patch for it soon.
      2. loop exit cannot be analyzed due to data dependent exit, which
      is currently handled via predication. There is a lot of room
      for improvement, specially for read-only loops.
      Please let him know if you are interested.



      - Status Updates
      - Data Dependence Graph (
      https://reviews.llvm.org/D65350) (Bardia)
      - All review comments are addressed. Waiting for approval.
      - Bugzilla bugs update (Vivek)
      - Florian has a patch fixing loop bugs related to max trip count.


      ----------------------------
      Tentative Agenda for Sept 25
      ----------------------------


      Presentation from Marc Moreno Maza about his work on delinearization.


      - Status Updates
      - Follow up on multi-dimensional array indexing RFC (Siddharth)
      - Impact of Loop Rotation on existing passes (Min-Yih)
      - Data Dependence Graph (
      https://reviews.llvm.org/D65350) (Bardia)
      - Bugzilla bugs update (Vivek)
      - Others?



      Bardia Mahjour
      Compiler Optimizations
      IBM Toronto Software Lab

      _______________________________________________
      LLVM Developers mailing list
      [hidden email]
      https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Jeremy Morse via llvm-dev
Sorry for reviving this old thread.
Is this the case that you are talking about?
void use(int *);
void f(int *p) {
  for (int i = 0; i < 1000; ++i) {
    use(p);
    use(p + 1);
    use(p + 2);
    use(p + 3);
  }
}

LICM hoists all the (p + N) computations out of the loop, and there is
nothing that could sink them back.
entry:
  %add.ptr = getelementptr inbounds i32, i32* %p, i64 1
  %add.ptr1 = getelementptr inbounds i32, i32* %p, i64 2
  %add.ptr2 = getelementptr inbounds i32, i32* %p, i64 3
...
for.body:
...
  tail call void @_Z3usePi(i32* %p)
  tail call void @_Z3usePi(i32* nonnull %add.ptr)
  tail call void @_Z3usePi(i32* nonnull %add.ptr1)
  tail call void @_Z3usePi(i32* nonnull %add.ptr2)

With more calls to use(), these common expressions will be
pre-computed, spilled and then reloaded inside the loop. Each
individual instruction is not profitable to sink or rematerialize in
the loop, because that would simply reduce the liverange of (p+N) at
the cost of extending the liverange of (p).

I see this problem in ARM MTE stack instrumentation. We use a virtual
frame pointer there which makes all local variable access look like
(p+N) in the above example.

On Fri, Sep 13, 2019 at 8:36 AM Bardia Mahjour via llvm-dev
<[hidden email]> wrote:

>
> Thanks Florian.
>
> Tim you said:
> > Some cases can be undone by rematerialization, but not all, and it can involve a lot of effort which increases compile time.
>
> Do you have examples of cases where rematerialization is not possible? We are interested in learning about any previous attempts at trying to address the issue in RA. Have you tried it?
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
> [hidden email] (905) 413-2336
>
>
>
> Florian Hahn ---2019/09/13 11:16:01 AM---Hi, > On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:
>
> From: Florian Hahn <[hidden email]>
> To: Bardia Mahjour <[hidden email]>
> Cc: via llvm-dev <[hidden email]>, [hidden email]
> Date: 2019/09/13 11:16 AM
> Subject: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019
> Sent by: [hidden email]
>
> ________________________________
>
>
>
> Hi,
>
> On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:
>
> ---------------------------
> Wed, Sep 11, 2019:
> ---------------------------
>
> - LICM vs Loop Sink Strategy (Whitney)
> - LICM and SCEV expander host code with no regards to increased
> live-ranges. This is a long standing issue where historically
> preference has been to keep LICM more aggressive.
>
>
> This issue also motivated adding metadata to disable LICM (llvm.loop.licm.disable) recently. https://reviews.llvm.org/D64557
>
> - Two questions from IBM side:
> a. This problem is not specific to the POWER platform, so we are
> wondering if other people are interested?
> - b. Where would be the best place to address this issue?
> - Since it's hard to come up with an accurate register pressure
> estimator in opt, it's probably better to be done fairly late,
> maybe after instruction scheduling.
> - A good place to start would be instruction re-materialization in
> the register allocator.
> - Problem is the logic in the register allocator can deal with a
> single instruction (instead of groups of instructions) at a time.
> - Start by handling one single-instruction at a time and apply the
> same logic to groups of instructions iteratively to see the
> impact on performance and compile-time.
> - live-range editor may have utilities to help with code motion.
> - lazy-code-motion may be a good long term solution, but no one seems
> to be actively working on it.
>
> - Announcements:
> - flang call moved so we are no longer in conflict!
>
> - Philip is working on making loop vectorizer robust in the face of
> multiple exits. There are two subproblems
> 1. vectorizer currently gives up because scev is not giving exit
> counts (due to a bug?). This is relatively easy to fix and
> Philip will have a patch for it soon.
> 2. loop exit cannot be analyzed due to data dependent exit, which
> is currently handled via predication. There is a lot of room
> for improvement, specially for read-only loops.
> Please let him know if you are interested.
>
>
> - Status Updates
> - Data Dependence Graph (https://reviews.llvm.org/D65350) (Bardia)
> - All review comments are addressed. Waiting for approval.
> - Bugzilla bugs update (Vivek)
> - Florian has a patch fixing loop bugs related to max trip count.
>
> ----------------------------
> Tentative Agenda for Sept 25
> ----------------------------
>
> Presentation from Marc Moreno Maza about his work on delinearization.
>
> - Status Updates
> - Follow up on multi-dimensional array indexing RFC (Siddharth)
> - Impact of Loop Rotation on existing passes (Min-Yih)
> - Data Dependence Graph (https://reviews.llvm.org/D65350) (Bardia)
> - Bugzilla bugs update (Vivek)
> - Others?
>
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Jeremy Morse via llvm-dev

Hi Evgenii,

The specific issue that we ran into turned out to be related to expansion of a remainder instruction which caused it to not be considered by RA rematerialization. However the example you provided falls into the general category of problem with LICM and live range extension, which is where we started from. I don't know the details but looks like when determining the cost of a sink or rematerialization we need to take a more holistic view than doing it on an instruction by instruction bases. Is that possible?

Adding Hussain to the discussion as well.

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab


Inactive hide details for Evgenii Stepanov ---2020/01/07 02:15:52 PM---Sorry for reviving this old thread. Is this the case thaEvgenii Stepanov ---2020/01/07 02:15:52 PM---Sorry for reviving this old thread. Is this the case that you are talking about?

From: Evgenii Stepanov <[hidden email]>
To: Bardia Mahjour <[hidden email]>
Cc: Florian Hahn <[hidden email]>, LLVM Dev <[hidden email]>, [hidden email]
Date: 2020/01/07 02:15 PM
Subject: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019





Sorry for reviving this old thread.
Is this the case that you are talking about?
void use(int *);
void f(int *p) {
 for (int i = 0; i < 1000; ++i) {
   use(p);
   use(p + 1);
   use(p + 2);
   use(p + 3);
 }
}

LICM hoists all the (p + N) computations out of the loop, and there is
nothing that could sink them back.
entry:
 %add.ptr = getelementptr inbounds i32, i32* %p, i64 1
 %add.ptr1 = getelementptr inbounds i32, i32* %p, i64 2
 %add.ptr2 = getelementptr inbounds i32, i32* %p, i64 3
...
for.body:
...
 tail call void @_Z3usePi(i32* %p)
 tail call void @_Z3usePi(i32* nonnull %add.ptr)
 tail call void @_Z3usePi(i32* nonnull %add.ptr1)
 tail call void @_Z3usePi(i32* nonnull %add.ptr2)

With more calls to use(), these common expressions will be
pre-computed, spilled and then reloaded inside the loop. Each
individual instruction is not profitable to sink or rematerialize in
the loop, because that would simply reduce the liverange of (p+N) at
the cost of extending the liverange of (p).

I see this problem in ARM MTE stack instrumentation. We use a virtual
frame pointer there which makes all local variable access look like
(p+N) in the above example.

On Fri, Sep 13, 2019 at 8:36 AM Bardia Mahjour via llvm-dev
<[hidden email]> wrote:

>
> Thanks Florian.
>
> Tim you said:
> > Some cases can be undone by rematerialization, but not all, and it can involve a lot of effort which increases compile time.
>
> Do you have examples of cases where rematerialization is not possible? We are interested in learning about any previous attempts at trying to address the issue in RA. Have you tried it?
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
> [hidden email] (905) 413-2336
>
>
>
> Florian Hahn ---2019/09/13 11:16:01 AM---Hi, > On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:
>
> From: Florian Hahn <[hidden email]>
> To: Bardia Mahjour <[hidden email]>
> Cc: via llvm-dev <[hidden email]>, [hidden email]
> Date: 2019/09/13 11:16 AM
> Subject: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019
> Sent by: [hidden email]
>
> ________________________________
>
>
>
> Hi,
>
> On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:
>
> ---------------------------
> Wed, Sep 11, 2019:
> ---------------------------
>
> - LICM vs Loop Sink Strategy (Whitney)
> - LICM and SCEV expander host code with no regards to increased
> live-ranges. This is a long standing issue where historically
> preference has been to keep LICM more aggressive.
>
>
> This issue also motivated adding metadata to disable LICM (llvm.loop.licm.disable) recently.
https://reviews.llvm.org/D64557 
>
> - Two questions from IBM side:
> a. This problem is not specific to the POWER platform, so we are
> wondering if other people are interested?
> - b. Where would be the best place to address this issue?
> - Since it's hard to come up with an accurate register pressure
> estimator in opt, it's probably better to be done fairly late,
> maybe after instruction scheduling.
> - A good place to start would be instruction re-materialization in
> the register allocator.
> - Problem is the logic in the register allocator can deal with a
> single instruction (instead of groups of instructions) at a time.
> - Start by handling one single-instruction at a time and apply the
> same logic to groups of instructions iteratively to see the
> impact on performance and compile-time.
> - live-range editor may have utilities to help with code motion.
> - lazy-code-motion may be a good long term solution, but no one seems
> to be actively working on it.
>
> - Announcements:
> - flang call moved so we are no longer in conflict!
>
> - Philip is working on making loop vectorizer robust in the face of
> multiple exits. There are two subproblems
> 1. vectorizer currently gives up because scev is not giving exit
> counts (due to a bug?). This is relatively easy to fix and
> Philip will have a patch for it soon.
> 2. loop exit cannot be analyzed due to data dependent exit, which
> is currently handled via predication. There is a lot of room
> for improvement, specially for read-only loops.
> Please let him know if you are interested.
>
>
> - Status Updates
> - Data Dependence Graph (
https://reviews.llvm.org/D65350 ) (Bardia)
> - All review comments are addressed. Waiting for approval.
> - Bugzilla bugs update (Vivek)
> - Florian has a patch fixing loop bugs related to max trip count.
>
> ----------------------------
> Tentative Agenda for Sept 25
> ----------------------------
>
> Presentation from Marc Moreno Maza about his work on delinearization.
>
> - Status Updates
> - Follow up on multi-dimensional array indexing RFC (Siddharth)
> - Impact of Loop Rotation on existing passes (Min-Yih)
> - Data Dependence Graph (
https://reviews.llvm.org/D65350 ) (Bardia)
> - Bugzilla bugs update (Vivek)
> - Others?
>
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 





_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Jeremy Morse via llvm-dev
In reply to this post by Jeremy Morse via llvm-dev
Hi Evgenii,
 
As Bardia mentioned, I had started work in the direction of extending LiveRangeEdit to allow grouped remat decisions, in response to an srem/urem remat issue which turned out to be more easily solved by modifying the SDAG combiner.
 
This is now done for the most part, so I am going back to complete the LRE work, going off of the discussion here: http://lists.llvm.org/pipermail/llvm-dev/2016-December/107718.html and followed up here: http://llvm.1065342.n5.nabble.com/llvm-dev-Register-Rematerialization-td119906.html.
 
The goal is to allow LRE to make decisions that (i) take into account the live ranges of multiple independent values in the same basic block, or (ii) remat a group of dependent instructions in one basic block.
 
The former should solve the case you are dealing with, and I would be happy to work with you to test the solution with ARM MTE once I'm done implementing it.
 
Cheers,
Hussain
 
----- Original message -----
From: Bardia Mahjour/Toronto/IBM
To: Evgenii Stepanov <[hidden email]>
Cc: Florian Hahn <[hidden email]>, LLVM Dev <[hidden email]>, [hidden email], Hussain Kadhem/Canada/IBM@IBM
Subject: Re: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019
Date: Thu, Jan 9, 2020 11:13 AM
 
Hi Evgenii,

The specific issue that we ran into turned out to be related to expansion of a remainder instruction which caused it to not be considered by RA rematerialization. However the example you provided falls into the general category of problem with LICM and live range extension, which is where we started from. I don't know the details but looks like when determining the cost of a sink or rematerialization we need to take a more holistic view than doing it on an instruction by instruction bases. Is that possible?

Adding Hussain to the discussion as well.

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab



Inactive hide details for Evgenii Stepanov ---2020/01/07 02:15:52 PM---Sorry for reviving this old thread. Is this the case thaEvgenii Stepanov ---2020/01/07 02:15:52 PM---Sorry for reviving this old thread. Is this the case that you are talking about?

From: Evgenii Stepanov <[hidden email]>
To: Bardia Mahjour <[hidden email]>
Cc: Florian Hahn <[hidden email]>, LLVM Dev <[hidden email]>, [hidden email]
Date: 2020/01/07 02:15 PM
Subject: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019



Sorry for reviving this old thread.
Is this the case that you are talking about?
void use(int *);
void f(int *p) {
 for (int i = 0; i < 1000; ++i) {
   use(p);
   use(p + 1);
   use(p + 2);
   use(p + 3);
 }
}

LICM hoists all the (p + N) computations out of the loop, and there is
nothing that could sink them back.
entry:
 %add.ptr = getelementptr inbounds i32, i32* %p, i64 1
 %add.ptr1 = getelementptr inbounds i32, i32* %p, i64 2
 %add.ptr2 = getelementptr inbounds i32, i32* %p, i64 3
...
for.body:
...
 tail call void @_Z3usePi(i32* %p)
 tail call void @_Z3usePi(i32* nonnull %add.ptr)
 tail call void @_Z3usePi(i32* nonnull %add.ptr1)
 tail call void @_Z3usePi(i32* nonnull %add.ptr2)

With more calls to use(), these common expressions will be
pre-computed, spilled and then reloaded inside the loop. Each
individual instruction is not profitable to sink or rematerialize in
the loop, because that would simply reduce the liverange of (p+N) at
the cost of extending the liverange of (p).

I see this problem in ARM MTE stack instrumentation. We use a virtual
frame pointer there which makes all local variable access look like
(p+N) in the above example.

On Fri, Sep 13, 2019 at 8:36 AM Bardia Mahjour via llvm-dev
<[hidden email]> wrote:

>
> Thanks Florian.
>
> Tim you said:
> > Some cases can be undone by rematerialization, but not all, and it can involve a lot of effort which increases compile time.
>
> Do you have examples of cases where rematerialization is not possible? We are interested in learning about any previous attempts at trying to address the issue in RA. Have you tried it?
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
> [hidden email] (905) 413-2336
>
>
>
> Florian Hahn ---2019/09/13 11:16:01 AM---Hi, > On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:
>
> From: Florian Hahn <[hidden email]>
> To: Bardia Mahjour <[hidden email]>
> Cc: via llvm-dev <[hidden email]>, [hidden email]
> Date: 2019/09/13 11:16 AM
> Subject: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019
> Sent by: [hidden email]
>
> ________________________________
>
>
>
> Hi,
>
> On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev <[hidden email]> wrote:
>
> ---------------------------
> Wed, Sep 11, 2019:
> ---------------------------
>
> - LICM vs Loop Sink Strategy (Whitney)
> - LICM and SCEV expander host code with no regards to increased
> live-ranges. This is a long standing issue where historically
> preference has been to keep LICM more aggressive.
>
>
> This issue also motivated adding metadata to disable LICM (llvm.loop.licm.disable) recently.
https://reviews.llvm.org/D64557 
>
> - Two questions from IBM side:
> a. This problem is not specific to the POWER platform, so we are
> wondering if other people are interested?
> - b. Where would be the best place to address this issue?
> - Since it's hard to come up with an accurate register pressure
> estimator in opt, it's probably better to be done fairly late,
> maybe after instruction scheduling.
> - A good place to start would be instruction re-materialization in
> the register allocator.
> - Problem is the logic in the register allocator can deal with a
> single instruction (instead of groups of instructions) at a time.
> - Start by handling one single-instruction at a time and apply the
> same logic to groups of instructions iteratively to see the
> impact on performance and compile-time.
> - live-range editor may have utilities to help with code motion.
> - lazy-code-motion may be a good long term solution, but no one seems
> to be actively working on it.
>
> - Announcements:
> - flang call moved so we are no longer in conflict!
>
> - Philip is working on making loop vectorizer robust in the face of
> multiple exits. There are two subproblems
> 1. vectorizer currently gives up because scev is not giving exit
> counts (due to a bug?). This is relatively easy to fix and
> Philip will have a patch for it soon.
> 2. loop exit cannot be analyzed due to data dependent exit, which
> is currently handled via predication. There is a lot of room
> for improvement, specially for read-only loops.
> Please let him know if you are interested.
>
>
> - Status Updates
> - Data Dependence Graph (
https://reviews.llvm.org/D65350 ) (Bardia)
> - All review comments are addressed. Waiting for approval.
> - Bugzilla bugs update (Vivek)
> - Florian has a patch fixing loop bugs related to max trip count.
>
> ----------------------------
> Tentative Agenda for Sept 25
> ----------------------------
>
> Presentation from Marc Moreno Maza about his work on delinearization.
>
> - Status Updates
> - Follow up on multi-dimensional array indexing RFC (Siddharth)
> - Impact of Loop Rotation on existing passes (Min-Yih)
> - Data Dependence Graph (
https://reviews.llvm.org/D65350 ) (Bardia)
> - Bugzilla bugs update (Vivek)
> - Others?
>
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 

 
 


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev