[llvm-dev] [RFC] Abstract Parallel IR Optimizations

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[llvm-dev] [RFC] Abstract Parallel IR Optimizations

Tim Northover via llvm-dev
This is an RFC to add analyses and transformation passes into LLVM to
optimize programs based on an abstract notion of a parallel region.

  == this is _not_ a proposal to add a new encoding of parallelism ==

We currently perform poorly when it comes to optimizations for parallel
codes. In fact, parallelizing your loops might actually prevent various
optimizations that would have been applied otherwise. One solution to
this problem is to teach the compiler about the semantics of the used
parallel representation. While this sounds tedious at first, it turns
out that we can perform key optimizations with reasonable implementation
effort (and thereby also reasonable maintenance costs). However, we have
various parallel representations that are already in use (KMPC,
GOMP, CILK runtime, ...) or proposed (Tapir, IntelPIR, ...).

Our proposal seeks to introduce parallelism specific optimizations for
multiple representations while minimizing the implementation overhead.
This is done through an abstract notion of a parallel region which hides
the actual representation from the analysis and optimization passes. In
the schemata below, our current five optimizations (described in detail
here [0]) are shown on the left, the abstract parallel IR interface is
is in the middle, and the representation specific implementations is on
the right.

         Optimization          (A)nalysis/(T)ransformation         Impl.
     CodePlacementOpt \  /---> ParallelRegionInfo (A) ---------|-> KMPCImpl (A)
       RegionExpander -\ |                                     |   GOMPImpl (A)
   AttributeAnnotator -|-|---> ParallelCommunicationInfo (A) --/   ...
   BarrierElimination -/ |
VariablePrivatization /  \---> ParallelIR/Builder (T) -----------> KMPCImpl (T)

In our setting, a parallel region can be an outlined function called
through a runtime library but also a fork-join/attach-reattach region
embedded in an otherwise sequential code. The new optimizations will
provide parallelism specific optimizations to all of them (if
applicable). There are various reasons why we believe this is a
worthwhile effort that belongs into the LLVM codebase, including:

  1) We improve the performance of parallel programs, today.
  2) It serves as a meaningful baseline for future discussions on
     (optimized) parallel representations.
  3) It allows to determine the pros and cons of the different schemes
     when it comes to actual optimizations and inputs.
  4) It helps to identify problems that might arise once we start to
     transform parallel programs but _before_ we commit to a specific

Our prototypes for the OpenMP KMPC library (used by clang) already shows
significant speedups for various benchmarks [0]. It also exposed a (to
me) prior unknown problem between restrict/noalias pointers and
(potential) barriers (see Section 3 in [0]).

We are currently in the process of cleaning the code, extending the
support for OpenMP constructs and adding a second implementation for a
embedded parallel regions. Though, a first horizontal prototype
implementation is already available for review [1].

Inputs of any kind are welcome and reviewers are needed!


[0] http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
[1] https://reviews.llvm.org/D47300

  Sorry if you received this message multiple times!


Johannes Doerfert
PhD Student / Researcher

Compiler Design Lab (Professor Hack) / Argonne National Laboratory
Saarland Informatics Campus, Germany / Lemont, IL 60439, USA
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : [hidden email] / [hidden email]
Fax. +49 (0)681 302-3065  : http://www.cdl.uni-saarland.de/people/doerfert

LLVM Developers mailing list
[hidden email]

signature.asc (235 bytes) Download Attachment