[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Introduction
-----------------
Currently llvm-mca only accepts assembly code as input. We would like to
extend llvm-mca to support object files, allowing users to analyze the
performance of binaries. The proposed changes (which involve both
clang and llvm) optionally introduce an object file section, but this can be
stripped-out if desired.

For the llvm-mca binary support feature to be useful, a user needs to tell
llvm-mca which portions of their code they would like analyzed. Currently,
this is accomplished via assembly comments. However, assembly comments are not
preserved in object files, and this has encouraged this RFC. For the proposed
binary support, we need to introduce changes to clang and llvm to allow the
user's object code to be recognized by llvm-mca:

* We need a way for a user to identify a region/block of code they want
   analyzed by llvm-mca.
* We need the information defining the user's region of code to be maintained
   in the object file so that llvm-mca can analyze the desired region(s) from the
   object file.

We define a "code region" as a subset of a user's program that is to be
analyzed via llvm-mca. The sequence of instructions to be analyzed is
represented as a pair: <start, end> where the 'start' marks the beginning of
the user's source code and 'end' terminates the sequence. The instructions
between 'start' and 'end' form the region that can be analyzed by llvm-mca at a
later time.

Example
-----------
Before we go into the details of this proposed change, let's first look at a
simple example:

// example.c -- Analyze a dot-product expression.
double test(double x, double y) {
   double result = 0.0;
   __mca_code_region_start(42);
   result += x * y;
   __mca_code_region_end();
   return result;
}

In the example above, we have identified a code region, in this case a single
dot-product expression. For the sake of brevity and simplicity, we've chosen
a very simple example, but in reality a more complicated example could use
multiple expressions. We have also denoted this region as number 42. That
identifier is only for the user, and simplifies reading an llvm-mca analysis
report later.

When this code is compiled, the region markers (the mca_code_region markers)
are transformed into assembly labels. While the markers are presented as
function calls, in reality they are no-ops.

test:
pushq %rbp
movq %rsp, %rbp
movsd %xmm0, -8(%rbp)
movsd %xmm1, -16(%rbp)
.Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
xorps %xmm0, %xmm0
movsd %xmm0, -24(%rbp)
movsd -8(%rbp), %xmm0
mulsd -16(%rbp), %xmm0
addsd -24(%rbp), %xmm0
movsd %xmm0, -24(%rbp)
.Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
movsd -24(%rbp), %xmm0
popq %rbp
retq
.section .mca_code_regions,"",@progbits
.quad 42
.quad .Lmca_code_region_start_0
.quad .Lmca_code_region_end_0-.Lmca_code_region_start_0

The assembly has been trimmed to show the portions relevant to this RFC.
Notice the labels enclose the user's defined region, and that they preserve the
user's arbitrary region identifier, the ever-so-important region 42.

In the object file section .mca_code_regions, we have noted the user's region
identifier (.quad 42), start address, and region size. A more complicated
example can have multiple regions defined within a single .mca_code_regions
section. This section can be read by llvm-mca, allowing llvm-mca to take
object files as input instead of assembly source.

Details
---------
We need a way for a user to identify a region/block of code they want analyzed
by llvm-mca. We solve this problem by introducing two intrinsics that a user can
specify, for identifying regions of code for analysis.

The two intrinsics are: llvm.mca.code.regions.start and
llvm.mca.code.regions.end. A user can identify a code region by inserting the
mca_code_region_start and mca_code_region_end markers. These are simply
clang builtins and are transformed into the aforementioned intrinsics during
compilation. The code between the intrinsics are what we call "code regions"
and are to be easily identifiable by llvm-mca; any code between a start/end
pair can be analyzed by llvm-mca at a later time. A user can define multiple
non-overlapping code regions within their program.

The llvm.mca.code.region.start intrinsic takes an integer constant as its only
argument. This argument is implemented as a metadata i32, and is only used
when generating llvm-mca reports. This value allows a user to more easily
identify a specific code region. llvm.mca.code.region.end takes no arguments.
Since we disallow nesting of regions, the first 'end' intrinsic lexically
following a 'start' intrinsic represents the end of that code region.

Now that we have a solution for identifying regions for analysis, we now need a
way for preserving that information to be read at a later time. To accomplish
this we propose adding a new section (.mca_code_regions) to the object file
generated by llvm. During code generation, the start/end intrinsics described
above will be transformed into start/end labels in assembly. When llvm
generates the object file from the user's code, these start/end labels form a
pair of values identifying the start of the user's code region, and size. The
size represents the number of bytes between the start and end address of the
labels. Note that the labels are emitted during assembly printing. We hope
that these labels have no influence on code generation or basic-block
placement. However, the target assembler strategy for handling labels is
outside of our control.

This proposed change affects the size of a binary, but only if the user calls
the start/end builtins mentioned above. The additional size of the
.mca_code_regions section, which we imagine to be very small (to the order of a
few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'.

Implementation Status
------------------------------
We currently have the proposed changes implemented at the url posted below.
This initial patch only targets ELF object files, and does not handle
relocatable addresses. Since the start of a code region is represented as an
assembly label, and referenced in the .mca_code_regions section, that address
is relocatable. That value can be represented as section-relative relocatable
symbol (.text + addend), but we are not handling that case yet. Instead, the
proposed changes only handle linked/executable object files.

For purposes of review and to communicate the idea, the change is
presented as a monolithic patch here:

https://reviews.llvm.org/D54603

The change is presented as a monolithic patch; however, if accepted
the patch will be split into three smaller patches:
1. The introduction of the builtins to clang.
2. The llvm portion (the added intrinsics).
3. The llvm-mca portion.

Thanks!

-Matt
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
I would really like to add this feature to llvm-mca.

I have spoken off-line with Matt mutiple time about this feature multiple times.
I am happy with the suggested approach. Matt prototyped it, and it seems to work okay for us.

However, it would be really nice to get feedback from somebody else (not necessarily people involved in the llvm-mca project).
For example, I am interested in what people think about the whole design (i.e. the idea of introducing two new intrinsics, and generating the information in a separate section of the binary object file).

About the suggested design:
I like the idea of being able to identify code regions using a numeric identifier.
However, what happens if a code region spans through multiple basic blocks?

My understanding is that code regions are not allowed to overlap. So, it makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
However, what if ` __mca_code_region_end()` ends in a different basic block?

`__mca_code_region_start()` has to always dominate ` __mca_code_region_end()`. This is trivial to verify when both calls are in a same basic block; however, we need to make sure that the relationship is still the same when the `end()` call is in a different basic block.
That would not be enough. I think we should also verify  that ` __mca_code_region_end()` always post-dominates the call to `__mca_code_region_start()`.

My question is: what happens with basic block reordering? We don't know the layout of basic blocks until we reach code emission. How does it work for regions that span through multiple basic blocks?. I think your RFC should clarify this aspect.

As a side note: at the moment, llvm-mca doesn't know how to deal with branches. So, for simplicity we could force code regions to only contain instructions from a single basic block.

However, In future we may want to teach llvm-mca how to analyze branchy code too. For example, we could introduce a simple control-flow analysis in llvm-mca, and use an external "branch trace" information (for example, a perf trace generated by an external tool) to decorate branches with with branch probabilities (similarly to what we currently do in LLVM with PGO). We could then use that knowledge to model branch prediction and simulate what happens in the presence of multiple branches.

So, the idea of having regions that potentially span multiple basic blocks is not bad in general. However, I think you should better clarify what are the constraints (at least, you should answer to my questions from before).

If we decide to use those new intrinsics, then those should be experimental (at least to start).



On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <[hidden email]> wrote:
Introduction
-----------------
Currently llvm-mca only accepts assembly code as input. We would like to
extend llvm-mca to support object files, allowing users to analyze the
performance of binaries. The proposed changes (which involve both
clang and llvm) optionally introduce an object file section, but this can be
stripped-out if desired.

For the llvm-mca binary support feature to be useful, a user needs to tell
llvm-mca which portions of their code they would like analyzed. Currently,
this is accomplished via assembly comments. However, assembly comments are not
preserved in object files, and this has encouraged this RFC. For the proposed
binary support, we need to introduce changes to clang and llvm to allow the
user's object code to be recognized by llvm-mca:

* We need a way for a user to identify a region/block of code they want
   analyzed by llvm-mca.
* We need the information defining the user's region of code to be maintained
   in the object file so that llvm-mca can analyze the desired region(s) from the
   object file.

We define a "code region" as a subset of a user's program that is to be
analyzed via llvm-mca. The sequence of instructions to be analyzed is
represented as a pair: <start, end> where the 'start' marks the beginning of
the user's source code and 'end' terminates the sequence. The instructions
between 'start' and 'end' form the region that can be analyzed by llvm-mca at a
later time.

Example
-----------
Before we go into the details of this proposed change, let's first look at a
simple example:

// example.c -- Analyze a dot-product expression.
double test(double x, double y) {
   double result = 0.0;
   __mca_code_region_start(42);
   result += x * y;
   __mca_code_region_end();
   return result;
}

In the example above, we have identified a code region, in this case a single
dot-product expression. For the sake of brevity and simplicity, we've chosen
a very simple example, but in reality a more complicated example could use
multiple expressions. We have also denoted this region as number 42. That
identifier is only for the user, and simplifies reading an llvm-mca analysis
report later.

When this code is compiled, the region markers (the mca_code_region markers)
are transformed into assembly labels. While the markers are presented as
function calls, in reality they are no-ops.

test:
pushq   %rbp
movq    %rsp, %rbp
movsd   %xmm0, -8(%rbp)
movsd   %xmm1, -16(%rbp)
.Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
xorps   %xmm0, %xmm0
movsd   %xmm0, -24(%rbp)
movsd   -8(%rbp), %xmm0
mulsd   -16(%rbp), %xmm0
addsd   -24(%rbp), %xmm0
movsd   %xmm0, -24(%rbp)
.Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
movsd   -24(%rbp), %xmm0
popq    %rbp
retq
.section        .mca_code_regions,"",@progbits
.quad   42
.quad   .Lmca_code_region_start_0
.quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0

The assembly has been trimmed to show the portions relevant to this RFC.
Notice the labels enclose the user's defined region, and that they preserve the
user's arbitrary region identifier, the ever-so-important region 42.

In the object file section .mca_code_regions, we have noted the user's region
identifier (.quad 42), start address, and region size. A more complicated
example can have multiple regions defined within a single .mca_code_regions
section. This section can be read by llvm-mca, allowing llvm-mca to take
object files as input instead of assembly source.

Details
---------
We need a way for a user to identify a region/block of code they want analyzed
by llvm-mca. We solve this problem by introducing two intrinsics that a user can
specify, for identifying regions of code for analysis.

The two intrinsics are: llvm.mca.code.regions.start and
llvm.mca.code.regions.end. A user can identify a code region by inserting the
mca_code_region_start and mca_code_region_end markers. These are simply
clang builtins and are transformed into the aforementioned intrinsics during
compilation. The code between the intrinsics are what we call "code regions"
and are to be easily identifiable by llvm-mca; any code between a start/end
pair can be analyzed by llvm-mca at a later time. A user can define multiple
non-overlapping code regions within their program.

The llvm.mca.code.region.start intrinsic takes an integer constant as its only
argument. This argument is implemented as a metadata i32, and is only used
when generating llvm-mca reports. This value allows a user to more easily
identify a specific code region. llvm.mca.code.region.end takes no arguments.
Since we disallow nesting of regions, the first 'end' intrinsic lexically
following a 'start' intrinsic represents the end of that code region.

Now that we have a solution for identifying regions for analysis, we now need a
way for preserving that information to be read at a later time. To accomplish
this we propose adding a new section (.mca_code_regions) to the object file
generated by llvm. During code generation, the start/end intrinsics described
above will be transformed into start/end labels in assembly. When llvm
generates the object file from the user's code, these start/end labels form a
pair of values identifying the start of the user's code region, and size. The
size represents the number of bytes between the start and end address of the
labels. Note that the labels are emitted during assembly printing. We hope
that these labels have no influence on code generation or basic-block
placement. However, the target assembler strategy for handling labels is
outside of our control.

This proposed change affects the size of a binary, but only if the user calls
the start/end builtins mentioned above. The additional size of the
.mca_code_regions section, which we imagine to be very small (to the order of a
few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'.

Implementation Status
------------------------------
We currently have the proposed changes implemented at the url posted below.
This initial patch only targets ELF object files, and does not handle
relocatable addresses. Since the start of a code region is represented as an
assembly label, and referenced in the .mca_code_regions section, that address
is relocatable. That value can be represented as section-relative relocatable
symbol (.text + addend), but we are not handling that case yet. Instead, the
proposed changes only handle linked/executable object files.

For purposes of review and to communicate the idea, the change is
presented as a monolithic patch here:

https://reviews.llvm.org/D54603

The change is presented as a monolithic patch; however, if accepted
the patch will be split into three smaller patches:
1. The introduction of the builtins to clang.
2. The llvm portion (the added intrinsics).
3. The llvm-mca portion.

Thanks!

-Matt
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Hi Andrea,

Thanks for your input.

On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
[... snip ...]
> About the suggested design:
> I like the idea of being able to identify code regions using a numeric
> identifier.
> However, what happens if a code region spans through multiple basic blocks?

The current patch does not take into consideration cases where the
region start and end intrinsics are placed in different basic blocks.
Such would be the case if a region is defined to span multiple blocks.
This would be similar to the current case where a user places a
#LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
another.  However, as you point out below, if the user does this in the
source code via intrinsics (just what this patch is proposing), then
there is a chance that optimizations might change the layout of the
instructions and confuse the ordering of the MCA intrinsics.

Since MCA does not follow branches (MCA just treats a branch as it would
a non-branching instruction), it seems that a user should be aware that
defining MCA code regions that span multiple blocks might result in an
unexpected analysis.  While we do not discourage this, it seems like
such a case will probably not produce an expected result for the user.
We could introduce a warning, or automatically divide the regions so
that a single region can only contain a single block.

> My understanding is that code regions are not allowed to overlap. So, it
> makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> However, what if ` __mca_code_region_end()` ends in a different basic block?
>
> `__mca_code_region_start()` has to always dominate `
> __mca_code_region_end()`. This is trivial to verify when both calls are in
> a same basic block; however, we need to make sure that the relationship is
> still the same when the `end()` call is in a different basic block.
> That would not be enough. I think we should also verify  that `
> __mca_code_region_end()` always post-dominates the call to
> `__mca_code_region_start()`.

In any case this patch should probably check dominance of the
intrinsics, even though MCA does not follow branches and MCA does not
not explicitly forbid a region from containing multiple blocks.

>
> My question is: what happens with basic block reordering? We don't know the
> layout of basic blocks until we reach code emission. How does it work for
> regions that span through multiple basic blocks?. I think your RFC should
> clarify this aspect.
>
> As a side note: at the moment, llvm-mca doesn't know how to deal with
> branches. So, for simplicity we could force code regions to only contain
> instructions from a single basic block.
>
> However, In future we may want to teach llvm-mca how to analyze branchy
> code too. For example, we could introduce a simple control-flow analysis in
> llvm-mca, and use an external "branch trace" information (for example, a
> perf trace generated by an external tool) to decorate branches with with
> branch probabilities (similarly to what we currently do in LLVM with PGO).
> We could then use that knowledge to model branch prediction and simulate
> what happens in the presence of multiple branches.
>
> So, the idea of having regions that potentially span multiple basic blocks
> is not bad in general. However, I think you should better clarify what are
> the constraints (at least, you should answer to my questions from before).

I agree! Thanks for pointing that out.

> If we decide to use those new intrinsics, then those should be experimental
> (at least to start).

Agreed.

-Matt

> On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <[hidden email]>
> wrote:
>
> > Introduction
> > -----------------
> > Currently llvm-mca only accepts assembly code as input. We would like to
> > extend llvm-mca to support object files, allowing users to analyze the
> > performance of binaries. The proposed changes (which involve both
> > clang and llvm) optionally introduce an object file section, but this can
> > be
> > stripped-out if desired.
> >
> > For the llvm-mca binary support feature to be useful, a user needs to tell
> > llvm-mca which portions of their code they would like analyzed. Currently,
> > this is accomplished via assembly comments. However, assembly comments are
> > not
> > preserved in object files, and this has encouraged this RFC. For the
> > proposed
> > binary support, we need to introduce changes to clang and llvm to allow the
> > user's object code to be recognized by llvm-mca:
> >
> > * We need a way for a user to identify a region/block of code they want
> >    analyzed by llvm-mca.
> > * We need the information defining the user's region of code to be
> > maintained
> >    in the object file so that llvm-mca can analyze the desired region(s)
> > from the
> >    object file.
> >
> > We define a "code region" as a subset of a user's program that is to be
> > analyzed via llvm-mca. The sequence of instructions to be analyzed is
> > represented as a pair: <start, end> where the 'start' marks the beginning
> > of
> > the user's source code and 'end' terminates the sequence. The instructions
> > between 'start' and 'end' form the region that can be analyzed by llvm-mca
> > at a
> > later time.
> >
> > Example
> > -----------
> > Before we go into the details of this proposed change, let's first look at
> > a
> > simple example:
> >
> > // example.c -- Analyze a dot-product expression.
> > double test(double x, double y) {
> >    double result = 0.0;
> >    __mca_code_region_start(42);
> >    result += x * y;
> >    __mca_code_region_end();
> >    return result;
> > }
> >
> > In the example above, we have identified a code region, in this case a
> > single
> > dot-product expression. For the sake of brevity and simplicity, we've
> > chosen
> > a very simple example, but in reality a more complicated example could use
> > multiple expressions. We have also denoted this region as number 42. That
> > identifier is only for the user, and simplifies reading an llvm-mca
> > analysis
> > report later.
> >
> > When this code is compiled, the region markers (the mca_code_region
> > markers)
> > are transformed into assembly labels. While the markers are presented as
> > function calls, in reality they are no-ops.
> >
> > test:
> > pushq   %rbp
> > movq    %rsp, %rbp
> > movsd   %xmm0, -8(%rbp)
> > movsd   %xmm1, -16(%rbp)
> > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > xorps   %xmm0, %xmm0
> > movsd   %xmm0, -24(%rbp)
> > movsd   -8(%rbp), %xmm0
> > mulsd   -16(%rbp), %xmm0
> > addsd   -24(%rbp), %xmm0
> > movsd   %xmm0, -24(%rbp)
> > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > movsd   -24(%rbp), %xmm0
> > popq    %rbp
> > retq
> > .section        .mca_code_regions,"",@progbits
> > .quad   42
> > .quad   .Lmca_code_region_start_0
> > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >
> > The assembly has been trimmed to show the portions relevant to this RFC.
> > Notice the labels enclose the user's defined region, and that they
> > preserve the
> > user's arbitrary region identifier, the ever-so-important region 42.
> >
> > In the object file section .mca_code_regions, we have noted the user's
> > region
> > identifier (.quad 42), start address, and region size. A more complicated
> > example can have multiple regions defined within a single .mca_code_regions
> > section. This section can be read by llvm-mca, allowing llvm-mca to take
> > object files as input instead of assembly source.
> >
> > Details
> > ---------
> > We need a way for a user to identify a region/block of code they want
> > analyzed
> > by llvm-mca. We solve this problem by introducing two intrinsics that a
> > user can
> > specify, for identifying regions of code for analysis.
> >
> > The two intrinsics are: llvm.mca.code.regions.start and
> > llvm.mca.code.regions.end. A user can identify a code region by inserting
> > the
> > mca_code_region_start and mca_code_region_end markers. These are simply
> > clang builtins and are transformed into the aforementioned intrinsics
> > during
> > compilation. The code between the intrinsics are what we call "code
> > regions"
> > and are to be easily identifiable by llvm-mca; any code between a start/end
> > pair can be analyzed by llvm-mca at a later time. A user can define
> > multiple
> > non-overlapping code regions within their program.
> >
> > The llvm.mca.code.region.start intrinsic takes an integer constant as its
> > only
> > argument. This argument is implemented as a metadata i32, and is only used
> > when generating llvm-mca reports. This value allows a user to more easily
> > identify a specific code region. llvm.mca.code.region.end takes no
> > arguments.
> > Since we disallow nesting of regions, the first 'end' intrinsic lexically
> > following a 'start' intrinsic represents the end of that code region.
> >
> > Now that we have a solution for identifying regions for analysis, we now
> > need a
> > way for preserving that information to be read at a later time. To
> > accomplish
> > this we propose adding a new section (.mca_code_regions) to the object file
> > generated by llvm. During code generation, the start/end intrinsics
> > described
> > above will be transformed into start/end labels in assembly. When llvm
> > generates the object file from the user's code, these start/end labels
> > form a
> > pair of values identifying the start of the user's code region, and size.
> > The
> > size represents the number of bytes between the start and end address of
> > the
> > labels. Note that the labels are emitted during assembly printing. We hope
> > that these labels have no influence on code generation or basic-block
> > placement. However, the target assembler strategy for handling labels is
> > outside of our control.
> >
> > This proposed change affects the size of a binary, but only if the user
> > calls
> > the start/end builtins mentioned above. The additional size of the
> > .mca_code_regions section, which we imagine to be very small (to the order
> > of a
> > few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'.
> >
> > Implementation Status
> > ------------------------------
> > We currently have the proposed changes implemented at the url posted below.
> > This initial patch only targets ELF object files, and does not handle
> > relocatable addresses. Since the start of a code region is represented as
> > an
> > assembly label, and referenced in the .mca_code_regions section, that
> > address
> > is relocatable. That value can be represented as section-relative
> > relocatable
> > symbol (.text + addend), but we are not handling that case yet. Instead,
> > the
> > proposed changes only handle linked/executable object files.
> >
> > For purposes of review and to communicate the idea, the change is
> > presented as a monolithic patch here:
> >
> > https://reviews.llvm.org/D54603
> >
> > The change is presented as a monolithic patch; however, if accepted
> > the patch will be split into three smaller patches:
> > 1. The introduction of the builtins to clang.
> > 2. The llvm portion (the added intrinsics).
> > 3. The llvm-mca portion.
> >
> > Thanks!
> >
> > -Matt
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
I want to clarify a few restrictions of llvm-mca code regions that this RFC proposes:

1) All llvm-mca code regions must start with an llvm.mca.code.region.start intrinsic and end with
an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR level in the IR verifier.

2) llvm-mca code regions cannot nest.  This restriction implies that an  llvm.mca.code.region.start
must have a llvm.mca.code.region.end intrinsic without any other llvm.mca start intrinsics
between the two. The current implementation in the patch enforces this restriction at the
IR level via the IR Verifier.

3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca  does not follow
branches (yet).  Instead, a branch instruction is treated by llvm-mca like any other instruction.
The current patch associated with this RFC does not enforce this restriction.  I plan on updating
the patch to enforce that a code region can only belong to a single basic block.  This is a simple
check, ensuring that both the llvm.mca.code.region.start and accompanying end intrinsics live
in the same basic block. I imagine adding this check at the IR level when we also verify points 1 and 2
above.  That will keep the code-region verification logic isolated to the IR verifier.  The start/end
intrinsics should not have any uses, so I'm not sure that they would be moved/sunk on behalf
of any other instruction.  In other words, I do not imagine that a start and end would be split
apart due to later MI optimizations.  If I discover that such a case occurs, then I might add the
basic-block check prior to emitting the code region data to the object file.    Once  llvm-mca  is
updated to handle branches, then we can remove this constraint.

-Matt

> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Matt Davis via llvm-
> dev
> Sent: Wednesday, November 21, 2018 8:47 AM
> To: Andrea Di Biagio <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
>
> Hi Andrea,
>
> Thanks for your input.
>
> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> [... snip ...]
> > About the suggested design:
> > I like the idea of being able to identify code regions using a numeric
> > identifier.
> > However, what happens if a code region spans through multiple basic blocks?
>
> The current patch does not take into consideration cases where the
> region start and end intrinsics are placed in different basic blocks.
> Such would be the case if a region is defined to span multiple blocks.
> This would be similar to the current case where a user places a
> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> another.  However, as you point out below, if the user does this in the
> source code via intrinsics (just what this patch is proposing), then
> there is a chance that optimizations might change the layout of the
> instructions and confuse the ordering of the MCA intrinsics.
>
> Since MCA does not follow branches (MCA just treats a branch as it would
> a non-branching instruction), it seems that a user should be aware that
> defining MCA code regions that span multiple blocks might result in an
> unexpected analysis.  While we do not discourage this, it seems like
> such a case will probably not produce an expected result for the user.
> We could introduce a warning, or automatically divide the regions so
> that a single region can only contain a single block.
>
> > My understanding is that code regions are not allowed to overlap. So, it
> > makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> > However, what if ` __mca_code_region_end()` ends in a different basic block?
> >
> > `__mca_code_region_start()` has to always dominate `
> > __mca_code_region_end()`. This is trivial to verify when both calls are in
> > a same basic block; however, we need to make sure that the relationship is
> > still the same when the `end()` call is in a different basic block.
> > That would not be enough. I think we should also verify  that `
> > __mca_code_region_end()` always post-dominates the call to
> > `__mca_code_region_start()`.
>
> In any case this patch should probably check dominance of the
> intrinsics, even though MCA does not follow branches and MCA does not
> not explicitly forbid a region from containing multiple blocks.
>
> >
> > My question is: what happens with basic block reordering? We don't know the
> > layout of basic blocks until we reach code emission. How does it work for
> > regions that span through multiple basic blocks?. I think your RFC should
> > clarify this aspect.
> >
> > As a side note: at the moment, llvm-mca doesn't know how to deal with
> > branches. So, for simplicity we could force code regions to only contain
> > instructions from a single basic block.
> >
> > However, In future we may want to teach llvm-mca how to analyze branchy
> > code too. For example, we could introduce a simple control-flow analysis in
> > llvm-mca, and use an external "branch trace" information (for example, a
> > perf trace generated by an external tool) to decorate branches with with
> > branch probabilities (similarly to what we currently do in LLVM with PGO).
> > We could then use that knowledge to model branch prediction and simulate
> > what happens in the presence of multiple branches.
> >
> > So, the idea of having regions that potentially span multiple basic blocks
> > is not bad in general. However, I think you should better clarify what are
> > the constraints (at least, you should answer to my questions from before).
>
> I agree! Thanks for pointing that out.
>
> > If we decide to use those new intrinsics, then those should be experimental
> > (at least to start).
>
> Agreed.
>
> -Matt
>
> > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <[hidden email]>
> > wrote:
> >
> > > Introduction
> > > -----------------
> > > Currently llvm-mca only accepts assembly code as input. We would like to
> > > extend llvm-mca to support object files, allowing users to analyze the
> > > performance of binaries. The proposed changes (which involve both
> > > clang and llvm) optionally introduce an object file section, but this can
> > > be
> > > stripped-out if desired.
> > >
> > > For the llvm-mca binary support feature to be useful, a user needs to tell
> > > llvm-mca which portions of their code they would like analyzed. Currently,
> > > this is accomplished via assembly comments. However, assembly comments are
> > > not
> > > preserved in object files, and this has encouraged this RFC. For the
> > > proposed
> > > binary support, we need to introduce changes to clang and llvm to allow the
> > > user's object code to be recognized by llvm-mca:
> > >
> > > * We need a way for a user to identify a region/block of code they want
> > >    analyzed by llvm-mca.
> > > * We need the information defining the user's region of code to be
> > > maintained
> > >    in the object file so that llvm-mca can analyze the desired region(s)
> > > from the
> > >    object file.
> > >
> > > We define a "code region" as a subset of a user's program that is to be
> > > analyzed via llvm-mca. The sequence of instructions to be analyzed is
> > > represented as a pair: <start, end> where the 'start' marks the beginning
> > > of
> > > the user's source code and 'end' terminates the sequence. The instructions
> > > between 'start' and 'end' form the region that can be analyzed by llvm-mca
> > > at a
> > > later time.
> > >
> > > Example
> > > -----------
> > > Before we go into the details of this proposed change, let's first look at
> > > a
> > > simple example:
> > >
> > > // example.c -- Analyze a dot-product expression.
> > > double test(double x, double y) {
> > >    double result = 0.0;
> > >    __mca_code_region_start(42);
> > >    result += x * y;
> > >    __mca_code_region_end();
> > >    return result;
> > > }
> > >
> > > In the example above, we have identified a code region, in this case a
> > > single
> > > dot-product expression. For the sake of brevity and simplicity, we've
> > > chosen
> > > a very simple example, but in reality a more complicated example could use
> > > multiple expressions. We have also denoted this region as number 42. That
> > > identifier is only for the user, and simplifies reading an llvm-mca
> > > analysis
> > > report later.
> > >
> > > When this code is compiled, the region markers (the mca_code_region
> > > markers)
> > > are transformed into assembly labels. While the markers are presented as
> > > function calls, in reality they are no-ops.
> > >
> > > test:
> > > pushq   %rbp
> > > movq    %rsp, %rbp
> > > movsd   %xmm0, -8(%rbp)
> > > movsd   %xmm1, -16(%rbp)
> > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > > xorps   %xmm0, %xmm0
> > > movsd   %xmm0, -24(%rbp)
> > > movsd   -8(%rbp), %xmm0
> > > mulsd   -16(%rbp), %xmm0
> > > addsd   -24(%rbp), %xmm0
> > > movsd   %xmm0, -24(%rbp)
> > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > > movsd   -24(%rbp), %xmm0
> > > popq    %rbp
> > > retq
> > > .section        .mca_code_regions,"",@progbits
> > > .quad   42
> > > .quad   .Lmca_code_region_start_0
> > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> > >
> > > The assembly has been trimmed to show the portions relevant to this RFC.
> > > Notice the labels enclose the user's defined region, and that they
> > > preserve the
> > > user's arbitrary region identifier, the ever-so-important region 42.
> > >
> > > In the object file section .mca_code_regions, we have noted the user's
> > > region
> > > identifier (.quad 42), start address, and region size. A more complicated
> > > example can have multiple regions defined within a single .mca_code_regions
> > > section. This section can be read by llvm-mca, allowing llvm-mca to take
> > > object files as input instead of assembly source.
> > >
> > > Details
> > > ---------
> > > We need a way for a user to identify a region/block of code they want
> > > analyzed
> > > by llvm-mca. We solve this problem by introducing two intrinsics that a
> > > user can
> > > specify, for identifying regions of code for analysis.
> > >
> > > The two intrinsics are: llvm.mca.code.regions.start and
> > > llvm.mca.code.regions.end. A user can identify a code region by inserting
> > > the
> > > mca_code_region_start and mca_code_region_end markers. These are simply
> > > clang builtins and are transformed into the aforementioned intrinsics
> > > during
> > > compilation. The code between the intrinsics are what we call "code
> > > regions"
> > > and are to be easily identifiable by llvm-mca; any code between a start/end
> > > pair can be analyzed by llvm-mca at a later time. A user can define
> > > multiple
> > > non-overlapping code regions within their program.
> > >
> > > The llvm.mca.code.region.start intrinsic takes an integer constant as its
> > > only
> > > argument. This argument is implemented as a metadata i32, and is only used
> > > when generating llvm-mca reports. This value allows a user to more easily
> > > identify a specific code region. llvm.mca.code.region.end takes no
> > > arguments.
> > > Since we disallow nesting of regions, the first 'end' intrinsic lexically
> > > following a 'start' intrinsic represents the end of that code region.
> > >
> > > Now that we have a solution for identifying regions for analysis, we now
> > > need a
> > > way for preserving that information to be read at a later time. To
> > > accomplish
> > > this we propose adding a new section (.mca_code_regions) to the object file
> > > generated by llvm. During code generation, the start/end intrinsics
> > > described
> > > above will be transformed into start/end labels in assembly. When llvm
> > > generates the object file from the user's code, these start/end labels
> > > form a
> > > pair of values identifying the start of the user's code region, and size.
> > > The
> > > size represents the number of bytes between the start and end address of
> > > the
> > > labels. Note that the labels are emitted during assembly printing. We hope
> > > that these labels have no influence on code generation or basic-block
> > > placement. However, the target assembler strategy for handling labels is
> > > outside of our control.
> > >
> > > This proposed change affects the size of a binary, but only if the user
> > > calls
> > > the start/end builtins mentioned above. The additional size of the
> > > .mca_code_regions section, which we imagine to be very small (to the order
> > > of a
> > > few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'.
> > >
> > > Implementation Status
> > > ------------------------------
> > > We currently have the proposed changes implemented at the url posted below.
> > > This initial patch only targets ELF object files, and does not handle
> > > relocatable addresses. Since the start of a code region is represented as
> > > an
> > > assembly label, and referenced in the .mca_code_regions section, that
> > > address
> > > is relocatable. That value can be represented as section-relative
> > > relocatable
> > > symbol (.text + addend), but we are not handling that case yet. Instead,
> > > the
> > > proposed changes only handle linked/executable object files.
> > >
> > > For purposes of review and to communicate the idea, the change is
> > > presented as a monolithic patch here:
> > >
> > > https://reviews.llvm.org/D54603
> > >
> > > The change is presented as a monolithic patch; however, if accepted
> > > the patch will be split into three smaller patches:
> > > 1. The introduction of the builtins to clang.
> > > 2. The llvm portion (the added intrinsics).
> > > 3. The llvm-mca portion.
> > >
> > > Thanks!
> > >
> > > -Matt
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > [hidden email]
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Thanks for clarifying it Matt.

In general, I quite like your suggested design.

My only concern is about the semantic of the two new intrinsics. You design doesn't allow mca ranges to span through multiple basic blocks. That constraint is acceptable for now, since llvm-mca doesn't know how to deal with control flow.
However, I am a bit concerned about what might happen in future if we decide to let users specify code regions that span through multiple basic blocks. Basically, I don't particularly like the idea of changing the semantic of already existing intrinsic. A design that already accounts for that particular scenario/future work would be ideal. That being said, marking those new intrinsics as 'experimental' may be a good compromise (at least for now).

So, I am quite happy overall with the direction of this RFC.
However, I am interesting to hear from other developers about your suggested design.

> This initial patch only targets ELF object files, and does not handle
relocatable addresses. Since the start of a code region is represented as an
assembly label, and referenced in the .mca_code_regions section, that address
is relocatable.

This may be okay for now. However, it would be nice to remove that constraint in future and add support to generic object files.

-Andrea

On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
I want to clarify a few restrictions of llvm-mca code regions that this RFC proposes:

1) All llvm-mca code regions must start with an llvm.mca.code.region.start intrinsic and end with
an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR level in the IR verifier.

2) llvm-mca code regions cannot nest.  This restriction implies that an  llvm.mca.code.region.start
must have a llvm.mca.code.region.end intrinsic without any other llvm.mca start intrinsics
between the two. The current implementation in the patch enforces this restriction at the
IR level via the IR Verifier.

3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca  does not follow
branches (yet).  Instead, a branch instruction is treated by llvm-mca like any other instruction.
The current patch associated with this RFC does not enforce this restriction.  I plan on updating
the patch to enforce that a code region can only belong to a single basic block.  This is a simple
check, ensuring that both the llvm.mca.code.region.start and accompanying end intrinsics live
in the same basic block. I imagine adding this check at the IR level when we also verify points 1 and 2
above.  That will keep the code-region verification logic isolated to the IR verifier.  The start/end
intrinsics should not have any uses, so I'm not sure that they would be moved/sunk on behalf
of any other instruction.  In other words, I do not imagine that a start and end would be split
apart due to later MI optimizations.  If I discover that such a case occurs, then I might add the
basic-block check prior to emitting the code region data to the object file.    Once  llvm-mca  is
updated to handle branches, then we can remove this constraint.

-Matt

> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Matt Davis via llvm-
> dev
> Sent: Wednesday, November 21, 2018 8:47 AM
> To: Andrea Di Biagio <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
>
> Hi Andrea,
>
> Thanks for your input.
>
> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> [... snip ...]
> > About the suggested design:
> > I like the idea of being able to identify code regions using a numeric
> > identifier.
> > However, what happens if a code region spans through multiple basic blocks?
>
> The current patch does not take into consideration cases where the
> region start and end intrinsics are placed in different basic blocks.
> Such would be the case if a region is defined to span multiple blocks.
> This would be similar to the current case where a user places a
> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> another.  However, as you point out below, if the user does this in the
> source code via intrinsics (just what this patch is proposing), then
> there is a chance that optimizations might change the layout of the
> instructions and confuse the ordering of the MCA intrinsics.
>
> Since MCA does not follow branches (MCA just treats a branch as it would
> a non-branching instruction), it seems that a user should be aware that
> defining MCA code regions that span multiple blocks might result in an
> unexpected analysis.  While we do not discourage this, it seems like
> such a case will probably not produce an expected result for the user.
> We could introduce a warning, or automatically divide the regions so
> that a single region can only contain a single block.
>
> > My understanding is that code regions are not allowed to overlap. So, it
> > makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> > However, what if ` __mca_code_region_end()` ends in a different basic block?
> >
> > `__mca_code_region_start()` has to always dominate `
> > __mca_code_region_end()`. This is trivial to verify when both calls are in
> > a same basic block; however, we need to make sure that the relationship is
> > still the same when the `end()` call is in a different basic block.
> > That would not be enough. I think we should also verify  that `
> > __mca_code_region_end()` always post-dominates the call to
> > `__mca_code_region_start()`.
>
> In any case this patch should probably check dominance of the
> intrinsics, even though MCA does not follow branches and MCA does not
> not explicitly forbid a region from containing multiple blocks.
>
> >
> > My question is: what happens with basic block reordering? We don't know the
> > layout of basic blocks until we reach code emission. How does it work for
> > regions that span through multiple basic blocks?. I think your RFC should
> > clarify this aspect.
> >
> > As a side note: at the moment, llvm-mca doesn't know how to deal with
> > branches. So, for simplicity we could force code regions to only contain
> > instructions from a single basic block.
> >
> > However, In future we may want to teach llvm-mca how to analyze branchy
> > code too. For example, we could introduce a simple control-flow analysis in
> > llvm-mca, and use an external "branch trace" information (for example, a
> > perf trace generated by an external tool) to decorate branches with with
> > branch probabilities (similarly to what we currently do in LLVM with PGO).
> > We could then use that knowledge to model branch prediction and simulate
> > what happens in the presence of multiple branches.
> >
> > So, the idea of having regions that potentially span multiple basic blocks
> > is not bad in general. However, I think you should better clarify what are
> > the constraints (at least, you should answer to my questions from before).
>
> I agree! Thanks for pointing that out.
>
> > If we decide to use those new intrinsics, then those should be experimental
> > (at least to start).
>
> Agreed.
>
> -Matt
>
> > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <[hidden email]>
> > wrote:
> >
> > > Introduction
> > > -----------------
> > > Currently llvm-mca only accepts assembly code as input. We would like to
> > > extend llvm-mca to support object files, allowing users to analyze the
> > > performance of binaries. The proposed changes (which involve both
> > > clang and llvm) optionally introduce an object file section, but this can
> > > be
> > > stripped-out if desired.
> > >
> > > For the llvm-mca binary support feature to be useful, a user needs to tell
> > > llvm-mca which portions of their code they would like analyzed. Currently,
> > > this is accomplished via assembly comments. However, assembly comments are
> > > not
> > > preserved in object files, and this has encouraged this RFC. For the
> > > proposed
> > > binary support, we need to introduce changes to clang and llvm to allow the
> > > user's object code to be recognized by llvm-mca:
> > >
> > > * We need a way for a user to identify a region/block of code they want
> > >    analyzed by llvm-mca.
> > > * We need the information defining the user's region of code to be
> > > maintained
> > >    in the object file so that llvm-mca can analyze the desired region(s)
> > > from the
> > >    object file.
> > >
> > > We define a "code region" as a subset of a user's program that is to be
> > > analyzed via llvm-mca. The sequence of instructions to be analyzed is
> > > represented as a pair: <start, end> where the 'start' marks the beginning
> > > of
> > > the user's source code and 'end' terminates the sequence. The instructions
> > > between 'start' and 'end' form the region that can be analyzed by llvm-mca
> > > at a
> > > later time.
> > >
> > > Example
> > > -----------
> > > Before we go into the details of this proposed change, let's first look at
> > > a
> > > simple example:
> > >
> > > // example.c -- Analyze a dot-product expression.
> > > double test(double x, double y) {
> > >    double result = 0.0;
> > >    __mca_code_region_start(42);
> > >    result += x * y;
> > >    __mca_code_region_end();
> > >    return result;
> > > }
> > >
> > > In the example above, we have identified a code region, in this case a
> > > single
> > > dot-product expression. For the sake of brevity and simplicity, we've
> > > chosen
> > > a very simple example, but in reality a more complicated example could use
> > > multiple expressions. We have also denoted this region as number 42. That
> > > identifier is only for the user, and simplifies reading an llvm-mca
> > > analysis
> > > report later.
> > >
> > > When this code is compiled, the region markers (the mca_code_region
> > > markers)
> > > are transformed into assembly labels. While the markers are presented as
> > > function calls, in reality they are no-ops.
> > >
> > > test:
> > > pushq   %rbp
> > > movq    %rsp, %rbp
> > > movsd   %xmm0, -8(%rbp)
> > > movsd   %xmm1, -16(%rbp)
> > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > > xorps   %xmm0, %xmm0
> > > movsd   %xmm0, -24(%rbp)
> > > movsd   -8(%rbp), %xmm0
> > > mulsd   -16(%rbp), %xmm0
> > > addsd   -24(%rbp), %xmm0
> > > movsd   %xmm0, -24(%rbp)
> > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > > movsd   -24(%rbp), %xmm0
> > > popq    %rbp
> > > retq
> > > .section        .mca_code_regions,"",@progbits
> > > .quad   42
> > > .quad   .Lmca_code_region_start_0
> > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> > >
> > > The assembly has been trimmed to show the portions relevant to this RFC.
> > > Notice the labels enclose the user's defined region, and that they
> > > preserve the
> > > user's arbitrary region identifier, the ever-so-important region 42.
> > >
> > > In the object file section .mca_code_regions, we have noted the user's
> > > region
> > > identifier (.quad 42), start address, and region size. A more complicated
> > > example can have multiple regions defined within a single .mca_code_regions
> > > section. This section can be read by llvm-mca, allowing llvm-mca to take
> > > object files as input instead of assembly source.
> > >
> > > Details
> > > ---------
> > > We need a way for a user to identify a region/block of code they want
> > > analyzed
> > > by llvm-mca. We solve this problem by introducing two intrinsics that a
> > > user can
> > > specify, for identifying regions of code for analysis.
> > >
> > > The two intrinsics are: llvm.mca.code.regions.start and
> > > llvm.mca.code.regions.end. A user can identify a code region by inserting
> > > the
> > > mca_code_region_start and mca_code_region_end markers. These are simply
> > > clang builtins and are transformed into the aforementioned intrinsics
> > > during
> > > compilation. The code between the intrinsics are what we call "code
> > > regions"
> > > and are to be easily identifiable by llvm-mca; any code between a start/end
> > > pair can be analyzed by llvm-mca at a later time. A user can define
> > > multiple
> > > non-overlapping code regions within their program.
> > >
> > > The llvm.mca.code.region.start intrinsic takes an integer constant as its
> > > only
> > > argument. This argument is implemented as a metadata i32, and is only used
> > > when generating llvm-mca reports. This value allows a user to more easily
> > > identify a specific code region. llvm.mca.code.region.end takes no
> > > arguments.
> > > Since we disallow nesting of regions, the first 'end' intrinsic lexically
> > > following a 'start' intrinsic represents the end of that code region.
> > >
> > > Now that we have a solution for identifying regions for analysis, we now
> > > need a
> > > way for preserving that information to be read at a later time. To
> > > accomplish
> > > this we propose adding a new section (.mca_code_regions) to the object file
> > > generated by llvm. During code generation, the start/end intrinsics
> > > described
> > > above will be transformed into start/end labels in assembly. When llvm
> > > generates the object file from the user's code, these start/end labels
> > > form a
> > > pair of values identifying the start of the user's code region, and size.
> > > The
> > > size represents the number of bytes between the start and end address of
> > > the
> > > labels. Note that the labels are emitted during assembly printing. We hope
> > > that these labels have no influence on code generation or basic-block
> > > placement. However, the target assembler strategy for handling labels is
> > > outside of our control.
> > >
> > > This proposed change affects the size of a binary, but only if the user
> > > calls
> > > the start/end builtins mentioned above. The additional size of the
> > > .mca_code_regions section, which we imagine to be very small (to the order
> > > of a
> > > few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'.
> > >
> > > Implementation Status
> > > ------------------------------
> > > We currently have the proposed changes implemented at the url posted below.
> > > This initial patch only targets ELF object files, and does not handle
> > > relocatable addresses. Since the start of a code region is represented as
> > > an
> > > assembly label, and referenced in the .mca_code_regions section, that
> > > address
> > > is relocatable. That value can be represented as section-relative
> > > relocatable
> > > symbol (.text + addend), but we are not handling that case yet. Instead,
> > > the
> > > proposed changes only handle linked/executable object files.
> > >
> > > For purposes of review and to communicate the idea, the change is
> > > presented as a monolithic patch here:
> > >
> > > https://reviews.llvm.org/D54603
> > >
> > > The change is presented as a monolithic patch; however, if accepted
> > > the patch will be split into three smaller patches:
> > > 1. The introduction of the builtins to clang.
> > > 2. The llvm portion (the added intrinsics).
> > > 3. The llvm-mca portion.
> > >
> > > Thanks!
> > >
> > > -Matt
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > [hidden email]
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
So, I have been thinking a bit more about this whole design.

The more I think about your suggested design, the more I am convinced that we should do something more to support ranges in binary object files too.
My understanding is that the reason why we don't support object files in general, is because of the presence of relocations. That is because a region start marker is effectively symbol relative, and the symbol (a function) would be relocated in the final executable.
You mentioned to me that resolving even a 'simple' symbol-relative relocation is not trivial, beause it requires specific knowledge about the binary format, and the target (i.e. how relocations are encoded is target specific). I am surprised that there is not a utility library for resolving relocations.. but I am not familiar with that part of the compiler. I was hoping that there was a target specific interface to use in this case...

An alternative approach would require that you define your own "symbol-relative" reference. After all, ranges are just a sequences of instructions in a function. If a function symbol is described by the symbol table, then you should be able to obtain its offset in the .text section. So, you could potentially encode your own symbol+offset. However, the linker would not be able to understand your "custom relocation", and information about regions in the final elf would be basically broken. So,that would not be a solution...

I don't know honestly what is the best approach to use in this case.
As a compromise, it would not be a bad idea to add the ability to specify ranges from command line. What do you think?
Still, from a user point of view, the idea that we don't support object files in general sounds like a big limitation.

About the new experimental intrinsics: those would definitely work well for the simple case where instructions are from the same basic block.
However, some/most of the constraints that you plan to add will have to change if in future we decide to allow ranges that potentially cross multiple basic blocks. How will the rules/constraints on those new intrinsics change? I just want to make sure that the suggested design is future-proof.

-Andrea

On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <[hidden email]> wrote:
Thanks for clarifying it Matt.

In general, I quite like your suggested design.

My only concern is about the semantic of the two new intrinsics. You design doesn't allow mca ranges to span through multiple basic blocks. That constraint is acceptable for now, since llvm-mca doesn't know how to deal with control flow.
However, I am a bit concerned about what might happen in future if we decide to let users specify code regions that span through multiple basic blocks. Basically, I don't particularly like the idea of changing the semantic of already existing intrinsic. A design that already accounts for that particular scenario/future work would be ideal. That being said, marking those new intrinsics as 'experimental' may be a good compromise (at least for now).

So, I am quite happy overall with the direction of this RFC.
However, I am interesting to hear from other developers about your suggested design.

> This initial patch only targets ELF object files, and does not handle
relocatable addresses. Since the start of a code region is represented as an
assembly label, and referenced in the .mca_code_regions section, that address
is relocatable.

This may be okay for now. However, it would be nice to remove that constraint in future and add support to generic object files.

-Andrea

On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
I want to clarify a few restrictions of llvm-mca code regions that this RFC proposes:

1) All llvm-mca code regions must start with an llvm.mca.code.region.start intrinsic and end with
an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR level in the IR verifier.

2) llvm-mca code regions cannot nest.  This restriction implies that an  llvm.mca.code.region.start
must have a llvm.mca.code.region.end intrinsic without any other llvm.mca start intrinsics
between the two. The current implementation in the patch enforces this restriction at the
IR level via the IR Verifier.

3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca  does not follow
branches (yet).  Instead, a branch instruction is treated by llvm-mca like any other instruction.
The current patch associated with this RFC does not enforce this restriction.  I plan on updating
the patch to enforce that a code region can only belong to a single basic block.  This is a simple
check, ensuring that both the llvm.mca.code.region.start and accompanying end intrinsics live
in the same basic block. I imagine adding this check at the IR level when we also verify points 1 and 2
above.  That will keep the code-region verification logic isolated to the IR verifier.  The start/end
intrinsics should not have any uses, so I'm not sure that they would be moved/sunk on behalf
of any other instruction.  In other words, I do not imagine that a start and end would be split
apart due to later MI optimizations.  If I discover that such a case occurs, then I might add the
basic-block check prior to emitting the code region data to the object file.    Once  llvm-mca  is
updated to handle branches, then we can remove this constraint.

-Matt

> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Matt Davis via llvm-
> dev
> Sent: Wednesday, November 21, 2018 8:47 AM
> To: Andrea Di Biagio <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
>
> Hi Andrea,
>
> Thanks for your input.
>
> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> [... snip ...]
> > About the suggested design:
> > I like the idea of being able to identify code regions using a numeric
> > identifier.
> > However, what happens if a code region spans through multiple basic blocks?
>
> The current patch does not take into consideration cases where the
> region start and end intrinsics are placed in different basic blocks.
> Such would be the case if a region is defined to span multiple blocks.
> This would be similar to the current case where a user places a
> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> another.  However, as you point out below, if the user does this in the
> source code via intrinsics (just what this patch is proposing), then
> there is a chance that optimizations might change the layout of the
> instructions and confuse the ordering of the MCA intrinsics.
>
> Since MCA does not follow branches (MCA just treats a branch as it would
> a non-branching instruction), it seems that a user should be aware that
> defining MCA code regions that span multiple blocks might result in an
> unexpected analysis.  While we do not discourage this, it seems like
> such a case will probably not produce an expected result for the user.
> We could introduce a warning, or automatically divide the regions so
> that a single region can only contain a single block.
>
> > My understanding is that code regions are not allowed to overlap. So, it
> > makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> > However, what if ` __mca_code_region_end()` ends in a different basic block?
> >
> > `__mca_code_region_start()` has to always dominate `
> > __mca_code_region_end()`. This is trivial to verify when both calls are in
> > a same basic block; however, we need to make sure that the relationship is
> > still the same when the `end()` call is in a different basic block.
> > That would not be enough. I think we should also verify  that `
> > __mca_code_region_end()` always post-dominates the call to
> > `__mca_code_region_start()`.
>
> In any case this patch should probably check dominance of the
> intrinsics, even though MCA does not follow branches and MCA does not
> not explicitly forbid a region from containing multiple blocks.
>
> >
> > My question is: what happens with basic block reordering? We don't know the
> > layout of basic blocks until we reach code emission. How does it work for
> > regions that span through multiple basic blocks?. I think your RFC should
> > clarify this aspect.
> >
> > As a side note: at the moment, llvm-mca doesn't know how to deal with
> > branches. So, for simplicity we could force code regions to only contain
> > instructions from a single basic block.
> >
> > However, In future we may want to teach llvm-mca how to analyze branchy
> > code too. For example, we could introduce a simple control-flow analysis in
> > llvm-mca, and use an external "branch trace" information (for example, a
> > perf trace generated by an external tool) to decorate branches with with
> > branch probabilities (similarly to what we currently do in LLVM with PGO).
> > We could then use that knowledge to model branch prediction and simulate
> > what happens in the presence of multiple branches.
> >
> > So, the idea of having regions that potentially span multiple basic blocks
> > is not bad in general. However, I think you should better clarify what are
> > the constraints (at least, you should answer to my questions from before).
>
> I agree! Thanks for pointing that out.
>
> > If we decide to use those new intrinsics, then those should be experimental
> > (at least to start).
>
> Agreed.
>
> -Matt
>
> > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <[hidden email]>
> > wrote:
> >
> > > Introduction
> > > -----------------
> > > Currently llvm-mca only accepts assembly code as input. We would like to
> > > extend llvm-mca to support object files, allowing users to analyze the
> > > performance of binaries. The proposed changes (which involve both
> > > clang and llvm) optionally introduce an object file section, but this can
> > > be
> > > stripped-out if desired.
> > >
> > > For the llvm-mca binary support feature to be useful, a user needs to tell
> > > llvm-mca which portions of their code they would like analyzed. Currently,
> > > this is accomplished via assembly comments. However, assembly comments are
> > > not
> > > preserved in object files, and this has encouraged this RFC. For the
> > > proposed
> > > binary support, we need to introduce changes to clang and llvm to allow the
> > > user's object code to be recognized by llvm-mca:
> > >
> > > * We need a way for a user to identify a region/block of code they want
> > >    analyzed by llvm-mca.
> > > * We need the information defining the user's region of code to be
> > > maintained
> > >    in the object file so that llvm-mca can analyze the desired region(s)
> > > from the
> > >    object file.
> > >
> > > We define a "code region" as a subset of a user's program that is to be
> > > analyzed via llvm-mca. The sequence of instructions to be analyzed is
> > > represented as a pair: <start, end> where the 'start' marks the beginning
> > > of
> > > the user's source code and 'end' terminates the sequence. The instructions
> > > between 'start' and 'end' form the region that can be analyzed by llvm-mca
> > > at a
> > > later time.
> > >
> > > Example
> > > -----------
> > > Before we go into the details of this proposed change, let's first look at
> > > a
> > > simple example:
> > >
> > > // example.c -- Analyze a dot-product expression.
> > > double test(double x, double y) {
> > >    double result = 0.0;
> > >    __mca_code_region_start(42);
> > >    result += x * y;
> > >    __mca_code_region_end();
> > >    return result;
> > > }
> > >
> > > In the example above, we have identified a code region, in this case a
> > > single
> > > dot-product expression. For the sake of brevity and simplicity, we've
> > > chosen
> > > a very simple example, but in reality a more complicated example could use
> > > multiple expressions. We have also denoted this region as number 42. That
> > > identifier is only for the user, and simplifies reading an llvm-mca
> > > analysis
> > > report later.
> > >
> > > When this code is compiled, the region markers (the mca_code_region
> > > markers)
> > > are transformed into assembly labels. While the markers are presented as
> > > function calls, in reality they are no-ops.
> > >
> > > test:
> > > pushq   %rbp
> > > movq    %rsp, %rbp
> > > movsd   %xmm0, -8(%rbp)
> > > movsd   %xmm1, -16(%rbp)
> > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > > xorps   %xmm0, %xmm0
> > > movsd   %xmm0, -24(%rbp)
> > > movsd   -8(%rbp), %xmm0
> > > mulsd   -16(%rbp), %xmm0
> > > addsd   -24(%rbp), %xmm0
> > > movsd   %xmm0, -24(%rbp)
> > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > > movsd   -24(%rbp), %xmm0
> > > popq    %rbp
> > > retq
> > > .section        .mca_code_regions,"",@progbits
> > > .quad   42
> > > .quad   .Lmca_code_region_start_0
> > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> > >
> > > The assembly has been trimmed to show the portions relevant to this RFC.
> > > Notice the labels enclose the user's defined region, and that they
> > > preserve the
> > > user's arbitrary region identifier, the ever-so-important region 42.
> > >
> > > In the object file section .mca_code_regions, we have noted the user's
> > > region
> > > identifier (.quad 42), start address, and region size. A more complicated
> > > example can have multiple regions defined within a single .mca_code_regions
> > > section. This section can be read by llvm-mca, allowing llvm-mca to take
> > > object files as input instead of assembly source.
> > >
> > > Details
> > > ---------
> > > We need a way for a user to identify a region/block of code they want
> > > analyzed
> > > by llvm-mca. We solve this problem by introducing two intrinsics that a
> > > user can
> > > specify, for identifying regions of code for analysis.
> > >
> > > The two intrinsics are: llvm.mca.code.regions.start and
> > > llvm.mca.code.regions.end. A user can identify a code region by inserting
> > > the
> > > mca_code_region_start and mca_code_region_end markers. These are simply
> > > clang builtins and are transformed into the aforementioned intrinsics
> > > during
> > > compilation. The code between the intrinsics are what we call "code
> > > regions"
> > > and are to be easily identifiable by llvm-mca; any code between a start/end
> > > pair can be analyzed by llvm-mca at a later time. A user can define
> > > multiple
> > > non-overlapping code regions within their program.
> > >
> > > The llvm.mca.code.region.start intrinsic takes an integer constant as its
> > > only
> > > argument. This argument is implemented as a metadata i32, and is only used
> > > when generating llvm-mca reports. This value allows a user to more easily
> > > identify a specific code region. llvm.mca.code.region.end takes no
> > > arguments.
> > > Since we disallow nesting of regions, the first 'end' intrinsic lexically
> > > following a 'start' intrinsic represents the end of that code region.
> > >
> > > Now that we have a solution for identifying regions for analysis, we now
> > > need a
> > > way for preserving that information to be read at a later time. To
> > > accomplish
> > > this we propose adding a new section (.mca_code_regions) to the object file
> > > generated by llvm. During code generation, the start/end intrinsics
> > > described
> > > above will be transformed into start/end labels in assembly. When llvm
> > > generates the object file from the user's code, these start/end labels
> > > form a
> > > pair of values identifying the start of the user's code region, and size.
> > > The
> > > size represents the number of bytes between the start and end address of
> > > the
> > > labels. Note that the labels are emitted during assembly printing. We hope
> > > that these labels have no influence on code generation or basic-block
> > > placement. However, the target assembler strategy for handling labels is
> > > outside of our control.
> > >
> > > This proposed change affects the size of a binary, but only if the user
> > > calls
> > > the start/end builtins mentioned above. The additional size of the
> > > .mca_code_regions section, which we imagine to be very small (to the order
> > > of a
> > > few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'.
> > >
> > > Implementation Status
> > > ------------------------------
> > > We currently have the proposed changes implemented at the url posted below.
> > > This initial patch only targets ELF object files, and does not handle
> > > relocatable addresses. Since the start of a code region is represented as
> > > an
> > > assembly label, and referenced in the .mca_code_regions section, that
> > > address
> > > is relocatable. That value can be represented as section-relative
> > > relocatable
> > > symbol (.text + addend), but we are not handling that case yet. Instead,
> > > the
> > > proposed changes only handle linked/executable object files.
> > >
> > > For purposes of review and to communicate the idea, the change is
> > > presented as a monolithic patch here:
> > >
> > > https://reviews.llvm.org/D54603
> > >
> > > The change is presented as a monolithic patch; however, if accepted
> > > the patch will be split into three smaller patches:
> > > 1. The introduction of the builtins to clang.
> > > 2. The llvm portion (the added intrinsics).
> > > 3. The llvm-mca portion.
> > >
> > > Thanks!
> > >
> > > -Matt
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > [hidden email]
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Hi Andrea,

On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:

> So, I have been thinking a bit more about this whole design.
>
> The more I think about your suggested design, the more I am convinced that
> we should do something more to support ranges in binary object files too.
> My understanding is that the reason why we don't support object files in
> general, is because of the presence of relocations. That is because a
> region start marker is effectively symbol relative, and the symbol (a
> function) would be relocated in the final executable.
> You mentioned to me that resolving even a 'simple' symbol-relative
> relocation is not trivial, beause it requires specific knowledge about the
> binary format, and the target (i.e. how relocations are encoded is target
> specific). I am surprised that there is not a utility library for resolving
> relocations.. but I am not familiar with that part of the compiler. I was
> hoping that there was a target specific interface to use in this case...

There might be a better way of resolving the relocs, but from what I saw
looking at llvm-objdump and other related tools, it seems that resolving
the relocated symbol is a target specific effort.  I also spent sometime
sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
performs the reloc resolution.  I should clarify that I too am not an
expert in llvm's utilities for performing symbol/reloc resolution, and
perhaps someone in the community can point me in the right direction.  I
can clearly see the reloc data in the object file via tools like
objdump; however, accessing the relocs via
llvm::object::ObjectFile::relocations() did not produce address values
that we could use (values of zero).

I was hoping that, for a first pass at this patch, supporting just
executables would be okay.  That keeps this initial patch set simple,
and hopefully will encourage others to take a peek at it, since it's
less daunting than what it might otherwise be.  Of course, there is the
concern that this initial patch will lock us into a design that will be
more complicated to unravel later.

> An alternative approach would require that you define your own
> "symbol-relative" reference. After all, ranges are just a sequences of
> instructions in a function. If a function symbol is described by the symbol
> table, then you should be able to obtain its offset in the .text section.
> So, you could potentially encode your own symbol+offset. However, the
> linker would not be able to understand your "custom relocation", and
> information about regions in the final elf would be basically broken.
> So,that would not be a solution...
>
> I don't know honestly what is the best approach to use in this case.
> As a compromise, it would not be a bad idea to add the ability to specify
> ranges from command line. What do you think?
> Still, from a user point of view, the idea that we don't support object
> files in general sounds like a big limitation.

I agree, only supporting executables is a limitation.  However, I'd
like to land the base support now and add in the additional
features/support after this large patch set lands.  But I can see
where landing the whole thing entirely also makes sense.

> About the new experimental intrinsics: those would definitely work well for
> the simple case where instructions are from the same basic block.
> However, some/most of the constraints that you plan to add will have to
> change if in future we decide to allow ranges that potentially cross
> multiple basic blocks. How will the rules/constraints on those new
> intrinsics change? I just want to make sure that the suggested design is
> future-proof.

Since the llvm/clang parts of the code are just responsible for
collecting where a range starts/ends, I hope that we can remove some
of the baked-in constraints that are specified in IR/Verifier.cpp.
As you pointed out earlier in this thread, we might want to
introduce a dominance check if/when we lift the one-basic-block
restriction.

-Matt

>
> -Andrea
>
> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <[hidden email]>
> wrote:
>
> > Thanks for clarifying it Matt.
> >
> > In general, I quite like your suggested design.
> >
> > My only concern is about the semantic of the two new intrinsics. You
> > design doesn't allow mca ranges to span through multiple basic blocks. That
> > constraint is acceptable for now, since llvm-mca doesn't know how to deal
> > with control flow.
> > However, I am a bit concerned about what might happen in future if we
> > decide to let users specify code regions that span through multiple basic
> > blocks. Basically, I don't particularly like the idea of changing the
> > semantic of already existing intrinsic. A design that already accounts for
> > that particular scenario/future work would be ideal. That being said,
> > marking those new intrinsics as 'experimental' may be a good compromise (at
> > least for now).
> >
> > So, I am quite happy overall with the direction of this RFC.
> > However, I am interesting to hear from other developers about your
> > suggested design.
> >
> > > This initial patch only targets ELF object files, and does not handle
> > relocatable addresses. Since the start of a code region is represented as
> > an
> > assembly label, and referenced in the .mca_code_regions section, that
> > address
> > is relocatable.
> >
> > This may be okay for now. However, it would be nice to remove that
> > constraint in future and add support to generic object files.
> >
> > -Andrea
> >
> > On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
> >
> >> I want to clarify a few restrictions of llvm-mca code regions that this
> >> RFC proposes:
> >>
> >> 1) All llvm-mca code regions must start with an
> >> llvm.mca.code.region.start intrinsic and end with
> >> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR
> >> level in the IR verifier.
> >>
> >> 2) llvm-mca code regions cannot nest.  This restriction implies that an
> >> llvm.mca.code.region.start
> >> must have a llvm.mca.code.region.end intrinsic without any other llvm.mca
> >> start intrinsics
> >> between the two. The current implementation in the patch enforces this
> >> restriction at the
> >> IR level via the IR Verifier.
> >>
> >> 3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca
> >> does not follow
> >> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> >> like any other instruction.
> >> The current patch associated with this RFC does not enforce this
> >> restriction.  I plan on updating
> >> the patch to enforce that a code region can only belong to a single basic
> >> block.  This is a simple
> >> check, ensuring that both the llvm.mca.code.region.start and accompanying
> >> end intrinsics live
> >> in the same basic block. I imagine adding this check at the IR level when
> >> we also verify points 1 and 2
> >> above.  That will keep the code-region verification logic isolated to the
> >> IR verifier.  The start/end
> >> intrinsics should not have any uses, so I'm not sure that they would be
> >> moved/sunk on behalf
> >> of any other instruction.  In other words, I do not imagine that a start
> >> and end would be split
> >> apart due to later MI optimizations.  If I discover that such a case
> >> occurs, then I might add the
> >> basic-block check prior to emitting the code region data to the object
> >> file.    Once  llvm-mca  is
> >> updated to handle branches, then we can remove this constraint.
> >>
> >> -Matt
> >>
> >> > -----Original Message-----
> >> > From: llvm-dev <[hidden email]> On Behalf Of Matt
> >> Davis via llvm-
> >> > dev
> >> > Sent: Wednesday, November 21, 2018 8:47 AM
> >> > To: Andrea Di Biagio <[hidden email]>
> >> > Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> >> > <[hidden email]>; [hidden email]
> >> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> >> llvm-mca.
> >> >
> >> > Hi Andrea,
> >> >
> >> > Thanks for your input.
> >> >
> >> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> >> > [... snip ...]
> >> > > About the suggested design:
> >> > > I like the idea of being able to identify code regions using a numeric
> >> > > identifier.
> >> > > However, what happens if a code region spans through multiple basic
> >> blocks?
> >> >
> >> > The current patch does not take into consideration cases where the
> >> > region start and end intrinsics are placed in different basic blocks.
> >> > Such would be the case if a region is defined to span multiple blocks.
> >> > This would be similar to the current case where a user places a
> >> > #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> >> > another.  However, as you point out below, if the user does this in the
> >> > source code via intrinsics (just what this patch is proposing), then
> >> > there is a chance that optimizations might change the layout of the
> >> > instructions and confuse the ordering of the MCA intrinsics.
> >> >
> >> > Since MCA does not follow branches (MCA just treats a branch as it would
> >> > a non-branching instruction), it seems that a user should be aware that
> >> > defining MCA code regions that span multiple blocks might result in an
> >> > unexpected analysis.  While we do not discourage this, it seems like
> >> > such a case will probably not produce an expected result for the user.
> >> > We could introduce a warning, or automatically divide the regions so
> >> > that a single region can only contain a single block.
> >> >
> >> > > My understanding is that code regions are not allowed to overlap. So,
> >> it
> >> > > makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> >> > > However, what if ` __mca_code_region_end()` ends in a different basic
> >> block?
> >> > >
> >> > > `__mca_code_region_start()` has to always dominate `
> >> > > __mca_code_region_end()`. This is trivial to verify when both calls
> >> are in
> >> > > a same basic block; however, we need to make sure that the
> >> relationship is
> >> > > still the same when the `end()` call is in a different basic block.
> >> > > That would not be enough. I think we should also verify  that `
> >> > > __mca_code_region_end()` always post-dominates the call to
> >> > > `__mca_code_region_start()`.
> >> >
> >> > In any case this patch should probably check dominance of the
> >> > intrinsics, even though MCA does not follow branches and MCA does not
> >> > not explicitly forbid a region from containing multiple blocks.
> >> >
> >> > >
> >> > > My question is: what happens with basic block reordering? We don't
> >> know the
> >> > > layout of basic blocks until we reach code emission. How does it work
> >> for
> >> > > regions that span through multiple basic blocks?. I think your RFC
> >> should
> >> > > clarify this aspect.
> >> > >
> >> > > As a side note: at the moment, llvm-mca doesn't know how to deal with
> >> > > branches. So, for simplicity we could force code regions to only
> >> contain
> >> > > instructions from a single basic block.
> >> > >
> >> > > However, In future we may want to teach llvm-mca how to analyze
> >> branchy
> >> > > code too. For example, we could introduce a simple control-flow
> >> analysis in
> >> > > llvm-mca, and use an external "branch trace" information (for
> >> example, a
> >> > > perf trace generated by an external tool) to decorate branches with
> >> with
> >> > > branch probabilities (similarly to what we currently do in LLVM with
> >> PGO).
> >> > > We could then use that knowledge to model branch prediction and
> >> simulate
> >> > > what happens in the presence of multiple branches.
> >> > >
> >> > > So, the idea of having regions that potentially span multiple basic
> >> blocks
> >> > > is not bad in general. However, I think you should better clarify
> >> what are
> >> > > the constraints (at least, you should answer to my questions from
> >> before).
> >> >
> >> > I agree! Thanks for pointing that out.
> >> >
> >> > > If we decide to use those new intrinsics, then those should be
> >> experimental
> >> > > (at least to start).
> >> >
> >> > Agreed.
> >> >
> >> > -Matt
> >> >
> >> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >> [hidden email]>
> >> > > wrote:
> >> > >
> >> > > > Introduction
> >> > > > -----------------
> >> > > > Currently llvm-mca only accepts assembly code as input. We would
> >> like to
> >> > > > extend llvm-mca to support object files, allowing users to analyze
> >> the
> >> > > > performance of binaries. The proposed changes (which involve both
> >> > > > clang and llvm) optionally introduce an object file section, but
> >> this can
> >> > > > be
> >> > > > stripped-out if desired.
> >> > > >
> >> > > > For the llvm-mca binary support feature to be useful, a user needs
> >> to tell
> >> > > > llvm-mca which portions of their code they would like analyzed.
> >> Currently,
> >> > > > this is accomplished via assembly comments. However, assembly
> >> comments are
> >> > > > not
> >> > > > preserved in object files, and this has encouraged this RFC. For the
> >> > > > proposed
> >> > > > binary support, we need to introduce changes to clang and llvm to
> >> allow the
> >> > > > user's object code to be recognized by llvm-mca:
> >> > > >
> >> > > > * We need a way for a user to identify a region/block of code they
> >> want
> >> > > >    analyzed by llvm-mca.
> >> > > > * We need the information defining the user's region of code to be
> >> > > > maintained
> >> > > >    in the object file so that llvm-mca can analyze the desired
> >> region(s)
> >> > > > from the
> >> > > >    object file.
> >> > > >
> >> > > > We define a "code region" as a subset of a user's program that is
> >> to be
> >> > > > analyzed via llvm-mca. The sequence of instructions to be analyzed
> >> is
> >> > > > represented as a pair: <start, end> where the 'start' marks the
> >> beginning
> >> > > > of
> >> > > > the user's source code and 'end' terminates the sequence. The
> >> instructions
> >> > > > between 'start' and 'end' form the region that can be analyzed by
> >> llvm-mca
> >> > > > at a
> >> > > > later time.
> >> > > >
> >> > > > Example
> >> > > > -----------
> >> > > > Before we go into the details of this proposed change, let's first
> >> look at
> >> > > > a
> >> > > > simple example:
> >> > > >
> >> > > > // example.c -- Analyze a dot-product expression.
> >> > > > double test(double x, double y) {
> >> > > >    double result = 0.0;
> >> > > >    __mca_code_region_start(42);
> >> > > >    result += x * y;
> >> > > >    __mca_code_region_end();
> >> > > >    return result;
> >> > > > }
> >> > > >
> >> > > > In the example above, we have identified a code region, in this
> >> case a
> >> > > > single
> >> > > > dot-product expression. For the sake of brevity and simplicity,
> >> we've
> >> > > > chosen
> >> > > > a very simple example, but in reality a more complicated example
> >> could use
> >> > > > multiple expressions. We have also denoted this region as number
> >> 42. That
> >> > > > identifier is only for the user, and simplifies reading an llvm-mca
> >> > > > analysis
> >> > > > report later.
> >> > > >
> >> > > > When this code is compiled, the region markers (the mca_code_region
> >> > > > markers)
> >> > > > are transformed into assembly labels. While the markers are
> >> presented as
> >> > > > function calls, in reality they are no-ops.
> >> > > >
> >> > > > test:
> >> > > > pushq   %rbp
> >> > > > movq    %rsp, %rbp
> >> > > > movsd   %xmm0, -8(%rbp)
> >> > > > movsd   %xmm1, -16(%rbp)
> >> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >> > > > xorps   %xmm0, %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > movsd   -8(%rbp), %xmm0
> >> > > > mulsd   -16(%rbp), %xmm0
> >> > > > addsd   -24(%rbp), %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >> > > > movsd   -24(%rbp), %xmm0
> >> > > > popq    %rbp
> >> > > > retq
> >> > > > .section        .mca_code_regions,"",@progbits
> >> > > > .quad   42
> >> > > > .quad   .Lmca_code_region_start_0
> >> > > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >> > > >
> >> > > > The assembly has been trimmed to show the portions relevant to this
> >> RFC.
> >> > > > Notice the labels enclose the user's defined region, and that they
> >> > > > preserve the
> >> > > > user's arbitrary region identifier, the ever-so-important region 42.
> >> > > >
> >> > > > In the object file section .mca_code_regions, we have noted the
> >> user's
> >> > > > region
> >> > > > identifier (.quad 42), start address, and region size. A more
> >> complicated
> >> > > > example can have multiple regions defined within a single
> >> .mca_code_regions
> >> > > > section. This section can be read by llvm-mca, allowing llvm-mca to
> >> take
> >> > > > object files as input instead of assembly source.
> >> > > >
> >> > > > Details
> >> > > > ---------
> >> > > > We need a way for a user to identify a region/block of code they
> >> want
> >> > > > analyzed
> >> > > > by llvm-mca. We solve this problem by introducing two intrinsics
> >> that a
> >> > > > user can
> >> > > > specify, for identifying regions of code for analysis.
> >> > > >
> >> > > > The two intrinsics are: llvm.mca.code.regions.start and
> >> > > > llvm.mca.code.regions.end. A user can identify a code region by
> >> inserting
> >> > > > the
> >> > > > mca_code_region_start and mca_code_region_end markers. These are
> >> simply
> >> > > > clang builtins and are transformed into the aforementioned
> >> intrinsics
> >> > > > during
> >> > > > compilation. The code between the intrinsics are what we call "code
> >> > > > regions"
> >> > > > and are to be easily identifiable by llvm-mca; any code between a
> >> start/end
> >> > > > pair can be analyzed by llvm-mca at a later time. A user can define
> >> > > > multiple
> >> > > > non-overlapping code regions within their program.
> >> > > >
> >> > > > The llvm.mca.code.region.start intrinsic takes an integer constant
> >> as its
> >> > > > only
> >> > > > argument. This argument is implemented as a metadata i32, and is
> >> only used
> >> > > > when generating llvm-mca reports. This value allows a user to more
> >> easily
> >> > > > identify a specific code region. llvm.mca.code.region.end takes no
> >> > > > arguments.
> >> > > > Since we disallow nesting of regions, the first 'end' intrinsic
> >> lexically
> >> > > > following a 'start' intrinsic represents the end of that code
> >> region.
> >> > > >
> >> > > > Now that we have a solution for identifying regions for analysis,
> >> we now
> >> > > > need a
> >> > > > way for preserving that information to be read at a later time. To
> >> > > > accomplish
> >> > > > this we propose adding a new section (.mca_code_regions) to the
> >> object file
> >> > > > generated by llvm. During code generation, the start/end intrinsics
> >> > > > described
> >> > > > above will be transformed into start/end labels in assembly. When
> >> llvm
> >> > > > generates the object file from the user's code, these start/end
> >> labels
> >> > > > form a
> >> > > > pair of values identifying the start of the user's code region, and
> >> size.
> >> > > > The
> >> > > > size represents the number of bytes between the start and end
> >> address of
> >> > > > the
> >> > > > labels. Note that the labels are emitted during assembly printing.
> >> We hope
> >> > > > that these labels have no influence on code generation or
> >> basic-block
> >> > > > placement. However, the target assembler strategy for handling
> >> labels is
> >> > > > outside of our control.
> >> > > >
> >> > > > This proposed change affects the size of a binary, but only if the
> >> user
> >> > > > calls
> >> > > > the start/end builtins mentioned above. The additional size of the
> >> > > > .mca_code_regions section, which we imagine to be very small (to
> >> the order
> >> > > > of a
> >> > > > few bytes), can trivially be stripped by tools like 'strip' or
> >> 'objcopy'.
> >> > > >
> >> > > > Implementation Status
> >> > > > ------------------------------
> >> > > > We currently have the proposed changes implemented at the url
> >> posted below.
> >> > > > This initial patch only targets ELF object files, and does not
> >> handle
> >> > > > relocatable addresses. Since the start of a code region is
> >> represented as
> >> > > > an
> >> > > > assembly label, and referenced in the .mca_code_regions section,
> >> that
> >> > > > address
> >> > > > is relocatable. That value can be represented as section-relative
> >> > > > relocatable
> >> > > > symbol (.text + addend), but we are not handling that case yet.
> >> Instead,
> >> > > > the
> >> > > > proposed changes only handle linked/executable object files.
> >> > > >
> >> > > > For purposes of review and to communicate the idea, the change is
> >> > > > presented as a monolithic patch here:
> >> > > >
> >> > > > https://reviews.llvm.org/D54603
> >> > > >
> >> > > > The change is presented as a monolithic patch; however, if accepted
> >> > > > the patch will be split into three smaller patches:
> >> > > > 1. The introduction of the builtins to clang.
> >> > > > 2. The llvm portion (the added intrinsics).
> >> > > > 3. The llvm-mca portion.
> >> > > >
> >> > > > Thanks!
> >> > > >
> >> > > > -Matt
> >> > > > _______________________________________________
> >> > > > LLVM Developers mailing list
> >> > > > [hidden email]
> >> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > > >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > [hidden email]
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Hi Matt/Andrea,

I see pros and cons for IACA-style markers vs intrinsics.
On the one hand, IACA-style markers are very magical, and not very visible in both the source and object code. Using IACA-style markers has the advantage that you can use llvm-mca as a drop-in replacement for IACA, or even to compare their outputs on the exact same binary. They also do not require tooling on the compiler side and allow comparing the output of several compilers.
On the other hand, IACA-style markers do not have a equivalent on other architectures, and I'm not sure inventing new ones is a good idea :) I think the latter makes them pretty much a no-go for llvm-mca as I don't think we'll want to teach each target how to parse code regions. That's much better handled in a target-agnostic way by the object. Intel got away with them because they only had to support one architecture.

tl;dr: In the case of llvm-mca, I like your design better than the markers.

In terms of future-proofness of only allowing regions within a basic block, are we confident we can actually ever simulate branches apart from "always taken, perfectly predicated" loop ? Even this simple need requires knowing quite a few details on the frontend. The current design could handle this use case with the addition of an external "loop mode" option to MCA. If there are no other strong use cases, I would advocate for experimental intrinsics unless people can contribute other example use cases.

On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <[hidden email]> wrote:
Hi Andrea,

On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
> So, I have been thinking a bit more about this whole design.
>
> The more I think about your suggested design, the more I am convinced that
> we should do something more to support ranges in binary object files too.
> My understanding is that the reason why we don't support object files in
> general, is because of the presence of relocations. That is because a
> region start marker is effectively symbol relative, and the symbol (a
> function) would be relocated in the final executable.
> You mentioned to me that resolving even a 'simple' symbol-relative
> relocation is not trivial, beause it requires specific knowledge about the
> binary format, and the target (i.e. how relocations are encoded is target
> specific). I am surprised that there is not a utility library for resolving
> relocations.. but I am not familiar with that part of the compiler. I was
> hoping that there was a target specific interface to use in this case...

There might be a better way of resolving the relocs, but from what I saw
looking at llvm-objdump and other related tools, it seems that resolving
the relocated symbol is a target specific effort.  I also spent sometime
sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
performs the reloc resolution.  I should clarify that I too am not an
expert in llvm's utilities for performing symbol/reloc resolution, and
perhaps someone in the community can point me in the right direction.  I
can clearly see the reloc data in the object file via tools like
objdump; however, accessing the relocs via
llvm::object::ObjectFile::relocations() did not produce address values
that we could use (values of zero).

I was hoping that, for a first pass at this patch, supporting just
executables would be okay.  That keeps this initial patch set simple,
and hopefully will encourage others to take a peek at it, since it's
less daunting than what it might otherwise be.  Of course, there is the
concern that this initial patch will lock us into a design that will be
more complicated to unravel later.

> An alternative approach would require that you define your own
> "symbol-relative" reference. After all, ranges are just a sequences of
> instructions in a function. If a function symbol is described by the symbol
> table, then you should be able to obtain its offset in the .text section.
> So, you could potentially encode your own symbol+offset. However, the
> linker would not be able to understand your "custom relocation", and
> information about regions in the final elf would be basically broken.
> So,that would not be a solution...
>
> I don't know honestly what is the best approach to use in this case.
> As a compromise, it would not be a bad idea to add the ability to specify
> ranges from command line. What do you think?
> Still, from a user point of view, the idea that we don't support object
> files in general sounds like a big limitation.

I agree, only supporting executables is a limitation.  However, I'd
like to land the base support now and add in the additional
features/support after this large patch set lands.  But I can see
where landing the whole thing entirely also makes sense.

> About the new experimental intrinsics: those would definitely work well for
> the simple case where instructions are from the same basic block.
> However, some/most of the constraints that you plan to add will have to
> change if in future we decide to allow ranges that potentially cross
> multiple basic blocks. How will the rules/constraints on those new
> intrinsics change? I just want to make sure that the suggested design is
> future-proof.

Since the llvm/clang parts of the code are just responsible for
collecting where a range starts/ends, I hope that we can remove some
of the baked-in constraints that are specified in IR/Verifier.cpp.
As you pointed out earlier in this thread, we might want to
introduce a dominance check if/when we lift the one-basic-block
restriction.

-Matt

>
> -Andrea
>
> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <[hidden email]>
> wrote:
>
> > Thanks for clarifying it Matt.
> >
> > In general, I quite like your suggested design.
> >
> > My only concern is about the semantic of the two new intrinsics. You
> > design doesn't allow mca ranges to span through multiple basic blocks. That
> > constraint is acceptable for now, since llvm-mca doesn't know how to deal
> > with control flow.
> > However, I am a bit concerned about what might happen in future if we
> > decide to let users specify code regions that span through multiple basic
> > blocks. Basically, I don't particularly like the idea of changing the
> > semantic of already existing intrinsic. A design that already accounts for
> > that particular scenario/future work would be ideal. That being said,
> > marking those new intrinsics as 'experimental' may be a good compromise (at
> > least for now).
> >
> > So, I am quite happy overall with the direction of this RFC.
> > However, I am interesting to hear from other developers about your
> > suggested design.
> >
> > > This initial patch only targets ELF object files, and does not handle
> > relocatable addresses. Since the start of a code region is represented as
> > an
> > assembly label, and referenced in the .mca_code_regions section, that
> > address
> > is relocatable.
> >
> > This may be okay for now. However, it would be nice to remove that
> > constraint in future and add support to generic object files.
> >
> > -Andrea
> >
> > On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
> >
> >> I want to clarify a few restrictions of llvm-mca code regions that this
> >> RFC proposes:
> >>
> >> 1) All llvm-mca code regions must start with an
> >> llvm.mca.code.region.start intrinsic and end with
> >> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR
> >> level in the IR verifier.
> >>
> >> 2) llvm-mca code regions cannot nest.  This restriction implies that an
> >> llvm.mca.code.region.start
> >> must have a llvm.mca.code.region.end intrinsic without any other llvm.mca
> >> start intrinsics
> >> between the two. The current implementation in the patch enforces this
> >> restriction at the
> >> IR level via the IR Verifier.
> >>
> >> 3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca
> >> does not follow
> >> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> >> like any other instruction.
> >> The current patch associated with this RFC does not enforce this
> >> restriction.  I plan on updating
> >> the patch to enforce that a code region can only belong to a single basic
> >> block.  This is a simple
> >> check, ensuring that both the llvm.mca.code.region.start and accompanying
> >> end intrinsics live
> >> in the same basic block. I imagine adding this check at the IR level when
> >> we also verify points 1 and 2
> >> above.  That will keep the code-region verification logic isolated to the
> >> IR verifier.  The start/end
> >> intrinsics should not have any uses, so I'm not sure that they would be
> >> moved/sunk on behalf
> >> of any other instruction.  In other words, I do not imagine that a start
> >> and end would be split
> >> apart due to later MI optimizations.  If I discover that such a case
> >> occurs, then I might add the
> >> basic-block check prior to emitting the code region data to the object
> >> file.    Once  llvm-mca  is
> >> updated to handle branches, then we can remove this constraint.
> >>
> >> -Matt
> >>
> >> > -----Original Message-----
> >> > From: llvm-dev <[hidden email]> On Behalf Of Matt
> >> Davis via llvm-
> >> > dev
> >> > Sent: Wednesday, November 21, 2018 8:47 AM
> >> > To: Andrea Di Biagio <[hidden email]>
> >> > Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> >> > <[hidden email]>; [hidden email]
> >> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> >> llvm-mca.
> >> >
> >> > Hi Andrea,
> >> >
> >> > Thanks for your input.
> >> >
> >> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> >> > [... snip ...]
> >> > > About the suggested design:
> >> > > I like the idea of being able to identify code regions using a numeric
> >> > > identifier.
> >> > > However, what happens if a code region spans through multiple basic
> >> blocks?
> >> >
> >> > The current patch does not take into consideration cases where the
> >> > region start and end intrinsics are placed in different basic blocks.
> >> > Such would be the case if a region is defined to span multiple blocks.
> >> > This would be similar to the current case where a user places a
> >> > #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> >> > another.  However, as you point out below, if the user does this in the
> >> > source code via intrinsics (just what this patch is proposing), then
> >> > there is a chance that optimizations might change the layout of the
> >> > instructions and confuse the ordering of the MCA intrinsics.
> >> >
> >> > Since MCA does not follow branches (MCA just treats a branch as it would
> >> > a non-branching instruction), it seems that a user should be aware that
> >> > defining MCA code regions that span multiple blocks might result in an
> >> > unexpected analysis.  While we do not discourage this, it seems like
> >> > such a case will probably not produce an expected result for the user.
> >> > We could introduce a warning, or automatically divide the regions so
> >> > that a single region can only contain a single block.
> >> >
> >> > > My understanding is that code regions are not allowed to overlap. So,
> >> it
> >> > > makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> >> > > However, what if ` __mca_code_region_end()` ends in a different basic
> >> block?
> >> > >
> >> > > `__mca_code_region_start()` has to always dominate `
> >> > > __mca_code_region_end()`. This is trivial to verify when both calls
> >> are in
> >> > > a same basic block; however, we need to make sure that the
> >> relationship is
> >> > > still the same when the `end()` call is in a different basic block.
> >> > > That would not be enough. I think we should also verify  that `
> >> > > __mca_code_region_end()` always post-dominates the call to
> >> > > `__mca_code_region_start()`.
> >> >
> >> > In any case this patch should probably check dominance of the
> >> > intrinsics, even though MCA does not follow branches and MCA does not
> >> > not explicitly forbid a region from containing multiple blocks.
> >> >
> >> > >
> >> > > My question is: what happens with basic block reordering? We don't
> >> know the
> >> > > layout of basic blocks until we reach code emission. How does it work
> >> for
> >> > > regions that span through multiple basic blocks?. I think your RFC
> >> should
> >> > > clarify this aspect.
> >> > >
> >> > > As a side note: at the moment, llvm-mca doesn't know how to deal with
> >> > > branches. So, for simplicity we could force code regions to only
> >> contain
> >> > > instructions from a single basic block.
> >> > >
> >> > > However, In future we may want to teach llvm-mca how to analyze
> >> branchy
> >> > > code too. For example, we could introduce a simple control-flow
> >> analysis in
> >> > > llvm-mca, and use an external "branch trace" information (for
> >> example, a
> >> > > perf trace generated by an external tool) to decorate branches with
> >> with
> >> > > branch probabilities (similarly to what we currently do in LLVM with
> >> PGO).
> >> > > We could then use that knowledge to model branch prediction and
> >> simulate
> >> > > what happens in the presence of multiple branches.
> >> > >
> >> > > So, the idea of having regions that potentially span multiple basic
> >> blocks
> >> > > is not bad in general. However, I think you should better clarify
> >> what are
> >> > > the constraints (at least, you should answer to my questions from
> >> before).
> >> >
> >> > I agree! Thanks for pointing that out.
> >> >
> >> > > If we decide to use those new intrinsics, then those should be
> >> experimental
> >> > > (at least to start).
> >> >
> >> > Agreed.
> >> >
> >> > -Matt
> >> >
> >> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >> [hidden email]>
> >> > > wrote:
> >> > >
> >> > > > Introduction
> >> > > > -----------------
> >> > > > Currently llvm-mca only accepts assembly code as input. We would
> >> like to
> >> > > > extend llvm-mca to support object files, allowing users to analyze
> >> the
> >> > > > performance of binaries. The proposed changes (which involve both
> >> > > > clang and llvm) optionally introduce an object file section, but
> >> this can
> >> > > > be
> >> > > > stripped-out if desired.
> >> > > >
> >> > > > For the llvm-mca binary support feature to be useful, a user needs
> >> to tell
> >> > > > llvm-mca which portions of their code they would like analyzed.
> >> Currently,
> >> > > > this is accomplished via assembly comments. However, assembly
> >> comments are
> >> > > > not
> >> > > > preserved in object files, and this has encouraged this RFC. For the
> >> > > > proposed
> >> > > > binary support, we need to introduce changes to clang and llvm to
> >> allow the
> >> > > > user's object code to be recognized by llvm-mca:
> >> > > >
> >> > > > * We need a way for a user to identify a region/block of code they
> >> want
> >> > > >    analyzed by llvm-mca.
> >> > > > * We need the information defining the user's region of code to be
> >> > > > maintained
> >> > > >    in the object file so that llvm-mca can analyze the desired
> >> region(s)
> >> > > > from the
> >> > > >    object file.
> >> > > >
> >> > > > We define a "code region" as a subset of a user's program that is
> >> to be
> >> > > > analyzed via llvm-mca. The sequence of instructions to be analyzed
> >> is
> >> > > > represented as a pair: <start, end> where the 'start' marks the
> >> beginning
> >> > > > of
> >> > > > the user's source code and 'end' terminates the sequence. The
> >> instructions
> >> > > > between 'start' and 'end' form the region that can be analyzed by
> >> llvm-mca
> >> > > > at a
> >> > > > later time.
> >> > > >
> >> > > > Example
> >> > > > -----------
> >> > > > Before we go into the details of this proposed change, let's first
> >> look at
> >> > > > a
> >> > > > simple example:
> >> > > >
> >> > > > // example.c -- Analyze a dot-product expression.
> >> > > > double test(double x, double y) {
> >> > > >    double result = 0.0;
> >> > > >    __mca_code_region_start(42);
> >> > > >    result += x * y;
> >> > > >    __mca_code_region_end();
> >> > > >    return result;
> >> > > > }
> >> > > >
> >> > > > In the example above, we have identified a code region, in this
> >> case a
> >> > > > single
> >> > > > dot-product expression. For the sake of brevity and simplicity,
> >> we've
> >> > > > chosen
> >> > > > a very simple example, but in reality a more complicated example
> >> could use
> >> > > > multiple expressions. We have also denoted this region as number
> >> 42. That
> >> > > > identifier is only for the user, and simplifies reading an llvm-mca
> >> > > > analysis
> >> > > > report later.
> >> > > >
> >> > > > When this code is compiled, the region markers (the mca_code_region
> >> > > > markers)
> >> > > > are transformed into assembly labels. While the markers are
> >> presented as
> >> > > > function calls, in reality they are no-ops.
> >> > > >
> >> > > > test:
> >> > > > pushq   %rbp
> >> > > > movq    %rsp, %rbp
> >> > > > movsd   %xmm0, -8(%rbp)
> >> > > > movsd   %xmm1, -16(%rbp)
> >> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >> > > > xorps   %xmm0, %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > movsd   -8(%rbp), %xmm0
> >> > > > mulsd   -16(%rbp), %xmm0
> >> > > > addsd   -24(%rbp), %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >> > > > movsd   -24(%rbp), %xmm0
> >> > > > popq    %rbp
> >> > > > retq
> >> > > > .section        .mca_code_regions,"",@progbits
> >> > > > .quad   42
> >> > > > .quad   .Lmca_code_region_start_0
> >> > > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >> > > >
> >> > > > The assembly has been trimmed to show the portions relevant to this
> >> RFC.
> >> > > > Notice the labels enclose the user's defined region, and that they
> >> > > > preserve the
> >> > > > user's arbitrary region identifier, the ever-so-important region 42.
> >> > > >
> >> > > > In the object file section .mca_code_regions, we have noted the
> >> user's
> >> > > > region
> >> > > > identifier (.quad 42), start address, and region size. A more
> >> complicated
> >> > > > example can have multiple regions defined within a single
> >> .mca_code_regions
> >> > > > section. This section can be read by llvm-mca, allowing llvm-mca to
> >> take
> >> > > > object files as input instead of assembly source.
> >> > > >
> >> > > > Details
> >> > > > ---------
> >> > > > We need a way for a user to identify a region/block of code they
> >> want
> >> > > > analyzed
> >> > > > by llvm-mca. We solve this problem by introducing two intrinsics
> >> that a
> >> > > > user can
> >> > > > specify, for identifying regions of code for analysis.
> >> > > >
> >> > > > The two intrinsics are: llvm.mca.code.regions.start and
> >> > > > llvm.mca.code.regions.end. A user can identify a code region by
> >> inserting
> >> > > > the
> >> > > > mca_code_region_start and mca_code_region_end markers. These are
> >> simply
> >> > > > clang builtins and are transformed into the aforementioned
> >> intrinsics
> >> > > > during
> >> > > > compilation. The code between the intrinsics are what we call "code
> >> > > > regions"
> >> > > > and are to be easily identifiable by llvm-mca; any code between a
> >> start/end
> >> > > > pair can be analyzed by llvm-mca at a later time. A user can define
> >> > > > multiple
> >> > > > non-overlapping code regions within their program.
> >> > > >
> >> > > > The llvm.mca.code.region.start intrinsic takes an integer constant
> >> as its
> >> > > > only
> >> > > > argument. This argument is implemented as a metadata i32, and is
> >> only used
> >> > > > when generating llvm-mca reports. This value allows a user to more
> >> easily
> >> > > > identify a specific code region. llvm.mca.code.region.end takes no
> >> > > > arguments.
> >> > > > Since we disallow nesting of regions, the first 'end' intrinsic
> >> lexically
> >> > > > following a 'start' intrinsic represents the end of that code
> >> region.
> >> > > >
> >> > > > Now that we have a solution for identifying regions for analysis,
> >> we now
> >> > > > need a
> >> > > > way for preserving that information to be read at a later time. To
> >> > > > accomplish
> >> > > > this we propose adding a new section (.mca_code_regions) to the
> >> object file
> >> > > > generated by llvm. During code generation, the start/end intrinsics
> >> > > > described
> >> > > > above will be transformed into start/end labels in assembly. When
> >> llvm
> >> > > > generates the object file from the user's code, these start/end
> >> labels
> >> > > > form a
> >> > > > pair of values identifying the start of the user's code region, and
> >> size.
> >> > > > The
> >> > > > size represents the number of bytes between the start and end
> >> address of
> >> > > > the
> >> > > > labels. Note that the labels are emitted during assembly printing.
> >> We hope
> >> > > > that these labels have no influence on code generation or
> >> basic-block
> >> > > > placement. However, the target assembler strategy for handling
> >> labels is
> >> > > > outside of our control.
> >> > > >
> >> > > > This proposed change affects the size of a binary, but only if the
> >> user
> >> > > > calls
> >> > > > the start/end builtins mentioned above. The additional size of the
> >> > > > .mca_code_regions section, which we imagine to be very small (to
> >> the order
> >> > > > of a
> >> > > > few bytes), can trivially be stripped by tools like 'strip' or
> >> 'objcopy'.
> >> > > >
> >> > > > Implementation Status
> >> > > > ------------------------------
> >> > > > We currently have the proposed changes implemented at the url
> >> posted below.
> >> > > > This initial patch only targets ELF object files, and does not
> >> handle
> >> > > > relocatable addresses. Since the start of a code region is
> >> represented as
> >> > > > an
> >> > > > assembly label, and referenced in the .mca_code_regions section,
> >> that
> >> > > > address
> >> > > > is relocatable. That value can be represented as section-relative
> >> > > > relocatable
> >> > > > symbol (.text + addend), but we are not handling that case yet.
> >> Instead,
> >> > > > the
> >> > > > proposed changes only handle linked/executable object files.
> >> > > >
> >> > > > For purposes of review and to communicate the idea, the change is
> >> > > > presented as a monolithic patch here:
> >> > > >
> >> > > > https://reviews.llvm.org/D54603
> >> > > >
> >> > > > The change is presented as a monolithic patch; however, if accepted
> >> > > > the patch will be split into three smaller patches:
> >> > > > 1. The introduction of the builtins to clang.
> >> > > > 2. The llvm portion (the added intrinsics).
> >> > > > 3. The llvm-mca portion.
> >> > > >
> >> > > > Thanks!
> >> > > >
> >> > > > -Matt
> >> > > > _______________________________________________
> >> > > > LLVM Developers mailing list
> >> > > > [hidden email]
> >> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > > >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > [hidden email]
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
+1 to what Clement said.
I believe the intrinsics are a better design to support many architectures.

IACA users are probably decorating their code with IACA_START / IACA_END macros. One possibility is to provide a header that define these macros in terms of the new intrinsics.

On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <[hidden email]> wrote:
Hi Matt/Andrea,

I see pros and cons for IACA-style markers vs intrinsics.
On the one hand, IACA-style markers are very magical, and not very visible in both the source and object code. Using IACA-style markers has the advantage that you can use llvm-mca as a drop-in replacement for IACA, or even to compare their outputs on the exact same binary. They also do not require tooling on the compiler side and allow comparing the output of several compilers.
On the other hand, IACA-style markers do not have a equivalent on other architectures, and I'm not sure inventing new ones is a good idea :) I think the latter makes them pretty much a no-go for llvm-mca as I don't think we'll want to teach each target how to parse code regions. That's much better handled in a target-agnostic way by the object. Intel got away with them because they only had to support one architecture.

tl;dr: In the case of llvm-mca, I like your design better than the markers.

In terms of future-proofness of only allowing regions within a basic block, are we confident we can actually ever simulate branches apart from "always taken, perfectly predicated" loop ? Even this simple need requires knowing quite a few details on the frontend. The current design could handle this use case with the addition of an external "loop mode" option to MCA. If there are no other strong use cases, I would advocate for experimental intrinsics unless people can contribute other example use cases.

On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <[hidden email]> wrote:
Hi Andrea,

On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
> So, I have been thinking a bit more about this whole design.
>
> The more I think about your suggested design, the more I am convinced that
> we should do something more to support ranges in binary object files too.
> My understanding is that the reason why we don't support object files in
> general, is because of the presence of relocations. That is because a
> region start marker is effectively symbol relative, and the symbol (a
> function) would be relocated in the final executable.
> You mentioned to me that resolving even a 'simple' symbol-relative
> relocation is not trivial, beause it requires specific knowledge about the
> binary format, and the target (i.e. how relocations are encoded is target
> specific). I am surprised that there is not a utility library for resolving
> relocations.. but I am not familiar with that part of the compiler. I was
> hoping that there was a target specific interface to use in this case...

There might be a better way of resolving the relocs, but from what I saw
looking at llvm-objdump and other related tools, it seems that resolving
the relocated symbol is a target specific effort.  I also spent sometime
sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
performs the reloc resolution.  I should clarify that I too am not an
expert in llvm's utilities for performing symbol/reloc resolution, and
perhaps someone in the community can point me in the right direction.  I
can clearly see the reloc data in the object file via tools like
objdump; however, accessing the relocs via
llvm::object::ObjectFile::relocations() did not produce address values
that we could use (values of zero).

I was hoping that, for a first pass at this patch, supporting just
executables would be okay.  That keeps this initial patch set simple,
and hopefully will encourage others to take a peek at it, since it's
less daunting than what it might otherwise be.  Of course, there is the
concern that this initial patch will lock us into a design that will be
more complicated to unravel later.

> An alternative approach would require that you define your own
> "symbol-relative" reference. After all, ranges are just a sequences of
> instructions in a function. If a function symbol is described by the symbol
> table, then you should be able to obtain its offset in the .text section.
> So, you could potentially encode your own symbol+offset. However, the
> linker would not be able to understand your "custom relocation", and
> information about regions in the final elf would be basically broken.
> So,that would not be a solution...
>
> I don't know honestly what is the best approach to use in this case.
> As a compromise, it would not be a bad idea to add the ability to specify
> ranges from command line. What do you think?
> Still, from a user point of view, the idea that we don't support object
> files in general sounds like a big limitation.

I agree, only supporting executables is a limitation.  However, I'd
like to land the base support now and add in the additional
features/support after this large patch set lands.  But I can see
where landing the whole thing entirely also makes sense.

> About the new experimental intrinsics: those would definitely work well for
> the simple case where instructions are from the same basic block.
> However, some/most of the constraints that you plan to add will have to
> change if in future we decide to allow ranges that potentially cross
> multiple basic blocks. How will the rules/constraints on those new
> intrinsics change? I just want to make sure that the suggested design is
> future-proof.

Since the llvm/clang parts of the code are just responsible for
collecting where a range starts/ends, I hope that we can remove some
of the baked-in constraints that are specified in IR/Verifier.cpp.
As you pointed out earlier in this thread, we might want to
introduce a dominance check if/when we lift the one-basic-block
restriction.

-Matt

>
> -Andrea
>
> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <[hidden email]>
> wrote:
>
> > Thanks for clarifying it Matt.
> >
> > In general, I quite like your suggested design.
> >
> > My only concern is about the semantic of the two new intrinsics. You
> > design doesn't allow mca ranges to span through multiple basic blocks. That
> > constraint is acceptable for now, since llvm-mca doesn't know how to deal
> > with control flow.
> > However, I am a bit concerned about what might happen in future if we
> > decide to let users specify code regions that span through multiple basic
> > blocks. Basically, I don't particularly like the idea of changing the
> > semantic of already existing intrinsic. A design that already accounts for
> > that particular scenario/future work would be ideal. That being said,
> > marking those new intrinsics as 'experimental' may be a good compromise (at
> > least for now).
> >
> > So, I am quite happy overall with the direction of this RFC.
> > However, I am interesting to hear from other developers about your
> > suggested design.
> >
> > > This initial patch only targets ELF object files, and does not handle
> > relocatable addresses. Since the start of a code region is represented as
> > an
> > assembly label, and referenced in the .mca_code_regions section, that
> > address
> > is relocatable.
> >
> > This may be okay for now. However, it would be nice to remove that
> > constraint in future and add support to generic object files.
> >
> > -Andrea
> >
> > On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
> >
> >> I want to clarify a few restrictions of llvm-mca code regions that this
> >> RFC proposes:
> >>
> >> 1) All llvm-mca code regions must start with an
> >> llvm.mca.code.region.start intrinsic and end with
> >> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR
> >> level in the IR verifier.
> >>
> >> 2) llvm-mca code regions cannot nest.  This restriction implies that an
> >> llvm.mca.code.region.start
> >> must have a llvm.mca.code.region.end intrinsic without any other llvm.mca
> >> start intrinsics
> >> between the two. The current implementation in the patch enforces this
> >> restriction at the
> >> IR level via the IR Verifier.
> >>
> >> 3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca
> >> does not follow
> >> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> >> like any other instruction.
> >> The current patch associated with this RFC does not enforce this
> >> restriction.  I plan on updating
> >> the patch to enforce that a code region can only belong to a single basic
> >> block.  This is a simple
> >> check, ensuring that both the llvm.mca.code.region.start and accompanying
> >> end intrinsics live
> >> in the same basic block. I imagine adding this check at the IR level when
> >> we also verify points 1 and 2
> >> above.  That will keep the code-region verification logic isolated to the
> >> IR verifier.  The start/end
> >> intrinsics should not have any uses, so I'm not sure that they would be
> >> moved/sunk on behalf
> >> of any other instruction.  In other words, I do not imagine that a start
> >> and end would be split
> >> apart due to later MI optimizations.  If I discover that such a case
> >> occurs, then I might add the
> >> basic-block check prior to emitting the code region data to the object
> >> file.    Once  llvm-mca  is
> >> updated to handle branches, then we can remove this constraint.
> >>
> >> -Matt
> >>
> >> > -----Original Message-----
> >> > From: llvm-dev <[hidden email]> On Behalf Of Matt
> >> Davis via llvm-
> >> > dev
> >> > Sent: Wednesday, November 21, 2018 8:47 AM
> >> > To: Andrea Di Biagio <[hidden email]>
> >> > Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> >> > <[hidden email]>; [hidden email]
> >> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> >> llvm-mca.
> >> >
> >> > Hi Andrea,
> >> >
> >> > Thanks for your input.
> >> >
> >> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> >> > [... snip ...]
> >> > > About the suggested design:
> >> > > I like the idea of being able to identify code regions using a numeric
> >> > > identifier.
> >> > > However, what happens if a code region spans through multiple basic
> >> blocks?
> >> >
> >> > The current patch does not take into consideration cases where the
> >> > region start and end intrinsics are placed in different basic blocks.
> >> > Such would be the case if a region is defined to span multiple blocks.
> >> > This would be similar to the current case where a user places a
> >> > #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> >> > another.  However, as you point out below, if the user does this in the
> >> > source code via intrinsics (just what this patch is proposing), then
> >> > there is a chance that optimizations might change the layout of the
> >> > instructions and confuse the ordering of the MCA intrinsics.
> >> >
> >> > Since MCA does not follow branches (MCA just treats a branch as it would
> >> > a non-branching instruction), it seems that a user should be aware that
> >> > defining MCA code regions that span multiple blocks might result in an
> >> > unexpected analysis.  While we do not discourage this, it seems like
> >> > such a case will probably not produce an expected result for the user.
> >> > We could introduce a warning, or automatically divide the regions so
> >> > that a single region can only contain a single block.
> >> >
> >> > > My understanding is that code regions are not allowed to overlap. So,
> >> it
> >> > > makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> >> > > However, what if ` __mca_code_region_end()` ends in a different basic
> >> block?
> >> > >
> >> > > `__mca_code_region_start()` has to always dominate `
> >> > > __mca_code_region_end()`. This is trivial to verify when both calls
> >> are in
> >> > > a same basic block; however, we need to make sure that the
> >> relationship is
> >> > > still the same when the `end()` call is in a different basic block.
> >> > > That would not be enough. I think we should also verify  that `
> >> > > __mca_code_region_end()` always post-dominates the call to
> >> > > `__mca_code_region_start()`.
> >> >
> >> > In any case this patch should probably check dominance of the
> >> > intrinsics, even though MCA does not follow branches and MCA does not
> >> > not explicitly forbid a region from containing multiple blocks.
> >> >
> >> > >
> >> > > My question is: what happens with basic block reordering? We don't
> >> know the
> >> > > layout of basic blocks until we reach code emission. How does it work
> >> for
> >> > > regions that span through multiple basic blocks?. I think your RFC
> >> should
> >> > > clarify this aspect.
> >> > >
> >> > > As a side note: at the moment, llvm-mca doesn't know how to deal with
> >> > > branches. So, for simplicity we could force code regions to only
> >> contain
> >> > > instructions from a single basic block.
> >> > >
> >> > > However, In future we may want to teach llvm-mca how to analyze
> >> branchy
> >> > > code too. For example, we could introduce a simple control-flow
> >> analysis in
> >> > > llvm-mca, and use an external "branch trace" information (for
> >> example, a
> >> > > perf trace generated by an external tool) to decorate branches with
> >> with
> >> > > branch probabilities (similarly to what we currently do in LLVM with
> >> PGO).
> >> > > We could then use that knowledge to model branch prediction and
> >> simulate
> >> > > what happens in the presence of multiple branches.
> >> > >
> >> > > So, the idea of having regions that potentially span multiple basic
> >> blocks
> >> > > is not bad in general. However, I think you should better clarify
> >> what are
> >> > > the constraints (at least, you should answer to my questions from
> >> before).
> >> >
> >> > I agree! Thanks for pointing that out.
> >> >
> >> > > If we decide to use those new intrinsics, then those should be
> >> experimental
> >> > > (at least to start).
> >> >
> >> > Agreed.
> >> >
> >> > -Matt
> >> >
> >> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >> [hidden email]>
> >> > > wrote:
> >> > >
> >> > > > Introduction
> >> > > > -----------------
> >> > > > Currently llvm-mca only accepts assembly code as input. We would
> >> like to
> >> > > > extend llvm-mca to support object files, allowing users to analyze
> >> the
> >> > > > performance of binaries. The proposed changes (which involve both
> >> > > > clang and llvm) optionally introduce an object file section, but
> >> this can
> >> > > > be
> >> > > > stripped-out if desired.
> >> > > >
> >> > > > For the llvm-mca binary support feature to be useful, a user needs
> >> to tell
> >> > > > llvm-mca which portions of their code they would like analyzed.
> >> Currently,
> >> > > > this is accomplished via assembly comments. However, assembly
> >> comments are
> >> > > > not
> >> > > > preserved in object files, and this has encouraged this RFC. For the
> >> > > > proposed
> >> > > > binary support, we need to introduce changes to clang and llvm to
> >> allow the
> >> > > > user's object code to be recognized by llvm-mca:
> >> > > >
> >> > > > * We need a way for a user to identify a region/block of code they
> >> want
> >> > > >    analyzed by llvm-mca.
> >> > > > * We need the information defining the user's region of code to be
> >> > > > maintained
> >> > > >    in the object file so that llvm-mca can analyze the desired
> >> region(s)
> >> > > > from the
> >> > > >    object file.
> >> > > >
> >> > > > We define a "code region" as a subset of a user's program that is
> >> to be
> >> > > > analyzed via llvm-mca. The sequence of instructions to be analyzed
> >> is
> >> > > > represented as a pair: <start, end> where the 'start' marks the
> >> beginning
> >> > > > of
> >> > > > the user's source code and 'end' terminates the sequence. The
> >> instructions
> >> > > > between 'start' and 'end' form the region that can be analyzed by
> >> llvm-mca
> >> > > > at a
> >> > > > later time.
> >> > > >
> >> > > > Example
> >> > > > -----------
> >> > > > Before we go into the details of this proposed change, let's first
> >> look at
> >> > > > a
> >> > > > simple example:
> >> > > >
> >> > > > // example.c -- Analyze a dot-product expression.
> >> > > > double test(double x, double y) {
> >> > > >    double result = 0.0;
> >> > > >    __mca_code_region_start(42);
> >> > > >    result += x * y;
> >> > > >    __mca_code_region_end();
> >> > > >    return result;
> >> > > > }
> >> > > >
> >> > > > In the example above, we have identified a code region, in this
> >> case a
> >> > > > single
> >> > > > dot-product expression. For the sake of brevity and simplicity,
> >> we've
> >> > > > chosen
> >> > > > a very simple example, but in reality a more complicated example
> >> could use
> >> > > > multiple expressions. We have also denoted this region as number
> >> 42. That
> >> > > > identifier is only for the user, and simplifies reading an llvm-mca
> >> > > > analysis
> >> > > > report later.
> >> > > >
> >> > > > When this code is compiled, the region markers (the mca_code_region
> >> > > > markers)
> >> > > > are transformed into assembly labels. While the markers are
> >> presented as
> >> > > > function calls, in reality they are no-ops.
> >> > > >
> >> > > > test:
> >> > > > pushq   %rbp
> >> > > > movq    %rsp, %rbp
> >> > > > movsd   %xmm0, -8(%rbp)
> >> > > > movsd   %xmm1, -16(%rbp)
> >> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >> > > > xorps   %xmm0, %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > movsd   -8(%rbp), %xmm0
> >> > > > mulsd   -16(%rbp), %xmm0
> >> > > > addsd   -24(%rbp), %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >> > > > movsd   -24(%rbp), %xmm0
> >> > > > popq    %rbp
> >> > > > retq
> >> > > > .section        .mca_code_regions,"",@progbits
> >> > > > .quad   42
> >> > > > .quad   .Lmca_code_region_start_0
> >> > > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >> > > >
> >> > > > The assembly has been trimmed to show the portions relevant to this
> >> RFC.
> >> > > > Notice the labels enclose the user's defined region, and that they
> >> > > > preserve the
> >> > > > user's arbitrary region identifier, the ever-so-important region 42.
> >> > > >
> >> > > > In the object file section .mca_code_regions, we have noted the
> >> user's
> >> > > > region
> >> > > > identifier (.quad 42), start address, and region size. A more
> >> complicated
> >> > > > example can have multiple regions defined within a single
> >> .mca_code_regions
> >> > > > section. This section can be read by llvm-mca, allowing llvm-mca to
> >> take
> >> > > > object files as input instead of assembly source.
> >> > > >
> >> > > > Details
> >> > > > ---------
> >> > > > We need a way for a user to identify a region/block of code they
> >> want
> >> > > > analyzed
> >> > > > by llvm-mca. We solve this problem by introducing two intrinsics
> >> that a
> >> > > > user can
> >> > > > specify, for identifying regions of code for analysis.
> >> > > >
> >> > > > The two intrinsics are: llvm.mca.code.regions.start and
> >> > > > llvm.mca.code.regions.end. A user can identify a code region by
> >> inserting
> >> > > > the
> >> > > > mca_code_region_start and mca_code_region_end markers. These are
> >> simply
> >> > > > clang builtins and are transformed into the aforementioned
> >> intrinsics
> >> > > > during
> >> > > > compilation. The code between the intrinsics are what we call "code
> >> > > > regions"
> >> > > > and are to be easily identifiable by llvm-mca; any code between a
> >> start/end
> >> > > > pair can be analyzed by llvm-mca at a later time. A user can define
> >> > > > multiple
> >> > > > non-overlapping code regions within their program.
> >> > > >
> >> > > > The llvm.mca.code.region.start intrinsic takes an integer constant
> >> as its
> >> > > > only
> >> > > > argument. This argument is implemented as a metadata i32, and is
> >> only used
> >> > > > when generating llvm-mca reports. This value allows a user to more
> >> easily
> >> > > > identify a specific code region. llvm.mca.code.region.end takes no
> >> > > > arguments.
> >> > > > Since we disallow nesting of regions, the first 'end' intrinsic
> >> lexically
> >> > > > following a 'start' intrinsic represents the end of that code
> >> region.
> >> > > >
> >> > > > Now that we have a solution for identifying regions for analysis,
> >> we now
> >> > > > need a
> >> > > > way for preserving that information to be read at a later time. To
> >> > > > accomplish
> >> > > > this we propose adding a new section (.mca_code_regions) to the
> >> object file
> >> > > > generated by llvm. During code generation, the start/end intrinsics
> >> > > > described
> >> > > > above will be transformed into start/end labels in assembly. When
> >> llvm
> >> > > > generates the object file from the user's code, these start/end
> >> labels
> >> > > > form a
> >> > > > pair of values identifying the start of the user's code region, and
> >> size.
> >> > > > The
> >> > > > size represents the number of bytes between the start and end
> >> address of
> >> > > > the
> >> > > > labels. Note that the labels are emitted during assembly printing.
> >> We hope
> >> > > > that these labels have no influence on code generation or
> >> basic-block
> >> > > > placement. However, the target assembler strategy for handling
> >> labels is
> >> > > > outside of our control.
> >> > > >
> >> > > > This proposed change affects the size of a binary, but only if the
> >> user
> >> > > > calls
> >> > > > the start/end builtins mentioned above. The additional size of the
> >> > > > .mca_code_regions section, which we imagine to be very small (to
> >> the order
> >> > > > of a
> >> > > > few bytes), can trivially be stripped by tools like 'strip' or
> >> 'objcopy'.
> >> > > >
> >> > > > Implementation Status
> >> > > > ------------------------------
> >> > > > We currently have the proposed changes implemented at the url
> >> posted below.
> >> > > > This initial patch only targets ELF object files, and does not
> >> handle
> >> > > > relocatable addresses. Since the start of a code region is
> >> represented as
> >> > > > an
> >> > > > assembly label, and referenced in the .mca_code_regions section,
> >> that
> >> > > > address
> >> > > > is relocatable. That value can be represented as section-relative
> >> > > > relocatable
> >> > > > symbol (.text + addend), but we are not handling that case yet.
> >> Instead,
> >> > > > the
> >> > > > proposed changes only handle linked/executable object files.
> >> > > >
> >> > > > For purposes of review and to communicate the idea, the change is
> >> > > > presented as a monolithic patch here:
> >> > > >
> >> > > > https://reviews.llvm.org/D54603
> >> > > >
> >> > > > The change is presented as a monolithic patch; however, if accepted
> >> > > > the patch will be split into three smaller patches:
> >> > > > 1. The introduction of the builtins to clang.
> >> > > > 2. The llvm portion (the added intrinsics).
> >> > > > 3. The llvm-mca portion.
> >> > > >
> >> > > > Thanks!
> >> > > >
> >> > > > -Matt
> >> > > > _______________________________________________
> >> > > > LLVM Developers mailing list
> >> > > > [hidden email]
> >> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > > >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > [hidden email]
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Thanks for the feedback Guillaume and Clement!

In response to Clement:

> > In terms of future-proofness of only allowing regions within a basic
> > block, are we confident we can actually ever simulate branches apart from
> > "always taken, perfectly predicated" loop ? Even this simple need requires
> > knowing quite a few details on the frontend. The current design could
> > handle this use case with the addition of an external "loop mode" option to
> > MCA. If there are no other strong use cases, I would advocate for
> > experimental intrinsics unless people can contribute other example use
> > cases.

In short, I am in agreement and think that handling of branching or loop
constructs should be isolated to the llvm-mca driver/front-end.  The
only thing the code regions should be concerned with is identifying
blocks of instructions that will later be used by the front end.

We can place limitations to how those blocks are formed. For example the
current implementation forces regions to be isolated to a single basic
block.  However, we anticipate lifting this restriction once branching
is handled.

-Matt


On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:

> +1 to what Clement said.
> I believe the intrinsics are a better design to support many architectures.
>
> IACA users are probably decorating their code with IACA_START / IACA_END
> macros. One possibility is to provide a header that define these macros in
> terms of the new intrinsics.
>
> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <[hidden email]> wrote:
>
> > Hi Matt/Andrea,
> >
> > I see pros and cons for IACA-style markers vs intrinsics.
> > On the one hand, IACA-style markers are very magical, and not very visible
> > in both the source and object code. Using IACA-style markers has the
> > advantage that you can use llvm-mca as a drop-in replacement for IACA, or
> > even to compare their outputs on the exact same binary. They also do not
> > require tooling on the compiler side and allow comparing the output of
> > several compilers.
> >
> On the other hand, IACA-style markers do not have a equivalent on other
> > architectures, and I'm not sure inventing new ones is a good idea :) I
> > think the latter makes them pretty much a no-go for llvm-mca as I don't
> > think we'll want to teach each target how to parse code regions. That's
> > much better handled in a target-agnostic way by the object. Intel got away
> > with them because they only had to support one architecture.
> >
> > tl;dr: In the case of llvm-mca, I like your design better than the markers.
> >
> > In terms of future-proofness of only allowing regions within a basic
> > block, are we confident we can actually ever simulate branches apart from
> > "always taken, perfectly predicated" loop ? Even this simple need requires
> > knowing quite a few details on the frontend. The current design could
> > handle this use case with the addition of an external "loop mode" option to
> > MCA. If there are no other strong use cases, I would advocate for
> > experimental intrinsics unless people can contribute other example use
> > cases.
> >
> > On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <[hidden email]> wrote:
> >
> >> Hi Andrea,
> >>
> >> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
> >> > So, I have been thinking a bit more about this whole design.
> >> >
> >> > The more I think about your suggested design, the more I am convinced
> >> that
> >> > we should do something more to support ranges in binary object files
> >> too.
> >> > My understanding is that the reason why we don't support object files in
> >> > general, is because of the presence of relocations. That is because a
> >> > region start marker is effectively symbol relative, and the symbol (a
> >> > function) would be relocated in the final executable.
> >> > You mentioned to me that resolving even a 'simple' symbol-relative
> >> > relocation is not trivial, beause it requires specific knowledge about
> >> the
> >> > binary format, and the target (i.e. how relocations are encoded is
> >> target
> >> > specific). I am surprised that there is not a utility library for
> >> resolving
> >> > relocations.. but I am not familiar with that part of the compiler. I
> >> was
> >> > hoping that there was a target specific interface to use in this case...
> >>
> >> There might be a better way of resolving the relocs, but from what I saw
> >> looking at llvm-objdump and other related tools, it seems that resolving
> >> the relocated symbol is a target specific effort.  I also spent sometime
> >> sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
> >> performs the reloc resolution.  I should clarify that I too am not an
> >> expert in llvm's utilities for performing symbol/reloc resolution, and
> >> perhaps someone in the community can point me in the right direction.  I
> >> can clearly see the reloc data in the object file via tools like
> >> objdump; however, accessing the relocs via
> >> llvm::object::ObjectFile::relocations() did not produce address values
> >> that we could use (values of zero).
> >>
> >> I was hoping that, for a first pass at this patch, supporting just
> >> executables would be okay.  That keeps this initial patch set simple,
> >> and hopefully will encourage others to take a peek at it, since it's
> >> less daunting than what it might otherwise be.  Of course, there is the
> >> concern that this initial patch will lock us into a design that will be
> >> more complicated to unravel later.
> >>
> >> > An alternative approach would require that you define your own
> >> > "symbol-relative" reference. After all, ranges are just a sequences of
> >> > instructions in a function. If a function symbol is described by the
> >> symbol
> >> > table, then you should be able to obtain its offset in the .text
> >> section.
> >> > So, you could potentially encode your own symbol+offset. However, the
> >> > linker would not be able to understand your "custom relocation", and
> >> > information about regions in the final elf would be basically broken.
> >> > So,that would not be a solution...
> >> >
> >> > I don't know honestly what is the best approach to use in this case.
> >> > As a compromise, it would not be a bad idea to add the ability to
> >> specify
> >> > ranges from command line. What do you think?
> >> > Still, from a user point of view, the idea that we don't support object
> >> > files in general sounds like a big limitation.
> >>
> >> I agree, only supporting executables is a limitation.  However, I'd
> >> like to land the base support now and add in the additional
> >> features/support after this large patch set lands.  But I can see
> >> where landing the whole thing entirely also makes sense.
> >>
> >> > About the new experimental intrinsics: those would definitely work well
> >> for
> >> > the simple case where instructions are from the same basic block.
> >> > However, some/most of the constraints that you plan to add will have to
> >> > change if in future we decide to allow ranges that potentially cross
> >> > multiple basic blocks. How will the rules/constraints on those new
> >> > intrinsics change? I just want to make sure that the suggested design is
> >> > future-proof.
> >>
> >> Since the llvm/clang parts of the code are just responsible for
> >> collecting where a range starts/ends, I hope that we can remove some
> >> of the baked-in constraints that are specified in IR/Verifier.cpp.
> >> As you pointed out earlier in this thread, we might want to
> >> introduce a dominance check if/when we lift the one-basic-block
> >> restriction.
> >>
> >> -Matt
> >>
> >> >
> >> > -Andrea
> >> >
> >> > On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
> >> [hidden email]>
> >> > wrote:
> >> >
> >> > > Thanks for clarifying it Matt.
> >> > >
> >> > > In general, I quite like your suggested design.
> >> > >
> >> > > My only concern is about the semantic of the two new intrinsics. You
> >> > > design doesn't allow mca ranges to span through multiple basic
> >> blocks. That
> >> > > constraint is acceptable for now, since llvm-mca doesn't know how to
> >> deal
> >> > > with control flow.
> >> > > However, I am a bit concerned about what might happen in future if we
> >> > > decide to let users specify code regions that span through multiple
> >> basic
> >> > > blocks. Basically, I don't particularly like the idea of changing the
> >> > > semantic of already existing intrinsic. A design that already
> >> accounts for
> >> > > that particular scenario/future work would be ideal. That being said,
> >> > > marking those new intrinsics as 'experimental' may be a good
> >> compromise (at
> >> > > least for now).
> >> > >
> >> > > So, I am quite happy overall with the direction of this RFC.
> >> > > However, I am interesting to hear from other developers about your
> >> > > suggested design.
> >> > >
> >> > > > This initial patch only targets ELF object files, and does not
> >> handle
> >> > > relocatable addresses. Since the start of a code region is
> >> represented as
> >> > > an
> >> > > assembly label, and referenced in the .mca_code_regions section, that
> >> > > address
> >> > > is relocatable.
> >> > >
> >> > > This may be okay for now. However, it would be nice to remove that
> >> > > constraint in future and add support to generic object files.
> >> > >
> >> > > -Andrea
> >> > >
> >> > > On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
> >> > >
> >> > >> I want to clarify a few restrictions of llvm-mca code regions that
> >> this
> >> > >> RFC proposes:
> >> > >>
> >> > >> 1) All llvm-mca code regions must start with an
> >> > >> llvm.mca.code.region.start intrinsic and end with
> >> > >> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the
> >> IR
> >> > >> level in the IR verifier.
> >> > >>
> >> > >> 2) llvm-mca code regions cannot nest.  This restriction implies that
> >> an
> >> > >> llvm.mca.code.region.start
> >> > >> must have a llvm.mca.code.region.end intrinsic without any other
> >> llvm.mca
> >> > >> start intrinsics
> >> > >> between the two. The current implementation in the patch enforces
> >> this
> >> > >> restriction at the
> >> > >> IR level via the IR Verifier.
> >> > >>
> >> > >> 3) An llvm-mca code region cannot span multiple basic blocks.
> >> llvm-mca
> >> > >> does not follow
> >> > >> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> >> > >> like any other instruction.
> >> > >> The current patch associated with this RFC does not enforce this
> >> > >> restriction.  I plan on updating
> >> > >> the patch to enforce that a code region can only belong to a single
> >> basic
> >> > >> block.  This is a simple
> >> > >> check, ensuring that both the llvm.mca.code.region.start and
> >> accompanying
> >> > >> end intrinsics live
> >> > >> in the same basic block. I imagine adding this check at the IR level
> >> when
> >> > >> we also verify points 1 and 2
> >> > >> above.  That will keep the code-region verification logic isolated
> >> to the
> >> > >> IR verifier.  The start/end
> >> > >> intrinsics should not have any uses, so I'm not sure that they would
> >> be
> >> > >> moved/sunk on behalf
> >> > >> of any other instruction.  In other words, I do not imagine that a
> >> start
> >> > >> and end would be split
> >> > >> apart due to later MI optimizations.  If I discover that such a case
> >> > >> occurs, then I might add the
> >> > >> basic-block check prior to emitting the code region data to the
> >> object
> >> > >> file.    Once  llvm-mca  is
> >> > >> updated to handle branches, then we can remove this constraint.
> >> > >>
> >> > >> -Matt
> >> > >>
> >> > >> > -----Original Message-----
> >> > >> > From: llvm-dev <[hidden email]> On Behalf Of Matt
> >> > >> Davis via llvm-
> >> > >> > dev
> >> > >> > Sent: Wednesday, November 21, 2018 8:47 AM
> >> > >> > To: Andrea Di Biagio <[hidden email]>
> >> > >> > Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> >> > >> > <[hidden email]>; [hidden email]
> >> > >> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> >> > >> llvm-mca.
> >> > >> >
> >> > >> > Hi Andrea,
> >> > >> >
> >> > >> > Thanks for your input.
> >> > >> >
> >> > >> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> >> > >> > [... snip ...]
> >> > >> > > About the suggested design:
> >> > >> > > I like the idea of being able to identify code regions using a
> >> numeric
> >> > >> > > identifier.
> >> > >> > > However, what happens if a code region spans through multiple
> >> basic
> >> > >> blocks?
> >> > >> >
> >> > >> > The current patch does not take into consideration cases where the
> >> > >> > region start and end intrinsics are placed in different basic
> >> blocks.
> >> > >> > Such would be the case if a region is defined to span multiple
> >> blocks.
> >> > >> > This would be similar to the current case where a user places a
> >> > >> > #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END
> >> in
> >> > >> > another.  However, as you point out below, if the user does this
> >> in the
> >> > >> > source code via intrinsics (just what this patch is proposing),
> >> then
> >> > >> > there is a chance that optimizations might change the layout of the
> >> > >> > instructions and confuse the ordering of the MCA intrinsics.
> >> > >> >
> >> > >> > Since MCA does not follow branches (MCA just treats a branch as it
> >> would
> >> > >> > a non-branching instruction), it seems that a user should be aware
> >> that
> >> > >> > defining MCA code regions that span multiple blocks might result
> >> in an
> >> > >> > unexpected analysis.  While we do not discourage this, it seems
> >> like
> >> > >> > such a case will probably not produce an expected result for the
> >> user.
> >> > >> > We could introduce a warning, or automatically divide the regions
> >> so
> >> > >> > that a single region can only contain a single block.
> >> > >> >
> >> > >> > > My understanding is that code regions are not allowed to
> >> overlap. So,
> >> > >> it
> >> > >> > > makes sense if ` __mca_code_region_end()` doesn't take an ID as
> >> input.
> >> > >> > > However, what if ` __mca_code_region_end()` ends in a different
> >> basic
> >> > >> block?
> >> > >> > >
> >> > >> > > `__mca_code_region_start()` has to always dominate `
> >> > >> > > __mca_code_region_end()`. This is trivial to verify when both
> >> calls
> >> > >> are in
> >> > >> > > a same basic block; however, we need to make sure that the
> >> > >> relationship is
> >> > >> > > still the same when the `end()` call is in a different basic
> >> block.
> >> > >> > > That would not be enough. I think we should also verify  that `
> >> > >> > > __mca_code_region_end()` always post-dominates the call to
> >> > >> > > `__mca_code_region_start()`.
> >> > >> >
> >> > >> > In any case this patch should probably check dominance of the
> >> > >> > intrinsics, even though MCA does not follow branches and MCA does
> >> not
> >> > >> > not explicitly forbid a region from containing multiple blocks.
> >> > >> >
> >> > >> > >
> >> > >> > > My question is: what happens with basic block reordering? We
> >> don't
> >> > >> know the
> >> > >> > > layout of basic blocks until we reach code emission. How does it
> >> work
> >> > >> for
> >> > >> > > regions that span through multiple basic blocks?. I think your
> >> RFC
> >> > >> should
> >> > >> > > clarify this aspect.
> >> > >> > >
> >> > >> > > As a side note: at the moment, llvm-mca doesn't know how to deal
> >> with
> >> > >> > > branches. So, for simplicity we could force code regions to only
> >> > >> contain
> >> > >> > > instructions from a single basic block.
> >> > >> > >
> >> > >> > > However, In future we may want to teach llvm-mca how to analyze
> >> > >> branchy
> >> > >> > > code too. For example, we could introduce a simple control-flow
> >> > >> analysis in
> >> > >> > > llvm-mca, and use an external "branch trace" information (for
> >> > >> example, a
> >> > >> > > perf trace generated by an external tool) to decorate branches
> >> with
> >> > >> with
> >> > >> > > branch probabilities (similarly to what we currently do in LLVM
> >> with
> >> > >> PGO).
> >> > >> > > We could then use that knowledge to model branch prediction and
> >> > >> simulate
> >> > >> > > what happens in the presence of multiple branches.
> >> > >> > >
> >> > >> > > So, the idea of having regions that potentially span multiple
> >> basic
> >> > >> blocks
> >> > >> > > is not bad in general. However, I think you should better clarify
> >> > >> what are
> >> > >> > > the constraints (at least, you should answer to my questions from
> >> > >> before).
> >> > >> >
> >> > >> > I agree! Thanks for pointing that out.
> >> > >> >
> >> > >> > > If we decide to use those new intrinsics, then those should be
> >> > >> experimental
> >> > >> > > (at least to start).
> >> > >> >
> >> > >> > Agreed.
> >> > >> >
> >> > >> > -Matt
> >> > >> >
> >> > >> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >> > >> [hidden email]>
> >> > >> > > wrote:
> >> > >> > >
> >> > >> > > > Introduction
> >> > >> > > > -----------------
> >> > >> > > > Currently llvm-mca only accepts assembly code as input. We
> >> would
> >> > >> like to
> >> > >> > > > extend llvm-mca to support object files, allowing users to
> >> analyze
> >> > >> the
> >> > >> > > > performance of binaries. The proposed changes (which involve
> >> both
> >> > >> > > > clang and llvm) optionally introduce an object file section,
> >> but
> >> > >> this can
> >> > >> > > > be
> >> > >> > > > stripped-out if desired.
> >> > >> > > >
> >> > >> > > > For the llvm-mca binary support feature to be useful, a user
> >> needs
> >> > >> to tell
> >> > >> > > > llvm-mca which portions of their code they would like analyzed.
> >> > >> Currently,
> >> > >> > > > this is accomplished via assembly comments. However, assembly
> >> > >> comments are
> >> > >> > > > not
> >> > >> > > > preserved in object files, and this has encouraged this RFC.
> >> For the
> >> > >> > > > proposed
> >> > >> > > > binary support, we need to introduce changes to clang and llvm
> >> to
> >> > >> allow the
> >> > >> > > > user's object code to be recognized by llvm-mca:
> >> > >> > > >
> >> > >> > > > * We need a way for a user to identify a region/block of code
> >> they
> >> > >> want
> >> > >> > > >    analyzed by llvm-mca.
> >> > >> > > > * We need the information defining the user's region of code
> >> to be
> >> > >> > > > maintained
> >> > >> > > >    in the object file so that llvm-mca can analyze the desired
> >> > >> region(s)
> >> > >> > > > from the
> >> > >> > > >    object file.
> >> > >> > > >
> >> > >> > > > We define a "code region" as a subset of a user's program that
> >> is
> >> > >> to be
> >> > >> > > > analyzed via llvm-mca. The sequence of instructions to be
> >> analyzed
> >> > >> is
> >> > >> > > > represented as a pair: <start, end> where the 'start' marks the
> >> > >> beginning
> >> > >> > > > of
> >> > >> > > > the user's source code and 'end' terminates the sequence. The
> >> > >> instructions
> >> > >> > > > between 'start' and 'end' form the region that can be analyzed
> >> by
> >> > >> llvm-mca
> >> > >> > > > at a
> >> > >> > > > later time.
> >> > >> > > >
> >> > >> > > > Example
> >> > >> > > > -----------
> >> > >> > > > Before we go into the details of this proposed change, let's
> >> first
> >> > >> look at
> >> > >> > > > a
> >> > >> > > > simple example:
> >> > >> > > >
> >> > >> > > > // example.c -- Analyze a dot-product expression.
> >> > >> > > > double test(double x, double y) {
> >> > >> > > >    double result = 0.0;
> >> > >> > > >    __mca_code_region_start(42);
> >> > >> > > >    result += x * y;
> >> > >> > > >    __mca_code_region_end();
> >> > >> > > >    return result;
> >> > >> > > > }
> >> > >> > > >
> >> > >> > > > In the example above, we have identified a code region, in this
> >> > >> case a
> >> > >> > > > single
> >> > >> > > > dot-product expression. For the sake of brevity and simplicity,
> >> > >> we've
> >> > >> > > > chosen
> >> > >> > > > a very simple example, but in reality a more complicated
> >> example
> >> > >> could use
> >> > >> > > > multiple expressions. We have also denoted this region as
> >> number
> >> > >> 42. That
> >> > >> > > > identifier is only for the user, and simplifies reading an
> >> llvm-mca
> >> > >> > > > analysis
> >> > >> > > > report later.
> >> > >> > > >
> >> > >> > > > When this code is compiled, the region markers (the
> >> mca_code_region
> >> > >> > > > markers)
> >> > >> > > > are transformed into assembly labels. While the markers are
> >> > >> presented as
> >> > >> > > > function calls, in reality they are no-ops.
> >> > >> > > >
> >> > >> > > > test:
> >> > >> > > > pushq   %rbp
> >> > >> > > > movq    %rsp, %rbp
> >> > >> > > > movsd   %xmm0, -8(%rbp)
> >> > >> > > > movsd   %xmm1, -16(%rbp)
> >> > >> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >> > >> > > > xorps   %xmm0, %xmm0
> >> > >> > > > movsd   %xmm0, -24(%rbp)
> >> > >> > > > movsd   -8(%rbp), %xmm0
> >> > >> > > > mulsd   -16(%rbp), %xmm0
> >> > >> > > > addsd   -24(%rbp), %xmm0
> >> > >> > > > movsd   %xmm0, -24(%rbp)
> >> > >> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >> > >> > > > movsd   -24(%rbp), %xmm0
> >> > >> > > > popq    %rbp
> >> > >> > > > retq
> >> > >> > > > .section        .mca_code_regions,"",@progbits
> >> > >> > > > .quad   42
> >> > >> > > > .quad   .Lmca_code_region_start_0
> >> > >> > > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >> > >> > > >
> >> > >> > > > The assembly has been trimmed to show the portions relevant to
> >> this
> >> > >> RFC.
> >> > >> > > > Notice the labels enclose the user's defined region, and that
> >> they
> >> > >> > > > preserve the
> >> > >> > > > user's arbitrary region identifier, the ever-so-important
> >> region 42.
> >> > >> > > >
> >> > >> > > > In the object file section .mca_code_regions, we have noted the
> >> > >> user's
> >> > >> > > > region
> >> > >> > > > identifier (.quad 42), start address, and region size. A more
> >> > >> complicated
> >> > >> > > > example can have multiple regions defined within a single
> >> > >> .mca_code_regions
> >> > >> > > > section. This section can be read by llvm-mca, allowing
> >> llvm-mca to
> >> > >> take
> >> > >> > > > object files as input instead of assembly source.
> >> > >> > > >
> >> > >> > > > Details
> >> > >> > > > ---------
> >> > >> > > > We need a way for a user to identify a region/block of code
> >> they
> >> > >> want
> >> > >> > > > analyzed
> >> > >> > > > by llvm-mca. We solve this problem by introducing two
> >> intrinsics
> >> > >> that a
> >> > >> > > > user can
> >> > >> > > > specify, for identifying regions of code for analysis.
> >> > >> > > >
> >> > >> > > > The two intrinsics are: llvm.mca.code.regions.start and
> >> > >> > > > llvm.mca.code.regions.end. A user can identify a code region by
> >> > >> inserting
> >> > >> > > > the
> >> > >> > > > mca_code_region_start and mca_code_region_end markers. These
> >> are
> >> > >> simply
> >> > >> > > > clang builtins and are transformed into the aforementioned
> >> > >> intrinsics
> >> > >> > > > during
> >> > >> > > > compilation. The code between the intrinsics are what we call
> >> "code
> >> > >> > > > regions"
> >> > >> > > > and are to be easily identifiable by llvm-mca; any code
> >> between a
> >> > >> start/end
> >> > >> > > > pair can be analyzed by llvm-mca at a later time. A user can
> >> define
> >> > >> > > > multiple
> >> > >> > > > non-overlapping code regions within their program.
> >> > >> > > >
> >> > >> > > > The llvm.mca.code.region.start intrinsic takes an integer
> >> constant
> >> > >> as its
> >> > >> > > > only
> >> > >> > > > argument. This argument is implemented as a metadata i32, and
> >> is
> >> > >> only used
> >> > >> > > > when generating llvm-mca reports. This value allows a user to
> >> more
> >> > >> easily
> >> > >> > > > identify a specific code region. llvm.mca.code.region.end
> >> takes no
> >> > >> > > > arguments.
> >> > >> > > > Since we disallow nesting of regions, the first 'end' intrinsic
> >> > >> lexically
> >> > >> > > > following a 'start' intrinsic represents the end of that code
> >> > >> region.
> >> > >> > > >
> >> > >> > > > Now that we have a solution for identifying regions for
> >> analysis,
> >> > >> we now
> >> > >> > > > need a
> >> > >> > > > way for preserving that information to be read at a later
> >> time. To
> >> > >> > > > accomplish
> >> > >> > > > this we propose adding a new section (.mca_code_regions) to the
> >> > >> object file
> >> > >> > > > generated by llvm. During code generation, the start/end
> >> intrinsics
> >> > >> > > > described
> >> > >> > > > above will be transformed into start/end labels in assembly.
> >> When
> >> > >> llvm
> >> > >> > > > generates the object file from the user's code, these start/end
> >> > >> labels
> >> > >> > > > form a
> >> > >> > > > pair of values identifying the start of the user's code
> >> region, and
> >> > >> size.
> >> > >> > > > The
> >> > >> > > > size represents the number of bytes between the start and end
> >> > >> address of
> >> > >> > > > the
> >> > >> > > > labels. Note that the labels are emitted during assembly
> >> printing.
> >> > >> We hope
> >> > >> > > > that these labels have no influence on code generation or
> >> > >> basic-block
> >> > >> > > > placement. However, the target assembler strategy for handling
> >> > >> labels is
> >> > >> > > > outside of our control.
> >> > >> > > >
> >> > >> > > > This proposed change affects the size of a binary, but only if
> >> the
> >> > >> user
> >> > >> > > > calls
> >> > >> > > > the start/end builtins mentioned above. The additional size of
> >> the
> >> > >> > > > .mca_code_regions section, which we imagine to be very small
> >> (to
> >> > >> the order
> >> > >> > > > of a
> >> > >> > > > few bytes), can trivially be stripped by tools like 'strip' or
> >> > >> 'objcopy'.
> >> > >> > > >
> >> > >> > > > Implementation Status
> >> > >> > > > ------------------------------
> >> > >> > > > We currently have the proposed changes implemented at the url
> >> > >> posted below.
> >> > >> > > > This initial patch only targets ELF object files, and does not
> >> > >> handle
> >> > >> > > > relocatable addresses. Since the start of a code region is
> >> > >> represented as
> >> > >> > > > an
> >> > >> > > > assembly label, and referenced in the .mca_code_regions
> >> section,
> >> > >> that
> >> > >> > > > address
> >> > >> > > > is relocatable. That value can be represented as
> >> section-relative
> >> > >> > > > relocatable
> >> > >> > > > symbol (.text + addend), but we are not handling that case yet.
> >> > >> Instead,
> >> > >> > > > the
> >> > >> > > > proposed changes only handle linked/executable object files.
> >> > >> > > >
> >> > >> > > > For purposes of review and to communicate the idea, the change
> >> is
> >> > >> > > > presented as a monolithic patch here:
> >> > >> > > >
> >> > >> > > > https://reviews.llvm.org/D54603
> >> > >> > > >
> >> > >> > > > The change is presented as a monolithic patch; however, if
> >> accepted
> >> > >> > > > the patch will be split into three smaller patches:
> >> > >> > > > 1. The introduction of the builtins to clang.
> >> > >> > > > 2. The llvm portion (the added intrinsics).
> >> > >> > > > 3. The llvm-mca portion.
> >> > >> > > >
> >> > >> > > > Thanks!
> >> > >> > > >
> >> > >> > > > -Matt
> >> > >> > > > _______________________________________________
> >> > >> > > > LLVM Developers mailing list
> >> > >> > > > [hidden email]
> >> > >> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > >> > > >
> >> > >> > _______________________________________________
> >> > >> > LLVM Developers mailing list
> >> > >> > [hidden email]
> >> > >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > >>
> >> > >
> >>
> >
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Hi Matt,

I can see a near future where perf-analysis tooling uses branch history
profiler captures to determine how often loops/branches are taken and
feeds that into llvm-mca, especially for hot/branchy loop analysis
reports etc. Are you confident that your approach will be easily
extendable for this?

Similarly, being able to generally embed the profile markers in object
libraries for reuse is going to be important for some people - I'd like
to see more of a plan of how this will be achieved. I understand that it
might not be easy for some exe formats.

Sorry if I'm being too critical, but I'm a bit worried that we end up
with an initial implementation that will take a lot of reworking to meet
our final aims.

Thanks, Simon.

On 10/12/2018 19:32, Matt Davis wrote:

> Thanks for the feedback Guillaume and Clement!
>
> In response to Clement:
>
>>> In terms of future-proofness of only allowing regions within a basic
>>> block, are we confident we can actually ever simulate branches apart from
>>> "always taken, perfectly predicated" loop ? Even this simple need requires
>>> knowing quite a few details on the frontend. The current design could
>>> handle this use case with the addition of an external "loop mode" option to
>>> MCA. If there are no other strong use cases, I would advocate for
>>> experimental intrinsics unless people can contribute other example use
>>> cases.
> In short, I am in agreement and think that handling of branching or loop
> constructs should be isolated to the llvm-mca driver/front-end.  The
> only thing the code regions should be concerned with is identifying
> blocks of instructions that will later be used by the front end.
>
> We can place limitations to how those blocks are formed. For example the
> current implementation forces regions to be isolated to a single basic
> block.  However, we anticipate lifting this restriction once branching
> is handled.
>
> -Matt
>
>
> On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:
>> +1 to what Clement said.
>> I believe the intrinsics are a better design to support many architectures.
>>
>> IACA users are probably decorating their code with IACA_START / IACA_END
>> macros. One possibility is to provide a header that define these macros in
>> terms of the new intrinsics.
>>
>> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <[hidden email]> wrote:
>>
>>> Hi Matt/Andrea,
>>>
>>> I see pros and cons for IACA-style markers vs intrinsics.
>>> On the one hand, IACA-style markers are very magical, and not very visible
>>> in both the source and object code. Using IACA-style markers has the
>>> advantage that you can use llvm-mca as a drop-in replacement for IACA, or
>>> even to compare their outputs on the exact same binary. They also do not
>>> require tooling on the compiler side and allow comparing the output of
>>> several compilers.
>>>
>> On the other hand, IACA-style markers do not have a equivalent on other
>>> architectures, and I'm not sure inventing new ones is a good idea :) I
>>> think the latter makes them pretty much a no-go for llvm-mca as I don't
>>> think we'll want to teach each target how to parse code regions. That's
>>> much better handled in a target-agnostic way by the object. Intel got away
>>> with them because they only had to support one architecture.
>>>
>>> tl;dr: In the case of llvm-mca, I like your design better than the markers.
>>>
>>> In terms of future-proofness of only allowing regions within a basic
>>> block, are we confident we can actually ever simulate branches apart from
>>> "always taken, perfectly predicated" loop ? Even this simple need requires
>>> knowing quite a few details on the frontend. The current design could
>>> handle this use case with the addition of an external "loop mode" option to
>>> MCA. If there are no other strong use cases, I would advocate for
>>> experimental intrinsics unless people can contribute other example use
>>> cases.
>>>
>>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <[hidden email]> wrote:
>>>
>>>> Hi Andrea,
>>>>
>>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
>>>>> So, I have been thinking a bit more about this whole design.
>>>>>
>>>>> The more I think about your suggested design, the more I am convinced
>>>> that
>>>>> we should do something more to support ranges in binary object files
>>>> too.
>>>>> My understanding is that the reason why we don't support object files in
>>>>> general, is because of the presence of relocations. That is because a
>>>>> region start marker is effectively symbol relative, and the symbol (a
>>>>> function) would be relocated in the final executable.
>>>>> You mentioned to me that resolving even a 'simple' symbol-relative
>>>>> relocation is not trivial, beause it requires specific knowledge about
>>>> the
>>>>> binary format, and the target (i.e. how relocations are encoded is
>>>> target
>>>>> specific). I am surprised that there is not a utility library for
>>>> resolving
>>>>> relocations.. but I am not familiar with that part of the compiler. I
>>>> was
>>>>> hoping that there was a target specific interface to use in this case...
>>>> There might be a better way of resolving the relocs, but from what I saw
>>>> looking at llvm-objdump and other related tools, it seems that resolving
>>>> the relocated symbol is a target specific effort.  I also spent sometime
>>>> sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
>>>> performs the reloc resolution.  I should clarify that I too am not an
>>>> expert in llvm's utilities for performing symbol/reloc resolution, and
>>>> perhaps someone in the community can point me in the right direction.  I
>>>> can clearly see the reloc data in the object file via tools like
>>>> objdump; however, accessing the relocs via
>>>> llvm::object::ObjectFile::relocations() did not produce address values
>>>> that we could use (values of zero).
>>>>
>>>> I was hoping that, for a first pass at this patch, supporting just
>>>> executables would be okay.  That keeps this initial patch set simple,
>>>> and hopefully will encourage others to take a peek at it, since it's
>>>> less daunting than what it might otherwise be.  Of course, there is the
>>>> concern that this initial patch will lock us into a design that will be
>>>> more complicated to unravel later.
>>>>
>>>>> An alternative approach would require that you define your own
>>>>> "symbol-relative" reference. After all, ranges are just a sequences of
>>>>> instructions in a function. If a function symbol is described by the
>>>> symbol
>>>>> table, then you should be able to obtain its offset in the .text
>>>> section.
>>>>> So, you could potentially encode your own symbol+offset. However, the
>>>>> linker would not be able to understand your "custom relocation", and
>>>>> information about regions in the final elf would be basically broken.
>>>>> So,that would not be a solution...
>>>>>
>>>>> I don't know honestly what is the best approach to use in this case.
>>>>> As a compromise, it would not be a bad idea to add the ability to
>>>> specify
>>>>> ranges from command line. What do you think?
>>>>> Still, from a user point of view, the idea that we don't support object
>>>>> files in general sounds like a big limitation.
>>>> I agree, only supporting executables is a limitation.  However, I'd
>>>> like to land the base support now and add in the additional
>>>> features/support after this large patch set lands.  But I can see
>>>> where landing the whole thing entirely also makes sense.
>>>>
>>>>> About the new experimental intrinsics: those would definitely work well
>>>> for
>>>>> the simple case where instructions are from the same basic block.
>>>>> However, some/most of the constraints that you plan to add will have to
>>>>> change if in future we decide to allow ranges that potentially cross
>>>>> multiple basic blocks. How will the rules/constraints on those new
>>>>> intrinsics change? I just want to make sure that the suggested design is
>>>>> future-proof.
>>>> Since the llvm/clang parts of the code are just responsible for
>>>> collecting where a range starts/ends, I hope that we can remove some
>>>> of the baked-in constraints that are specified in IR/Verifier.cpp.
>>>> As you pointed out earlier in this thread, we might want to
>>>> introduce a dominance check if/when we lift the one-basic-block
>>>> restriction.
>>>>
>>>> -Matt
>>>>
>>>>> -Andrea
>>>>>
>>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
>>>> [hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Thanks for clarifying it Matt.
>>>>>>
>>>>>> In general, I quite like your suggested design.
>>>>>>
>>>>>> My only concern is about the semantic of the two new intrinsics. You
>>>>>> design doesn't allow mca ranges to span through multiple basic
>>>> blocks. That
>>>>>> constraint is acceptable for now, since llvm-mca doesn't know how to
>>>> deal
>>>>>> with control flow.
>>>>>> However, I am a bit concerned about what might happen in future if we
>>>>>> decide to let users specify code regions that span through multiple
>>>> basic
>>>>>> blocks. Basically, I don't particularly like the idea of changing the
>>>>>> semantic of already existing intrinsic. A design that already
>>>> accounts for
>>>>>> that particular scenario/future work would be ideal. That being said,
>>>>>> marking those new intrinsics as 'experimental' may be a good
>>>> compromise (at
>>>>>> least for now).
>>>>>>
>>>>>> So, I am quite happy overall with the direction of this RFC.
>>>>>> However, I am interesting to hear from other developers about your
>>>>>> suggested design.
>>>>>>
>>>>>>> This initial patch only targets ELF object files, and does not
>>>> handle
>>>>>> relocatable addresses. Since the start of a code region is
>>>> represented as
>>>>>> an
>>>>>> assembly label, and referenced in the .mca_code_regions section, that
>>>>>> address
>>>>>> is relocatable.
>>>>>>
>>>>>> This may be okay for now. However, it would be nice to remove that
>>>>>> constraint in future and add support to generic object files.
>>>>>>
>>>>>> -Andrea
>>>>>>
>>>>>> On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
>>>>>>
>>>>>>> I want to clarify a few restrictions of llvm-mca code regions that
>>>> this
>>>>>>> RFC proposes:
>>>>>>>
>>>>>>> 1) All llvm-mca code regions must start with an
>>>>>>> llvm.mca.code.region.start intrinsic and end with
>>>>>>> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the
>>>> IR
>>>>>>> level in the IR verifier.
>>>>>>>
>>>>>>> 2) llvm-mca code regions cannot nest.  This restriction implies that
>>>> an
>>>>>>> llvm.mca.code.region.start
>>>>>>> must have a llvm.mca.code.region.end intrinsic without any other
>>>> llvm.mca
>>>>>>> start intrinsics
>>>>>>> between the two. The current implementation in the patch enforces
>>>> this
>>>>>>> restriction at the
>>>>>>> IR level via the IR Verifier.
>>>>>>>
>>>>>>> 3) An llvm-mca code region cannot span multiple basic blocks.
>>>> llvm-mca
>>>>>>> does not follow
>>>>>>> branches (yet).  Instead, a branch instruction is treated by llvm-mca
>>>>>>> like any other instruction.
>>>>>>> The current patch associated with this RFC does not enforce this
>>>>>>> restriction.  I plan on updating
>>>>>>> the patch to enforce that a code region can only belong to a single
>>>> basic
>>>>>>> block.  This is a simple
>>>>>>> check, ensuring that both the llvm.mca.code.region.start and
>>>> accompanying
>>>>>>> end intrinsics live
>>>>>>> in the same basic block. I imagine adding this check at the IR level
>>>> when
>>>>>>> we also verify points 1 and 2
>>>>>>> above.  That will keep the code-region verification logic isolated
>>>> to the
>>>>>>> IR verifier.  The start/end
>>>>>>> intrinsics should not have any uses, so I'm not sure that they would
>>>> be
>>>>>>> moved/sunk on behalf
>>>>>>> of any other instruction.  In other words, I do not imagine that a
>>>> start
>>>>>>> and end would be split
>>>>>>> apart due to later MI optimizations.  If I discover that such a case
>>>>>>> occurs, then I might add the
>>>>>>> basic-block check prior to emitting the code region data to the
>>>> object
>>>>>>> file.    Once  llvm-mca  is
>>>>>>> updated to handle branches, then we can remove this constraint.
>>>>>>>
>>>>>>> -Matt
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: llvm-dev <[hidden email]> On Behalf Of Matt
>>>>>>> Davis via llvm-
>>>>>>>> dev
>>>>>>>> Sent: Wednesday, November 21, 2018 8:47 AM
>>>>>>>> To: Andrea Di Biagio <[hidden email]>
>>>>>>>> Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
>>>>>>>> <[hidden email]>; [hidden email]
>>>>>>>> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
>>>>>>> llvm-mca.
>>>>>>>> Hi Andrea,
>>>>>>>>
>>>>>>>> Thanks for your input.
>>>>>>>>
>>>>>>>> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
>>>>>>>> [... snip ...]
>>>>>>>>> About the suggested design:
>>>>>>>>> I like the idea of being able to identify code regions using a
>>>> numeric
>>>>>>>>> identifier.
>>>>>>>>> However, what happens if a code region spans through multiple
>>>> basic
>>>>>>> blocks?
>>>>>>>> The current patch does not take into consideration cases where the
>>>>>>>> region start and end intrinsics are placed in different basic
>>>> blocks.
>>>>>>>> Such would be the case if a region is defined to span multiple
>>>> blocks.
>>>>>>>> This would be similar to the current case where a user places a
>>>>>>>> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END
>>>> in
>>>>>>>> another.  However, as you point out below, if the user does this
>>>> in the
>>>>>>>> source code via intrinsics (just what this patch is proposing),
>>>> then
>>>>>>>> there is a chance that optimizations might change the layout of the
>>>>>>>> instructions and confuse the ordering of the MCA intrinsics.
>>>>>>>>
>>>>>>>> Since MCA does not follow branches (MCA just treats a branch as it
>>>> would
>>>>>>>> a non-branching instruction), it seems that a user should be aware
>>>> that
>>>>>>>> defining MCA code regions that span multiple blocks might result
>>>> in an
>>>>>>>> unexpected analysis.  While we do not discourage this, it seems
>>>> like
>>>>>>>> such a case will probably not produce an expected result for the
>>>> user.
>>>>>>>> We could introduce a warning, or automatically divide the regions
>>>> so
>>>>>>>> that a single region can only contain a single block.
>>>>>>>>
>>>>>>>>> My understanding is that code regions are not allowed to
>>>> overlap. So,
>>>>>>> it
>>>>>>>>> makes sense if ` __mca_code_region_end()` doesn't take an ID as
>>>> input.
>>>>>>>>> However, what if ` __mca_code_region_end()` ends in a different
>>>> basic
>>>>>>> block?
>>>>>>>>> `__mca_code_region_start()` has to always dominate `
>>>>>>>>> __mca_code_region_end()`. This is trivial to verify when both
>>>> calls
>>>>>>> are in
>>>>>>>>> a same basic block; however, we need to make sure that the
>>>>>>> relationship is
>>>>>>>>> still the same when the `end()` call is in a different basic
>>>> block.
>>>>>>>>> That would not be enough. I think we should also verify  that `
>>>>>>>>> __mca_code_region_end()` always post-dominates the call to
>>>>>>>>> `__mca_code_region_start()`.
>>>>>>>> In any case this patch should probably check dominance of the
>>>>>>>> intrinsics, even though MCA does not follow branches and MCA does
>>>> not
>>>>>>>> not explicitly forbid a region from containing multiple blocks.
>>>>>>>>
>>>>>>>>> My question is: what happens with basic block reordering? We
>>>> don't
>>>>>>> know the
>>>>>>>>> layout of basic blocks until we reach code emission. How does it
>>>> work
>>>>>>> for
>>>>>>>>> regions that span through multiple basic blocks?. I think your
>>>> RFC
>>>>>>> should
>>>>>>>>> clarify this aspect.
>>>>>>>>>
>>>>>>>>> As a side note: at the moment, llvm-mca doesn't know how to deal
>>>> with
>>>>>>>>> branches. So, for simplicity we could force code regions to only
>>>>>>> contain
>>>>>>>>> instructions from a single basic block.
>>>>>>>>>
>>>>>>>>> However, In future we may want to teach llvm-mca how to analyze
>>>>>>> branchy
>>>>>>>>> code too. For example, we could introduce a simple control-flow
>>>>>>> analysis in
>>>>>>>>> llvm-mca, and use an external "branch trace" information (for
>>>>>>> example, a
>>>>>>>>> perf trace generated by an external tool) to decorate branches
>>>> with
>>>>>>> with
>>>>>>>>> branch probabilities (similarly to what we currently do in LLVM
>>>> with
>>>>>>> PGO).
>>>>>>>>> We could then use that knowledge to model branch prediction and
>>>>>>> simulate
>>>>>>>>> what happens in the presence of multiple branches.
>>>>>>>>>
>>>>>>>>> So, the idea of having regions that potentially span multiple
>>>> basic
>>>>>>> blocks
>>>>>>>>> is not bad in general. However, I think you should better clarify
>>>>>>> what are
>>>>>>>>> the constraints (at least, you should answer to my questions from
>>>>>>> before).
>>>>>>>> I agree! Thanks for pointing that out.
>>>>>>>>
>>>>>>>>> If we decide to use those new intrinsics, then those should be
>>>>>>> experimental
>>>>>>>>> (at least to start).
>>>>>>>> Agreed.
>>>>>>>>
>>>>>>>> -Matt
>>>>>>>>
>>>>>>>>> On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
>>>>>>> [hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Introduction
>>>>>>>>>> -----------------
>>>>>>>>>> Currently llvm-mca only accepts assembly code as input. We
>>>> would
>>>>>>> like to
>>>>>>>>>> extend llvm-mca to support object files, allowing users to
>>>> analyze
>>>>>>> the
>>>>>>>>>> performance of binaries. The proposed changes (which involve
>>>> both
>>>>>>>>>> clang and llvm) optionally introduce an object file section,
>>>> but
>>>>>>> this can
>>>>>>>>>> be
>>>>>>>>>> stripped-out if desired.
>>>>>>>>>>
>>>>>>>>>> For the llvm-mca binary support feature to be useful, a user
>>>> needs
>>>>>>> to tell
>>>>>>>>>> llvm-mca which portions of their code they would like analyzed.
>>>>>>> Currently,
>>>>>>>>>> this is accomplished via assembly comments. However, assembly
>>>>>>> comments are
>>>>>>>>>> not
>>>>>>>>>> preserved in object files, and this has encouraged this RFC.
>>>> For the
>>>>>>>>>> proposed
>>>>>>>>>> binary support, we need to introduce changes to clang and llvm
>>>> to
>>>>>>> allow the
>>>>>>>>>> user's object code to be recognized by llvm-mca:
>>>>>>>>>>
>>>>>>>>>> * We need a way for a user to identify a region/block of code
>>>> they
>>>>>>> want
>>>>>>>>>>     analyzed by llvm-mca.
>>>>>>>>>> * We need the information defining the user's region of code
>>>> to be
>>>>>>>>>> maintained
>>>>>>>>>>     in the object file so that llvm-mca can analyze the desired
>>>>>>> region(s)
>>>>>>>>>> from the
>>>>>>>>>>     object file.
>>>>>>>>>>
>>>>>>>>>> We define a "code region" as a subset of a user's program that
>>>> is
>>>>>>> to be
>>>>>>>>>> analyzed via llvm-mca. The sequence of instructions to be
>>>> analyzed
>>>>>>> is
>>>>>>>>>> represented as a pair: <start, end> where the 'start' marks the
>>>>>>> beginning
>>>>>>>>>> of
>>>>>>>>>> the user's source code and 'end' terminates the sequence. The
>>>>>>> instructions
>>>>>>>>>> between 'start' and 'end' form the region that can be analyzed
>>>> by
>>>>>>> llvm-mca
>>>>>>>>>> at a
>>>>>>>>>> later time.
>>>>>>>>>>
>>>>>>>>>> Example
>>>>>>>>>> -----------
>>>>>>>>>> Before we go into the details of this proposed change, let's
>>>> first
>>>>>>> look at
>>>>>>>>>> a
>>>>>>>>>> simple example:
>>>>>>>>>>
>>>>>>>>>> // example.c -- Analyze a dot-product expression.
>>>>>>>>>> double test(double x, double y) {
>>>>>>>>>>     double result = 0.0;
>>>>>>>>>>     __mca_code_region_start(42);
>>>>>>>>>>     result += x * y;
>>>>>>>>>>     __mca_code_region_end();
>>>>>>>>>>     return result;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> In the example above, we have identified a code region, in this
>>>>>>> case a
>>>>>>>>>> single
>>>>>>>>>> dot-product expression. For the sake of brevity and simplicity,
>>>>>>> we've
>>>>>>>>>> chosen
>>>>>>>>>> a very simple example, but in reality a more complicated
>>>> example
>>>>>>> could use
>>>>>>>>>> multiple expressions. We have also denoted this region as
>>>> number
>>>>>>> 42. That
>>>>>>>>>> identifier is only for the user, and simplifies reading an
>>>> llvm-mca
>>>>>>>>>> analysis
>>>>>>>>>> report later.
>>>>>>>>>>
>>>>>>>>>> When this code is compiled, the region markers (the
>>>> mca_code_region
>>>>>>>>>> markers)
>>>>>>>>>> are transformed into assembly labels. While the markers are
>>>>>>> presented as
>>>>>>>>>> function calls, in reality they are no-ops.
>>>>>>>>>>
>>>>>>>>>> test:
>>>>>>>>>> pushq   %rbp
>>>>>>>>>> movq    %rsp, %rbp
>>>>>>>>>> movsd   %xmm0, -8(%rbp)
>>>>>>>>>> movsd   %xmm1, -16(%rbp)
>>>>>>>>>> .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
>>>>>>>>>> xorps   %xmm0, %xmm0
>>>>>>>>>> movsd   %xmm0, -24(%rbp)
>>>>>>>>>> movsd   -8(%rbp), %xmm0
>>>>>>>>>> mulsd   -16(%rbp), %xmm0
>>>>>>>>>> addsd   -24(%rbp), %xmm0
>>>>>>>>>> movsd   %xmm0, -24(%rbp)
>>>>>>>>>> .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
>>>>>>>>>> movsd   -24(%rbp), %xmm0
>>>>>>>>>> popq    %rbp
>>>>>>>>>> retq
>>>>>>>>>> .section        .mca_code_regions,"",@progbits
>>>>>>>>>> .quad   42
>>>>>>>>>> .quad   .Lmca_code_region_start_0
>>>>>>>>>> .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
>>>>>>>>>>
>>>>>>>>>> The assembly has been trimmed to show the portions relevant to
>>>> this
>>>>>>> RFC.
>>>>>>>>>> Notice the labels enclose the user's defined region, and that
>>>> they
>>>>>>>>>> preserve the
>>>>>>>>>> user's arbitrary region identifier, the ever-so-important
>>>> region 42.
>>>>>>>>>> In the object file section .mca_code_regions, we have noted the
>>>>>>> user's
>>>>>>>>>> region
>>>>>>>>>> identifier (.quad 42), start address, and region size. A more
>>>>>>> complicated
>>>>>>>>>> example can have multiple regions defined within a single
>>>>>>> .mca_code_regions
>>>>>>>>>> section. This section can be read by llvm-mca, allowing
>>>> llvm-mca to
>>>>>>> take
>>>>>>>>>> object files as input instead of assembly source.
>>>>>>>>>>
>>>>>>>>>> Details
>>>>>>>>>> ---------
>>>>>>>>>> We need a way for a user to identify a region/block of code
>>>> they
>>>>>>> want
>>>>>>>>>> analyzed
>>>>>>>>>> by llvm-mca. We solve this problem by introducing two
>>>> intrinsics
>>>>>>> that a
>>>>>>>>>> user can
>>>>>>>>>> specify, for identifying regions of code for analysis.
>>>>>>>>>>
>>>>>>>>>> The two intrinsics are: llvm.mca.code.regions.start and
>>>>>>>>>> llvm.mca.code.regions.end. A user can identify a code region by
>>>>>>> inserting
>>>>>>>>>> the
>>>>>>>>>> mca_code_region_start and mca_code_region_end markers. These
>>>> are
>>>>>>> simply
>>>>>>>>>> clang builtins and are transformed into the aforementioned
>>>>>>> intrinsics
>>>>>>>>>> during
>>>>>>>>>> compilation. The code between the intrinsics are what we call
>>>> "code
>>>>>>>>>> regions"
>>>>>>>>>> and are to be easily identifiable by llvm-mca; any code
>>>> between a
>>>>>>> start/end
>>>>>>>>>> pair can be analyzed by llvm-mca at a later time. A user can
>>>> define
>>>>>>>>>> multiple
>>>>>>>>>> non-overlapping code regions within their program.
>>>>>>>>>>
>>>>>>>>>> The llvm.mca.code.region.start intrinsic takes an integer
>>>> constant
>>>>>>> as its
>>>>>>>>>> only
>>>>>>>>>> argument. This argument is implemented as a metadata i32, and
>>>> is
>>>>>>> only used
>>>>>>>>>> when generating llvm-mca reports. This value allows a user to
>>>> more
>>>>>>> easily
>>>>>>>>>> identify a specific code region. llvm.mca.code.region.end
>>>> takes no
>>>>>>>>>> arguments.
>>>>>>>>>> Since we disallow nesting of regions, the first 'end' intrinsic
>>>>>>> lexically
>>>>>>>>>> following a 'start' intrinsic represents the end of that code
>>>>>>> region.
>>>>>>>>>> Now that we have a solution for identifying regions for
>>>> analysis,
>>>>>>> we now
>>>>>>>>>> need a
>>>>>>>>>> way for preserving that information to be read at a later
>>>> time. To
>>>>>>>>>> accomplish
>>>>>>>>>> this we propose adding a new section (.mca_code_regions) to the
>>>>>>> object file
>>>>>>>>>> generated by llvm. During code generation, the start/end
>>>> intrinsics
>>>>>>>>>> described
>>>>>>>>>> above will be transformed into start/end labels in assembly.
>>>> When
>>>>>>> llvm
>>>>>>>>>> generates the object file from the user's code, these start/end
>>>>>>> labels
>>>>>>>>>> form a
>>>>>>>>>> pair of values identifying the start of the user's code
>>>> region, and
>>>>>>> size.
>>>>>>>>>> The
>>>>>>>>>> size represents the number of bytes between the start and end
>>>>>>> address of
>>>>>>>>>> the
>>>>>>>>>> labels. Note that the labels are emitted during assembly
>>>> printing.
>>>>>>> We hope
>>>>>>>>>> that these labels have no influence on code generation or
>>>>>>> basic-block
>>>>>>>>>> placement. However, the target assembler strategy for handling
>>>>>>> labels is
>>>>>>>>>> outside of our control.
>>>>>>>>>>
>>>>>>>>>> This proposed change affects the size of a binary, but only if
>>>> the
>>>>>>> user
>>>>>>>>>> calls
>>>>>>>>>> the start/end builtins mentioned above. The additional size of
>>>> the
>>>>>>>>>> .mca_code_regions section, which we imagine to be very small
>>>> (to
>>>>>>> the order
>>>>>>>>>> of a
>>>>>>>>>> few bytes), can trivially be stripped by tools like 'strip' or
>>>>>>> 'objcopy'.
>>>>>>>>>> Implementation Status
>>>>>>>>>> ------------------------------
>>>>>>>>>> We currently have the proposed changes implemented at the url
>>>>>>> posted below.
>>>>>>>>>> This initial patch only targets ELF object files, and does not
>>>>>>> handle
>>>>>>>>>> relocatable addresses. Since the start of a code region is
>>>>>>> represented as
>>>>>>>>>> an
>>>>>>>>>> assembly label, and referenced in the .mca_code_regions
>>>> section,
>>>>>>> that
>>>>>>>>>> address
>>>>>>>>>> is relocatable. That value can be represented as
>>>> section-relative
>>>>>>>>>> relocatable
>>>>>>>>>> symbol (.text + addend), but we are not handling that case yet.
>>>>>>> Instead,
>>>>>>>>>> the
>>>>>>>>>> proposed changes only handle linked/executable object files.
>>>>>>>>>>
>>>>>>>>>> For purposes of review and to communicate the idea, the change
>>>> is
>>>>>>>>>> presented as a monolithic patch here:
>>>>>>>>>>
>>>>>>>>>> https://reviews.llvm.org/D54603
>>>>>>>>>>
>>>>>>>>>> The change is presented as a monolithic patch; however, if
>>>> accepted
>>>>>>>>>> the patch will be split into three smaller patches:
>>>>>>>>>> 1. The introduction of the builtins to clang.
>>>>>>>>>> 2. The llvm portion (the added intrinsics).
>>>>>>>>>> 3. The llvm-mca portion.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> -Matt
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> [hidden email]
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Alberto Barbaro via llvm-dev
Thanks for the response Simon.  My reply is inline:

> From: Simon Pilgrim <[hidden email]>
> Sent: Monday, December 10, 2018 1:40 PM
>
> Hi Matt,
>
> I can see a near future where perf-analysis tooling uses branch history
> profiler captures to determine how often loops/branches are taken and
> feeds that into llvm-mca, especially for hot/branchy loop analysis
> reports etc. Are you confident that your approach will be easily
> extendable for this?

That is a very interesting use case.  The restriction of a code-region to a single block
is a limitation for any tools that want to analyze branches.  However, I believe
that it will be easy to lift this restriction (it's just a check in IR/Verifier).  This limitation is not
expressed in the llvm-mca driver.

If the information is coming from a profile report, then we'd most
likely need to extend the llvm-mca driver to accept profile reports.  Currently,
code regions, from the perspective of the  llvm-mca driver, are very simple. They are
just a collection of MCInst.  The binary support in this RFC+patch
disassembles just the address range from marker start address for
some specified number of bytes.  It might be useful to add another driver argument
so that a user (or tool) can specify, from the command line, a range of instructions to
analyze.  I recently added a class for handling inputs to llvm::mca::CodeRegionGenerator,
which is just responsible for taking some input and creating a list of MCInst that llvm-mca uses.
We could subclass this to handle profile reports.
 
> Similarly, being able to generally embed the profile markers in object
> libraries for reuse is going to be important for some people - I'd like
> to see more of a plan of how this will be achieved. I understand that it
> might not be easy for some exe formats.

That is definitely a limitation.  This initial patch+RFC only handles linked
executables (i.e., the llvm-mca marker symbol addresses are resolved).
I'm working on a better solution so that this will not be a restriction.
In fact, I'll probably delay trying to land any patches until I solve relocations
(or use a different solution for identifying start/end addresses for llvm-mca code regions).  

> Sorry if I'm being too critical, but I'm a bit worried that we end up
> with an initial implementation that will take a lot of reworking to meet
> our final aims.
>
> Thanks, Simon.

I understand your criticisms and value your input. Thanks a ton!

-Matt

 

> On 10/12/2018 19:32, Matt Davis wrote:
> > Thanks for the feedback Guillaume and Clement!
> >
> > In response to Clement:
> >
> >>> In terms of future-proofness of only allowing regions within a basic
> >>> block, are we confident we can actually ever simulate branches apart from
> >>> "always taken, perfectly predicated" loop ? Even this simple need requires
> >>> knowing quite a few details on the frontend. The current design could
> >>> handle this use case with the addition of an external "loop mode" option to
> >>> MCA. If there are no other strong use cases, I would advocate for
> >>> experimental intrinsics unless people can contribute other example use
> >>> cases.
> > In short, I am in agreement and think that handling of branching or loop
> > constructs should be isolated to the llvm-mca driver/front-end.  The
> > only thing the code regions should be concerned with is identifying
> > blocks of instructions that will later be used by the front end.
> >
> > We can place limitations to how those blocks are formed. For example the
> > current implementation forces regions to be isolated to a single basic
> > block.  However, we anticipate lifting this restriction once branching
> > is handled.
> >
> > -Matt
> >
> >
> > On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:
> >> +1 to what Clement said.
> >> I believe the intrinsics are a better design to support many architectures.
> >>
> >> IACA users are probably decorating their code with IACA_START / IACA_END
> >> macros. One possibility is to provide a header that define these macros in
> >> terms of the new intrinsics.
> >>
> >> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <[hidden email]>
> wrote:
> >>
> >>> Hi Matt/Andrea,
> >>>
> >>> I see pros and cons for IACA-style markers vs intrinsics.
> >>> On the one hand, IACA-style markers are very magical, and not very visible
> >>> in both the source and object code. Using IACA-style markers has the
> >>> advantage that you can use llvm-mca as a drop-in replacement for IACA, or
> >>> even to compare their outputs on the exact same binary. They also do not
> >>> require tooling on the compiler side and allow comparing the output of
> >>> several compilers.
> >>>
> >> On the other hand, IACA-style markers do not have a equivalent on other
> >>> architectures, and I'm not sure inventing new ones is a good idea :) I
> >>> think the latter makes them pretty much a no-go for llvm-mca as I don't
> >>> think we'll want to teach each target how to parse code regions. That's
> >>> much better handled in a target-agnostic way by the object. Intel got away
> >>> with them because they only had to support one architecture.
> >>>
> >>> tl;dr: In the case of llvm-mca, I like your design better than the markers.
> >>>
> >>> In terms of future-proofness of only allowing regions within a basic
> >>> block, are we confident we can actually ever simulate branches apart from
> >>> "always taken, perfectly predicated" loop ? Even this simple need requires
> >>> knowing quite a few details on the frontend. The current design could
> >>> handle this use case with the addition of an external "loop mode" option to
> >>> MCA. If there are no other strong use cases, I would advocate for
> >>> experimental intrinsics unless people can contribute other example use
> >>> cases.
> >>>
> >>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <[hidden email]>
> wrote:
> >>>
> >>>> Hi Andrea,
> >>>>
> >>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
> >>>>> So, I have been thinking a bit more about this whole design.
> >>>>>
> >>>>> The more I think about your suggested design, the more I am convinced
> >>>> that
> >>>>> we should do something more to support ranges in binary object files
> >>>> too.
> >>>>> My understanding is that the reason why we don't support object files in
> >>>>> general, is because of the presence of relocations. That is because a
> >>>>> region start marker is effectively symbol relative, and the symbol (a
> >>>>> function) would be relocated in the final executable.
> >>>>> You mentioned to me that resolving even a 'simple' symbol-relative
> >>>>> relocation is not trivial, beause it requires specific knowledge about
> >>>> the
> >>>>> binary format, and the target (i.e. how relocations are encoded is
> >>>> target
> >>>>> specific). I am surprised that there is not a utility library for
> >>>> resolving
> >>>>> relocations.. but I am not familiar with that part of the compiler. I
> >>>> was
> >>>>> hoping that there was a target specific interface to use in this case...
> >>>> There might be a better way of resolving the relocs, but from what I saw
> >>>> looking at llvm-objdump and other related tools, it seems that resolving
> >>>> the relocated symbol is a target specific effort.  I also spent sometime
> >>>> sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
> >>>> performs the reloc resolution.  I should clarify that I too am not an
> >>>> expert in llvm's utilities for performing symbol/reloc resolution, and
> >>>> perhaps someone in the community can point me in the right direction.  I
> >>>> can clearly see the reloc data in the object file via tools like
> >>>> objdump; however, accessing the relocs via
> >>>> llvm::object::ObjectFile::relocations() did not produce address values
> >>>> that we could use (values of zero).
> >>>>
> >>>> I was hoping that, for a first pass at this patch, supporting just
> >>>> executables would be okay.  That keeps this initial patch set simple,
> >>>> and hopefully will encourage others to take a peek at it, since it's
> >>>> less daunting than what it might otherwise be.  Of course, there is the
> >>>> concern that this initial patch will lock us into a design that will be
> >>>> more complicated to unravel later.
> >>>>
> >>>>> An alternative approach would require that you define your own
> >>>>> "symbol-relative" reference. After all, ranges are just a sequences of
> >>>>> instructions in a function. If a function symbol is described by the
> >>>> symbol
> >>>>> table, then you should be able to obtain its offset in the .text
> >>>> section.
> >>>>> So, you could potentially encode your own symbol+offset. However, the
> >>>>> linker would not be able to understand your "custom relocation", and
> >>>>> information about regions in the final elf would be basically broken.
> >>>>> So,that would not be a solution...
> >>>>>
> >>>>> I don't know honestly what is the best approach to use in this case.
> >>>>> As a compromise, it would not be a bad idea to add the ability to
> >>>> specify
> >>>>> ranges from command line. What do you think?
> >>>>> Still, from a user point of view, the idea that we don't support object
> >>>>> files in general sounds like a big limitation.
> >>>> I agree, only supporting executables is a limitation.  However, I'd
> >>>> like to land the base support now and add in the additional
> >>>> features/support after this large patch set lands.  But I can see
> >>>> where landing the whole thing entirely also makes sense.
> >>>>
> >>>>> About the new experimental intrinsics: those would definitely work well
> >>>> for
> >>>>> the simple case where instructions are from the same basic block.
> >>>>> However, some/most of the constraints that you plan to add will have to
> >>>>> change if in future we decide to allow ranges that potentially cross
> >>>>> multiple basic blocks. How will the rules/constraints on those new
> >>>>> intrinsics change? I just want to make sure that the suggested design is
> >>>>> future-proof.
> >>>> Since the llvm/clang parts of the code are just responsible for
> >>>> collecting where a range starts/ends, I hope that we can remove some
> >>>> of the baked-in constraints that are specified in IR/Verifier.cpp.
> >>>> As you pointed out earlier in this thread, we might want to
> >>>> introduce a dominance check if/when we lift the one-basic-block
> >>>> restriction.
> >>>>
> >>>> -Matt
> >>>>
> >>>>> -Andrea
> >>>>>
> >>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
> >>>> [hidden email]>
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks for clarifying it Matt.
> >>>>>>
> >>>>>> In general, I quite like your suggested design.
> >>>>>>
> >>>>>> My only concern is about the semantic of the two new intrinsics. You
> >>>>>> design doesn't allow mca ranges to span through multiple basic
> >>>> blocks. That
> >>>>>> constraint is acceptable for now, since llvm-mca doesn't know how to
> >>>> deal
> >>>>>> with control flow.
> >>>>>> However, I am a bit concerned about what might happen in future if we
> >>>>>> decide to let users specify code regions that span through multiple
> >>>> basic
> >>>>>> blocks. Basically, I don't particularly like the idea of changing the
> >>>>>> semantic of already existing intrinsic. A design that already
> >>>> accounts for
> >>>>>> that particular scenario/future work would be ideal. That being said,
> >>>>>> marking those new intrinsics as 'experimental' may be a good
> >>>> compromise (at
> >>>>>> least for now).
> >>>>>>
> >>>>>> So, I am quite happy overall with the direction of this RFC.
> >>>>>> However, I am interesting to hear from other developers about your
> >>>>>> suggested design.
> >>>>>>
> >>>>>>> This initial patch only targets ELF object files, and does not
> >>>> handle
> >>>>>> relocatable addresses. Since the start of a code region is
> >>>> represented as
> >>>>>> an
> >>>>>> assembly label, and referenced in the .mca_code_regions section, that
> >>>>>> address
> >>>>>> is relocatable.
> >>>>>>
> >>>>>> This may be okay for now. However, it would be nice to remove that
> >>>>>> constraint in future and add support to generic object files.
> >>>>>>
> >>>>>> -Andrea
> >>>>>>
> >>>>>> On Thu, Nov 22, 2018 at 7:21 PM <[hidden email]> wrote:
> >>>>>>
> >>>>>>> I want to clarify a few restrictions of llvm-mca code regions that
> >>>> this
> >>>>>>> RFC proposes:
> >>>>>>>
> >>>>>>> 1) All llvm-mca code regions must start with an
> >>>>>>> llvm.mca.code.region.start intrinsic and end with
> >>>>>>> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the
> >>>> IR
> >>>>>>> level in the IR verifier.
> >>>>>>>
> >>>>>>> 2) llvm-mca code regions cannot nest.  This restriction implies that
> >>>> an
> >>>>>>> llvm.mca.code.region.start
> >>>>>>> must have a llvm.mca.code.region.end intrinsic without any other
> >>>> llvm.mca
> >>>>>>> start intrinsics
> >>>>>>> between the two. The current implementation in the patch enforces
> >>>> this
> >>>>>>> restriction at the
> >>>>>>> IR level via the IR Verifier.
> >>>>>>>
> >>>>>>> 3) An llvm-mca code region cannot span multiple basic blocks.
> >>>> llvm-mca
> >>>>>>> does not follow
> >>>>>>> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> >>>>>>> like any other instruction.
> >>>>>>> The current patch associated with this RFC does not enforce this
> >>>>>>> restriction.  I plan on updating
> >>>>>>> the patch to enforce that a code region can only belong to a single
> >>>> basic
> >>>>>>> block.  This is a simple
> >>>>>>> check, ensuring that both the llvm.mca.code.region.start and
> >>>> accompanying
> >>>>>>> end intrinsics live
> >>>>>>> in the same basic block. I imagine adding this check at the IR level
> >>>> when
> >>>>>>> we also verify points 1 and 2
> >>>>>>> above.  That will keep the code-region verification logic isolated
> >>>> to the
> >>>>>>> IR verifier.  The start/end
> >>>>>>> intrinsics should not have any uses, so I'm not sure that they would
> >>>> be
> >>>>>>> moved/sunk on behalf
> >>>>>>> of any other instruction.  In other words, I do not imagine that a
> >>>> start
> >>>>>>> and end would be split
> >>>>>>> apart due to later MI optimizations.  If I discover that such a case
> >>>>>>> occurs, then I might add the
> >>>>>>> basic-block check prior to emitting the code region data to the
> >>>> object
> >>>>>>> file.    Once  llvm-mca  is
> >>>>>>> updated to handle branches, then we can remove this constraint.
> >>>>>>>
> >>>>>>> -Matt
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: llvm-dev <[hidden email]> On Behalf Of Matt
> >>>>>>> Davis via llvm-
> >>>>>>>> dev
> >>>>>>>> Sent: Wednesday, November 21, 2018 8:47 AM
> >>>>>>>> To: Andrea Di Biagio <[hidden email]>
> >>>>>>>> Cc: llvm-dev <[hidden email]>; Di Biagio, Andrea
> >>>>>>>> <[hidden email]>; [hidden email]
> >>>>>>>> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> >>>>>>> llvm-mca.
> >>>>>>>> Hi Andrea,
> >>>>>>>>
> >>>>>>>> Thanks for your input.
> >>>>>>>>
> >>>>>>>> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> >>>>>>>> [... snip ...]
> >>>>>>>>> About the suggested design:
> >>>>>>>>> I like the idea of being able to identify code regions using a
> >>>> numeric
> >>>>>>>>> identifier.
> >>>>>>>>> However, what happens if a code region spans through multiple
> >>>> basic
> >>>>>>> blocks?
> >>>>>>>> The current patch does not take into consideration cases where the
> >>>>>>>> region start and end intrinsics are placed in different basic
> >>>> blocks.
> >>>>>>>> Such would be the case if a region is defined to span multiple
> >>>> blocks.
> >>>>>>>> This would be similar to the current case where a user places a
> >>>>>>>> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-
> END
> >>>> in
> >>>>>>>> another.  However, as you point out below, if the user does this
> >>>> in the
> >>>>>>>> source code via intrinsics (just what this patch is proposing),
> >>>> then
> >>>>>>>> there is a chance that optimizations might change the layout of the
> >>>>>>>> instructions and confuse the ordering of the MCA intrinsics.
> >>>>>>>>
> >>>>>>>> Since MCA does not follow branches (MCA just treats a branch as it
> >>>> would
> >>>>>>>> a non-branching instruction), it seems that a user should be aware
> >>>> that
> >>>>>>>> defining MCA code regions that span multiple blocks might result
> >>>> in an
> >>>>>>>> unexpected analysis.  While we do not discourage this, it seems
> >>>> like
> >>>>>>>> such a case will probably not produce an expected result for the
> >>>> user.
> >>>>>>>> We could introduce a warning, or automatically divide the regions
> >>>> so
> >>>>>>>> that a single region can only contain a single block.
> >>>>>>>>
> >>>>>>>>> My understanding is that code regions are not allowed to
> >>>> overlap. So,
> >>>>>>> it
> >>>>>>>>> makes sense if ` __mca_code_region_end()` doesn't take an ID as
> >>>> input.
> >>>>>>>>> However, what if ` __mca_code_region_end()` ends in a different
> >>>> basic
> >>>>>>> block?
> >>>>>>>>> `__mca_code_region_start()` has to always dominate `
> >>>>>>>>> __mca_code_region_end()`. This is trivial to verify when both
> >>>> calls
> >>>>>>> are in
> >>>>>>>>> a same basic block; however, we need to make sure that the
> >>>>>>> relationship is
> >>>>>>>>> still the same when the `end()` call is in a different basic
> >>>> block.
> >>>>>>>>> That would not be enough. I think we should also verify  that `
> >>>>>>>>> __mca_code_region_end()` always post-dominates the call to
> >>>>>>>>> `__mca_code_region_start()`.
> >>>>>>>> In any case this patch should probably check dominance of the
> >>>>>>>> intrinsics, even though MCA does not follow branches and MCA does
> >>>> not
> >>>>>>>> not explicitly forbid a region from containing multiple blocks.
> >>>>>>>>
> >>>>>>>>> My question is: what happens with basic block reordering? We
> >>>> don't
> >>>>>>> know the
> >>>>>>>>> layout of basic blocks until we reach code emission. How does it
> >>>> work
> >>>>>>> for
> >>>>>>>>> regions that span through multiple basic blocks?. I think your
> >>>> RFC
> >>>>>>> should
> >>>>>>>>> clarify this aspect.
> >>>>>>>>>
> >>>>>>>>> As a side note: at the moment, llvm-mca doesn't know how to deal
> >>>> with
> >>>>>>>>> branches. So, for simplicity we could force code regions to only
> >>>>>>> contain
> >>>>>>>>> instructions from a single basic block.
> >>>>>>>>>
> >>>>>>>>> However, In future we may want to teach llvm-mca how to analyze
> >>>>>>> branchy
> >>>>>>>>> code too. For example, we could introduce a simple control-flow
> >>>>>>> analysis in
> >>>>>>>>> llvm-mca, and use an external "branch trace" information (for
> >>>>>>> example, a
> >>>>>>>>> perf trace generated by an external tool) to decorate branches
> >>>> with
> >>>>>>> with
> >>>>>>>>> branch probabilities (similarly to what we currently do in LLVM
> >>>> with
> >>>>>>> PGO).
> >>>>>>>>> We could then use that knowledge to model branch prediction and
> >>>>>>> simulate
> >>>>>>>>> what happens in the presence of multiple branches.
> >>>>>>>>>
> >>>>>>>>> So, the idea of having regions that potentially span multiple
> >>>> basic
> >>>>>>> blocks
> >>>>>>>>> is not bad in general. However, I think you should better clarify
> >>>>>>> what are
> >>>>>>>>> the constraints (at least, you should answer to my questions from
> >>>>>>> before).
> >>>>>>>> I agree! Thanks for pointing that out.
> >>>>>>>>
> >>>>>>>>> If we decide to use those new intrinsics, then those should be
> >>>>>>> experimental
> >>>>>>>>> (at least to start).
> >>>>>>>> Agreed.
> >>>>>>>>
> >>>>>>>> -Matt
> >>>>>>>>
> >>>>>>>>> On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >>>>>>> [hidden email]>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Introduction
> >>>>>>>>>> -----------------
> >>>>>>>>>> Currently llvm-mca only accepts assembly code as input. We
> >>>> would
> >>>>>>> like to
> >>>>>>>>>> extend llvm-mca to support object files, allowing users to
> >>>> analyze
> >>>>>>> the
> >>>>>>>>>> performance of binaries. The proposed changes (which involve
> >>>> both
> >>>>>>>>>> clang and llvm) optionally introduce an object file section,
> >>>> but
> >>>>>>> this can
> >>>>>>>>>> be
> >>>>>>>>>> stripped-out if desired.
> >>>>>>>>>>
> >>>>>>>>>> For the llvm-mca binary support feature to be useful, a user
> >>>> needs
> >>>>>>> to tell
> >>>>>>>>>> llvm-mca which portions of their code they would like analyzed.
> >>>>>>> Currently,
> >>>>>>>>>> this is accomplished via assembly comments. However, assembly
> >>>>>>> comments are
> >>>>>>>>>> not
> >>>>>>>>>> preserved in object files, and this has encouraged this RFC.
> >>>> For the
> >>>>>>>>>> proposed
> >>>>>>>>>> binary support, we need to introduce changes to clang and llvm
> >>>> to
> >>>>>>> allow the
> >>>>>>>>>> user's object code to be recognized by llvm-mca:
> >>>>>>>>>>
> >>>>>>>>>> * We need a way for a user to identify a region/block of code
> >>>> they
> >>>>>>> want
> >>>>>>>>>>     analyzed by llvm-mca.
> >>>>>>>>>> * We need the information defining the user's region of code
> >>>> to be
> >>>>>>>>>> maintained
> >>>>>>>>>>     in the object file so that llvm-mca can analyze the desired
> >>>>>>> region(s)
> >>>>>>>>>> from the
> >>>>>>>>>>     object file.
> >>>>>>>>>>
> >>>>>>>>>> We define a "code region" as a subset of a user's program that
> >>>> is
> >>>>>>> to be
> >>>>>>>>>> analyzed via llvm-mca. The sequence of instructions to be
> >>>> analyzed
> >>>>>>> is
> >>>>>>>>>> represented as a pair: <start, end> where the 'start' marks the
> >>>>>>> beginning
> >>>>>>>>>> of
> >>>>>>>>>> the user's source code and 'end' terminates the sequence. The
> >>>>>>> instructions
> >>>>>>>>>> between 'start' and 'end' form the region that can be analyzed
> >>>> by
> >>>>>>> llvm-mca
> >>>>>>>>>> at a
> >>>>>>>>>> later time.
> >>>>>>>>>>
> >>>>>>>>>> Example
> >>>>>>>>>> -----------
> >>>>>>>>>> Before we go into the details of this proposed change, let's
> >>>> first
> >>>>>>> look at
> >>>>>>>>>> a
> >>>>>>>>>> simple example:
> >>>>>>>>>>
> >>>>>>>>>> // example.c -- Analyze a dot-product expression.
> >>>>>>>>>> double test(double x, double y) {
> >>>>>>>>>>     double result = 0.0;
> >>>>>>>>>>     __mca_code_region_start(42);
> >>>>>>>>>>     result += x * y;
> >>>>>>>>>>     __mca_code_region_end();
> >>>>>>>>>>     return result;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> In the example above, we have identified a code region, in this
> >>>>>>> case a
> >>>>>>>>>> single
> >>>>>>>>>> dot-product expression. For the sake of brevity and simplicity,
> >>>>>>> we've
> >>>>>>>>>> chosen
> >>>>>>>>>> a very simple example, but in reality a more complicated
> >>>> example
> >>>>>>> could use
> >>>>>>>>>> multiple expressions. We have also denoted this region as
> >>>> number
> >>>>>>> 42. That
> >>>>>>>>>> identifier is only for the user, and simplifies reading an
> >>>> llvm-mca
> >>>>>>>>>> analysis
> >>>>>>>>>> report later.
> >>>>>>>>>>
> >>>>>>>>>> When this code is compiled, the region markers (the
> >>>> mca_code_region
> >>>>>>>>>> markers)
> >>>>>>>>>> are transformed into assembly labels. While the markers are
> >>>>>>> presented as
> >>>>>>>>>> function calls, in reality they are no-ops.
> >>>>>>>>>>
> >>>>>>>>>> test:
> >>>>>>>>>> pushq   %rbp
> >>>>>>>>>> movq    %rsp, %rbp
> >>>>>>>>>> movsd   %xmm0, -8(%rbp)
> >>>>>>>>>> movsd   %xmm1, -16(%rbp)
> >>>>>>>>>> .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >>>>>>>>>> xorps   %xmm0, %xmm0
> >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> >>>>>>>>>> movsd   -8(%rbp), %xmm0
> >>>>>>>>>> mulsd   -16(%rbp), %xmm0
> >>>>>>>>>> addsd   -24(%rbp), %xmm0
> >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> >>>>>>>>>> .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >>>>>>>>>> movsd   -24(%rbp), %xmm0
> >>>>>>>>>> popq    %rbp
> >>>>>>>>>> retq
> >>>>>>>>>> .section        .mca_code_regions,"",@progbits
> >>>>>>>>>> .quad   42
> >>>>>>>>>> .quad   .Lmca_code_region_start_0
> >>>>>>>>>> .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >>>>>>>>>>
> >>>>>>>>>> The assembly has been trimmed to show the portions relevant to
> >>>> this
> >>>>>>> RFC.
> >>>>>>>>>> Notice the labels enclose the user's defined region, and that
> >>>> they
> >>>>>>>>>> preserve the
> >>>>>>>>>> user's arbitrary region identifier, the ever-so-important
> >>>> region 42.
> >>>>>>>>>> In the object file section .mca_code_regions, we have noted the
> >>>>>>> user's
> >>>>>>>>>> region
> >>>>>>>>>> identifier (.quad 42), start address, and region size. A more
> >>>>>>> complicated
> >>>>>>>>>> example can have multiple regions defined within a single
> >>>>>>> .mca_code_regions
> >>>>>>>>>> section. This section can be read by llvm-mca, allowing
> >>>> llvm-mca to
> >>>>>>> take
> >>>>>>>>>> object files as input instead of assembly source.
> >>>>>>>>>>
> >>>>>>>>>> Details
> >>>>>>>>>> ---------
> >>>>>>>>>> We need a way for a user to identify a region/block of code
> >>>> they
> >>>>>>> want
> >>>>>>>>>> analyzed
> >>>>>>>>>> by llvm-mca. We solve this problem by introducing two
> >>>> intrinsics
> >>>>>>> that a
> >>>>>>>>>> user can
> >>>>>>>>>> specify, for identifying regions of code for analysis.
> >>>>>>>>>>
> >>>>>>>>>> The two intrinsics are: llvm.mca.code.regions.start and
> >>>>>>>>>> llvm.mca.code.regions.end. A user can identify a code region by
> >>>>>>> inserting
> >>>>>>>>>> the
> >>>>>>>>>> mca_code_region_start and mca_code_region_end markers. These
> >>>> are
> >>>>>>> simply
> >>>>>>>>>> clang builtins and are transformed into the aforementioned
> >>>>>>> intrinsics
> >>>>>>>>>> during
> >>>>>>>>>> compilation. The code between the intrinsics are what we call
> >>>> "code
> >>>>>>>>>> regions"
> >>>>>>>>>> and are to be easily identifiable by llvm-mca; any code
> >>>> between a
> >>>>>>> start/end
> >>>>>>>>>> pair can be analyzed by llvm-mca at a later time. A user can
> >>>> define
> >>>>>>>>>> multiple
> >>>>>>>>>> non-overlapping code regions within their program.
> >>>>>>>>>>
> >>>>>>>>>> The llvm.mca.code.region.start intrinsic takes an integer
> >>>> constant
> >>>>>>> as its
> >>>>>>>>>> only
> >>>>>>>>>> argument. This argument is implemented as a metadata i32, and
> >>>> is
> >>>>>>> only used
> >>>>>>>>>> when generating llvm-mca reports. This value allows a user to
> >>>> more
> >>>>>>> easily
> >>>>>>>>>> identify a specific code region. llvm.mca.code.region.end
> >>>> takes no
> >>>>>>>>>> arguments.
> >>>>>>>>>> Since we disallow nesting of regions, the first 'end' intrinsic
> >>>>>>> lexically
> >>>>>>>>>> following a 'start' intrinsic represents the end of that code
> >>>>>>> region.
> >>>>>>>>>> Now that we have a solution for identifying regions for
> >>>> analysis,
> >>>>>>> we now
> >>>>>>>>>> need a
> >>>>>>>>>> way for preserving that information to be read at a later
> >>>> time. To
> >>>>>>>>>> accomplish
> >>>>>>>>>> this we propose adding a new section (.mca_code_regions) to the
> >>>>>>> object file
> >>>>>>>>>> generated by llvm. During code generation, the start/end
> >>>> intrinsics
> >>>>>>>>>> described
> >>>>>>>>>> above will be transformed into start/end labels in assembly.
> >>>> When
> >>>>>>> llvm
> >>>>>>>>>> generates the object file from the user's code, these start/end
> >>>>>>> labels
> >>>>>>>>>> form a
> >>>>>>>>>> pair of values identifying the start of the user's code
> >>>> region, and
> >>>>>>> size.
> >>>>>>>>>> The
> >>>>>>>>>> size represents the number of bytes between the start and end
> >>>>>>> address of
> >>>>>>>>>> the
> >>>>>>>>>> labels. Note that the labels are emitted during assembly
> >>>> printing.
> >>>>>>> We hope
> >>>>>>>>>> that these labels have no influence on code generation or
> >>>>>>> basic-block
> >>>>>>>>>> placement. However, the target assembler strategy for handling
> >>>>>>> labels is
> >>>>>>>>>> outside of our control.
> >>>>>>>>>>
> >>>>>>>>>> This proposed change affects the size of a binary, but only if
> >>>> the
> >>>>>>> user
> >>>>>>>>>> calls
> >>>>>>>>>> the start/end builtins mentioned above. The additional size of
> >>>> the
> >>>>>>>>>> .mca_code_regions section, which we imagine to be very small
> >>>> (to
> >>>>>>> the order
> >>>>>>>>>> of a
> >>>>>>>>>> few bytes), can trivially be stripped by tools like 'strip' or
> >>>>>>> 'objcopy'.
> >>>>>>>>>> Implementation Status
> >>>>>>>>>> ------------------------------
> >>>>>>>>>> We currently have the proposed changes implemented at the url
> >>>>>>> posted below.
> >>>>>>>>>> This initial patch only targets ELF object files, and does not
> >>>>>>> handle
> >>>>>>>>>> relocatable addresses. Since the start of a code region is
> >>>>>>> represented as
> >>>>>>>>>> an
> >>>>>>>>>> assembly label, and referenced in the .mca_code_regions
> >>>> section,
> >>>>>>> that
> >>>>>>>>>> address
> >>>>>>>>>> is relocatable. That value can be represented as
> >>>> section-relative
> >>>>>>>>>> relocatable
> >>>>>>>>>> symbol (.text + addend), but we are not handling that case yet.
> >>>>>>> Instead,
> >>>>>>>>>> the
> >>>>>>>>>> proposed changes only handle linked/executable object files.
> >>>>>>>>>>
> >>>>>>>>>> For purposes of review and to communicate the idea, the change
> >>>> is
> >>>>>>>>>> presented as a monolithic patch here:
> >>>>>>>>>>
> >>>>>>>>>> https://reviews.llvm.org/D54603
> >>>>>>>>>>
> >>>>>>>>>> The change is presented as a monolithic patch; however, if
> >>>> accepted
> >>>>>>>>>> the patch will be split into three smaller patches:
> >>>>>>>>>> 1. The introduction of the builtins to clang.
> >>>>>>>>>> 2. The llvm portion (the added intrinsics).
> >>>>>>>>>> 3. The llvm-mca portion.
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> -Matt
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> LLVM Developers mailing list
> >>>>>>>>>> [hidden email]
> >>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> LLVM Developers mailing list
> >>>>>>>> [hidden email]
> >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev