[llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev

Hi,

Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.

I look forward to your feedback.

Thanks,
David Spickett.



RFC New Clang target feature selection options for ARM/AArch64
--------------------------------------------------------------

In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
- validate that given options make sense within architectural restrictions
- make option discovery and documentation easier
- unify the list of extensions that command lines and asm directives use
- bring the options closer to GCC's where appropriate

Current Options Comparison
--------------------------

                       | GCC           | Clang         |
                       |-------------------------------|
                       | ARM | AArch64 | ARM | AArch64 |
|----------------------|-----|---------|-----|---------|
| -march with '+<ext>' | Y   | Y       | Y   | Y       |
| checks extensions    | Y   | N       | N   | N       |
| .arch with '+<ext>'  | N   | N       | N   | Y       |
| .arch_extension      | Y   | Y       | Y   | N       |
| .fpu                 | Y   | N       | Y   | N       |
| -mfpu                | Y   | N       | Y   | N       |
| checks FPUs          | N   | n/a     | N   | n/a     |
|----------------------|-----|---------|-----|---------|

Examples of each of these can be found at the end of this document.

Problems With the Current Options
---------------------------------

- You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
- Combinations of options are not checked for compatibility.
- Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
- Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
- Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.

Proposed solution
------------------

ARM and AArch64:
- Make the TargetParser the single source for extension names, removing the AsmParser tables.
- Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
- Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
- Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
- Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

ARM:
- Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
(allow them to be recognised, they could still be rejected for compatibility).
- Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
- Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
- Reject invalid arch/cpu and extension combinations with an error diagnostic.

Optional features
-----------------

AArch64:
- add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.

ARM:
- Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

Options Comparison With the Proposed Solution
----------------------------------------------

Anything in brackets has changed from the previous table.

                       | GCC           | Clang             |
                       |-----------------------------------|
                       | ARM | AArch64 | ARM     | AArch64 |
|----------------------|-----|---------|---------|---------|
| -march with '+<ext>' | Y   | Y       | Y       | Y       |
| checks extensions    | Y   | N       | (Y)     | (Y)     |
| .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
| .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
| -mfpu                | Y   | N       | Y       | N       |
| .fpu                 | Y   | N       | Y       | N       |
| checks FPUs          | N   | n/a     | (Y)     | n/a     |
|----------------------|-----|---------|---------|---------|

Implementation
--------------

Use of Table-gen
================

The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
- more readable than the current macros
- allows default/optional values more easily
- we can generate code and documentation from the same source
- easier to add new properties

Drawbacks:
- it requires a new tablegen backend to generate the include files
- additional indirection which could make debugging and future changes more difficult

We think the benefits outweigh the disadvantages in this case.

To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
2. put the TargetParser for each backend in the library group for that backend. This requires one of:
    * Relaxing the requirement that target parsers must be built even if the backend is not.
    * Modifying the CMake scripts to build the target parsers even if the backend is not being built.

Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Using existing SubTarget features
=================================

If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.

We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.

Consider this AArch64 march:
-march=armv8.4-a+crypto+nosha2

The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

New Errors and Warnings
=======================

Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
(these apply to cmd lines and directives unless stated)

Errors:
- unknown extension in an assembly directive (currently fails silently)
- extension incompatible with base arch, message shows the base arch it requires.
- extension requires another which is disabled later, message shows which one is required.
- extension requires another which is not enabled, message shows requirements.
- ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Warnings:
- ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
- mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
- mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Proposed diagnostic names: (in the same order as above)
- "target-feature" (top level group)
    - "incompatible-feature"
      - "extension-requirement-disabled"
    - "extension-requires"
    - "incompatible-fpu"
    - "implied-fpu-unused"
    - "mandatory-feature-ignored"
    - "mandatory-feature-disabled"

"Negative" Backend Features
===========================

There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.

This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.

.arch_extension Directive
=========================

Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.

Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.

We can handle this in a few of ways:
- Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
- Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
- Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.

ARM Assembly Directives
=======================

As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.

'.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.

Current Command Line Option Examples
------------------------------------

Clang ARM
=========

Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.

$ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
(can't use dotprod with v8-a)

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(should be invalid but is allowed)

GCC ARM
=======

For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.

$ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
$ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o

$ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp

$ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(same example given for Clang above, should be invalid)

Clang AArch64
=============

The '+' syntax still applies but mfpu is replaced with '+' extensions.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o

Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o

Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o

GCC AArch64
===========

For GCC AArch64 mfpu is also dropped in favour of '+' extensions.

$ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?

Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.

$ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
$ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
$ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
(should not be allowed)
$ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
(should not be allowed)

Current Assembly Directive Examples
-----------------------------------

Clang .arch/.arch_extension
===========================

AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.

In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.

(example is using AArch64)
.arch armv8.2-a # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

.arch armv8.2-a+dotprod # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

ARM uses the .arch_extension directive which is one extension per use, with no '+'.

.arch armv7-a #error: instruction requires: crc armv8
CRC32B r0, r1, r2

.arch armv8-a+crc #error: Unknown arch name
CRC32B r0, r1, r2

.arch armv8-a # no error
.arch_extension crc
CRC32B r0, r1, r2

You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.

ARM does check validity of extensions provided with '.arch_extension'.

.arch armv7-a
.arch_extension crc
CRC32B r0, r1, r2

main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
.arch_extension crc

AArch64 only rejects known extensions that aren't supported at all.

.arch armv8-a+pan # unsupported architectural extension: pan
nop

Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.

(example is using AArch64)
.arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
nop

GCC .arch/.arch_extension
=========================

GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.

.arch armv8-a+crc # invalid

.arch armv8-a # valid
.arch_extension crc

.arch_extension crc # valid
.arch_extension crc+crypto #invalid

For extensions that vary based on base architecture, GCC tracks the last known arch.

Clang .fpu
==========

.fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.

./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:1: error: unknown directive
.fpu neon
^

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:6: error: Unknown FPU name
.fpu clearly-not-valid
     ^

(same example as 'Clang ARM' command lines, should be invalid)
$ cat /tmp/test.s
.fpu neon-fp16
$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o

GCC .fpu
========

.fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.

This is correctly rejected from a command line:
$ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2

Whereas the directive is accepted:
$ cat /tmp/test.s
.fpu neon
nop
$ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o

For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.

$ cat /tmp/test.s
.fpu neon
$ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
/tmp/test.s: Assembler messages:
/tmp/test.s:1: Error: unknown pseudo-op: `.fpu'

$ cat /tmp/test.s
.arch_extension simd
$ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o

References
----------

Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev
On 9/21/2018 3:05 AM, David Spickett via llvm-dev wrote:
Hi,

Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.

I look forward to your feedback.

This looks like you've put a lot of thought into it; I have some minor comments on the details, but I think this is looking in the right direction.

One thing this doesn't mention is clang's "target" attribute for functions; have you considered that at all?

ARM and AArch64:
- Make the TargetParser the single source for extension names, removing the AsmParser tables.
- Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
- Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
- Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)

Could you go into a bit more detail about mandatory features?  I'm pretty sure people are using the extension functionality to turn off features which are technically mandatory according to the reference manual, like floating-point in armv8a.

To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
2. put the TargetParser for each backend in the library group for that backend. This requires one of:
    * Relaxing the requirement that target parsers must be built even if the backend is not.
    * Modifying the CMake scripts to build the target parsers even if the backend is not being built.

Maybe you could put it into some existing library, like libLLVMTarget.

"Negative" Backend Features
===========================

There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.

This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.

This seems mostly orthogonal?  At least I mean, I guess it might make the translation from TargetParser features to LLVM features slightly easier, but it seems like there could be some unexpected implications, so I don't want to tie it to this change.  If you think it's really worthwhile, please make sure it's a separate patch.

Also, these aren't the only negative features... although maybe they're the only negative features that are relevant for TargetParser?

-Eli


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev
In reply to this post by Shawn Webb via llvm-dev
On Fri, 21 Sep 2018 at 11:06, David Spickett via llvm-dev
<[hidden email]> wrote:
> Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

Hi David,

This is *awesome*. Thanks for such a detailed analysis!


> In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
> - validate that given options make sense within architectural restrictions
> - make option discovery and documentation easier
> - unify the list of extensions that command lines and asm directives use
> - bring the options closer to GCC's where appropriate

One additional goal we had in the past, when we first wrote the
TargetParser was to use the *existing* target description table-gen
files to generate the parser tables.

This means new changes to cores, sub-arches, and fixes to existing
ones will *automatically* be translated to command line and assembly
parsing.



> Proposed solution
> ------------------
>
> ARM and AArch64:
> - Make the TargetParser the single source for extension names, removing the AsmParser tables.
> - Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
> - Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
> - Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.

SGTM.

> - Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
> - Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

I'd be more comfortable if these weren't enabled by default, but were
present in -Wall.

Writing generic and precise build systems is a nightmare, which is the
biggest reason why compilers generally ignore nonsense options
silently.


> ARM:
> - Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
> (allow them to be recognised, they could still be rejected for compatibility).
> - Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
> - Reject invalid arch/cpu and extension combinations with an error diagnostic.

SGTM.

> - Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.

I'd have assumed -mfpu is already "auto" by default. Or is this to
just override a previous option?

ex: clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto -> defaults back to VFP3.



> Optional features
> -----------------
>
> AArch64:
> - add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.
>
> ARM:
> - Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

I see no reason to be inconsistent with GNU tools here. We can have
more, but we should not have less or different behaviour.


> Use of Table-gen
> ================
>
> We think the benefits outweigh the disadvantages in this case.

Agreed!


> To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
> 1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
> 2. put the TargetParser for each backend in the library group for that backend. This requires one of:
>     * Relaxing the requirement that target parsers must be built even if the backend is not.
>     * Modifying the CMake scripts to build the target parsers even if the backend is not being built.
>
> Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Option 1 makes everyone pay the cost and can be a lot harder to make
it flexible and "zero-cost". This is the reason why it was changed
from a class-based model to a static function / table model.

I had a go at option 2 years ago and it works. You need to fiddle a
bit with the CMake file in lib/Targets (to prepare the inc files even
if targets aren't being built, because Clang needs to use it for all
supported targets regardless).

It wasn't upstreamed because the hard part is to re-use the existing
table-gen files for a new back-end, which would generate the tables.
Not so much writing the new back-end, but making sure the data we need
isn't redundant or contradictory (which it was both) across all
table-gen files. We also had to add new options to the targets (define
new classes, etc) which were solely used by the parser, so were harder
to justify on its own and needed a much more extensive validation than
we had bandwidth for.


> Consider this AArch64 march:
> -march=armv8.4-a+crypto+nosha2
>
> The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

Is this complex logic done by GCC's front-end as well?

It would be pretty cool to have it smart like that, but we also have
to be careful to have a rock solid model before improving on GCC's
(potentially broken) functionality, and hopefully someone talking to
them on the side.

The amount of noise that comes every time we change the command line
options interpretation is non-trivial. :)


> Errors:
> - unknown extension in an assembly directive (currently fails silently)

IIRC, this is by design.

Imagine a macro that defines .cpu in an asm file to multiple things,
and the rest of the file has .fpu all over the place, with support for
all .cpu options, but with the guarantee that those functions will
only be compiled/executed if the .cpu is correct.

This may sound weird, but some libraries (ex. unwind) actually depend
on weird behaviour like that.


> - extension incompatible with base arch, message shows the base arch it requires.
> - extension requires another which is disabled later, message shows which one is required.
> - extension requires another which is not enabled, message shows requirements.
> - ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Define "incompatible". Older Arm cores could have new features that
wasn't even define in its own standard because manufacturers upgraded
the extra but not the core.

I'm happy to have errors for things that are impossible, like "ARMv5
AArch64" or enabling and disabling intersecting groups that cannot be
represented in the compiler.

I'm happy to have warnings, possibly only under -Wall, for nonsense
options like "ARMv5 VFP4" or "ARMv8A IWMMX".


> Warnings:
> - ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.

This is nice.

> - mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)

Maybe under -Wall?

> - mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Arm is a flexible architecture, and build systems are crazy. This will
likely confuse a lot of builds in the wild.

I'd avoid it unless in -Wall.


> .arch_extension Directive
> =========================
>
> We can handle this in a few of ways:
> - Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.

.arch_extension was implemented because GCC does it. I'm not sure what
you mean by that, but I'm not happy with removing it, as it will break
scores of assembly files out there.

> - Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

This makes sense, but will likely require changes in a lot of existing
low-level assembly files, which choose a generic .cpu and vary
.fpu/.arch_extension to implement independent functionality (like
unwinders).

If you read the GNU manuals, the assembly directives is more to allow
the assembler to relax checks than enforce them more.

I personally like strong checks, but the problems we have with inline
assembly will come crashing in assembly files if we start tightening
the checks there, too.

It's a worthy long goal, but it's a loooong goal and you don't want
your current TargetParser work to depend on that.


> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
> (should be invalid but is allowed)
>
> $ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
> (same example given for Clang above, should be invalid)

If both are allowed, I'd recommend you not to change it in this
current pass. Let's get the parser fixed before changing overall
behaviour.


> Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.
>
> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
>
> Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.
>
> $ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
> (should not be allowed)

This is unlikely to change, let alone in the time frame of your work.

I strongly recommend that you do not change *any* user-facing
behaviour until the underlying parser changes are done and released
upstream.

--
cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev

Hi Eli, Renato,

Thanks for your feedback, there's a lot more to some of these things than I knew. I've addressed your points below.

The overall summary is:
- Start with converting the TargetParser to tableGen, with no user facing changes
- Add warnings based on that, behind -Wall. Starting with command lines, since directives have
larger implications that need investigation

Thanks,
David Spickett.


mandatory features
==================

>> Could you go into a bit more detail about mandatory features?  I'm pretty sure people are using the extension functionality to turn off features which are technically mandatory according to the reference manual, like floating-point in armv8a.

>> I'd be more comfortable if these weren't enabled by default, but were
present in -Wall.

It seems like a large portion of the architecturally invalid combinations have a technical reasoning. So I'd amend that point from:

"- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)"

to

" - Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'."

So the option doesn't change behaviour and as Renato suggested we don't make them errors by default. So it's visible if you want to check these things but it's not going to break existing code. Anyone who wants an error can always upgrade it if required.

Use of Tablegen
===============

>> Maybe you could put it into some existing library, like libLLVMTarget.

This would be a little easier but with what Renato said...

>> Option 1 makes everyone pay the cost and can be a lot harder to make
it flexible and "zero-cost".

I think we want to stick with the second way. This (from what I understand) also helps with the goal of reusing existing tablegen.

>> One additional goal we had in the past, when we first wrote the
TargetParser was to use the *existing* target description table-gen
files to generate the parser tables.

I should have been more clear, "unify the list of extensions that command lines and asm directives use" is sort of the same thing just muddled. As discussed above I think we can achieve this, I'm sure I'll hit the same issues you did Renato but hopefully they aren't showstoppers.

"target" attribute (Eli)
==================

>> One thing this doesn't mention is clang's "target" attribute for functions; have you considered that at all?

I hadn't, thanks for pointing that out.

As far as I can tell, we only support cpu names via the target attribute:
__attribute__((target("arch=cortex-a75")))
Whereas this doesn't work:
__attribute__((target("arch=armv8-a+crc")))

We don't plan to add this as part of this work, but of course you could specify invalid combinations with a CPU and some combination of other directives and options. These would be warnings following the ones already mentioned.

I need to do some more investigation to work out exactly what invalid combinations you could produce. So this will be a latter part of the work if at all.

"Negative" backend features (Eli)
===========================

>> This seems mostly orthogonal?  At least I mean, I guess it might make the translation from TargetParser features to LLVM features slightly easier, but it seems like there could be some unexpected implications, so I don't want to tie it to this change.

Agreed, it would certainly be a separate patch. It might not be needed so I will work on the tablegen conversion without changing any of this and see how it goes.

>> they're the only negative features that are relevant for TargetParser?

Yes, the rest are for internal use aka not enabled by a specific option. For a particular CPU for example.

'auto' FPU value (Renato)
================

>> I'd have assumed -mfpu is already "auto" by default. Or is this to
>> just override a previous option?
>>
>> ex: clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto -> defaults back to VFP3.

I don't see any reference to this in the code or the docs, and clang something similair:
./clang --target=arm-arm-none-eabi -mcpu=cortex-a8 -mfpu=vfp4 -mfpu=auto -c /tmp/test.c
clang-8: error: the clang compiler does not support '-mfpu=auto'

Maybe I'm missing something.

ACLE macros (Renato)
===========

>> The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/>SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang >needs to know that nosha2 also disables SHA3.

>Is this complex logic done by GCC's front-end as well?

I don't think so, you might be right there. We will look into exactly what GCC has implemented before making any moves here.

Errors (Renato)
======

>> Errors:
>> - unknown extension in an assembly directive (currently fails silently)
>>    IIRC, this is by design.

If that's the case then we'll keep the behaviour. Again with warnings under -Wall.

My impression of it came more from me trying to work out what was a valid option at all. However if we can improve the documentation and consistency between directives and command line options that won't be an issue.

>> Define "incompatible". Older Arm cores could have new features that
wasn't even define in its own standard because manufacturers upgraded
the extra but not the core.

Good point, I suppose "incompatible" in the way I wrote it means "not listed as an off the shelf config". Which you're right, doesn't cover everything. So yes, agreed on defaulting to warnings behind -Wall.

>> - mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)

Agreed, also as discussed above it should not ignore the option. (unless it is actually a nop in that situation or completely impossible)

>> .arch_extension was implemented because GCC does it. I'm not sure what
you mean by that, but I'm not happy with removing it, as it will break
scores of assembly files out there.

I put it there to put the choice between being GCC compatible and being consistent within Clang itself. I've quickly realised that the former is more important.

>> This makes sense, but will likely require changes in a lot of existing
low-level assembly files, which choose a generic .cpu and vary
.fpu/.arch_extension to implement independent functionality (like
unwinders).

Again I didn't know about that use case. It's definitely a later goal and I think there needs to be more investigation before we could make any changes.

>> I strongly recommend that you do not change *any* user-facing
behaviour until the underlying parser changes are done and released
upstream.

After what I've read here I'm fully on board with that.


From: Renato Golin <[hidden email]>
Sent: 24 September 2018 21:51:01
To: David Spickett
Cc: Clang Dev; LLVM Dev; nd
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 
On Fri, 21 Sep 2018 at 11:06, David Spickett via llvm-dev
<[hidden email]> wrote:
> Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

Hi David,

This is *awesome*. Thanks for such a detailed analysis!


> In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
> - validate that given options make sense within architectural restrictions
> - make option discovery and documentation easier
> - unify the list of extensions that command lines and asm directives use
> - bring the options closer to GCC's where appropriate

One additional goal we had in the past, when we first wrote the
TargetParser was to use the *existing* target description table-gen
files to generate the parser tables.

This means new changes to cores, sub-arches, and fixes to existing
ones will *automatically* be translated to command line and assembly
parsing.



> Proposed solution
> ------------------
>
> ARM and AArch64:
> - Make the TargetParser the single source for extension names, removing the AsmParser tables.
> - Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
> - Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
> - Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.

SGTM.

> - Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
> - Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

I'd be more comfortable if these weren't enabled by default, but were
present in -Wall.

Writing generic and precise build systems is a nightmare, which is the
biggest reason why compilers generally ignore nonsense options
silently.


> ARM:
> - Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
> (allow them to be recognised, they could still be rejected for compatibility).
> - Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
> - Reject invalid arch/cpu and extension combinations with an error diagnostic.

SGTM.

> - Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.

I'd have assumed -mfpu is already "auto" by default. Or is this to
just override a previous option?

ex: clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto -> defaults back to VFP3.



> Optional features
> -----------------
>
> AArch64:
> - add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.
>
> ARM:
> - Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

I see no reason to be inconsistent with GNU tools here. We can have
more, but we should not have less or different behaviour.


> Use of Table-gen
> ================
>
> We think the benefits outweigh the disadvantages in this case.

Agreed!


> To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
> 1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
> 2. put the TargetParser for each backend in the library group for that backend. This requires one of:
>     * Relaxing the requirement that target parsers must be built even if the backend is not.
>     * Modifying the CMake scripts to build the target parsers even if the backend is not being built.
>
> Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Option 1 makes everyone pay the cost and can be a lot harder to make
it flexible and "zero-cost". This is the reason why it was changed
from a class-based model to a static function / table model.

I had a go at option 2 years ago and it works. You need to fiddle a
bit with the CMake file in lib/Targets (to prepare the inc files even
if targets aren't being built, because Clang needs to use it for all
supported targets regardless).

It wasn't upstreamed because the hard part is to re-use the existing
table-gen files for a new back-end, which would generate the tables.
Not so much writing the new back-end, but making sure the data we need
isn't redundant or contradictory (which it was both) across all
table-gen files. We also had to add new options to the targets (define
new classes, etc) which were solely used by the parser, so were harder
to justify on its own and needed a much more extensive validation than
we had bandwidth for.


> Consider this AArch64 march:
> -march=armv8.4-a+crypto+nosha2
>
> The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

Is this complex logic done by GCC's front-end as well?

It would be pretty cool to have it smart like that, but we also have
to be careful to have a rock solid model before improving on GCC's
(potentially broken) functionality, and hopefully someone talking to
them on the side.

The amount of noise that comes every time we change the command line
options interpretation is non-trivial. :)


> Errors:
> - unknown extension in an assembly directive (currently fails silently)

IIRC, this is by design.

Imagine a macro that defines .cpu in an asm file to multiple things,
and the rest of the file has .fpu all over the place, with support for
all .cpu options, but with the guarantee that those functions will
only be compiled/executed if the .cpu is correct.

This may sound weird, but some libraries (ex. unwind) actually depend
on weird behaviour like that.


> - extension incompatible with base arch, message shows the base arch it requires.
> - extension requires another which is disabled later, message shows which one is required.
> - extension requires another which is not enabled, message shows requirements.
> - ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Define "incompatible". Older Arm cores could have new features that
wasn't even define in its own standard because manufacturers upgraded
the extra but not the core.

I'm happy to have errors for things that are impossible, like "ARMv5
AArch64" or enabling and disabling intersecting groups that cannot be
represented in the compiler.

I'm happy to have warnings, possibly only under -Wall, for nonsense
options like "ARMv5 VFP4" or "ARMv8A IWMMX".


> Warnings:
> - ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.

This is nice.

> - mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)

Maybe under -Wall?

> - mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Arm is a flexible architecture, and build systems are crazy. This will
likely confuse a lot of builds in the wild.

I'd avoid it unless in -Wall.


> .arch_extension Directive
> =========================
>
> We can handle this in a few of ways:
> - Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.

.arch_extension was implemented because GCC does it. I'm not sure what
you mean by that, but I'm not happy with removing it, as it will break
scores of assembly files out there.

> - Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

This makes sense, but will likely require changes in a lot of existing
low-level assembly files, which choose a generic .cpu and vary
.fpu/.arch_extension to implement independent functionality (like
unwinders).

If you read the GNU manuals, the assembly directives is more to allow
the assembler to relax checks than enforce them more.

I personally like strong checks, but the problems we have with inline
assembly will come crashing in assembly files if we start tightening
the checks there, too.

It's a worthy long goal, but it's a loooong goal and you don't want
your current TargetParser work to depend on that.


> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
> (should be invalid but is allowed)
>
> $ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
> (same example given for Clang above, should be invalid)

If both are allowed, I'd recommend you not to change it in this
current pass. Let's get the parser fixed before changing overall
behaviour.


> Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.
>
> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
>
> Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.
>
> $ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
> (should not be allowed)

This is unlikely to change, let alone in the time frame of your work.

I strongly recommend that you do not change *any* user-facing
behaviour until the underlying parser changes are done and released
upstream.

--
cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev
Hello David,

Not a lot to add over what Eli and Renato have already mentioned.

> 'auto' FPU value (Renato)
> ================
>
> >> I'd have assumed -mfpu is already "auto" by default. Or is this to
> >> just override a previous option?
> >>
> >> ex: clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto -> defaults back to VFP3.
>
> I don't see any reference to this in the code or the docs, and clang something similair:
> ./clang --target=arm-arm-none-eabi -mcpu=cortex-a8 -mfpu=vfp4 -mfpu=auto -c /tmp/test.c
> clang-8: error: the clang compiler does not support '-mfpu=auto'
>
> Maybe I'm missing something.
>

I think what Renato was meaning was that some CPUs already imply an
FPU and hence -mfpu is already "kind of" auto. For example
-mcpu=cortex-m4 implies a FPU.
ARM_CPU_NAME("cortex-m4", ARMV7EM, FK_FPV4_SP_D16, true, ARM::AEK_NONE)

I think the example "clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto ->
defaults back to VFP3." was a theoretical what if -mfpu=auto came
after an explicit -mpfu=vfp4, would this result in the default fpu
thus overriding the previous -mfpu=vfp4

> >> Define "incompatible". Older Arm cores could have new features that
> wasn't even define in its own standard because manufacturers upgraded
> the extra but not the core.
>
> Good point, I suppose "incompatible" in the way I wrote it means "not listed as an off the shelf config". Which you're right, doesn't cover everything. So yes, agreed on defaulting to warnings behind -Wall.
>
I think that post-cortex CPUs are far more locked down in terms of
configurations. If you have an FPU it will always be of the same type.
I agree with Renato that the outside world is considerably messier
than we'd like though. For example compiler-rt builtins for Cortex-m4
currently cheats and specifies a full double precision FPU as there
are some functions (never called and hence never included in a M4
binary but present in the archive) that use double precision floating
point. In compiler-rt's case I think that we should fix that but it
illustrates that there may be other projects that would be affected if
we are too strict.

Looking forward to seeing the results of the work.

Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev

Hi Peter,

>> I think what Renato was meaning was that some CPUs already imply an
>> FPU and hence -mfpu is already "kind of" auto. For example
>> -mcpu=cortex-m4 implies a FPU.
>> ARM_CPU_NAME("cortex-m4", ARMV7EM, FK_FPV4_SP_D16, true, ARM::AEK_NONE)

Ok I understand now, I'm mixing up current and (possible) future features.

>> clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto ->defaults back to VFP3.

We would want to keep the current logic here. Which I've observed to be that the last mfpu wins, overriding the default from mcpu.

$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a8 -mfpu=vfpv4-d16 -mfpu=vfpv4 -c /tmp/test.c -o /tmp/test.o -###
<...> "-target-feature" "-d16" "-target-feature" "+vfp4" <...>
$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a8 -mfpu=vfpv4 -mfpu=vfpv4-d16 -c /tmp/test.c -o /tmp/test.o -###
<...> "-target-feature" "+d16" "-target-feature" "+vfp4" <...>
(vfp4-d16 wins)

So yes, with the explicit 'auto' you'd revert to the cpu provided fpu:

$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a8 -mfpu=vfpv4 -mfpu=auto -c /tmp/test.c -o /tmp/test.o -###
<...> "-target-feature" "-d16" "-target-feature" "+vfp3" <...>

(and without the 'auto' we'd emit a warning telling you that you overrode the 'vfpv3' from mcpu with 'vfpv4' from mfpu)

>> but it illustrates that there may be other projects that would be affected if we are too strict.

Definitely going to tread lightly when introducing new checks. Just having the option to enable them is a good place to start.

Thanks,
David Spickett.


From: Peter Smith <[hidden email]>
Sent: 26 September 2018 12:38:16
To: David Spickett
Cc: Renato Golin; llvm-dev; nd; Clang Dev
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 
Hello David,

Not a lot to add over what Eli and Renato have already mentioned.

> 'auto' FPU value (Renato)
> ================
>
> >> I'd have assumed -mfpu is already "auto" by default. Or is this to
> >> just override a previous option?
> >>
> >> ex: clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto -> defaults back to VFP3.
>
> I don't see any reference to this in the code or the docs, and clang something similair:
> ./clang --target=arm-arm-none-eabi -mcpu=cortex-a8 -mfpu=vfp4 -mfpu=auto -c /tmp/test.c
> clang-8: error: the clang compiler does not support '-mfpu=auto'
>
> Maybe I'm missing something.
>

I think what Renato was meaning was that some CPUs already imply an
FPU and hence -mfpu is already "kind of" auto. For example
-mcpu=cortex-m4 implies a FPU.
ARM_CPU_NAME("cortex-m4", ARMV7EM, FK_FPV4_SP_D16, true, ARM::AEK_NONE)

I think the example "clang -mcpu cortex-a8 -mfpu vfp4 -mfpu auto ->
defaults back to VFP3." was a theoretical what if -mfpu=auto came
after an explicit -mpfu=vfp4, would this result in the default fpu
thus overriding the previous -mfpu=vfp4

> >> Define "incompatible". Older Arm cores could have new features that
> wasn't even define in its own standard because manufacturers upgraded
> the extra but not the core.
>
> Good point, I suppose "incompatible" in the way I wrote it means "not listed as an off the shelf config". Which you're right, doesn't cover everything. So yes, agreed on defaulting to warnings behind -Wall.
>
I think that post-cortex CPUs are far more locked down in terms of
configurations. If you have an FPU it will always be of the same type.
I agree with Renato that the outside world is considerably messier
than we'd like though. For example compiler-rt builtins for Cortex-m4
currently cheats and specifies a full double precision FPU as there
are some functions (never called and hence never included in a M4
binary but present in the archive) that use double precision floating
point. In compiler-rt's case I think that we should fix that but it
illustrates that there may be other projects that would be affected if
we are too strict.

Looking forward to seeing the results of the work.

Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev
In reply to this post by Shawn Webb via llvm-dev


On Fri, Sep 21, 2018 at 3:06 AM David Spickett via llvm-dev <[hidden email]> wrote:

Hi,

Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.

I look forward to your feedback.

Thanks,
David Spickett.



RFC New Clang target feature selection options for ARM/AArch64
--------------------------------------------------------------

In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
- validate that given options make sense within architectural restrictions
- make option discovery and documentation easier
- unify the list of extensions that command lines and asm directives use
- bring the options closer to GCC's where appropriate

Current Options Comparison
--------------------------

                       | GCC           | Clang         |
                       |-------------------------------|
                       | ARM | AArch64 | ARM | AArch64 |
|----------------------|-----|---------|-----|---------|
| -march with '+<ext>' | Y   | Y       | Y   | Y       |
| checks extensions    | Y   | N       | N   | N       |
| .arch with '+<ext>'  | N   | N       | N   | Y       |
| .arch_extension      | Y   | Y       | Y   | N       |
| .fpu                 | Y   | N       | Y   | N       |
| -mfpu                | Y   | N       | Y   | N       |
| checks FPUs          | N   | n/a     | N   | n/a     |
|----------------------|-----|---------|-----|---------|

Examples of each of these can be found at the end of this document.

Problems With the Current Options
---------------------------------

- You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
- Combinations of options are not checked for compatibility.
- Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
- Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
- Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.

Proposed solution
------------------

ARM and AArch64:
- Make the TargetParser the single source for extension names, removing the AsmParser tables.
- Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
- Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
- Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
- Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

ARM:
- Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
(allow them to be recognised, they could still be rejected for compatibility).
- Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
- Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
- Reject invalid arch/cpu and extension combinations with an error diagnostic.

Optional features
-----------------

AArch64:
- add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.

ARM:
- Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

Options Comparison With the Proposed Solution
----------------------------------------------

Anything in brackets has changed from the previous table.

                       | GCC           | Clang             |
                       |-----------------------------------|
                       | ARM | AArch64 | ARM     | AArch64 |
|----------------------|-----|---------|---------|---------|
| -march with '+<ext>' | Y   | Y       | Y       | Y       |
| checks extensions    | Y   | N       | (Y)     | (Y)     |
| .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
| .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
| -mfpu                | Y   | N       | Y       | N       |
| .fpu                 | Y   | N       | Y       | N       |
| checks FPUs          | N   | n/a     | (Y)     | n/a     |
|----------------------|-----|---------|---------|---------|

Implementation
--------------

Use of Table-gen
================

The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
- more readable than the current macros
- allows default/optional values more easily
- we can generate code and documentation from the same source
- easier to add new properties

Drawbacks:
- it requires a new tablegen backend to generate the include files
- additional indirection which could make debugging and future changes more difficult

We think the benefits outweigh the disadvantages in this case.

To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
2. put the TargetParser for each backend in the library group for that backend. This requires one of:
    * Relaxing the requirement that target parsers must be built even if the backend is not.
    * Modifying the CMake scripts to build the target parsers even if the backend is not being built.

Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Using existing SubTarget features
=================================

If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.

We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.

Consider this AArch64 march:
-march=armv8.4-a+crypto+nosha2

The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

New Errors and Warnings
=======================

Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
(these apply to cmd lines and directives unless stated)

Errors:
- unknown extension in an assembly directive (currently fails silently)
- extension incompatible with base arch, message shows the base arch it requires.
- extension requires another which is disabled later, message shows which one is required.
- extension requires another which is not enabled, message shows requirements.
- ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Warnings:
- ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
- mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
- mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Proposed diagnostic names: (in the same order as above)
- "target-feature" (top level group)
    - "incompatible-feature"
      - "extension-requirement-disabled"
    - "extension-requires"
    - "incompatible-fpu"
    - "implied-fpu-unused"
    - "mandatory-feature-ignored"
    - "mandatory-feature-disabled"

"Negative" Backend Features
===========================

There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.

This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.

.arch_extension Directive
=========================

Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.

Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.

We can handle this in a few of ways:
- Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
- Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
- Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.

ARM Assembly Directives
=======================

As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.

'.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.

Current Command Line Option Examples
------------------------------------

Clang ARM
=========

Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.

$ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
(can't use dotprod with v8-a)

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(should be invalid but is allowed)

GCC ARM
=======

For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.

$ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
$ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o

$ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp

$ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(same example given for Clang above, should be invalid)

Clang AArch64
=============

The '+' syntax still applies but mfpu is replaced with '+' extensions.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o

Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o


It is a bit late to reply but can the options be specified independently of "-march". i.e.  -march=armv8-a -mcrypto -mnosimd etc. similar to "-msse", "-mavx" on x86.
This is for situations where certain packages e.g. media packages want to enable certain features based on runtime cpu detection.
To enable e.g. "crypto", they are also forced to choose a march, but that could override the architecture specified by the build system 
( or could get overridden by the -march specified by build system). e.g. it makes little sense for "-march=armv8-a+extension" to override the build system "-march=armv8.3-a"
 and vice-versa when the only desire is to enable the specific extension additively.

The additive alternative is to use "-Xclang -target-feature -Xclang +feature" which is pretty ugly.

Thanks

Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o

GCC AArch64
===========

For GCC AArch64 mfpu is also dropped in favour of '+' extensions.

$ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?

Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.

$ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
$ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
$ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
(should not be allowed)
$ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
(should not be allowed)

Current Assembly Directive Examples
-----------------------------------

Clang .arch/.arch_extension
===========================

AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.

In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.

(example is using AArch64)
.arch armv8.2-a # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

.arch armv8.2-a+dotprod # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

ARM uses the .arch_extension directive which is one extension per use, with no '+'.

.arch armv7-a #error: instruction requires: crc armv8
CRC32B r0, r1, r2

.arch armv8-a+crc #error: Unknown arch name
CRC32B r0, r1, r2

.arch armv8-a # no error
.arch_extension crc
CRC32B r0, r1, r2

You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.

ARM does check validity of extensions provided with '.arch_extension'.

.arch armv7-a
.arch_extension crc
CRC32B r0, r1, r2

main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
.arch_extension crc

AArch64 only rejects known extensions that aren't supported at all.

.arch armv8-a+pan # unsupported architectural extension: pan
nop

Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.

(example is using AArch64)
.arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
nop

GCC .arch/.arch_extension
=========================

GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.

.arch armv8-a+crc # invalid

.arch armv8-a # valid
.arch_extension crc

.arch_extension crc # valid
.arch_extension crc+crypto #invalid

For extensions that vary based on base architecture, GCC tracks the last known arch.

Clang .fpu
==========

.fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.

./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:1: error: unknown directive
.fpu neon
^

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:6: error: Unknown FPU name
.fpu clearly-not-valid
     ^

(same example as 'Clang ARM' command lines, should be invalid)
$ cat /tmp/test.s
.fpu neon-fp16
$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o

GCC .fpu
========

.fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.

This is correctly rejected from a command line:
$ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2

Whereas the directive is accepted:
$ cat /tmp/test.s
.fpu neon
nop
$ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o

For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.

$ cat /tmp/test.s
.fpu neon
$ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
/tmp/test.s: Assembler messages:
/tmp/test.s:1: Error: unknown pseudo-op: `.fpu'

$ cat /tmp/test.s
.arch_extension simd
$ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o

References
----------

Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev

Hi Manoj,


Not too late at all, we have not got to that point of the work yet.


Are there examples of this kind of build setup that are available publicly? I think I understand the problem but it'd help to see one in action. To see if there are any other Arm extensions that are already being added like this and whether those systems support GCC and how.


Thanks,

David Spickett.


From: Manoj Gupta <[hidden email]>
Sent: 10 April 2019 16:34:16
To: David Spickett
Cc: [hidden email]; [hidden email]; nd
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 


On Fri, Sep 21, 2018 at 3:06 AM David Spickett via llvm-dev <[hidden email]> wrote:

Hi,

Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.

I look forward to your feedback.

Thanks,
David Spickett.



RFC New Clang target feature selection options for ARM/AArch64
--------------------------------------------------------------

In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
- validate that given options make sense within architectural restrictions
- make option discovery and documentation easier
- unify the list of extensions that command lines and asm directives use
- bring the options closer to GCC's where appropriate

Current Options Comparison
--------------------------

                       | GCC           | Clang         |
                       |-------------------------------|
                       | ARM | AArch64 | ARM | AArch64 |
|----------------------|-----|---------|-----|---------|
| -march with '+<ext>' | Y   | Y       | Y   | Y       |
| checks extensions    | Y   | N       | N   | N       |
| .arch with '+<ext>'  | N   | N       | N   | Y       |
| .arch_extension      | Y   | Y       | Y   | N       |
| .fpu                 | Y   | N       | Y   | N       |
| -mfpu                | Y   | N       | Y   | N       |
| checks FPUs          | N   | n/a     | N   | n/a     |
|----------------------|-----|---------|-----|---------|

Examples of each of these can be found at the end of this document.

Problems With the Current Options
---------------------------------

- You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
- Combinations of options are not checked for compatibility.
- Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
- Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
- Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.

Proposed solution
------------------

ARM and AArch64:
- Make the TargetParser the single source for extension names, removing the AsmParser tables.
- Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
- Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
- Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
- Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

ARM:
- Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
(allow them to be recognised, they could still be rejected for compatibility).
- Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
- Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
- Reject invalid arch/cpu and extension combinations with an error diagnostic.

Optional features
-----------------

AArch64:
- add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.

ARM:
- Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

Options Comparison With the Proposed Solution
----------------------------------------------

Anything in brackets has changed from the previous table.

                       | GCC           | Clang             |
                       |-----------------------------------|
                       | ARM | AArch64 | ARM     | AArch64 |
|----------------------|-----|---------|---------|---------|
| -march with '+<ext>' | Y   | Y       | Y       | Y       |
| checks extensions    | Y   | N       | (Y)     | (Y)     |
| .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
| .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
| -mfpu                | Y   | N       | Y       | N       |
| .fpu                 | Y   | N       | Y       | N       |
| checks FPUs          | N   | n/a     | (Y)     | n/a     |
|----------------------|-----|---------|---------|---------|

Implementation
--------------

Use of Table-gen
================

The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
- more readable than the current macros
- allows default/optional values more easily
- we can generate code and documentation from the same source
- easier to add new properties

Drawbacks:
- it requires a new tablegen backend to generate the include files
- additional indirection which could make debugging and future changes more difficult

We think the benefits outweigh the disadvantages in this case.

To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
2. put the TargetParser for each backend in the library group for that backend. This requires one of:
    * Relaxing the requirement that target parsers must be built even if the backend is not.
    * Modifying the CMake scripts to build the target parsers even if the backend is not being built.

Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Using existing SubTarget features
=================================

If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.

We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.

Consider this AArch64 march:
-march=armv8.4-a+crypto+nosha2

The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

New Errors and Warnings
=======================

Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
(these apply to cmd lines and directives unless stated)

Errors:
- unknown extension in an assembly directive (currently fails silently)
- extension incompatible with base arch, message shows the base arch it requires.
- extension requires another which is disabled later, message shows which one is required.
- extension requires another which is not enabled, message shows requirements.
- ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Warnings:
- ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
- mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
- mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Proposed diagnostic names: (in the same order as above)
- "target-feature" (top level group)
    - "incompatible-feature"
      - "extension-requirement-disabled"
    - "extension-requires"
    - "incompatible-fpu"
    - "implied-fpu-unused"
    - "mandatory-feature-ignored"
    - "mandatory-feature-disabled"

"Negative" Backend Features
===========================

There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.

This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.

.arch_extension Directive
=========================

Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.

Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.

We can handle this in a few of ways:
- Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
- Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
- Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.

ARM Assembly Directives
=======================

As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.

'.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.

Current Command Line Option Examples
------------------------------------

Clang ARM
=========

Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.

$ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
(can't use dotprod with v8-a)

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(should be invalid but is allowed)

GCC ARM
=======

For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.

$ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
$ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o

$ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp

$ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(same example given for Clang above, should be invalid)

Clang AArch64
=============

The '+' syntax still applies but mfpu is replaced with '+' extensions.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o

Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o


It is a bit late to reply but can the options be specified independently of "-march". i.e.  -march=armv8-a -mcrypto -mnosimd etc. similar to "-msse", "-mavx" on x86.
This is for situations where certain packages e.g. media packages want to enable certain features based on runtime cpu detection.
To enable e.g. "crypto", they are also forced to choose a march, but that could override the architecture specified by the build system 
( or could get overridden by the -march specified by build system). e.g. it makes little sense for "-march=armv8-a+extension" to override the build system "-march=armv8.3-a"
 and vice-versa when the only desire is to enable the specific extension additively.

The additive alternative is to use "-Xclang -target-feature -Xclang +feature" which is pretty ugly.

Thanks

Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o

GCC AArch64
===========

For GCC AArch64 mfpu is also dropped in favour of '+' extensions.

$ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?

Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.

$ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
$ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
$ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
(should not be allowed)
$ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
(should not be allowed)

Current Assembly Directive Examples
-----------------------------------

Clang .arch/.arch_extension
===========================

AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.

In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.

(example is using AArch64)
.arch armv8.2-a # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

.arch armv8.2-a+dotprod # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

ARM uses the .arch_extension directive which is one extension per use, with no '+'.

.arch armv7-a #error: instruction requires: crc armv8
CRC32B r0, r1, r2

.arch armv8-a+crc #error: Unknown arch name
CRC32B r0, r1, r2

.arch armv8-a # no error
.arch_extension crc
CRC32B r0, r1, r2

You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.

ARM does check validity of extensions provided with '.arch_extension'.

.arch armv7-a
.arch_extension crc
CRC32B r0, r1, r2

main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
.arch_extension crc

AArch64 only rejects known extensions that aren't supported at all.

.arch armv8-a+pan # unsupported architectural extension: pan
nop

Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.

(example is using AArch64)
.arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
nop

GCC .arch/.arch_extension
=========================

GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.

.arch armv8-a+crc # invalid

.arch armv8-a # valid
.arch_extension crc

.arch_extension crc # valid
.arch_extension crc+crypto #invalid

For extensions that vary based on base architecture, GCC tracks the last known arch.

Clang .fpu
==========

.fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.

./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:1: error: unknown directive
.fpu neon
^

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:6: error: Unknown FPU name
.fpu clearly-not-valid
     ^

(same example as 'Clang ARM' command lines, should be invalid)
$ cat /tmp/test.s
.fpu neon-fp16
$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o

GCC .fpu
========

.fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.

This is correctly rejected from a command line:
$ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2

Whereas the directive is accepted:
$ cat /tmp/test.s
.fpu neon
nop
$ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o

For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.

$ cat /tmp/test.s
.fpu neon
$ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
/tmp/test.s: Assembler messages:
/tmp/test.s:1: Error: unknown pseudo-op: `.fpu'

$ cat /tmp/test.s
.arch_extension simd
$ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o

References
----------

Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev


On Wed, Apr 10, 2019 at 9:03 AM David Spickett <[hidden email]> wrote:

Hi Manoj,


Not too late at all, we have not got to that point of the work yet.


Are there examples of this kind of build setup that are available publicly? I think I understand the problem but it'd help to see one in action. To see if there are any other Arm extensions that are already being added like this and whether those systems support GCC and how.


One example where we had to use "-Xclang -target-feature ... " is here:

This had to be done because Chrome OS build system passes the exact "-march=value" depending on the ISA supported by Chromebook but crc32c wants crc+crypto for runtime cpu detection based code.

As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.

Thanks,
Manoj
 

Thanks,

David Spickett.


From: Manoj Gupta <[hidden email]>
Sent: 10 April 2019 16:34:16
To: David Spickett
Cc: [hidden email]; [hidden email]; nd
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 


On Fri, Sep 21, 2018 at 3:06 AM David Spickett via llvm-dev <[hidden email]> wrote:

Hi,

Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.

I look forward to your feedback.

Thanks,
David Spickett.



RFC New Clang target feature selection options for ARM/AArch64
--------------------------------------------------------------

In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
- validate that given options make sense within architectural restrictions
- make option discovery and documentation easier
- unify the list of extensions that command lines and asm directives use
- bring the options closer to GCC's where appropriate

Current Options Comparison
--------------------------

                       | GCC           | Clang         |
                       |-------------------------------|
                       | ARM | AArch64 | ARM | AArch64 |
|----------------------|-----|---------|-----|---------|
| -march with '+<ext>' | Y   | Y       | Y   | Y       |
| checks extensions    | Y   | N       | N   | N       |
| .arch with '+<ext>'  | N   | N       | N   | Y       |
| .arch_extension      | Y   | Y       | Y   | N       |
| .fpu                 | Y   | N       | Y   | N       |
| -mfpu                | Y   | N       | Y   | N       |
| checks FPUs          | N   | n/a     | N   | n/a     |
|----------------------|-----|---------|-----|---------|

Examples of each of these can be found at the end of this document.

Problems With the Current Options
---------------------------------

- You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
- Combinations of options are not checked for compatibility.
- Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
- Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
- Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.

Proposed solution
------------------

ARM and AArch64:
- Make the TargetParser the single source for extension names, removing the AsmParser tables.
- Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
- Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
- Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
- Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

ARM:
- Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
(allow them to be recognised, they could still be rejected for compatibility).
- Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
- Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
- Reject invalid arch/cpu and extension combinations with an error diagnostic.

Optional features
-----------------

AArch64:
- add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.

ARM:
- Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

Options Comparison With the Proposed Solution
----------------------------------------------

Anything in brackets has changed from the previous table.

                       | GCC           | Clang             |
                       |-----------------------------------|
                       | ARM | AArch64 | ARM     | AArch64 |
|----------------------|-----|---------|---------|---------|
| -march with '+<ext>' | Y   | Y       | Y       | Y       |
| checks extensions    | Y   | N       | (Y)     | (Y)     |
| .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
| .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
| -mfpu                | Y   | N       | Y       | N       |
| .fpu                 | Y   | N       | Y       | N       |
| checks FPUs          | N   | n/a     | (Y)     | n/a     |
|----------------------|-----|---------|---------|---------|

Implementation
--------------

Use of Table-gen
================

The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
- more readable than the current macros
- allows default/optional values more easily
- we can generate code and documentation from the same source
- easier to add new properties

Drawbacks:
- it requires a new tablegen backend to generate the include files
- additional indirection which could make debugging and future changes more difficult

We think the benefits outweigh the disadvantages in this case.

To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
2. put the TargetParser for each backend in the library group for that backend. This requires one of:
    * Relaxing the requirement that target parsers must be built even if the backend is not.
    * Modifying the CMake scripts to build the target parsers even if the backend is not being built.

Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Using existing SubTarget features
=================================

If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.

We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.

Consider this AArch64 march:
-march=armv8.4-a+crypto+nosha2

The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

New Errors and Warnings
=======================

Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
(these apply to cmd lines and directives unless stated)

Errors:
- unknown extension in an assembly directive (currently fails silently)
- extension incompatible with base arch, message shows the base arch it requires.
- extension requires another which is disabled later, message shows which one is required.
- extension requires another which is not enabled, message shows requirements.
- ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Warnings:
- ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
- mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
- mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Proposed diagnostic names: (in the same order as above)
- "target-feature" (top level group)
    - "incompatible-feature"
      - "extension-requirement-disabled"
    - "extension-requires"
    - "incompatible-fpu"
    - "implied-fpu-unused"
    - "mandatory-feature-ignored"
    - "mandatory-feature-disabled"

"Negative" Backend Features
===========================

There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.

This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.

.arch_extension Directive
=========================

Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.

Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.

We can handle this in a few of ways:
- Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
- Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
- Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.

ARM Assembly Directives
=======================

As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.

'.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.

Current Command Line Option Examples
------------------------------------

Clang ARM
=========

Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.

$ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
(can't use dotprod with v8-a)

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(should be invalid but is allowed)

GCC ARM
=======

For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.

$ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
$ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o

$ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp

$ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(same example given for Clang above, should be invalid)

Clang AArch64
=============

The '+' syntax still applies but mfpu is replaced with '+' extensions.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o

Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o


It is a bit late to reply but can the options be specified independently of "-march". i.e.  -march=armv8-a -mcrypto -mnosimd etc. similar to "-msse", "-mavx" on x86.
This is for situations where certain packages e.g. media packages want to enable certain features based on runtime cpu detection.
To enable e.g. "crypto", they are also forced to choose a march, but that could override the architecture specified by the build system 
( or could get overridden by the -march specified by build system). e.g. it makes little sense for "-march=armv8-a+extension" to override the build system "-march=armv8.3-a"
 and vice-versa when the only desire is to enable the specific extension additively.

The additive alternative is to use "-Xclang -target-feature -Xclang +feature" which is pretty ugly.

Thanks

Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o

GCC AArch64
===========

For GCC AArch64 mfpu is also dropped in favour of '+' extensions.

$ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?

Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.

$ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
$ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
$ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
(should not be allowed)
$ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
(should not be allowed)

Current Assembly Directive Examples
-----------------------------------

Clang .arch/.arch_extension
===========================

AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.

In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.

(example is using AArch64)
.arch armv8.2-a # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

.arch armv8.2-a+dotprod # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

ARM uses the .arch_extension directive which is one extension per use, with no '+'.

.arch armv7-a #error: instruction requires: crc armv8
CRC32B r0, r1, r2

.arch armv8-a+crc #error: Unknown arch name
CRC32B r0, r1, r2

.arch armv8-a # no error
.arch_extension crc
CRC32B r0, r1, r2

You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.

ARM does check validity of extensions provided with '.arch_extension'.

.arch armv7-a
.arch_extension crc
CRC32B r0, r1, r2

main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
.arch_extension crc

AArch64 only rejects known extensions that aren't supported at all.

.arch armv8-a+pan # unsupported architectural extension: pan
nop

Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.

(example is using AArch64)
.arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
nop

GCC .arch/.arch_extension
=========================

GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.

.arch armv8-a+crc # invalid

.arch armv8-a # valid
.arch_extension crc

.arch_extension crc # valid
.arch_extension crc+crypto #invalid

For extensions that vary based on base architecture, GCC tracks the last known arch.

Clang .fpu
==========

.fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.

./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:1: error: unknown directive
.fpu neon
^

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:6: error: Unknown FPU name
.fpu clearly-not-valid
     ^

(same example as 'Clang ARM' command lines, should be invalid)
$ cat /tmp/test.s
.fpu neon-fp16
$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o

GCC .fpu
========

.fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.

This is correctly rejected from a command line:
$ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2

Whereas the directive is accepted:
$ cat /tmp/test.s
.fpu neon
nop
$ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o

For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.

$ cat /tmp/test.s
.fpu neon
$ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
/tmp/test.s: Assembler messages:
/tmp/test.s:1: Error: unknown pseudo-op: `.fpu'

$ cat /tmp/test.s
.arch_extension simd
$ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o

References
----------

Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev

Hi Manoj,


I tried a few other options myself:

* function 'target' attribute - the list of extensions this supports isn't complete and it doesn't enable the ACLE macros needed for intrinsics

* manually defining ACLE macros - this allows intrinsics and is additive but assumes that you're not relying on codegen to emit instructions. I don't think it helps the bug linked from the GN source either. (https://crbug.com/934016)


So what you have is the best we can do right now.


Looking forward the problem I have with a '-mcrypto' is two fold:

* What about other optional extensions? We'd need to add one for each, or at least every one people ask for and ideally get that into GCC too.

* crypto specifically can mean different things depending on the base architecture. From Clang's point of view that's fine as it just adds a target feature 'crypto'. However later we might want to allow people to select which set of crypto extensions is added and we hit the same issue. (maybe you'd go with -mcrypto as an 'auto' and -mcrypto8.4-a etc, which is also ugly)


The other option would be to allow -march without a base architecture. E.g.

-march=armv8-a+crc -march=+crypto


Or have them combine into some common set, which breaks existing behaviour:

-march=armv8-a+crc -march=armv8.4-a+dsp -> -march=amv8-a+dsp+crc


Which gets into a lot of issues around how you choose the set of features. Smallest subset to target the minimal core, or largest to allow CPU detection code to compile?


So allowing march=+<ext> is the one that won't break existing builds but would be Clang only for now. I don't know enough to say whether the other Architectures would be able to support that.


My instinct is that this is something the build system needs to handle since it presumably has to support GCC as well. I understand that still leaves you specifying a base architecture per file, when you'd rather pull that from the main march.

(plus you're putting arch specific special cases in your build config, which isn't great either)


>>> As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.


Is this not deterministic? I would assume either Chrome OS always wins or crc32c always wins. Tell me if I'm wrong.


Thanks,

David Spickett.



From: Manoj Gupta <[hidden email]>
Sent: 10 April 2019 17:15:00
To: David Spickett
Cc: [hidden email]; [hidden email]; nd; Peter Smith
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 


On Wed, Apr 10, 2019 at 9:03 AM David Spickett <[hidden email]> wrote:

Hi Manoj,


Not too late at all, we have not got to that point of the work yet.


Are there examples of this kind of build setup that are available publicly? I think I understand the problem but it'd help to see one in action. To see if there are any other Arm extensions that are already being added like this and whether those systems support GCC and how.


One example where we had to use "-Xclang -target-feature ... " is here:
<a href="https://chromium.googlesource.com/chromium/src/&#43;/refs/heads/master/third_party/crc32c/BUILD.gn#120">https://chromium.googlesource.com/chromium/src/+/refs/heads/master/third_party/crc32c/BUILD.gn#120

This had to be done because Chrome OS build system passes the exact "-march=value" depending on the ISA supported by Chromebook but crc32c wants crc+crypto for runtime cpu detection based code.

As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.

Thanks,
Manoj
 

Thanks,

David Spickett.


From: Manoj Gupta <[hidden email]>
Sent: 10 April 2019 16:34:16
To: David Spickett
Cc: [hidden email]; [hidden email]; nd
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 


On Fri, Sep 21, 2018 at 3:06 AM David Spickett via llvm-dev <[hidden email]> wrote:

Hi,

Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.

I look forward to your feedback.

Thanks,
David Spickett.



RFC New Clang target feature selection options for ARM/AArch64
--------------------------------------------------------------

In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
- validate that given options make sense within architectural restrictions
- make option discovery and documentation easier
- unify the list of extensions that command lines and asm directives use
- bring the options closer to GCC's where appropriate

Current Options Comparison
--------------------------

                       | GCC           | Clang         |
                       |-------------------------------|
                       | ARM | AArch64 | ARM | AArch64 |
|----------------------|-----|---------|-----|---------|
| -march with '+<ext>' | Y   | Y       | Y   | Y       |
| checks extensions    | Y   | N       | N   | N       |
| .arch with '+<ext>'  | N   | N       | N   | Y       |
| .arch_extension      | Y   | Y       | Y   | N       |
| .fpu                 | Y   | N       | Y   | N       |
| -mfpu                | Y   | N       | Y   | N       |
| checks FPUs          | N   | n/a     | N   | n/a     |
|----------------------|-----|---------|-----|---------|

Examples of each of these can be found at the end of this document.

Problems With the Current Options
---------------------------------

- You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
- Combinations of options are not checked for compatibility.
- Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
- Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
- Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.

Proposed solution
------------------

ARM and AArch64:
- Make the TargetParser the single source for extension names, removing the AsmParser tables.
- Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
- Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
- Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
- Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

ARM:
- Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
(allow them to be recognised, they could still be rejected for compatibility).
- Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
- Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
- Reject invalid arch/cpu and extension combinations with an error diagnostic.

Optional features
-----------------

AArch64:
- add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.

ARM:
- Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

Options Comparison With the Proposed Solution
----------------------------------------------

Anything in brackets has changed from the previous table.

                       | GCC           | Clang             |
                       |-----------------------------------|
                       | ARM | AArch64 | ARM     | AArch64 |
|----------------------|-----|---------|---------|---------|
| -march with '+<ext>' | Y   | Y       | Y       | Y       |
| checks extensions    | Y   | N       | (Y)     | (Y)     |
| .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
| .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
| -mfpu                | Y   | N       | Y       | N       |
| .fpu                 | Y   | N       | Y       | N       |
| checks FPUs          | N   | n/a     | (Y)     | n/a     |
|----------------------|-----|---------|---------|---------|

Implementation
--------------

Use of Table-gen
================

The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
- more readable than the current macros
- allows default/optional values more easily
- we can generate code and documentation from the same source
- easier to add new properties

Drawbacks:
- it requires a new tablegen backend to generate the include files
- additional indirection which could make debugging and future changes more difficult

We think the benefits outweigh the disadvantages in this case.

To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
2. put the TargetParser for each backend in the library group for that backend. This requires one of:
    * Relaxing the requirement that target parsers must be built even if the backend is not.
    * Modifying the CMake scripts to build the target parsers even if the backend is not being built.

Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Using existing SubTarget features
=================================

If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.

We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.

Consider this AArch64 march:
-march=armv8.4-a+crypto+nosha2

The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

New Errors and Warnings
=======================

Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
(these apply to cmd lines and directives unless stated)

Errors:
- unknown extension in an assembly directive (currently fails silently)
- extension incompatible with base arch, message shows the base arch it requires.
- extension requires another which is disabled later, message shows which one is required.
- extension requires another which is not enabled, message shows requirements.
- ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Warnings:
- ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
- mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
- mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Proposed diagnostic names: (in the same order as above)
- "target-feature" (top level group)
    - "incompatible-feature"
      - "extension-requirement-disabled"
    - "extension-requires"
    - "incompatible-fpu"
    - "implied-fpu-unused"
    - "mandatory-feature-ignored"
    - "mandatory-feature-disabled"

"Negative" Backend Features
===========================

There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.

This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.

.arch_extension Directive
=========================

Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.

Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.

We can handle this in a few of ways:
- Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
- Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
- Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.

ARM Assembly Directives
=======================

As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.

'.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.

Current Command Line Option Examples
------------------------------------

Clang ARM
=========

Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.

$ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
(can't use dotprod with v8-a)

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(should be invalid but is allowed)

GCC ARM
=======

For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.

$ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
$ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o

$ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp

$ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(same example given for Clang above, should be invalid)

Clang AArch64
=============

The '+' syntax still applies but mfpu is replaced with '+' extensions.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o

Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o


It is a bit late to reply but can the options be specified independently of "-march". i.e.  -march=armv8-a -mcrypto -mnosimd etc. similar to "-msse", "-mavx" on x86.
This is for situations where certain packages e.g. media packages want to enable certain features based on runtime cpu detection.
To enable e.g. "crypto", they are also forced to choose a march, but that could override the architecture specified by the build system 
( or could get overridden by the -march specified by build system). e.g. it makes little sense for "-march=armv8-a+extension" to override the build system "-march=armv8.3-a"
 and vice-versa when the only desire is to enable the specific extension additively.

The additive alternative is to use "-Xclang -target-feature -Xclang +feature" which is pretty ugly.

Thanks

Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o

GCC AArch64
===========

For GCC AArch64 mfpu is also dropped in favour of '+' extensions.

$ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?

Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.

$ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
$ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
$ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
(should not be allowed)
$ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
(should not be allowed)

Current Assembly Directive Examples
-----------------------------------

Clang .arch/.arch_extension
===========================

AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.

In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.

(example is using AArch64)
.arch armv8.2-a # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

.arch armv8.2-a+dotprod # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

ARM uses the .arch_extension directive which is one extension per use, with no '+'.

.arch armv7-a #error: instruction requires: crc armv8
CRC32B r0, r1, r2

.arch armv8-a+crc #error: Unknown arch name
CRC32B r0, r1, r2

.arch armv8-a # no error
.arch_extension crc
CRC32B r0, r1, r2

You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.

ARM does check validity of extensions provided with '.arch_extension'.

.arch armv7-a
.arch_extension crc
CRC32B r0, r1, r2

main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
.arch_extension crc

AArch64 only rejects known extensions that aren't supported at all.

.arch armv8-a+pan # unsupported architectural extension: pan
nop

Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.

(example is using AArch64)
.arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
nop

GCC .arch/.arch_extension
=========================

GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.

.arch armv8-a+crc # invalid

.arch armv8-a # valid
.arch_extension crc

.arch_extension crc # valid
.arch_extension crc+crypto #invalid

For extensions that vary based on base architecture, GCC tracks the last known arch.

Clang .fpu
==========

.fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.

./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:1: error: unknown directive
.fpu neon
^

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:6: error: Unknown FPU name
.fpu clearly-not-valid
     ^

(same example as 'Clang ARM' command lines, should be invalid)
$ cat /tmp/test.s
.fpu neon-fp16
$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o

GCC .fpu
========

.fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.

This is correctly rejected from a command line:
$ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2

Whereas the directive is accepted:
$ cat /tmp/test.s
.fpu neon
nop
$ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o

For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.

$ cat /tmp/test.s
.fpu neon
$ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
/tmp/test.s: Assembler messages:
/tmp/test.s:1: Error: unknown pseudo-op: `.fpu'

$ cat /tmp/test.s
.arch_extension simd
$ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o

References
----------

Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev
Hi David,

Thanks for the detailed information.

On Tue, Apr 16, 2019 at 2:58 AM David Spickett <[hidden email]> wrote:

Hi Manoj,


I tried a few other options myself:

* function 'target' attribute - the list of extensions this supports isn't complete and it doesn't enable the ACLE macros needed for intrinsics

* manually defining ACLE macros - this allows intrinsics and is additive but assumes that you're not relying on codegen to emit instructions. I don't think it helps the bug linked from the GN source either. (https://crbug.com/934016)


So what you have is the best we can do right now.


Looking forward the problem I have with a '-mcrypto' is two fold:

* What about other optional extensions? We'd need to add one for each, or at least every one people ask for and ideally get that into GCC too.

Yes, ideally we want this per extension. crypto is where we have the problem now but it can/will change in future.
 

* crypto specifically can mean different things depending on the base architecture. From Clang's point of view that's fine as it just adds a target feature 'crypto'. However later we might want to allow people to select which set of crypto extensions is added and we hit the same issue. (maybe you'd go with -mcrypto as an 'auto' and -mcrypto8.4-a etc, which is also ugly)


The other option would be to allow -march without a base architecture. E.g.

-march=armv8-a+crc -march=+crypto


Specifying -march without a base architecture is fine and probably would work for our problem but I feel it is not too different than specifying each feature individually. However, if it is too cumbersome to specify each feature independently like "-mcrypto"/"-mdsp" , my vote is for this.
 

Or have them combine into some common set, which breaks existing behaviour:

-march=armv8-a+crc -march=armv8.4-a+dsp -> -march=amv8-a+dsp+crc


Which gets into a lot of issues around how you choose the set of features. Smallest subset to target the minimal core, or largest to allow CPU detection code to compile?

I suspect that most code would be written to have the lowest "arch"+feature since it is only the feature that the code was written to handle with runtime detection. Do anymore, and the program could crash because features that the CPU didn't support got enabled.


So allowing march=+<ext> is the one that won't break existing builds but would be Clang only for now. I don't know enough to say whether the other Architectures would be able to support that.


My instinct is that this is something the build system needs to handle since it presumably has to support GCC as well. I understand that still leaves you specifying a base architecture per file, when you'd rather pull that from the main march.

(plus you're putting arch specific special cases in your build config, which isn't great either)

I believe we'll have the same problem with GCC as well, if we were using GCC. 


>>> As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.


Is this not deterministic? I would assume either Chrome OS always wins or crc32c always wins. Tell me if I'm wrong.

 

Right now, it is the Chrome OS flags that win but then it means crypto feature is disabled for the files resulting in a compile error. It could also be other way round as well depending on the build system. Either way, it would be best to avoid this problem.

Thanks,
Manoj

Thanks,

David Spickett.



From: Manoj Gupta <[hidden email]>
Sent: 10 April 2019 17:15:00
To: David Spickett
Cc: [hidden email]; [hidden email]; nd; Peter Smith
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 


On Wed, Apr 10, 2019 at 9:03 AM David Spickett <[hidden email]> wrote:

Hi Manoj,


Not too late at all, we have not got to that point of the work yet.


Are there examples of this kind of build setup that are available publicly? I think I understand the problem but it'd help to see one in action. To see if there are any other Arm extensions that are already being added like this and whether those systems support GCC and how.


One example where we had to use "-Xclang -target-feature ... " is here:

This had to be done because Chrome OS build system passes the exact "-march=value" depending on the ISA supported by Chromebook but crc32c wants crc+crypto for runtime cpu detection based code.

As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.

Thanks,
Manoj
 

Thanks,

David Spickett.


From: Manoj Gupta <[hidden email]>
Sent: 10 April 2019 16:34:16
To: David Spickett
Cc: [hidden email]; [hidden email]; nd
Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
 


On Fri, Sep 21, 2018 at 3:06 AM David Spickett via llvm-dev <[hidden email]> wrote:

Hi,

Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.

To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.

I look forward to your feedback.

Thanks,
David Spickett.



RFC New Clang target feature selection options for ARM/AArch64
--------------------------------------------------------------

In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
- validate that given options make sense within architectural restrictions
- make option discovery and documentation easier
- unify the list of extensions that command lines and asm directives use
- bring the options closer to GCC's where appropriate

Current Options Comparison
--------------------------

                       | GCC           | Clang         |
                       |-------------------------------|
                       | ARM | AArch64 | ARM | AArch64 |
|----------------------|-----|---------|-----|---------|
| -march with '+<ext>' | Y   | Y       | Y   | Y       |
| checks extensions    | Y   | N       | N   | N       |
| .arch with '+<ext>'  | N   | N       | N   | Y       |
| .arch_extension      | Y   | Y       | Y   | N       |
| .fpu                 | Y   | N       | Y   | N       |
| -mfpu                | Y   | N       | Y   | N       |
| checks FPUs          | N   | n/a     | N   | n/a     |
|----------------------|-----|---------|-----|---------|

Examples of each of these can be found at the end of this document.

Problems With the Current Options
---------------------------------

- You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
- Combinations of options are not checked for compatibility.
- Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
- Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
- Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.

Proposed solution
------------------

ARM and AArch64:
- Make the TargetParser the single source for extension names, removing the AsmParser tables.
- Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
- Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
- Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
- Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
- Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.

ARM:
- Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
(allow them to be recognised, they could still be rejected for compatibility).
- Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
- Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
- Reject invalid arch/cpu and extension combinations with an error diagnostic.

Optional features
-----------------

AArch64:
- add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.

ARM:
- Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.

Options Comparison With the Proposed Solution
----------------------------------------------

Anything in brackets has changed from the previous table.

                       | GCC           | Clang             |
                       |-----------------------------------|
                       | ARM | AArch64 | ARM     | AArch64 |
|----------------------|-----|---------|---------|---------|
| -march with '+<ext>' | Y   | Y       | Y       | Y       |
| checks extensions    | Y   | N       | (Y)     | (Y)     |
| .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
| .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
| -mfpu                | Y   | N       | Y       | N       |
| .fpu                 | Y   | N       | Y       | N       |
| checks FPUs          | N   | n/a     | (Y)     | n/a     |
|----------------------|-----|---------|---------|---------|

Implementation
--------------

Use of Table-gen
================

The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
- more readable than the current macros
- allows default/optional values more easily
- we can generate code and documentation from the same source
- easier to add new properties

Drawbacks:
- it requires a new tablegen backend to generate the include files
- additional indirection which could make debugging and future changes more difficult

We think the benefits outweigh the disadvantages in this case.

To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
2. put the TargetParser for each backend in the library group for that backend. This requires one of:
    * Relaxing the requirement that target parsers must be built even if the backend is not.
    * Modifying the CMake scripts to build the target parsers even if the backend is not being built.

Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.

Using existing SubTarget features
=================================

If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.

We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.

Consider this AArch64 march:
-march=armv8.4-a+crypto+nosha2

The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.

New Errors and Warnings
=======================

Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
(these apply to cmd lines and directives unless stated)

Errors:
- unknown extension in an assembly directive (currently fails silently)
- extension incompatible with base arch, message shows the base arch it requires.
- extension requires another which is disabled later, message shows which one is required.
- extension requires another which is not enabled, message shows requirements.
- ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.

Warnings:
- ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
- mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
- mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)

Proposed diagnostic names: (in the same order as above)
- "target-feature" (top level group)
    - "incompatible-feature"
      - "extension-requirement-disabled"
    - "extension-requires"
    - "incompatible-fpu"
    - "implied-fpu-unused"
    - "mandatory-feature-ignored"
    - "mandatory-feature-disabled"

"Negative" Backend Features
===========================

There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.

This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.

.arch_extension Directive
=========================

Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.

Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.

We can handle this in a few of ways:
- Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
- Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
- Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.

The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.

ARM Assembly Directives
=======================

As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.

'.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.

Current Command Line Option Examples
------------------------------------

Clang ARM
=========

Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.

$ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
(can't use dotprod with v8-a)

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(should be invalid but is allowed)

GCC ARM
=======

For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.

$ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
$ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o

$ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp

$ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
(same example given for Clang above, should be invalid)

Clang AArch64
=============

The '+' syntax still applies but mfpu is replaced with '+' extensions.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
$ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o

Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o


It is a bit late to reply but can the options be specified independently of "-march". i.e.  -march=armv8-a -mcrypto -mnosimd etc. similar to "-msse", "-mavx" on x86.
This is for situations where certain packages e.g. media packages want to enable certain features based on runtime cpu detection.
To enable e.g. "crypto", they are also forced to choose a march, but that could override the architecture specified by the build system 
( or could get overridden by the -march specified by build system). e.g. it makes little sense for "-march=armv8-a+extension" to override the build system "-march=armv8.3-a"
 and vice-versa when the only desire is to enable the specific extension additively.

The additive alternative is to use "-Xclang -target-feature -Xclang +feature" which is pretty ugly.

Thanks

Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.

$ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o

GCC AArch64
===========

For GCC AArch64 mfpu is also dropped in favour of '+' extensions.

$ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?

Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.

$ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
$ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
$ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
(should not be allowed)
$ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
(should not be allowed)

Current Assembly Directive Examples
-----------------------------------

Clang .arch/.arch_extension
===========================

AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.

In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.

(example is using AArch64)
.arch armv8.2-a # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

.arch armv8.2-a+dotprod # error: instruction requires: dotprod
udot v0.2s, v1.8b, v2.8b

ARM uses the .arch_extension directive which is one extension per use, with no '+'.

.arch armv7-a #error: instruction requires: crc armv8
CRC32B r0, r1, r2

.arch armv8-a+crc #error: Unknown arch name
CRC32B r0, r1, r2

.arch armv8-a # no error
.arch_extension crc
CRC32B r0, r1, r2

You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.

ARM does check validity of extensions provided with '.arch_extension'.

.arch armv7-a
.arch_extension crc
CRC32B r0, r1, r2

main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
.arch_extension crc

AArch64 only rejects known extensions that aren't supported at all.

.arch armv8-a+pan # unsupported architectural extension: pan
nop

Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.

(example is using AArch64)
.arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
nop

GCC .arch/.arch_extension
=========================

GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.

.arch armv8-a+crc # invalid

.arch armv8-a # valid
.arch_extension crc

.arch_extension crc # valid
.arch_extension crc+crypto #invalid

For extensions that vary based on base architecture, GCC tracks the last known arch.

Clang .fpu
==========

.fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.

./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:1: error: unknown directive
.fpu neon
^

$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
/tmp/test.s:1:6: error: Unknown FPU name
.fpu clearly-not-valid
     ^

(same example as 'Clang ARM' command lines, should be invalid)
$ cat /tmp/test.s
.fpu neon-fp16
$ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o

GCC .fpu
========

.fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.

This is correctly rejected from a command line:
$ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2

Whereas the directive is accepted:
$ cat /tmp/test.s
.fpu neon
nop
$ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o

For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.

$ cat /tmp/test.s
.fpu neon
$ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
/tmp/test.s: Assembler messages:
/tmp/test.s:1: Error: unknown pseudo-op: `.fpu'

$ cat /tmp/test.s
.arch_extension simd
$ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o

References
----------

Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev
On Tue, 16 Apr 2019 at 20:35, Manoj Gupta via llvm-dev
<[hidden email]> wrote:

>
> Hi David,
>
> Thanks for the detailed information.
>
> On Tue, Apr 16, 2019 at 2:58 AM David Spickett <[hidden email]> wrote:
>>
>> Hi Manoj,
>>
>>
>> I tried a few other options myself:
>>
>> * function 'target' attribute - the list of extensions this supports isn't complete and it doesn't enable the ACLE macros needed for intrinsics
>>
>> * manually defining ACLE macros - this allows intrinsics and is additive but assumes that you're not relying on codegen to emit instructions. I don't think it helps the bug linked from the GN source either. (https://crbug.com/934016)
>>
>>
>> So what you have is the best we can do right now.
>>
>>
>> Looking forward the problem I have with a '-mcrypto' is two fold:
>>
>> * What about other optional extensions? We'd need to add one for each, or at least every one people ask for and ideally get that into GCC too.
>
> Yes, ideally we want this per extension. crypto is where we have the problem now but it can/will change in future.
>
>>
>> * crypto specifically can mean different things depending on the base architecture. From Clang's point of view that's fine as it just adds a target feature 'crypto'. However later we might want to allow people to select which set of crypto extensions is added and we hit the same issue. (maybe you'd go with -mcrypto as an 'auto' and -mcrypto8.4-a etc, which is also ugly)
>>
>>
>> The other option would be to allow -march without a base architecture. E.g.
>>
>> -march=armv8-a+crc -march=+crypto
>>
>>
> Specifying -march without a base architecture is fine and probably would work for our problem but I feel it is not too different than specifying each feature individually. However, if it is too cumbersome to specify each feature independently like "-mcrypto"/"-mdsp" , my vote is for this.
>
>>
>> Or have them combine into some common set, which breaks existing behaviour:
>>
>> -march=armv8-a+crc -march=armv8.4-a+dsp -> -march=amv8-a+dsp+crc
>>
>>
>> Which gets into a lot of issues around how you choose the set of features. Smallest subset to target the minimal core, or largest to allow CPU detection code to compile?
>
> I suspect that most code would be written to have the lowest "arch"+feature since it is only the feature that the code was written to handle with runtime detection. Do anymore, and the program could crash because features that the CPU didn't support got enabled.
>>
>>
>> So allowing march=+<ext> is the one that won't break existing builds but would be Clang only for now. I don't know enough to say whether the other Architectures would be able to support that.
>>
>>
>> My instinct is that this is something the build system needs to handle since it presumably has to support GCC as well. I understand that still leaves you specifying a base architecture per file, when you'd rather pull that from the main march.
>>
>> (plus you're putting arch specific special cases in your build config, which isn't great either)
>
> I believe we'll have the same problem with GCC as well, if we were using GCC.
>>
>>
>> >>> As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.
>>
>>
>> Is this not deterministic? I would assume either Chrome OS always wins or crc32c always wins. Tell me if I'm wrong.
>>
>>
>
> Right now, it is the Chrome OS flags that win but then it means crypto feature is disabled for the files resulting in a compile error. It could also be other way round as well depending on the build system. Either way, it would be best to avoid this problem.
>

Just a thought; would it be possible to use an assembly directive like
".arch armv8-a+crypto" in the source files that need the crypto
extenstions, these would override the command line options?

Peter

> Thanks,
> Manoj
>
>> Thanks,
>>
>> David Spickett.
>>
>>
>> ________________________________
>> From: Manoj Gupta <[hidden email]>
>> Sent: 10 April 2019 17:15:00
>> To: David Spickett
>> Cc: [hidden email]; [hidden email]; nd; Peter Smith
>> Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
>>
>>
>>
>> On Wed, Apr 10, 2019 at 9:03 AM David Spickett <[hidden email]> wrote:
>>
>> Hi Manoj,
>>
>>
>> Not too late at all, we have not got to that point of the work yet.
>>
>>
>> Are there examples of this kind of build setup that are available publicly? I think I understand the problem but it'd help to see one in action. To see if there are any other Arm extensions that are already being added like this and whether those systems support GCC and how.
>>
>>
>> One example where we had to use "-Xclang -target-feature ... " is here:
>> https://chromium.googlesource.com/chromium/src/+/refs/heads/master/third_party/crc32c/BUILD.gn#120
>>
>> This had to be done because Chrome OS build system passes the exact "-march=value" depending on the ISA supported by Chromebook but crc32c wants crc+crypto for runtime cpu detection based code.
>>
>> As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.
>>
>> Thanks,
>> Manoj
>>
>>
>> Thanks,
>>
>> David Spickett.
>>
>> ________________________________
>> From: Manoj Gupta <[hidden email]>
>> Sent: 10 April 2019 16:34:16
>> To: David Spickett
>> Cc: [hidden email]; [hidden email]; nd
>> Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
>>
>>
>>
>> On Fri, Sep 21, 2018 at 3:06 AM David Spickett via llvm-dev <[hidden email]> wrote:
>>
>> Hi,
>>
>> Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.
>>
>> To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.
>>
>> I look forward to your feedback.
>>
>> Thanks,
>> David Spickett.
>>
>>
>>
>> RFC New Clang target feature selection options for ARM/AArch64
>> --------------------------------------------------------------
>>
>> In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
>> - validate that given options make sense within architectural restrictions
>> - make option discovery and documentation easier
>> - unify the list of extensions that command lines and asm directives use
>> - bring the options closer to GCC's where appropriate
>>
>> Current Options Comparison
>> --------------------------
>>
>>                        | GCC           | Clang         |
>>                        |-------------------------------|
>>                        | ARM | AArch64 | ARM | AArch64 |
>> |----------------------|-----|---------|-----|---------|
>> | -march with '+<ext>' | Y   | Y       | Y   | Y       |
>> | checks extensions    | Y   | N       | N   | N       |
>> | .arch with '+<ext>'  | N   | N       | N   | Y       |
>> | .arch_extension      | Y   | Y       | Y   | N       |
>> | .fpu                 | Y   | N       | Y   | N       |
>> | -mfpu                | Y   | N       | Y   | N       |
>> | checks FPUs          | N   | n/a     | N   | n/a     |
>> |----------------------|-----|---------|-----|---------|
>>
>> Examples of each of these can be found at the end of this document.
>>
>> Problems With the Current Options
>> ---------------------------------
>>
>> - You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
>> - Combinations of options are not checked for compatibility.
>> - Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
>> - Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
>> - Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.
>>
>> Proposed solution
>> ------------------
>>
>> ARM and AArch64:
>> - Make the TargetParser the single source for extension names, removing the AsmParser tables.
>> - Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
>> - Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
>> - Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
>> - Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
>> - Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.
>>
>> ARM:
>> - Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
>> (allow them to be recognised, they could still be rejected for compatibility).
>> - Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
>> - Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
>> - Reject invalid arch/cpu and extension combinations with an error diagnostic.
>>
>> Optional features
>> -----------------
>>
>> AArch64:
>> - add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.
>>
>> ARM:
>> - Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.
>>
>> Options Comparison With the Proposed Solution
>> ----------------------------------------------
>>
>> Anything in brackets has changed from the previous table.
>>
>>                        | GCC           | Clang             |
>>                        |-----------------------------------|
>>                        | ARM | AArch64 | ARM     | AArch64 |
>> |----------------------|-----|---------|---------|---------|
>> | -march with '+<ext>' | Y   | Y       | Y       | Y       |
>> | checks extensions    | Y   | N       | (Y)     | (Y)     |
>> | .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
>> | .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
>> | -mfpu                | Y   | N       | Y       | N       |
>> | .fpu                 | Y   | N       | Y       | N       |
>> | checks FPUs          | N   | n/a     | (Y)     | n/a     |
>> |----------------------|-----|---------|---------|---------|
>>
>> Implementation
>> --------------
>>
>> Use of Table-gen
>> ================
>>
>> The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
>> - more readable than the current macros
>> - allows default/optional values more easily
>> - we can generate code and documentation from the same source
>> - easier to add new properties
>>
>> Drawbacks:
>> - it requires a new tablegen backend to generate the include files
>> - additional indirection which could make debugging and future changes more difficult
>>
>> We think the benefits outweigh the disadvantages in this case.
>>
>> To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
>> 1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
>> 2. put the TargetParser for each backend in the library group for that backend. This requires one of:
>>     * Relaxing the requirement that target parsers must be built even if the backend is not.
>>     * Modifying the CMake scripts to build the target parsers even if the backend is not being built.
>>
>> Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.
>>
>> Using existing SubTarget features
>> =================================
>>
>> If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.
>>
>> We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.
>>
>> Consider this AArch64 march:
>> -march=armv8.4-a+crypto+nosha2
>>
>> The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.
>>
>> New Errors and Warnings
>> =======================
>>
>> Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
>> (these apply to cmd lines and directives unless stated)
>>
>> Errors:
>> - unknown extension in an assembly directive (currently fails silently)
>> - extension incompatible with base arch, message shows the base arch it requires.
>> - extension requires another which is disabled later, message shows which one is required.
>> - extension requires another which is not enabled, message shows requirements.
>> - ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.
>>
>> Warnings:
>> - ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
>> - mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
>> - mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)
>>
>> Proposed diagnostic names: (in the same order as above)
>> - "target-feature" (top level group)
>>     - "incompatible-feature"
>>       - "extension-requirement-disabled"
>>     - "extension-requires"
>>     - "incompatible-fpu"
>>     - "implied-fpu-unused"
>>     - "mandatory-feature-ignored"
>>     - "mandatory-feature-disabled"
>>
>> "Negative" Backend Features
>> ===========================
>>
>> There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
>> It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.
>>
>> This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.
>>
>> .arch_extension Directive
>> =========================
>>
>> Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.
>>
>> Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.
>>
>> We can handle this in a few of ways:
>> - Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
>> - Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
>> - Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.
>>
>> The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.
>>
>> ARM Assembly Directives
>> =======================
>>
>> As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.
>>
>> '.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.
>>
>> Current Command Line Option Examples
>> ------------------------------------
>>
>> Clang ARM
>> =========
>>
>> Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.
>>
>> $ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
>> $ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
>> (can't use dotprod with v8-a)
>>
>> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
>> (should be invalid but is allowed)
>>
>> GCC ARM
>> =======
>>
>> For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.
>>
>> $ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
>> $ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o
>>
>> $ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
>> arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
>> arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp
>>
>> $ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
>> (same example given for Clang above, should be invalid)
>>
>> Clang AArch64
>> =============
>>
>> The '+' syntax still applies but mfpu is replaced with '+' extensions.
>>
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
>> clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o
>>
>> Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.
>>
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
>>
>>
>> It is a bit late to reply but can the options be specified independently of "-march". i.e.  -march=armv8-a -mcrypto -mnosimd etc. similar to "-msse", "-mavx" on x86.
>> This is for situations where certain packages e.g. media packages want to enable certain features based on runtime cpu detection.
>> To enable e.g. "crypto", they are also forced to choose a march, but that could override the architecture specified by the build system
>> ( or could get overridden by the -march specified by build system). e.g. it makes little sense for "-march=armv8-a+extension" to override the build system "-march=armv8.3-a"
>>  and vice-versa when the only desire is to enable the specific extension additively.
>>
>> The additive alternative is to use "-Xclang -target-feature -Xclang +feature" which is pretty ugly.
>>
>> Thanks
>>
>> Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.
>>
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
>>
>> GCC AArch64
>> ===========
>>
>> For GCC AArch64 mfpu is also dropped in favour of '+' extensions.
>>
>> $ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
>> aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?
>>
>> Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.
>>
>> $ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
>> cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
>> $ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
>> $ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
>> (should not be allowed)
>> $ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
>> (should not be allowed)
>>
>> Current Assembly Directive Examples
>> -----------------------------------
>>
>> Clang .arch/.arch_extension
>> ===========================
>>
>> AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.
>>
>> In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.
>>
>> (example is using AArch64)
>> .arch armv8.2-a # error: instruction requires: dotprod
>> udot v0.2s, v1.8b, v2.8b
>>
>> .arch armv8.2-a+dotprod # error: instruction requires: dotprod
>> udot v0.2s, v1.8b, v2.8b
>>
>> ARM uses the .arch_extension directive which is one extension per use, with no '+'.
>>
>> .arch armv7-a #error: instruction requires: crc armv8
>> CRC32B r0, r1, r2
>>
>> .arch armv8-a+crc #error: Unknown arch name
>> CRC32B r0, r1, r2
>>
>> .arch armv8-a # no error
>> .arch_extension crc
>> CRC32B r0, r1, r2
>>
>> You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.
>>
>> ARM does check validity of extensions provided with '.arch_extension'.
>>
>> .arch armv7-a
>> .arch_extension crc
>> CRC32B r0, r1, r2
>>
>> main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
>> .arch_extension crc
>>
>> AArch64 only rejects known extensions that aren't supported at all.
>>
>> .arch armv8-a+pan # unsupported architectural extension: pan
>> nop
>>
>> Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.
>>
>> (example is using AArch64)
>> .arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
>> nop
>>
>> GCC .arch/.arch_extension
>> =========================
>>
>> GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.
>>
>> .arch armv8-a+crc # invalid
>>
>> .arch armv8-a # valid
>> .arch_extension crc
>>
>> .arch_extension crc # valid
>> .arch_extension crc+crypto #invalid
>>
>> For extensions that vary based on base architecture, GCC tracks the last known arch.
>>
>> Clang .fpu
>> ==========
>>
>> .fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.
>>
>> ./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
>> /tmp/test.s:1:1: error: unknown directive
>> .fpu neon
>> ^
>>
>> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
>> /tmp/test.s:1:6: error: Unknown FPU name
>> .fpu clearly-not-valid
>>      ^
>>
>> (same example as 'Clang ARM' command lines, should be invalid)
>> $ cat /tmp/test.s
>> .fpu neon-fp16
>> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
>>
>> GCC .fpu
>> ========
>>
>> .fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.
>>
>> This is correctly rejected from a command line:
>> $ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
>> arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
>> arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2
>>
>> Whereas the directive is accepted:
>> $ cat /tmp/test.s
>> .fpu neon
>> nop
>> $ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o
>>
>> For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.
>>
>> $ cat /tmp/test.s
>> .fpu neon
>> $ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
>> /tmp/test.s: Assembler messages:
>> /tmp/test.s:1: Error: unknown pseudo-op: `.fpu'
>>
>> $ cat /tmp/test.s
>> .arch_extension simd
>> $ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o
>>
>> References
>> ----------
>>
>> Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
>> GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
>> GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
>> GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
>> GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64

Shawn Webb via llvm-dev


On Wed, Apr 17, 2019 at 2:52 AM Peter Smith <[hidden email]> wrote:
On Tue, 16 Apr 2019 at 20:35, Manoj Gupta via llvm-dev
<[hidden email]> wrote:
>
> Hi David,
>
> Thanks for the detailed information.
>
> On Tue, Apr 16, 2019 at 2:58 AM David Spickett <[hidden email]> wrote:
>>
>> Hi Manoj,
>>
>>
>> I tried a few other options myself:
>>
>> * function 'target' attribute - the list of extensions this supports isn't complete and it doesn't enable the ACLE macros needed for intrinsics
>>
>> * manually defining ACLE macros - this allows intrinsics and is additive but assumes that you're not relying on codegen to emit instructions. I don't think it helps the bug linked from the GN source either. (https://crbug.com/934016)
>>
>>
>> So what you have is the best we can do right now.
>>
>>
>> Looking forward the problem I have with a '-mcrypto' is two fold:
>>
>> * What about other optional extensions? We'd need to add one for each, or at least every one people ask for and ideally get that into GCC too.
>
> Yes, ideally we want this per extension. crypto is where we have the problem now but it can/will change in future.
>
>>
>> * crypto specifically can mean different things depending on the base architecture. From Clang's point of view that's fine as it just adds a target feature 'crypto'. However later we might want to allow people to select which set of crypto extensions is added and we hit the same issue. (maybe you'd go with -mcrypto as an 'auto' and -mcrypto8.4-a etc, which is also ugly)
>>
>>
>> The other option would be to allow -march without a base architecture. E.g.
>>
>> -march=armv8-a+crc -march=+crypto
>>
>>
> Specifying -march without a base architecture is fine and probably would work for our problem but I feel it is not too different than specifying each feature individually. However, if it is too cumbersome to specify each feature independently like "-mcrypto"/"-mdsp" , my vote is for this.
>
>>
>> Or have them combine into some common set, which breaks existing behaviour:
>>
>> -march=armv8-a+crc -march=armv8.4-a+dsp -> -march=amv8-a+dsp+crc
>>
>>
>> Which gets into a lot of issues around how you choose the set of features. Smallest subset to target the minimal core, or largest to allow CPU detection code to compile?
>
> I suspect that most code would be written to have the lowest "arch"+feature since it is only the feature that the code was written to handle with runtime detection. Do anymore, and the program could crash because features that the CPU didn't support got enabled.
>>
>>
>> So allowing march=+<ext> is the one that won't break existing builds but would be Clang only for now. I don't know enough to say whether the other Architectures would be able to support that.
>>
>>
>> My instinct is that this is something the build system needs to handle since it presumably has to support GCC as well. I understand that still leaves you specifying a base architecture per file, when you'd rather pull that from the main march.
>>
>> (plus you're putting arch specific special cases in your build config, which isn't great either)
>
> I believe we'll have the same problem with GCC as well, if we were using GCC.
>>
>>
>> >>> As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.
>>
>>
>> Is this not deterministic? I would assume either Chrome OS always wins or crc32c always wins. Tell me if I'm wrong.
>>
>>
>
> Right now, it is the Chrome OS flags that win but then it means crypto feature is disabled for the files resulting in a compile error. It could also be other way round as well depending on the build system. Either way, it would be best to avoid this problem.
>

Just a thought; would it be possible to use an assembly directive like
".arch armv8-a+crypto" in the source files that need the crypto
extenstions, these would override the command line options?

For me, a source code annotation is fine but I can also imagine that some people would prefer to handle it via compiler options specially if the same file could be used for multiple architectures
and some of the targeted architectures may not support that feature.
Nevertheless, is it possible to make the directives more like ".feature +crypto" so that an arch does not have to provided.
 
Thanks,
Manoj

Peter

> Thanks,
> Manoj
>
>> Thanks,
>>
>> David Spickett.
>>
>>
>> ________________________________
>> From: Manoj Gupta <[hidden email]>
>> Sent: 10 April 2019 17:15:00
>> To: David Spickett
>> Cc: [hidden email]; [hidden email]; nd; Peter Smith
>> Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
>>
>>
>>
>> On Wed, Apr 10, 2019 at 9:03 AM David Spickett <[hidden email]> wrote:
>>
>> Hi Manoj,
>>
>>
>> Not too late at all, we have not got to that point of the work yet.
>>
>>
>> Are there examples of this kind of build setup that are available publicly? I think I understand the problem but it'd help to see one in action. To see if there are any other Arm extensions that are already being added like this and whether those systems support GCC and how.
>>
>>
>> One example where we had to use "-Xclang -target-feature ... " is here:
>> https://chromium.googlesource.com/chromium/src/+/refs/heads/master/third_party/crc32c/BUILD.gn#120
>>
>> This had to be done because Chrome OS build system passes the exact "-march=value" depending on the ISA supported by Chromebook but crc32c wants crc+crypto for runtime cpu detection based code.
>>
>> As a result,  one of the "-march" options either specified by Chrome OS or crc32c build has to lose the race as there was no other way to specify crc+crypto additively.
>>
>> Thanks,
>> Manoj
>>
>>
>> Thanks,
>>
>> David Spickett.
>>
>> ________________________________
>> From: Manoj Gupta <[hidden email]>
>> Sent: 10 April 2019 16:34:16
>> To: David Spickett
>> Cc: [hidden email]; [hidden email]; nd
>> Subject: Re: [llvm-dev] [RFC] New Clang target selection options for ARM/AArch64
>>
>>
>>
>> On Fri, Sep 21, 2018 at 3:06 AM David Spickett via llvm-dev <[hidden email]> wrote:
>>
>> Hi,
>>
>> Below is a document detailing changes we'd like to make to Clang/LLVM to improve the usability of the target options for ARM and AArch64.
>>
>> To keep things simple the proposed changes are listed at the start and you can find the supporting examples at the end of the document.
>>
>> I look forward to your feedback.
>>
>> Thanks,
>> David Spickett.
>>
>>
>>
>> RFC New Clang target feature selection options for ARM/AArch64
>> --------------------------------------------------------------
>>
>> In this RFC we propose changes to ARM and AArch64 target selection. With the top level goals to:
>> - validate that given options make sense within architectural restrictions
>> - make option discovery and documentation easier
>> - unify the list of extensions that command lines and asm directives use
>> - bring the options closer to GCC's where appropriate
>>
>> Current Options Comparison
>> --------------------------
>>
>>                        | GCC           | Clang         |
>>                        |-------------------------------|
>>                        | ARM | AArch64 | ARM | AArch64 |
>> |----------------------|-----|---------|-----|---------|
>> | -march with '+<ext>' | Y   | Y       | Y   | Y       |
>> | checks extensions    | Y   | N       | N   | N       |
>> | .arch with '+<ext>'  | N   | N       | N   | Y       |
>> | .arch_extension      | Y   | Y       | Y   | N       |
>> | .fpu                 | Y   | N       | Y   | N       |
>> | -mfpu                | Y   | N       | Y   | N       |
>> | checks FPUs          | N   | n/a     | N   | n/a     |
>> |----------------------|-----|---------|-----|---------|
>>
>> Examples of each of these can be found at the end of this document.
>>
>> Problems With the Current Options
>> ---------------------------------
>>
>> - You cannot select all extensions through an assembly directive, since the AsmParser's list is a separate subset of the complete one in TargetParser.
>> - Combinations of options are not checked for compatibility.
>> - Many extensions are tied to their base architecture, though it is valid to add them individually to a previous v8.x-a architecture.
>> - Users need to work out what FPU they need for ARM, this should be implied by the selected arch and extensions.
>> - Discovery of valid extensions is difficult, both for the user and for the purposes of generating documentation.
>>
>> Proposed solution
>> ------------------
>>
>> ARM and AArch64:
>> - Make the TargetParser the single source for extension names, removing the AsmParser tables.
>> - Reject unknown extension names with a diagnostic that includes a list of valid extensions for that architecture/CPU.
>> - Reject invalid combinations of architecture/CPU and extensions with an error diagnostic.
>> - Add independent subtarget features for each extension so that v8.x+1-a extensions can be used individually with earlier v8.x-a architectures where allowed.
>> - Emit a warning when a mandatory feature of the base architecture is enabled with '+extension', or disabled with '+noextension'. (and ignore the option)
>> - Errors caused by the solution above should be able to be downgraded to warnings with the usual -W* options. This applies only to cases where there is a reasonable interpretation of the options chosen.
>>
>> ARM:
>> - Allow all possible ARM extensions in the '.arch_extension' directive, without the '+' syntax
>> (allow them to be recognised, they could still be rejected for compatibility).
>> - Add an 'auto' value for -mfpu and make it the default. Meaning that the FPU is implied by mcpu/march. If mfpu is not auto, it should override other options and a warning should be emitted.
>> - Reject invalid mfpu and march/mcpu combinations with an error diagnostic.
>> - Reject invalid arch/cpu and extension combinations with an error diagnostic.
>>
>> Optional features
>> -----------------
>>
>> AArch64:
>> - add the '.arch_extension' directive, with the same behaviour as ARM (no '+', one extension per directive). This brings Clang in line with GCC which has this directive for both architectures. Clang does however allow you to achieve the same thing by using '+' with '.arch'.
>>
>> ARM:
>> - Allow '+' in '.arch' and '.cpu'. GCC does not allow this, but it would make ARM/AArch64 more consistent within Clang.
>>
>> Options Comparison With the Proposed Solution
>> ----------------------------------------------
>>
>> Anything in brackets has changed from the previous table.
>>
>>                        | GCC           | Clang             |
>>                        |-----------------------------------|
>>                        | ARM | AArch64 | ARM     | AArch64 |
>> |----------------------|-----|---------|---------|---------|
>> | -march with '+<ext>' | Y   | Y       | Y       | Y       |
>> | checks extensions    | Y   | N       | (Y)     | (Y)     |
>> | .arch with '+<ext>'  | N   | N       | (Y)     | Y       | (optional)
>> | .arch_extension      | Y   | Y       | Y       | (Y)     | (optional)
>> | -mfpu                | Y   | N       | Y       | N       |
>> | .fpu                 | Y   | N       | Y       | N       |
>> | checks FPUs          | N   | n/a     | (Y)     | n/a     |
>> |----------------------|-----|---------|---------|---------|
>>
>> Implementation
>> --------------
>>
>> Use of Table-gen
>> ================
>>
>> The current implementation of TargetParser has a number of FIXME comments saying that it should be changed to use tablegen instead of pre processor macros. There are several advantages of porting TargetParser to tablegen:
>> - more readable than the current macros
>> - allows default/optional values more easily
>> - we can generate code and documentation from the same source
>> - easier to add new properties
>>
>> Drawbacks:
>> - it requires a new tablegen backend to generate the include files
>> - additional indirection which could make debugging and future changes more difficult
>>
>> We think the benefits outweigh the disadvantages in this case.
>>
>> To do this, we would need to move TargetParser to break the cyclic dependency of LLVMSupport -> llvm-tblgen -> LLVMSupport. There are 2 options for this:
>> 1. create a new LLVMTargetParser library that contains all parsers for architectures that use it.
>> 2. put the TargetParser for each backend in the library group for that backend. This requires one of:
>>     * Relaxing the requirement that target parsers must be built even if the backend is not.
>>     * Modifying the CMake scripts to build the target parsers even if the backend is not being built.
>>
>> Option 1 is simpler but option 2 would allow us to make use of the existing tablegen files in the backends so it is preferred.
>>
>> Using existing SubTarget features
>> =================================
>>
>> If we go with option 2 above, we can reuse the existing subtarget features to work out any dependencies.
>>
>> We have a prototype that took option 1 above. The command line is converted into a sequence of options and resolved by the LLVM backend. This means that Clang does not know exactly what will be enabled. It needs to know this to output the correct pre processor feature test macros.
>>
>> Consider this AArch64 march:
>> -march=armv8.4-a+crypto+nosha2
>>
>> The base arch is armv8.4-a, the crypto extension turns on AES/SHA2/SHA3/SM4. The nosha2 disables SHA2/SHA3 (since SHA3 is dependant on SHA2). Each of these features has an ACLE feature test macro, so Clang needs to know that nosha2 also disables SHA3.
>>
>> New Errors and Warnings
>> =======================
>>
>> Whether these are errors or warnings by default is up for debate. This is a suggestion to begin with.
>> (these apply to cmd lines and directives unless stated)
>>
>> Errors:
>> - unknown extension in an assembly directive (currently fails silently)
>> - extension incompatible with base arch, message shows the base arch it requires.
>> - extension requires another which is disabled later, message shows which one is required.
>> - extension requires another which is not enabled, message shows requirements.
>> - ARM mfpu option is not 'auto' and is incompatible with the base arch, message shows list of valid FPUs.
>>
>> Warnings:
>> - ARM mfpu option is not auto and another option implies a different FPU than the mfpu value. The mfpu value will be used, and the message will show what was overridden.
>> - mandatory feature of the base arch is enabled with '+' (option is redundant so is ignored)
>> - mandatory feature of a base arch is disabled with '+no<feature>' (option makes no sense so the extension remains enabled)
>>
>> Proposed diagnostic names: (in the same order as above)
>> - "target-feature" (top level group)
>>     - "incompatible-feature"
>>       - "extension-requirement-disabled"
>>     - "extension-requires"
>>     - "incompatible-fpu"
>>     - "implied-fpu-unused"
>>     - "mandatory-feature-ignored"
>>     - "mandatory-feature-disabled"
>>
>> "Negative" Backend Features
>> ===========================
>>
>> There are a couple of features in ARM which remove capabilities rather than adding them. These are 'd16' (removes the top 16 D registers) and 'fp-only-sp' (removes double precision).
>> It would simplify the implementation if those were replaced with positive options. As in one that adds the top 16 D registers and one that enables double precision operations.
>>
>> This is a relatively simple change to LLVM but it will effect a large number of tests and would be a breaking change for users of LLVM as a library.
>>
>> .arch_extension Directive
>> =========================
>>
>> Regardless of '.arch_extension' being added to AArch64, it has some issues that need to be addressed for the rest of these changes.
>>
>> Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.
>>
>> We can handle this in a few of ways:
>> - Remove .arch_extension in favour of .arch. This conflicts with the option above to add it to AArch64 to bring us in line with GCC, and will break a lot of code written for older versions of Clang.
>> - Only accept options which do not vary with base architecture. For ARM, only the FPU options vary, and there is the .fpu directive for those. If we do decide to add .arch_extension to AArch64 this will mean that things like crypto will only be valid in .arch.
>> - Track the current base target, as implied by the command line or the last .arch/.cpu directive. This makes the directives as similar to the command lines as they can be without breaking backwards compatibility.
>>
>> The last option makes the most sense to us, certainly if we want to add .arch_extension to AArch64 in a straightforward way.
>>
>> ARM Assembly Directives
>> =======================
>>
>> As discussed for AArch64 the ARM assembly directives ('.arch', '.cpu', '.fpu', '.arch_extension') should be updated to use the new target parser. Giving them access to a complete list of features.
>>
>> '.arch' and '.cpu' supporting the '+' syntax is mentioned as an optional goal above. This makes ARM/AArch64 consistent within Clang but breaks from GCC's features.
>>
>> Current Command Line Option Examples
>> ------------------------------------
>>
>> Clang ARM
>> =========
>>
>> Extensions can be used with '+<{no}extension>' syntax on march or mcpu, there is no checking that the combinations are valid. The FPU is selected with -mfpu and this is not validated either.
>>
>> $ ./clang --target=arm-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
>> $ ./clang --target=arm-arm-none-eabi -mcpu=cortex-a53+dotprod -c /tmp/test.c -o /tmp/test.o
>> (can't use dotprod with v8-a)
>>
>> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
>> (should be invalid but is allowed)
>>
>> GCC ARM
>> =======
>>
>> For GCC it is the same except that mfpu defaults to 'auto', meaning that the value is implied by other options. Extensions are checked for compatibility with the base architecture but FPUs are not.
>>
>> $ ./arm-eabi-gcc -mcpu=cortex-a53 -mfpu=neon -c /tmp/test.c -o /tmp/test.o
>> $ ./arm-eabi-gcc -march=armv8-a -mfpu=auto -c /tmp/test.c -o /tmp/test.o
>>
>> $ ./arm-eabi-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
>> arm-eabi-gcc: error: 'armv8-a' does not support feature 'dotprod'
>> arm-eabi-gcc: note: valid feature names are: crc simd crypto nocrypto nofp
>>
>> $ ./arm-eabi-gcc -march=armv7-m -mfpu=neon-fp16 -c /tmp/test.c -o /tmp/test.o
>> (same example given for Clang above, should be invalid)
>>
>> Clang AArch64
>> =============
>>
>> The '+' syntax still applies but mfpu is replaced with '+' extensions.
>>
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
>> clang-7: warning: argument unused during compilation: '-mfpu=none' [-Wunused-command-line-argument]
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8.2-a+nofp -c /tmp/test.c -o /tmp/test.o
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto -c /tmp/test.c -o /tmp/test.o
>>
>> Dependencies within extensions are not checked. For example crypto requires simd, but it can be disabled in the same march option.
>>
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
>>
>>
>> It is a bit late to reply but can the options be specified independently of "-march". i.e.  -march=armv8-a -mcrypto -mnosimd etc. similar to "-msse", "-mavx" on x86.
>> This is for situations where certain packages e.g. media packages want to enable certain features based on runtime cpu detection.
>> To enable e.g. "crypto", they are also forced to choose a march, but that could override the architecture specified by the build system
>> ( or could get overridden by the -march specified by build system). e.g. it makes little sense for "-march=armv8-a+extension" to override the build system "-march=armv8.3-a"
>>  and vice-versa when the only desire is to enable the specific extension additively.
>>
>> The additive alternative is to use "-Xclang -target-feature -Xclang +feature" which is pretty ugly.
>>
>> Thanks
>>
>> Dependencies between an extension and the base arch are not checked either. Dot product cannot be used with v8.0-a but it is allowed.
>>
>> $ ./clang --target=aarch64-arm-none-eabi -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
>>
>> GCC AArch64
>> ===========
>>
>> For GCC AArch64 mfpu is also dropped in favour of '+' extensions.
>>
>> $ ./aarch64-elf-gcc -march=armv8.2-a -mfpu=none -c /tmp/test.c -o /tmp/test.o
>> aarch64-elf-gcc: error: unrecognized command line option '-mfpu=none'; did you mean '-gz=none'?
>>
>> Extensions are rejected if not recognised but not checked for compatibility. Hence the Clang crypto/simd example above is allowed with GCC too.
>>
>> $ ./aarch64-elf-gcc -march=armv8.2-a+food -c /tmp/test.c -o /tmp/test.o
>> cc1: error: invalid feature modifier in '-march=armv8.2-a+food'
>> $ ./aarch64-elf-gcc -march=armv8.2-a+dotprod -c /tmp/test.c -o /tmp/test.o
>> $ ./aarch64-elf-gcc -march=armv8-a+dotprod -c /tmp/test.c -o /tmp/test.o
>> (should not be allowed)
>> $ ./aarch64-elf-gcc -march=armv8-a+crypto+nosimd -c /tmp/test.c -o /tmp/test.o
>> (should not be allowed)
>>
>> Current Assembly Directive Examples
>> -----------------------------------
>>
>> Clang .arch/.arch_extension
>> ===========================
>>
>> AArch64 uses .arch and '+' syntax, ARM uses .arch_extension/.arch and does not support '+' syntax in either.
>>
>> In both arches, the list of possible extensions is not complete since it is separate from the one in TargetParser. So there is no way to enable dotprod (amongst other things) with a directive.
>>
>> (example is using AArch64)
>> .arch armv8.2-a # error: instruction requires: dotprod
>> udot v0.2s, v1.8b, v2.8b
>>
>> .arch armv8.2-a+dotprod # error: instruction requires: dotprod
>> udot v0.2s, v1.8b, v2.8b
>>
>> ARM uses the .arch_extension directive which is one extension per use, with no '+'.
>>
>> .arch armv7-a #error: instruction requires: crc armv8
>> CRC32B r0, r1, r2
>>
>> .arch armv8-a+crc #error: Unknown arch name
>> CRC32B r0, r1, r2
>>
>> .arch armv8-a # no error
>> .arch_extension crc
>> CRC32B r0, r1, r2
>>
>> You can see here that though ARM march/mcpu would understand +crc, the assembly directive does not.
>>
>> ARM does check validity of extensions provided with '.arch_extension'.
>>
>> .arch armv7-a
>> .arch_extension crc
>> CRC32B r0, r1, r2
>>
>> main.s:20:17: error: architectural extension 'crc' is not allowed for the current base architecture
>> .arch_extension crc
>>
>> AArch64 only rejects known extensions that aren't supported at all.
>>
>> .arch armv8-a+pan # unsupported architectural extension: pan
>> nop
>>
>> Neither ARM or AArch64 know about the inter dependencies between extensions. So the example from the command lines applies here too.
>>
>> (example is using AArch64)
>> .arch armv8-a+crypto+nosimd # no error/warning, crypto requires simd
>> nop
>>
>> GCC .arch/.arch_extension
>> =========================
>>
>> GCC is more consistent across the two arches, both use .arch and .arch_extension. Neither understand the '+' syntax.
>>
>> .arch armv8-a+crc # invalid
>>
>> .arch armv8-a # valid
>> .arch_extension crc
>>
>> .arch_extension crc # valid
>> .arch_extension crc+crypto #invalid
>>
>> For extensions that vary based on base architecture, GCC tracks the last known arch.
>>
>> Clang .fpu
>> ==========
>>
>> .fpu is only available for ARM. Values are not checked for compatibility, only rejected if completely unknown.
>>
>> ./clang --target=aarch64-arm-none-eabi -march=armv8-a -c /tmp/test.s -o /tmp/test.o
>> /tmp/test.s:1:1: error: unknown directive
>> .fpu neon
>> ^
>>
>> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
>> /tmp/test.s:1:6: error: Unknown FPU name
>> .fpu clearly-not-valid
>>      ^
>>
>> (same example as 'Clang ARM' command lines, should be invalid)
>> $ cat /tmp/test.s
>> .fpu neon-fp16
>> $ ./clang --target=arm-arm-none-eabi -march=armv7-m -c /tmp/test.s -o /tmp/test.o
>>
>> GCC .fpu
>> ========
>>
>> .fpu is provided for ARM only and the FPU names are not checked against the base arch or CPU.
>>
>> This is correctly rejected from a command line:
>> $ ./arm-eabi-gcc -march=armv6zk+neon -c /tmp/test.s -o /tmp/test.o
>> arm-eabi-gcc: error: 'armv6zk' does not support feature 'neon'
>> arm-eabi-gcc: note: valid feature names are: fp nofp vfpv2
>>
>> Whereas the directive is accepted:
>> $ cat /tmp/test.s
>> .fpu neon
>> nop
>> $ ./arm-eabi-gcc -march=armv6zk -c /tmp/test.s -o /tmp/test.o
>>
>> For AArch64 .fpu is removed in favour of .arch_extension. Instead of directly selecting an FPU it is implied by the extensions used.
>>
>> $ cat /tmp/test.s
>> .fpu neon
>> $ ./aarch64-elf-gcc -march=armv8-a+simd -c /tmp/test.s -o /tmp/test.o
>> /tmp/test.s: Assembler messages:
>> /tmp/test.s:1: Error: unknown pseudo-op: `.fpu'
>>
>> $ cat /tmp/test.s
>> .arch_extension simd
>> $ ./aarch64-elf-gcc -march=armv8-a -c /tmp/test.s -o /tmp/test.o
>>
>> References
>> ----------
>>
>> Crypto extension requires SIMD: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500e/CJHDEBAF.html
>> GCC ARM options: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
>> GCC ARM directives: https://sourceware.org/binutils/docs/as/ARM-Directives.html
>> GCC AArch64 options: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
>> GCC AArch64 directives: https://sourceware.org/binutils/docs/as/AArch64-Directives.html
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev