[llvm-dev] Fusing contract fadd/fsub with normal fmul

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[llvm-dev] Fusing contract fadd/fsub with normal fmul

Gerolf Hoflehner via llvm-dev
Hi,

On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
with `contract` or `fast` can be merged to a fma instruction by the
backend.

I'm wondering about the exact semantic of this new flag as well as
`fast` and in particular, would it be valid to do this when only the
`fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
least `fast`. The reasoning is that doing this will have a similar
effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
single flag on this instruction should be enough for the
transformation.

The particular case I'm interested in is vectorized loop with
reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
recognize this and mark the `+` as `fast` to enable vectorization.
It'll be great if this can enable the reduction to be done with `fma`
instructions.

Yichao Yu
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] Fusing contract fadd/fsub with normal fmul

Gerolf Hoflehner via llvm-dev
It seems like the contract flag is underspecified in this regard. I'd
lean, however, toward requiring it on both instructions in order to
contract them. That way inlining a function where contraction was
prohibited into a function where contraction was permitted would not be
able to effectively remove the final-result rounding from the callee.

  -Hal


On 06/09/2017 10:04 PM, Yichao Yu via llvm-dev wrote:

> Hi,
>
> On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
> with `contract` or `fast` can be merged to a fma instruction by the
> backend.
>
> I'm wondering about the exact semantic of this new flag as well as
> `fast` and in particular, would it be valid to do this when only the
> `fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
> least `fast`. The reasoning is that doing this will have a similar
> effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
> single flag on this instruction should be enough for the
> transformation.
>
> The particular case I'm interested in is vectorized loop with
> reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
> recognize this and mark the `+` as `fast` to enable vectorization.
> It'll be great if this can enable the reduction to be done with `fma`
> instructions.
>
> Yichao Yu
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] Fusing contract fadd/fsub with normal fmul

Gerolf Hoflehner via llvm-dev
In reply to this post by Gerolf Hoflehner via llvm-dev
For reference, the FMF 'contract' patches are listed here:
https://bugs.llvm.org/show_bug.cgi?id=25721#c6

If we can make the documentation better, that would certainly be a welcome patch.

It would be better to see the IR for your example(s), but I think you'd need 'contract' on both the fmul and fadd to generate an FMA. Conservatively, we wouldn't alter the result if either component somehow required strict FP. To vectorize, you probably need 'fast' on both ops because vectorization would be changing the order of operations (reassociation).


On Fri, Jun 9, 2017 at 9:04 PM, Yichao Yu via llvm-dev <[hidden email]> wrote:
Hi,

On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
with `contract` or `fast` can be merged to a fma instruction by the
backend.

I'm wondering about the exact semantic of this new flag as well as
`fast` and in particular, would it be valid to do this when only the
`fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
least `fast`. The reasoning is that doing this will have a similar
effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
single flag on this instruction should be enough for the
transformation.

The particular case I'm interested in is vectorized loop with
reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
recognize this and mark the `+` as `fast` to enable vectorization.
It'll be great if this can enable the reduction to be done with `fma`
instructions.

Yichao Yu
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] Fusing contract fadd/fsub with normal fmul

Gerolf Hoflehner via llvm-dev
On Mon, Jun 12, 2017 at 9:40 AM, Sanjay Patel <[hidden email]> wrote:
> For reference, the FMF 'contract' patches are listed here:
> https://bugs.llvm.org/show_bug.cgi?id=25721#c6
>
> If we can make the documentation better, that would certainly be a welcome
> patch.
>
> It would be better to see the IR for your example(s), but I think you'd need

The IR of the scalar loop is
```
if13:                                             ; preds = %scalar.ph, %if13
 %s.124 = phi double [ %51, %if13 ], [ %bc.merge.rdx, %scalar.ph ]
 %"i#672.023" = phi i64 [ %52, %if13 ], [ %bc.resume.val, %scalar.ph ]
 %46 = getelementptr double, double* %13, i64 %"i#672.023"
 %47 = load double, double* %46, align 8
 %48 = getelementptr double, double* %15, i64 %"i#672.023"
 %49 = load double, double* %48, align 8
 %50 = fmul double %47, %49
 %51 = fadd fast double %s.124, %50
 %52 = add nuw nsw i64 %"i#672.023", 1
 %53 = icmp slt i64 %52, %9
 br i1 %53, label %if13, label
%L11.outer.split.L11.outer.split.split_crit_edge.outer.loopexit
```

And it can be vectorized to

```
vector.body:                                      ; preds =
%vector.body, %vector.ph
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
 %vec.phi = phi <4 x double> [ %19, %vector.ph ], [ %40, %vector.body ]
 %vec.phi94 = phi <4 x double> [ zeroinitializer, %vector.ph ], [ %41,
%vector.body ]
 %vec.phi95 = phi <4 x double> [ zeroinitializer, %vector.ph ], [ %42,
%vector.body ]
 %vec.phi96 = phi <4 x double> [ zeroinitializer, %vector.ph ], [ %43,
%vector.body ]
 %20 = getelementptr double, double* %13, i64 %index
 %21 = bitcast double* %20 to <4 x double>*
 %wide.load = load <4 x double>, <4 x double>* %21, align 8
 %22 = getelementptr double, double* %20, i64 4
 %23 = bitcast double* %22 to <4 x double>*
 %wide.load100 = load <4 x double>, <4 x double>* %23, align 8
 %24 = getelementptr double, double* %20, i64 8
 %25 = bitcast double* %24 to <4 x double>*
 %wide.load101 = load <4 x double>, <4 x double>* %25, align 8
 %26 = getelementptr double, double* %20, i64 12
 %27 = bitcast double* %26 to <4 x double>*
 %wide.load102 = load <4 x double>, <4 x double>* %27, align 8
 %28 = getelementptr double, double* %15, i64 %index
 %29 = bitcast double* %28 to <4 x double>*
 %wide.load103 = load <4 x double>, <4 x double>* %29, align 8
 %30 = getelementptr double, double* %28, i64 4
 %31 = bitcast double* %30 to <4 x double>*
 %wide.load104 = load <4 x double>, <4 x double>* %31, align 8
 %32 = getelementptr double, double* %28, i64 8
 %33 = bitcast double* %32 to <4 x double>*
 %wide.load105 = load <4 x double>, <4 x double>* %33, align 8
 %34 = getelementptr double, double* %28, i64 12
 %35 = bitcast double* %34 to <4 x double>*
 %wide.load106 = load <4 x double>, <4 x double>* %35, align 8
 %36 = fmul <4 x double> %wide.load, %wide.load103
 %37 = fmul <4 x double> %wide.load100, %wide.load104
 %38 = fmul <4 x double> %wide.load101, %wide.load105
 %39 = fmul <4 x double> %wide.load102, %wide.load106
 %40 = fadd fast <4 x double> %vec.phi, %36
 %41 = fadd fast <4 x double> %vec.phi94, %37
 %42 = fadd fast <4 x double> %vec.phi95, %38
 %43 = fadd fast <4 x double> %vec.phi96, %39
 %index.next = add i64 %index, 16
 %44 = icmp eq i64 %index.next, %n.vec
 br i1 %44, label %middle.block, label %vector.body
```

If contracting normal mul and fast add is allowed, both loop can use fma.

> 'contract' on both the fmul and fadd to generate an FMA. Conservatively, we
> wouldn't alter the result if either component somehow required strict FP. To
> vectorize, you probably need 'fast' on both ops because vectorization would
> be changing the order of operations (reassociation).
>
>
> On Fri, Jun 9, 2017 at 9:04 PM, Yichao Yu via llvm-dev
> <[hidden email]> wrote:
>>
>> Hi,
>>
>> On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
>> with `contract` or `fast` can be merged to a fma instruction by the
>> backend.
>>
>> I'm wondering about the exact semantic of this new flag as well as
>> `fast` and in particular, would it be valid to do this when only the
>> `fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
>> least `fast`. The reasoning is that doing this will have a similar
>> effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
>> single flag on this instruction should be enough for the
>> transformation.
>>
>> The particular case I'm interested in is vectorized loop with
>> reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
>> recognize this and mark the `+` as `fast` to enable vectorization.
>> It'll be great if this can enable the reduction to be done with `fma`
>> instructions.
>>
>> Yichao Yu
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Loading...