Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Arnaud A. de Grandmaison
Hi All,

In the language reference manual, the access behavior of the memcpy,
memmove and memset intrinsics is not well defined with respect to the
volatile flag. The LRM even states that "it is unwise to depend on it".
This forces optimization passes to be conservatively correct and prevent
optimizations.

A very simple example of this is :

$ cat test.c

#include <stdint.h>

struct R {
  uint16_t a;
  uint16_t b;
};

volatile struct R * const addr = (volatile struct R *) 416;

void test(uint16_t a)
{
  struct R r = { a, 1 };
  *addr = r;
}

$ clang -O2 -o - -emit-llvm -S -c test.c
; ModuleID = 'test.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.R = type { i16, i16 }

@addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8

define void @test(i16 zeroext %a) nounwind uwtable {
  %r.sroa.0 = alloca i16, align 2
  %r.sroa.1 = alloca i16, align 2
  store i16 %a, i16* %r.sroa.0, align 2
  store i16 1, i16* %r.sroa.1, align 2
  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

In the generated ir, the loads are marked as volatile. In this case,
this is due to the new SROA being conservatively correct. We should
however expect the test function to look like this much simpler one
(which was actually the case with llvm <= 3.1) :

define void @test(i16 zeroext %a) nounwind uwtable {
  store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
  store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}


I propose to specify the volatile access behavior for those intrinsics :
instead of one flag, they should have 2 independant volatile flags, one
for destination accesses, the second for the source accesses.

If there is general agreement, I plan to proceed with the following steps :
1. Specify the access behavior (this patch).
2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into
the more precise form by replicating the single volatile flag and rework
the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(),
(is|set)DestVolatile() and implement (set|is)Volatile in terms of the
former 2 methods. This will conservatively preserve semantics. No
functional change so far. Commit 1 & 2.
3. Audit all uses of isVolatile() / setVolatile() and move them to the
more precise form. From this point, more aggressive / precise
optimizations can happen. Commit 3.
4. Teach clang to use the new form.
5. Optionally remove the old interface form,as there should be no
in-tree users left. This would however be an API change breaking
external code.

Cheers,

--
Arnaud de Grandmaison


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Andrew Trick
I can't think of a better way to do this, so I think it's ok.

I also  submitted a complementary patch on llvm-commits clarifying volatile semantics.

-Andy

On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison <[hidden email]> wrote:

> Hi All,
>
> In the language reference manual, the access behavior of the memcpy,
> memmove and memset intrinsics is not well defined with respect to the
> volatile flag. The LRM even states that "it is unwise to depend on it".
> This forces optimization passes to be conservatively correct and prevent
> optimizations.
>
> A very simple example of this is :
>
> $ cat test.c
>
> #include <stdint.h>
>
> struct R {
>  uint16_t a;
>  uint16_t b;
> };
>
> volatile struct R * const addr = (volatile struct R *) 416;
>
> void test(uint16_t a)
> {
>  struct R r = { a, 1 };
>  *addr = r;
> }
>
> $ clang -O2 -o - -emit-llvm -S -c test.c
> ; ModuleID = 'test.c'
> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> %struct.R = type { i16, i16 }
>
> @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>  %r.sroa.0 = alloca i16, align 2
>  %r.sroa.1 = alloca i16, align 2
>  store i16 %a, i16* %r.sroa.0, align 2
>  store i16 1, i16* %r.sroa.1, align 2
>  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
>  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
>  ret void
> }
>
> In the generated ir, the loads are marked as volatile. In this case,
> this is due to the new SROA being conservatively correct. We should
> however expect the test function to look like this much simpler one
> (which was actually the case with llvm <= 3.1) :
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>  store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
>  store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
>  ret void
> }
>
>
> I propose to specify the volatile access behavior for those intrinsics :
> instead of one flag, they should have 2 independant volatile flags, one
> for destination accesses, the second for the source accesses.
>
> If there is general agreement, I plan to proceed with the following steps :
> 1. Specify the access behavior (this patch).
> 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into
> the more precise form by replicating the single volatile flag and rework
> the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(),
> (is|set)DestVolatile() and implement (set|is)Volatile in terms of the
> former 2 methods. This will conservatively preserve semantics. No
> functional change so far. Commit 1 & 2.
> 3. Audit all uses of isVolatile() / setVolatile() and move them to the
> more precise form. From this point, more aggressive / precise
> optimizations can happen. Commit 3.
> 4. Teach clang to use the new form.
> 5. Optionally remove the old interface form,as there should be no
> in-tree users left. This would however be an API change breaking
> external code.
>
> Cheers,
>
> --
> Arnaud de Grandmaison
>
> <0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch>_______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Chandler Carruth-2
LGTM as well.

I'm particularly happy to specify that there is no guarantee about the width of loads and/or stores resulting, as I think that's important for frontends to not depend on (or to request a change if they need to depend on it).

Thanks Andy far the additional clarification, that helps pin down things even more.


On Tue, Jan 29, 2013 at 10:30 AM, Andrew Trick <[hidden email]> wrote:
I can't think of a better way to do this, so I think it's ok.

I also  submitted a complementary patch on llvm-commits clarifying volatile semantics.

-Andy

On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison <[hidden email]> wrote:

> Hi All,
>
> In the language reference manual, the access behavior of the memcpy,
> memmove and memset intrinsics is not well defined with respect to the
> volatile flag. The LRM even states that "it is unwise to depend on it".
> This forces optimization passes to be conservatively correct and prevent
> optimizations.
>
> A very simple example of this is :
>
> $ cat test.c
>
> #include <stdint.h>
>
> struct R {
>  uint16_t a;
>  uint16_t b;
> };
>
> volatile struct R * const addr = (volatile struct R *) 416;
>
> void test(uint16_t a)
> {
>  struct R r = { a, 1 };
>  *addr = r;
> }
>
> $ clang -O2 -o - -emit-llvm -S -c test.c
> ; ModuleID = 'test.c'
> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> %struct.R = type { i16, i16 }
>
> @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>  %r.sroa.0 = alloca i16, align 2
>  %r.sroa.1 = alloca i16, align 2
>  store i16 %a, i16* %r.sroa.0, align 2
>  store i16 1, i16* %r.sroa.1, align 2
>  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
>  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
>  ret void
> }
>
> In the generated ir, the loads are marked as volatile. In this case,
> this is due to the new SROA being conservatively correct. We should
> however expect the test function to look like this much simpler one
> (which was actually the case with llvm <= 3.1) :
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>  store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
>  store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
>  ret void
> }
>
>
> I propose to specify the volatile access behavior for those intrinsics :
> instead of one flag, they should have 2 independant volatile flags, one
> for destination accesses, the second for the source accesses.
>
> If there is general agreement, I plan to proceed with the following steps :
> 1. Specify the access behavior (this patch).
> 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into
> the more precise form by replicating the single volatile flag and rework
> the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(),
> (is|set)DestVolatile() and implement (set|is)Volatile in terms of the
> former 2 methods. This will conservatively preserve semantics. No
> functional change so far. Commit 1 & 2.
> 3. Audit all uses of isVolatile() / setVolatile() and move them to the
> more precise form. From this point, more aggressive / precise
> optimizations can happen. Commit 3.
> 4. Teach clang to use the new form.
> 5. Optionally remove the old interface form,as there should be no
> in-tree users left. This would however be an API change breaking
> external code.
>
> Cheers,
>
> --
> Arnaud de Grandmaison
>
> <0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch>_______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Arnaud Allard de Grandmaison-2
In reply to this post by Arnaud A. de Grandmaison
Thanks Andy and Chandler,

After specifying the volatile access behaviour, the second step was to
autoupgrade the memmove/memcpy intrinsics, and implement
(is|set)Volatile in terms of (is|set)(Src|Dest)Volatile, with no
functional change.

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch is the
one you already reviewed, unaltered.

0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch
implements this second step. It requires clang tests to be patched with
clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch as
the codegen'd memcpy / memmove will use the new format.

Finally, 0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch
is purely mechanical : it upgrades all llvm tests (but
auto_upgrade_intrinsics :)) to use the new format for memcpy/memmove.

There are no functional changes, "make check-llvm check-clang" pass
after each patch is applied.

When those patches are accepted, I will commit them and start
propagating the new interface in the codebase.

Cheers,
--
Arnaud A. de Grandmaison

On 01/28/2013 05:54 PM, Arnaud A. de Grandmaison wrote:

> Hi All,
>
> In the language reference manual, the access behavior of the memcpy,
> memmove and memset intrinsics is not well defined with respect to the
> volatile flag. The LRM even states that "it is unwise to depend on it".
> This forces optimization passes to be conservatively correct and prevent
> optimizations.
>
> A very simple example of this is :
>
> $ cat test.c
>
> #include <stdint.h>
>
> struct R {
>   uint16_t a;
>   uint16_t b;
> };
>
> volatile struct R * const addr = (volatile struct R *) 416;
>
> void test(uint16_t a)
> {
>   struct R r = { a, 1 };
>   *addr = r;
> }
>
> $ clang -O2 -o - -emit-llvm -S -c test.c
> ; ModuleID = 'test.c'
> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> %struct.R = type { i16, i16 }
>
> @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>   %r.sroa.0 = alloca i16, align 2
>   %r.sroa.1 = alloca i16, align 2
>   store i16 %a, i16* %r.sroa.0, align 2
>   store i16 1, i16* %r.sroa.1, align 2
>   %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>   store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
>   %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>   store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
>   ret void
> }
>
> In the generated ir, the loads are marked as volatile. In this case,
> this is due to the new SROA being conservatively correct. We should
> however expect the test function to look like this much simpler one
> (which was actually the case with llvm <= 3.1) :
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>   store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
>   store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
>   ret void
> }
>
>
> I propose to specify the volatile access behavior for those intrinsics :
> instead of one flag, they should have 2 independant volatile flags, one
> for destination accesses, the second for the source accesses.
>
> If there is general agreement, I plan to proceed with the following steps :
> 1. Specify the access behavior (this patch).
> 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into
> the more precise form by replicating the single volatile flag and rework
> the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(),
> (is|set)DestVolatile() and implement (set|is)Volatile in terms of the
> former 2 methods. This will conservatively preserve semantics. No
> functional change so far. Commit 1 & 2.
> 3. Audit all uses of isVolatile() / setVolatile() and move them to the
> more precise form. From this point, more aggressive / precise
> optimizations can happen. Commit 3.
> 4. Teach clang to use the new form.
> 5. Optionally remove the old interface form,as there should be no
> in-tree users left. This would however be an API change breaking
> external code.
>
> Cheers,
>

--
Arnaud A. de Grandmaison


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch (7K) Download Attachment
0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch (22K) Download Attachment
0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch (180K) Download Attachment
clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Arnaud Allard de Grandmaison-2
Same patches as before, but 0002-memcpy has been updated to put the
(is|set)SrcVolatile methods to where they logically belong :
MemTransferInst. This makes (is|set)Volatile methods look a bit ugly to
keep compatibility with existing behaviour, but they will hopefully
disappear when all users have moved to the new interface --- in the next
series of patches.

I plan to give a try to phabricator for the next patches, but could not
for the actual ones as they touch llvm and clang.

Cheers,
--
Arnaud A. de Grandmaison

On 01/31/2013 06:13 PM, Arnaud A. de Grandmaison wrote:

> Thanks Andy and Chandler,
>
> After specifying the volatile access behaviour, the second step was to
> autoupgrade the memmove/memcpy intrinsics, and implement
> (is|set)Volatile in terms of (is|set)(Src|Dest)Volatile, with no
> functional change.
>
> 0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch is the
> one you already reviewed, unaltered.
>
> 0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch
> implements this second step. It requires clang tests to be patched with
> clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch as
> the codegen'd memcpy / memmove will use the new format.
>
> Finally, 0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch
> is purely mechanical : it upgrades all llvm tests (but
> auto_upgrade_intrinsics :)) to use the new format for memcpy/memmove.
>
> There are no functional changes, "make check-llvm check-clang" pass
> after each patch is applied.
>
> When those patches are accepted, I will commit them and start
> propagating the new interface in the codebase.
>
> Cheers,
> --
> Arnaud A. de Grandmaison
>
> On 01/28/2013 05:54 PM, Arnaud A. de Grandmaison wrote:
>> Hi All,
>>
>> In the language reference manual, the access behavior of the memcpy,
>> memmove and memset intrinsics is not well defined with respect to the
>> volatile flag. The LRM even states that "it is unwise to depend on it".
>> This forces optimization passes to be conservatively correct and prevent
>> optimizations.
>>
>> A very simple example of this is :
>>
>> $ cat test.c
>>
>> #include <stdint.h>
>>
>> struct R {
>>   uint16_t a;
>>   uint16_t b;
>> };
>>
>> volatile struct R * const addr = (volatile struct R *) 416;
>>
>> void test(uint16_t a)
>> {
>>   struct R r = { a, 1 };
>>   *addr = r;
>> }
>>
>> $ clang -O2 -o - -emit-llvm -S -c test.c
>> ; ModuleID = 'test.c'
>> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>> target triple = "x86_64-unknown-linux-gnu"
>>
>> %struct.R = type { i16, i16 }
>>
>> @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8
>>
>> define void @test(i16 zeroext %a) nounwind uwtable {
>>   %r.sroa.0 = alloca i16, align 2
>>   %r.sroa.1 = alloca i16, align 2
>>   store i16 %a, i16* %r.sroa.0, align 2
>>   store i16 1, i16* %r.sroa.1, align 2
>>   %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>>   store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
>>   %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>>   store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
>>   ret void
>> }
>>
>> In the generated ir, the loads are marked as volatile. In this case,
>> this is due to the new SROA being conservatively correct. We should
>> however expect the test function to look like this much simpler one
>> (which was actually the case with llvm <= 3.1) :
>>
>> define void @test(i16 zeroext %a) nounwind uwtable {
>>   store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
>>   store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
>>   ret void
>> }
>>
>>
>> I propose to specify the volatile access behavior for those intrinsics :
>> instead of one flag, they should have 2 independant volatile flags, one
>> for destination accesses, the second for the source accesses.
>>
>> If there is general agreement, I plan to proceed with the following steps :
>> 1. Specify the access behavior (this patch).
>> 2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into
>> the more precise form by replicating the single volatile flag and rework
>> the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(),
>> (is|set)DestVolatile() and implement (set|is)Volatile in terms of the
>> former 2 methods. This will conservatively preserve semantics. No
>> functional change so far. Commit 1 & 2.
>> 3. Audit all uses of isVolatile() / setVolatile() and move them to the
>> more precise form. From this point, more aggressive / precise
>> optimizations can happen. Commit 3.
>> 4. Teach clang to use the new form.
>> 5. Optionally remove the old interface form,as there should be no
>> in-tree users left. This would however be an API change breaking
>> external code.
>>
>> Cheers,
>>
>

--
Arnaud A. de Grandmaison


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch (7K) Download Attachment
0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch (22K) Download Attachment
0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch.bz2 (28K) Download Attachment
clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch (22K) Download Attachment