[llvm-dev] ThinLTO + CFI

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Hi,

I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.

Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.

I was wondering if there was a way to mitigate this limitation.

a.c
=============================
typedef int (*fptr_t) (void);
fptr_t get_fptr();
int main(int argc, char *argv[])
{
  fptr_t fp = get_fptr();
  return fp();
}


b.c
=============================
typedef int (*fptr_t) (void);
int foo(void) { return 11; }
int bar(void) { return 22; }

static fptr_t fptr = bar;
static int i = 53;

fptr_t get_fptr(void)
{
  if (i >= 0)
    fptr = foo;
  else
    fptr = bar;

  return fptr;
}

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>
> Hi,
>
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>
> I was wondering if there was a way to mitigate this limitation.
>
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
>
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
>
> static fptr_t fptr = bar;
> static int i = 53;
>
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
>
>  return fptr;
> }
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>
> Hi,
>
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>
> I was wondering if there was a way to mitigate this limitation.
>
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
>
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
>
> static fptr_t fptr = bar;
> static int i = 53;
>
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
>
>  return fptr;
> }
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>
> Hi,
>
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>
> I was wondering if there was a way to mitigate this limitation.
>
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
>
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
>
> static fptr_t fptr = bar;
> static int i = 53;
>
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
>
>  return fptr;
> }
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Note that the renaming of functions happens with full LTO as well, but I think we might end up doing it more often with ThinLTO because with ThinLTO we need to decide which functions need jump tables (and therefore need to be renamed) before cross-module importing/inlining, as opposed to after inlining with full LTO.

Regarding the issue with constants, once the importing issue is solved I think the direct call would still go through the jump table. If that ends up being a problem for you I guess we can consider introducing an optimization pass to replace direct calls to jump tables with direct calls to real functions.

Regarding the link order issue, can you work around it by repeating each entry in your orderfile with ".cfi" appended? Something like:

fun1
fun1.cfi
fun2
fun2.cfi

Regarding bitcode files, the files we create for ThinLTO contain multiple modules, and you can use the llvm-modextract tool to extract the individual modules and inspect them with standard tools such as llvm-dis.

Peter

On Wed, Apr 18, 2018 at 4:49 PM, via llvm-dev <[hidden email]> wrote:
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>
> Hi,
>
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>
> I was wondering if there was a way to mitigate this limitation.
>
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
>
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
>
> static fptr_t fptr = bar;
> static int i = 53;
>
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
>
>  return fptr;
> }
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev


On Wed, Apr 18, 2018 at 4:49 PM, <[hidden email]> wrote:
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.

Teresa


Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>
> Hi,
>
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>
> I was wondering if there was a way to mitigate this limitation.
>
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
>
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
>
> static fptr_t fptr = bar;
> static int i = 53;
>
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
>
>  return fptr;
> }
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413




--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Teresa, Peter,

Thanks for your help!
I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.

Thanks.
Dmitry.


On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:



On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.

Teresa


Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> 
> Hi,
> 
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> 
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> 
> I was wondering if there was a way to mitigate this limitation.
> 
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
> 
> 
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
> 
> static fptr_t fptr = bar;
> static int i = 53;
> 
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
> 
>  return fptr;
> }
> 
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413




-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Regarding the orderfile, yes, I was thinking more about ordering the real functions.

In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.

Peter

On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
Teresa, Peter,

Thanks for your help!
I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.

Thanks.
Dmitry.


On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:



On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.

Teresa


Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> 
> Hi,
> 
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> 
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> 
> I was wondering if there was a way to mitigate this limitation.
> 
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
> 
> 
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
> 
> static fptr_t fptr = bar;
> static int i = 53;
> 
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
> 
>  return fptr;
> }
> 
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413




-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev


On Wed, Apr 18, 2018 at 6:11 PM, Teresa Johnson <[hidden email]> wrote:


On Wed, Apr 18, 2018 at 4:49 PM, <[hidden email]> wrote:
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.

FYI I decided to take a stab at an LLVM assembly format today. I have hacked some code to do the printing side, and am playing with formats. I'll send out an RFC on the format hopefully early next week.

Teresa


Teresa


Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>
> Hi,
>
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>
> I was wondering if there was a way to mitigate this limitation.
>
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
>
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
>
> static fptr_t fptr = bar;
> static int i = 53;
>
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
>
>  return fptr;
> }
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413




--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
Hi Peter,

We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.

In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.

I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.

Thanks.
Dmitry.



> 
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
> 
> 
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
> 
> static fptr_t fptr = bar;
> static int i = 53;
> 
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
> 
>  return fptr;
> }
> 




On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:

Regarding the orderfile, yes, I was thinking more about ordering the real functions.

In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.

Peter

On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
Teresa, Peter,

Thanks for your help!
I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.

Thanks.
Dmitry.


On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:



On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.

Teresa


Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> 
> Hi,
> 
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> 
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> 
> I was wondering if there was a way to mitigate this limitation.
> 
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
> 
> 
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
> 
> static fptr_t fptr = bar;
> static int i = 53;
> 
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
> 
>  return fptr;
> }
> 
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413




-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
-- 
Peter


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?

In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".

> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.

As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
Do you have an example of what you are seeing?

Peter

On Thu, Apr 26, 2018 at 4:54 PM, <[hidden email]> wrote:
Hi Peter,

We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.

In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.

I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.

Thanks.
Dmitry.



> 
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
> 
> 
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
> 
> static fptr_t fptr = bar;
> static int i = 53;
> 
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
> 
>  return fptr;
> }
> 




On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:

Regarding the orderfile, yes, I was thinking more about ordering the real functions.

In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.

Peter

On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
Teresa, Peter,

Thanks for your help!
I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.

Thanks.
Dmitry.


On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:



On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
Hi Teresa,

Thanks for the info!
This example is my attempt to reduce FreeBSD kernel to something more manageable :)

I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…

Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.

Teresa


Hopefully Peter can chime in regarding CFI related issues.

Thanks.
Dmitry. 


On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:

Hi Dmitry,

Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!

In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?

Teresa

On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?

Thanks!


> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> 
> Hi,
> 
> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> 
> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> 
> I was wondering if there was a way to mitigate this limitation.
> 
> a.c
> =============================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> int main(int argc, char *argv[])
> {
>  fptr_t fp = get_fptr();
>  return fp();
> }
> 
> 
> b.c
> =============================
> typedef int (*fptr_t) (void);
> int foo(void) { return 11; }
> int bar(void) { return 22; }
> 
> static fptr_t fptr = bar;
> static int i = 53;
> 
> fptr_t get_fptr(void)
> {
>  if (i >= 0)
>    fptr = foo;
>  else
>    fptr = bar;
> 
>  return fptr;
> }
> 
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413




-- 
Teresa Johnson |  Software Engineer |  [hidden email] |  408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
-- 
Peter




--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.

The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.

Maybe the problem is not with renaming but with the missing type metadata in this particular case.
Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?

--------------------------------
define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
  %3 = add nsw i32 %0, -2
  store i32 %3, i32* @i, align 4, !tbaa !5
  %4 = icmp sgt i32 %0, 1
  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9

; <label>:8:                                      ; preds = %2
  %9 = ptrtoint i32 ()* %5 to i64
  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
  unreachable, !nosanitize !9

; <label>:10:                                     ; preds = %2
  %11 = tail call i32 %5() #5
  ret i32 %11
}

. . .
declare hidden i32 @foo() #3
declare hidden i32 @bar() #3


—————————test case---------------
b.c
=================
typedef int (*fptr_t) (void);
fptr_t get_fptr();
extern int i;

int main(int argc, char *argv[])
{
  i = argc - 2;
  fptr_t fp = get_fptr();
  return fp();
}

v.c
================
int i;
typedef int (*fptr_t) (void);
int foo(void) {  return 11; }
int bar(void) {  return 22; }
fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }


> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
>
> > We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
>
> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
>
> > Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>
> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
> Do you have an example of what you are seeing?
>
> Peter
>
> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
> Hi Peter,
>
> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
>
> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>
> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
>
> Thanks.
> Dmitry.
>
>
>
>>>> >
>>>> > a.c
>>>> > =============================
>>>> > typedef int (*fptr_t) (void);
>>>> > fptr_t get_fptr();
>>>> > int main(int argc, char *argv[])
>>>> > {
>>>> >  fptr_t fp = get_fptr();
>>>> >  return fp();
>>>> > }
>>>> >
>>>> >
>>>> > b.c
>>>> > =============================
>>>> > typedef int (*fptr_t) (void);
>>>> > int foo(void) { return 11; }
>>>> > int bar(void) { return 22; }
>>>> >
>>>> > static fptr_t fptr = bar;
>>>> > static int i = 53;
>>>> >
>>>> > fptr_t get_fptr(void)
>>>> > {
>>>> >  if (i >= 0)
>>>> >    fptr = foo;
>>>> >  else
>>>> >    fptr = bar;
>>>> >
>>>> >  return fptr;
>>>> > }
>>>> >
>
>
>
>
>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
>>
>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
>>
>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
>>
>> Peter
>>
>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
>> Teresa, Peter,
>>
>> Thanks for your help!
>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
>>
>> Thanks.
>> Dmitry.
>>
>>
>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
>>>
>>>
>>>
>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
>>> Hi Teresa,
>>>
>>> Thanks for the info!
>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
>>>
>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
>>>
>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
>>>
>>> Teresa
>>>
>>>
>>> Hopefully Peter can chime in regarding CFI related issues.
>>>
>>> Thanks.
>>> Dmitry.
>>>
>>>
>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
>>>>
>>>> Hi Dmitry,
>>>>
>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
>>>>
>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
>>>>
>>>> Teresa
>>>>
>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> > On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>>>> >
>>>> > Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>>>> >
>>>> > I was wondering if there was a way to mitigate this limitation.
>>>> >
>>>> > a.c
>>>> > =============================
>>>> > typedef int (*fptr_t) (void);
>>>> > fptr_t get_fptr();
>>>> > int main(int argc, char *argv[])
>>>> > {
>>>> >  fptr_t fp = get_fptr();
>>>> >  return fp();
>>>> > }
>>>> >
>>>> >
>>>> > b.c
>>>> > =============================
>>>> > typedef int (*fptr_t) (void);
>>>> > int foo(void) { return 11; }
>>>> > int bar(void) { return 22; }
>>>> >
>>>> > static fptr_t fptr = bar;
>>>> > static int i = 53;
>>>> >
>>>> > fptr_t get_fptr(void)
>>>> > {
>>>> >  if (i >= 0)
>>>> >    fptr = foo;
>>>> >  else
>>>> >    fptr = bar;
>>>> >
>>>> >  return fptr;
>>>> > }
>>>> >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > [hidden email]
>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
>>>
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> --
>> --
>> Peter
>
>
>
>
> --
> --
> Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables. I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.

I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?





> On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
>
> For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
>
> The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
>
> Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
>
> --------------------------------
> define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
>  %3 = add nsw i32 %0, -2
>  store i32 %3, i32* @i, align 4, !tbaa !5
>  %4 = icmp sgt i32 %0, 1
>  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
>  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
>  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
>  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
>
> ; <label>:8:                                      ; preds = %2
>  %9 = ptrtoint i32 ()* %5 to i64
>  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
>  unreachable, !nosanitize !9
>
> ; <label>:10:                                     ; preds = %2
>  %11 = tail call i32 %5() #5
>  ret i32 %11
> }
>
> . . .
> declare hidden i32 @foo() #3
> declare hidden i32 @bar() #3
>
>
> —————————test case---------------
> b.c
> =================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> extern int i;
>
> int main(int argc, char *argv[])
> {
>  i = argc - 2;
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
> v.c
> ================
> int i;
> typedef int (*fptr_t) (void);
> int foo(void) {  return 11; }
> int bar(void) {  return 22; }
> fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
>
>
>> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
>>
>>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
>>
>> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
>>
>>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>>
>> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
>> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
>> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
>> Do you have an example of what you are seeing?
>>
>> Peter
>>
>> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
>> Hi Peter,
>>
>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
>>
>> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>>
>> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
>>
>> Thanks.
>> Dmitry.
>>
>>
>>
>>>>>>
>>>>>> a.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> fptr_t get_fptr();
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>> fptr_t fp = get_fptr();
>>>>>> return fp();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> b.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> int foo(void) { return 11; }
>>>>>> int bar(void) { return 22; }
>>>>>>
>>>>>> static fptr_t fptr = bar;
>>>>>> static int i = 53;
>>>>>>
>>>>>> fptr_t get_fptr(void)
>>>>>> {
>>>>>> if (i >= 0)
>>>>>>   fptr = foo;
>>>>>> else
>>>>>>   fptr = bar;
>>>>>>
>>>>>> return fptr;
>>>>>> }
>>>>>>
>>
>>
>>
>>
>>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
>>>
>>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
>>>
>>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
>>>
>>> Peter
>>>
>>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
>>> Teresa, Peter,
>>>
>>> Thanks for your help!
>>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
>>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
>>>
>>> Thanks.
>>> Dmitry.
>>>
>>>
>>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
>>>> Hi Teresa,
>>>>
>>>> Thanks for the info!
>>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
>>>>
>>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
>>>>
>>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
>>>>
>>>> Teresa
>>>>
>>>>
>>>> Hopefully Peter can chime in regarding CFI related issues.
>>>>
>>>> Thanks.
>>>> Dmitry.
>>>>
>>>>
>>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
>>>>>
>>>>> Hi Dmitry,
>>>>>
>>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
>>>>>
>>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
>>>>>
>>>>> Teresa
>>>>>
>>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
>>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>>>>>>
>>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>>>>>>
>>>>>> I was wondering if there was a way to mitigate this limitation.
>>>>>>
>>>>>> a.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> fptr_t get_fptr();
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>> fptr_t fp = get_fptr();
>>>>>> return fp();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> b.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> int foo(void) { return 11; }
>>>>>> int bar(void) { return 22; }
>>>>>>
>>>>>> static fptr_t fptr = bar;
>>>>>> static int i = 53;
>>>>>>
>>>>>> fptr_t get_fptr(void)
>>>>>> {
>>>>>> if (i >= 0)
>>>>>>   fptr = foo;
>>>>>> else
>>>>>>   fptr = bar;
>>>>>>
>>>>>> return fptr;
>>>>>> }
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> [hidden email]
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   408-460-2413
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson |    Software Engineer |      [hidden email] |   408-460-2413
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>
>>
>>
>>
>> --
>> --
>> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

thinlto+cfi-directcalls.diff (3K) Download Attachment
thinlto+cfi-movepass.diff (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
Hi Dmitry, the direct call patch seems like a good start; however, by falling through to the F->replaceUsesExceptBlockAddr(FDecl) at the bottom of LowerTypeTestsModule::importFunction() I believe it will replace all uses of a function (for a definition outside the TU it's defined in) with the direct function reference instead of just direct calls. Also, I believe the logic for replaceDirectCalls() should be more restrictive, currently it will also rewrite calls where the function use is as a function pointer argument instead of the call target.

On Mon, Apr 30, 2018 at 1:05 PM <[hidden email]> wrote:
Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables. I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.

I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?





> On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
>
> For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
>
> The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
>
> Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
>
> --------------------------------
> define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
>  %3 = add nsw i32 %0, -2
>  store i32 %3, i32* @i, align 4, !tbaa !5
>  %4 = icmp sgt i32 %0, 1
>  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
>  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
>  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
>  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
>
> ; <label>:8:                                      ; preds = %2
>  %9 = ptrtoint i32 ()* %5 to i64
>  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
>  unreachable, !nosanitize !9
>
> ; <label>:10:                                     ; preds = %2
>  %11 = tail call i32 %5() #5
>  ret i32 %11
> }
>
> . . .
> declare hidden i32 @foo() #3
> declare hidden i32 @bar() #3
>
>
> —————————test case---------------
> b.c
> =================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> extern int i;
>
> int main(int argc, char *argv[])
> {
>  i = argc - 2;
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
> v.c
> ================
> int i;
> typedef int (*fptr_t) (void);
> int foo(void) {  return 11; }
> int bar(void) {  return 22; }
> fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
>
>
>> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
>>
>>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
>>
>> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
>>
>>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>>
>> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
>> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
>> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
>> Do you have an example of what you are seeing?
>>
>> Peter
>>
>> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
>> Hi Peter,
>>
>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
>>
>> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>>
>> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
>>
>> Thanks.
>> Dmitry.
>>
>>
>>
>>>>>>
>>>>>> a.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> fptr_t get_fptr();
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>> fptr_t fp = get_fptr();
>>>>>> return fp();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> b.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> int foo(void) { return 11; }
>>>>>> int bar(void) { return 22; }
>>>>>>
>>>>>> static fptr_t fptr = bar;
>>>>>> static int i = 53;
>>>>>>
>>>>>> fptr_t get_fptr(void)
>>>>>> {
>>>>>> if (i >= 0)
>>>>>>   fptr = foo;
>>>>>> else
>>>>>>   fptr = bar;
>>>>>>
>>>>>> return fptr;
>>>>>> }
>>>>>>
>>
>>
>>
>>
>>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
>>>
>>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
>>>
>>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
>>>
>>> Peter
>>>
>>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
>>> Teresa, Peter,
>>>
>>> Thanks for your help!
>>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
>>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
>>>
>>> Thanks.
>>> Dmitry.
>>>
>>>
>>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
>>>> Hi Teresa,
>>>>
>>>> Thanks for the info!
>>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
>>>>
>>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
>>>>
>>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
>>>>
>>>> Teresa
>>>>
>>>>
>>>> Hopefully Peter can chime in regarding CFI related issues.
>>>>
>>>> Thanks.
>>>> Dmitry.
>>>>
>>>>
>>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
>>>>>
>>>>> Hi Dmitry,
>>>>>
>>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
>>>>>
>>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
>>>>>
>>>>> Teresa
>>>>>
>>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
>>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>>>>>>
>>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>>>>>>
>>>>>> I was wondering if there was a way to mitigate this limitation.
>>>>>>
>>>>>> a.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> fptr_t get_fptr();
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>> fptr_t fp = get_fptr();
>>>>>> return fp();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> b.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> int foo(void) { return 11; }
>>>>>> int bar(void) { return 22; }
>>>>>>
>>>>>> static fptr_t fptr = bar;
>>>>>> static int i = 53;
>>>>>>
>>>>>> fptr_t get_fptr(void)
>>>>>> {
>>>>>> if (i >= 0)
>>>>>>   fptr = foo;
>>>>>> else
>>>>>>   fptr = bar;
>>>>>>
>>>>>> return fptr;
>>>>>> }
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> [hidden email]
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   <a href="tel:(408)%20460-2413" value="+14084602413" target="_blank">408-460-2413
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson |    Software Engineer |      [hidden email] |   <a href="tel:(408)%20460-2413" value="+14084602413" target="_blank">408-460-2413
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>
>>
>>
>>
>> --
>> --
>> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
On Mon, Apr 30, 2018 at 1:05 PM, <[hidden email]> wrote:
Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables.

I would have thought that it could be beneficial to place the jump table next to one of the target functions, as it may reduce the chance of an additional page fault when performing an indirect call via a jump table. Do your benchmarks show otherwise?
 
I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.

I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?

The issue with running the passes early is that the llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is somewhat fragile, so it needs to see the IR first before one of the other passes can break it. (This is probably something that we want to fix by using a different pattern.) I don't think that the same issue exists with any of the patterns that LowerTypeTests looks for, so it would probably be fine to move it later.

Peter






> On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
>
> For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
>
> The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
>
> Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
>
> --------------------------------
> define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
>  %3 = add nsw i32 %0, -2
>  store i32 %3, i32* @i, align 4, !tbaa !5
>  %4 = icmp sgt i32 %0, 1
>  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
>  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
>  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
>  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
>
> ; <label>:8:                                      ; preds = %2
>  %9 = ptrtoint i32 ()* %5 to i64
>  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
>  unreachable, !nosanitize !9
>
> ; <label>:10:                                     ; preds = %2
>  %11 = tail call i32 %5() #5
>  ret i32 %11
> }
>
> . . .
> declare hidden i32 @foo() #3
> declare hidden i32 @bar() #3
>
>
> —————————test case---------------
> b.c
> =================
> typedef int (*fptr_t) (void);
> fptr_t get_fptr();
> extern int i;
>
> int main(int argc, char *argv[])
> {
>  i = argc - 2;
>  fptr_t fp = get_fptr();
>  return fp();
> }
>
> v.c
> ================
> int i;
> typedef int (*fptr_t) (void);
> int foo(void) {  return 11; }
> int bar(void) {  return 22; }
> fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
>
>
>> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
>>
>>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
>>
>> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
>>
>>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>>
>> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
>> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
>> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
>> Do you have an example of what you are seeing?
>>
>> Peter
>>
>> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
>> Hi Peter,
>>
>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
>>
>> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
>>
>> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
>>
>> Thanks.
>> Dmitry.
>>
>>
>>
>>>>>>
>>>>>> a.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> fptr_t get_fptr();
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>> fptr_t fp = get_fptr();
>>>>>> return fp();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> b.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> int foo(void) { return 11; }
>>>>>> int bar(void) { return 22; }
>>>>>>
>>>>>> static fptr_t fptr = bar;
>>>>>> static int i = 53;
>>>>>>
>>>>>> fptr_t get_fptr(void)
>>>>>> {
>>>>>> if (i >= 0)
>>>>>>   fptr = foo;
>>>>>> else
>>>>>>   fptr = bar;
>>>>>>
>>>>>> return fptr;
>>>>>> }
>>>>>>
>>
>>
>>
>>
>>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
>>>
>>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
>>>
>>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
>>>
>>> Peter
>>>
>>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
>>> Teresa, Peter,
>>>
>>> Thanks for your help!
>>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
>>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
>>>
>>> Thanks.
>>> Dmitry.
>>>
>>>
>>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
>>>> Hi Teresa,
>>>>
>>>> Thanks for the info!
>>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
>>>>
>>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
>>>>
>>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
>>>>
>>>> Teresa
>>>>
>>>>
>>>> Hopefully Peter can chime in regarding CFI related issues.
>>>>
>>>> Thanks.
>>>> Dmitry.
>>>>
>>>>
>>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
>>>>>
>>>>> Hi Dmitry,
>>>>>
>>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
>>>>>
>>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
>>>>>
>>>>> Teresa
>>>>>
>>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
>>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
>>>>>>
>>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
>>>>>>
>>>>>> I was wondering if there was a way to mitigate this limitation.
>>>>>>
>>>>>> a.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> fptr_t get_fptr();
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>> fptr_t fp = get_fptr();
>>>>>> return fp();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> b.c
>>>>>> =============================
>>>>>> typedef int (*fptr_t) (void);
>>>>>> int foo(void) { return 11; }
>>>>>> int bar(void) { return 22; }
>>>>>>
>>>>>> static fptr_t fptr = bar;
>>>>>> static int i = 53;
>>>>>>
>>>>>> fptr_t get_fptr(void)
>>>>>> {
>>>>>> if (i >= 0)
>>>>>>   fptr = foo;
>>>>>> else
>>>>>>   fptr = bar;
>>>>>>
>>>>>> return fptr;
>>>>>> }
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> [hidden email]
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   408-460-2413
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson |    Software Engineer |      [hidden email] |   408-460-2413
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>
>>
>>
>>
>> --
>> --
>> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev


> On Apr 30, 2018, at 5:43 PM, Vlad Tsyrklevich <[hidden email]> wrote:
>
> Hi Dmitry, the direct call patch seems like a good start; however, by falling through to the F->replaceUsesExceptBlockAddr(FDecl) at the bottom of LowerTypeTestsModule::importFunction() I believe it will replace all uses of a function (for a definition outside the TU it's defined in) with the direct function reference instead of just direct calls.

You’re right, it was meant to bail after direct call replacement.

> Also, I believe the logic for replaceDirectCalls() should be more restrictive, currently it will also rewrite calls where the function use is as a function pointer argument instead of the call target.

I’m checking if the user is a call instruction and that it’s getCalledFunction() returns non-null. I thought this would only give me direct calls. If not, what would be a good check?

I will create a Phab review to make it easier to comment.


>
> On Mon, Apr 30, 2018 at 1:05 PM <[hidden email]> wrote:
> Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables. I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.
>
> I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?
>
>
>
>
>
> > On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
> >
> > For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
> >
> > The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> > In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
> >
> > Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> > Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
> >
> > --------------------------------
> > define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
> >  %3 = add nsw i32 %0, -2
> >  store i32 %3, i32* @i, align 4, !tbaa !5
> >  %4 = icmp sgt i32 %0, 1
> >  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> >  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> >  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
> >  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> >
> > ; <label>:8:                                      ; preds = %2
> >  %9 = ptrtoint i32 ()* %5 to i64
> >  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
> >  unreachable, !nosanitize !9
> >
> > ; <label>:10:                                     ; preds = %2
> >  %11 = tail call i32 %5() #5
> >  ret i32 %11
> > }
> >
> > . . .
> > declare hidden i32 @foo() #3
> > declare hidden i32 @bar() #3
> >
> >
> > —————————test case---------------
> > b.c
> > =================
> > typedef int (*fptr_t) (void);
> > fptr_t get_fptr();
> > extern int i;
> >
> > int main(int argc, char *argv[])
> > {
> >  i = argc - 2;
> >  fptr_t fp = get_fptr();
> >  return fp();
> > }
> >
> > v.c
> > ================
> > int i;
> > typedef int (*fptr_t) (void);
> > int foo(void) {  return 11; }
> > int bar(void) {  return 22; }
> > fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
> >
> >
> >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
> >>
> >>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
> >>
> >> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
> >>
> >>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> >>
> >> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
> >> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
> >> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
> >> Do you have an example of what you are seeing?
> >>
> >> Peter
> >>
> >> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
> >> Hi Peter,
> >>
> >> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
> >>
> >> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> >>
> >> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
> >>
> >> Thanks.
> >> Dmitry.
> >>
> >>
> >>
> >>>>>>
> >>>>>> a.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> fptr_t get_fptr();
> >>>>>> int main(int argc, char *argv[])
> >>>>>> {
> >>>>>> fptr_t fp = get_fptr();
> >>>>>> return fp();
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> b.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> int foo(void) { return 11; }
> >>>>>> int bar(void) { return 22; }
> >>>>>>
> >>>>>> static fptr_t fptr = bar;
> >>>>>> static int i = 53;
> >>>>>>
> >>>>>> fptr_t get_fptr(void)
> >>>>>> {
> >>>>>> if (i >= 0)
> >>>>>>   fptr = foo;
> >>>>>> else
> >>>>>>   fptr = bar;
> >>>>>>
> >>>>>> return fptr;
> >>>>>> }
> >>>>>>
> >>
> >>
> >>
> >>
> >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
> >>>
> >>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
> >>>
> >>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
> >>>
> >>> Peter
> >>>
> >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
> >>> Teresa, Peter,
> >>>
> >>> Thanks for your help!
> >>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
> >>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
> >>>
> >>> Thanks.
> >>> Dmitry.
> >>>
> >>>
> >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
> >>>> Hi Teresa,
> >>>>
> >>>> Thanks for the info!
> >>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
> >>>>
> >>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> >>>>
> >>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
> >>>>
> >>>> Teresa
> >>>>
> >>>>
> >>>> Hopefully Peter can chime in regarding CFI related issues.
> >>>>
> >>>> Thanks.
> >>>> Dmitry.
> >>>>
> >>>>
> >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
> >>>>>
> >>>>> Hi Dmitry,
> >>>>>
> >>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
> >>>>>
> >>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
> >>>>>
> >>>>> Teresa
> >>>>>
> >>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
> >>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>>
> >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> >>>>>>
> >>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> >>>>>>
> >>>>>> I was wondering if there was a way to mitigate this limitation.
> >>>>>>
> >>>>>> a.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> fptr_t get_fptr();
> >>>>>> int main(int argc, char *argv[])
> >>>>>> {
> >>>>>> fptr_t fp = get_fptr();
> >>>>>> return fp();
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> b.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> int foo(void) { return 11; }
> >>>>>> int bar(void) { return 22; }
> >>>>>>
> >>>>>> static fptr_t fptr = bar;
> >>>>>> static int i = 53;
> >>>>>>
> >>>>>> fptr_t get_fptr(void)
> >>>>>> {
> >>>>>> if (i >= 0)
> >>>>>>   fptr = foo;
> >>>>>> else
> >>>>>>   fptr = bar;
> >>>>>>
> >>>>>> return fptr;
> >>>>>> }
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> LLVM Developers mailing list
> >>>>>> [hidden email]
> >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   408-460-2413
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Teresa Johnson |    Software Engineer |      [hidden email] |   408-460-2413
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> [hidden email]
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Peter
> >>
> >>
> >>
> >>
> >> --
> >> --
> >> Peter
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev


> On Apr 30, 2018, at 6:04 PM, Peter Collingbourne <[hidden email]> wrote:
>
> On Mon, Apr 30, 2018 at 1:05 PM,  <[hidden email]> wrote:
> Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables.
>
> I would have thought that it could be beneficial to place the jump table next to one of the target functions, as it may reduce the chance of an additional page fault when performing an indirect call via a jump table. Do your benchmarks show otherwise?

The problem is that the jump table entries end up in a large section of <name>.lto.o and the entire section is placed right next to one of the real functions. It breaks all locality in our case.

>  
> I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.
>
> I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?
>
> The issue with running the passes early is that the llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is somewhat fragile, so it needs to see the IR first before one of the other passes can break it. (This is probably something that we want to fix by using a different pattern.) I don't think that the same issue exists with any of the patterns that LowerTypeTests looks for, so it would probably be fine to move it later.

Moving WholeProgramDevirt breaks some tests. LowerTypeTests seems OK, not sure how much we can trust the tests though ;)
I’ll create a Phab review for this as well.


>
> Peter
>
>
>
>
>
>
> > On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
> >
> > For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
> >
> > The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> > In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
> >
> > Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> > Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
> >
> > --------------------------------
> > define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
> >  %3 = add nsw i32 %0, -2
> >  store i32 %3, i32* @i, align 4, !tbaa !5
> >  %4 = icmp sgt i32 %0, 1
> >  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> >  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> >  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
> >  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> >
> > ; <label>:8:                                      ; preds = %2
> >  %9 = ptrtoint i32 ()* %5 to i64
> >  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
> >  unreachable, !nosanitize !9
> >
> > ; <label>:10:                                     ; preds = %2
> >  %11 = tail call i32 %5() #5
> >  ret i32 %11
> > }
> >
> > . . .
> > declare hidden i32 @foo() #3
> > declare hidden i32 @bar() #3
> >
> >
> > —————————test case---------------
> > b.c
> > =================
> > typedef int (*fptr_t) (void);
> > fptr_t get_fptr();
> > extern int i;
> >
> > int main(int argc, char *argv[])
> > {
> >  i = argc - 2;
> >  fptr_t fp = get_fptr();
> >  return fp();
> > }
> >
> > v.c
> > ================
> > int i;
> > typedef int (*fptr_t) (void);
> > int foo(void) {  return 11; }
> > int bar(void) {  return 22; }
> > fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
> >
> >
> >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
> >>
> >>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
> >>
> >> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
> >>
> >>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> >>
> >> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
> >> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
> >> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
> >> Do you have an example of what you are seeing?
> >>
> >> Peter
> >>
> >> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
> >> Hi Peter,
> >>
> >> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
> >>
> >> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> >>
> >> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
> >>
> >> Thanks.
> >> Dmitry.
> >>
> >>
> >>
> >>>>>>
> >>>>>> a.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> fptr_t get_fptr();
> >>>>>> int main(int argc, char *argv[])
> >>>>>> {
> >>>>>> fptr_t fp = get_fptr();
> >>>>>> return fp();
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> b.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> int foo(void) { return 11; }
> >>>>>> int bar(void) { return 22; }
> >>>>>>
> >>>>>> static fptr_t fptr = bar;
> >>>>>> static int i = 53;
> >>>>>>
> >>>>>> fptr_t get_fptr(void)
> >>>>>> {
> >>>>>> if (i >= 0)
> >>>>>>   fptr = foo;
> >>>>>> else
> >>>>>>   fptr = bar;
> >>>>>>
> >>>>>> return fptr;
> >>>>>> }
> >>>>>>
> >>
> >>
> >>
> >>
> >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
> >>>
> >>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
> >>>
> >>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
> >>>
> >>> Peter
> >>>
> >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
> >>> Teresa, Peter,
> >>>
> >>> Thanks for your help!
> >>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
> >>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
> >>>
> >>> Thanks.
> >>> Dmitry.
> >>>
> >>>
> >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
> >>>> Hi Teresa,
> >>>>
> >>>> Thanks for the info!
> >>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
> >>>>
> >>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> >>>>
> >>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
> >>>>
> >>>> Teresa
> >>>>
> >>>>
> >>>> Hopefully Peter can chime in regarding CFI related issues.
> >>>>
> >>>> Thanks.
> >>>> Dmitry.
> >>>>
> >>>>
> >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
> >>>>>
> >>>>> Hi Dmitry,
> >>>>>
> >>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
> >>>>>
> >>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
> >>>>>
> >>>>> Teresa
> >>>>>
> >>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
> >>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>>
> >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> >>>>>>
> >>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> >>>>>>
> >>>>>> I was wondering if there was a way to mitigate this limitation.
> >>>>>>
> >>>>>> a.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> fptr_t get_fptr();
> >>>>>> int main(int argc, char *argv[])
> >>>>>> {
> >>>>>> fptr_t fp = get_fptr();
> >>>>>> return fp();
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> b.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> int foo(void) { return 11; }
> >>>>>> int bar(void) { return 22; }
> >>>>>>
> >>>>>> static fptr_t fptr = bar;
> >>>>>> static int i = 53;
> >>>>>>
> >>>>>> fptr_t get_fptr(void)
> >>>>>> {
> >>>>>> if (i >= 0)
> >>>>>>   fptr = foo;
> >>>>>> else
> >>>>>>   fptr = bar;
> >>>>>>
> >>>>>> return fptr;
> >>>>>> }
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> LLVM Developers mailing list
> >>>>>> [hidden email]
> >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   408-460-2413
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Teresa Johnson |    Software Engineer |      [hidden email] |   408-460-2413
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> [hidden email]
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Peter
> >>
> >>
> >>
> >>
> >> --
> >> --
> >> Peter
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> --
> Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev


On Tue, May 1, 2018 at 11:10 AM, <[hidden email]> wrote:


> On Apr 30, 2018, at 6:04 PM, Peter Collingbourne <[hidden email]> wrote:
>
> On Mon, Apr 30, 2018 at 1:05 PM,  <[hidden email]> wrote:
> Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables.
>
> I would have thought that it could be beneficial to place the jump table next to one of the target functions, as it may reduce the chance of an additional page fault when performing an indirect call via a jump table. Do your benchmarks show otherwise?

The problem is that the jump table entries end up in a large section of <name>.lto.o and the entire section is placed right next to one of the real functions. It breaks all locality in our case.

Is <name>.lto.o being compiled with function sections? That should allow the linker to move just the jump table. I know that lld and the gold plugin do that by default, but I don't know what your linker does.


> I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.
>
> I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?
>
> The issue with running the passes early is that the llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is somewhat fragile, so it needs to see the IR first before one of the other passes can break it. (This is probably something that we want to fix by using a different pattern.) I don't think that the same issue exists with any of the patterns that LowerTypeTests looks for, so it would probably be fine to move it later.

Moving WholeProgramDevirt breaks some tests. LowerTypeTests seems OK, not sure how much we can trust the tests though ;)
I’ll create a Phab review for this as well.

Thanks, I'll take a look.

Peter 


>
> Peter
>
>
>
>
>
>
> > On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
> >
> > For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
> >
> > The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> > In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
> >
> > Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> > Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
> >
> > --------------------------------
> > define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
> >  %3 = add nsw i32 %0, -2
> >  store i32 %3, i32* @i, align 4, !tbaa !5
> >  %4 = icmp sgt i32 %0, 1
> >  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> >  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> >  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
> >  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> >
> > ; <label>:8:                                      ; preds = %2
> >  %9 = ptrtoint i32 ()* %5 to i64
> >  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
> >  unreachable, !nosanitize !9
> >
> > ; <label>:10:                                     ; preds = %2
> >  %11 = tail call i32 %5() #5
> >  ret i32 %11
> > }
> >
> > . . .
> > declare hidden i32 @foo() #3
> > declare hidden i32 @bar() #3
> >
> >
> > —————————test case---------------
> > b.c
> > =================
> > typedef int (*fptr_t) (void);
> > fptr_t get_fptr();
> > extern int i;
> >
> > int main(int argc, char *argv[])
> > {
> >  i = argc - 2;
> >  fptr_t fp = get_fptr();
> >  return fp();
> > }
> >
> > v.c
> > ================
> > int i;
> > typedef int (*fptr_t) (void);
> > int foo(void) {  return 11; }
> > int bar(void) {  return 22; }
> > fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
> >
> >
> >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
> >>
> >>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
> >>
> >> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
> >>
> >>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> >>
> >> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
> >> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
> >> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
> >> Do you have an example of what you are seeing?
> >>
> >> Peter
> >>
> >> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
> >> Hi Peter,
> >>
> >> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
> >>
> >> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> >>
> >> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
> >>
> >> Thanks.
> >> Dmitry.
> >>
> >>
> >>
> >>>>>>
> >>>>>> a.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> fptr_t get_fptr();
> >>>>>> int main(int argc, char *argv[])
> >>>>>> {
> >>>>>> fptr_t fp = get_fptr();
> >>>>>> return fp();
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> b.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> int foo(void) { return 11; }
> >>>>>> int bar(void) { return 22; }
> >>>>>>
> >>>>>> static fptr_t fptr = bar;
> >>>>>> static int i = 53;
> >>>>>>
> >>>>>> fptr_t get_fptr(void)
> >>>>>> {
> >>>>>> if (i >= 0)
> >>>>>>   fptr = foo;
> >>>>>> else
> >>>>>>   fptr = bar;
> >>>>>>
> >>>>>> return fptr;
> >>>>>> }
> >>>>>>
> >>
> >>
> >>
> >>
> >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
> >>>
> >>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
> >>>
> >>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
> >>>
> >>> Peter
> >>>
> >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
> >>> Teresa, Peter,
> >>>
> >>> Thanks for your help!
> >>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
> >>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
> >>>
> >>> Thanks.
> >>> Dmitry.
> >>>
> >>>
> >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
> >>>> Hi Teresa,
> >>>>
> >>>> Thanks for the info!
> >>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
> >>>>
> >>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> >>>>
> >>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
> >>>>
> >>>> Teresa
> >>>>
> >>>>
> >>>> Hopefully Peter can chime in regarding CFI related issues.
> >>>>
> >>>> Thanks.
> >>>> Dmitry.
> >>>>
> >>>>
> >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
> >>>>>
> >>>>> Hi Dmitry,
> >>>>>
> >>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
> >>>>>
> >>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
> >>>>>
> >>>>> Teresa
> >>>>>
> >>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
> >>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>>
> >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> >>>>>>
> >>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> >>>>>>
> >>>>>> I was wondering if there was a way to mitigate this limitation.
> >>>>>>
> >>>>>> a.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> fptr_t get_fptr();
> >>>>>> int main(int argc, char *argv[])
> >>>>>> {
> >>>>>> fptr_t fp = get_fptr();
> >>>>>> return fp();
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> b.c
> >>>>>> =============================
> >>>>>> typedef int (*fptr_t) (void);
> >>>>>> int foo(void) { return 11; }
> >>>>>> int bar(void) { return 22; }
> >>>>>>
> >>>>>> static fptr_t fptr = bar;
> >>>>>> static int i = 53;
> >>>>>>
> >>>>>> fptr_t get_fptr(void)
> >>>>>> {
> >>>>>> if (i >= 0)
> >>>>>>   fptr = foo;
> >>>>>> else
> >>>>>>   fptr = bar;
> >>>>>>
> >>>>>> return fptr;
> >>>>>> }
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> LLVM Developers mailing list
> >>>>>> [hidden email]
> >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   408-460-2413
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Teresa Johnson |    Software Engineer |      [hidden email] |   408-460-2413
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> [hidden email]
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Peter
> >>
> >>
> >>
> >>
> >> --
> >> --
> >> Peter
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> --
> Peter




--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
It’s not the functions themselves, it’s the jump tables that end up in the same section of the thin-link temporary object. If I have, say, foo and bar and both of them need a jump table entry, both entries will be in the same section of the temp object. The real functions will be renamed to foo.cfi and bar.cfi. If I list “foo.cfi, foo, bar.cfi, bar” in the link order file, the linker (I use lld) will place foo.cfi followed by the entire jump table, followed by bar. In our case the jump table section is about 80K, so the expectation of foo being on the same page as bar breaks down.

 

> On May 1, 2018, at 11:16 AM, Peter Collingbourne <[hidden email]> wrote:
>
>
>
> On Tue, May 1, 2018 at 11:10 AM,  <[hidden email]> wrote:
>
>
> > On Apr 30, 2018, at 6:04 PM, Peter Collingbourne <[hidden email]> wrote:
> >
> > On Mon, Apr 30, 2018 at 1:05 PM,  <[hidden email]> wrote:
> > Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables.
> >
> > I would have thought that it could be beneficial to place the jump table next to one of the target functions, as it may reduce the chance of an additional page fault when performing an indirect call via a jump table. Do your benchmarks show otherwise?
>
> The problem is that the jump table entries end up in a large section of <name>.lto.o and the entire section is placed right next to one of the real functions. It breaks all locality in our case.
>
> Is <name>.lto.o being compiled with function sections? That should allow the linker to move just the jump table. I know that lld and the gold plugin do that by default, but I don't know what your linker does.
>
> >  
> > I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.
> >
> > I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?
> >
> > The issue with running the passes early is that the llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is somewhat fragile, so it needs to see the IR first before one of the other passes can break it. (This is probably something that we want to fix by using a different pattern.) I don't think that the same issue exists with any of the patterns that LowerTypeTests looks for, so it would probably be fine to move it later.
>
> Moving WholeProgramDevirt breaks some tests. LowerTypeTests seems OK, not sure how much we can trust the tests though ;)
> I’ll create a Phab review for this as well.
>
> Thanks, I'll take a look.
>
> Peter
>
>
> >
> > Peter
> >
> >
> >
> >
> >
> >
> > > On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
> > >
> > > For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
> > >
> > > The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> > > In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
> > >
> > > Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> > > Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
> > >
> > > --------------------------------
> > > define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
> > >  %3 = add nsw i32 %0, -2
> > >  store i32 %3, i32* @i, align 4, !tbaa !5
> > >  %4 = icmp sgt i32 %0, 1
> > >  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> > >  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> > >  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
> > >  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> > >
> > > ; <label>:8:                                      ; preds = %2
> > >  %9 = ptrtoint i32 ()* %5 to i64
> > >  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
> > >  unreachable, !nosanitize !9
> > >
> > > ; <label>:10:                                     ; preds = %2
> > >  %11 = tail call i32 %5() #5
> > >  ret i32 %11
> > > }
> > >
> > > . . .
> > > declare hidden i32 @foo() #3
> > > declare hidden i32 @bar() #3
> > >
> > >
> > > —————————test case---------------
> > > b.c
> > > =================
> > > typedef int (*fptr_t) (void);
> > > fptr_t get_fptr();
> > > extern int i;
> > >
> > > int main(int argc, char *argv[])
> > > {
> > >  i = argc - 2;
> > >  fptr_t fp = get_fptr();
> > >  return fp();
> > > }
> > >
> > > v.c
> > > ================
> > > int i;
> > > typedef int (*fptr_t) (void);
> > > int foo(void) {  return 11; }
> > > int bar(void) {  return 22; }
> > > fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
> > >
> > >
> > >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
> > >>
> > >>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
> > >>
> > >> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
> > >>
> > >>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> > >>
> > >> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
> > >> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
> > >> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
> > >> Do you have an example of what you are seeing?
> > >>
> > >> Peter
> > >>
> > >> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
> > >> Hi Peter,
> > >>
> > >> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
> > >>
> > >> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> > >>
> > >> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
> > >>
> > >> Thanks.
> > >> Dmitry.
> > >>
> > >>
> > >>
> > >>>>>>
> > >>>>>> a.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> fptr_t get_fptr();
> > >>>>>> int main(int argc, char *argv[])
> > >>>>>> {
> > >>>>>> fptr_t fp = get_fptr();
> > >>>>>> return fp();
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> b.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> int foo(void) { return 11; }
> > >>>>>> int bar(void) { return 22; }
> > >>>>>>
> > >>>>>> static fptr_t fptr = bar;
> > >>>>>> static int i = 53;
> > >>>>>>
> > >>>>>> fptr_t get_fptr(void)
> > >>>>>> {
> > >>>>>> if (i >= 0)
> > >>>>>>   fptr = foo;
> > >>>>>> else
> > >>>>>>   fptr = bar;
> > >>>>>>
> > >>>>>> return fptr;
> > >>>>>> }
> > >>>>>>
> > >>
> > >>
> > >>
> > >>
> > >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
> > >>>
> > >>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
> > >>>
> > >>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
> > >>>
> > >>> Peter
> > >>>
> > >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
> > >>> Teresa, Peter,
> > >>>
> > >>> Thanks for your help!
> > >>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
> > >>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
> > >>>
> > >>> Thanks.
> > >>> Dmitry.
> > >>>
> > >>>
> > >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
> > >>>> Hi Teresa,
> > >>>>
> > >>>> Thanks for the info!
> > >>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
> > >>>>
> > >>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> > >>>>
> > >>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
> > >>>>
> > >>>> Teresa
> > >>>>
> > >>>>
> > >>>> Hopefully Peter can chime in regarding CFI related issues.
> > >>>>
> > >>>> Thanks.
> > >>>> Dmitry.
> > >>>>
> > >>>>
> > >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
> > >>>>>
> > >>>>> Hi Dmitry,
> > >>>>>
> > >>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
> > >>>>>
> > >>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
> > >>>>>
> > >>>>> Teresa
> > >>>>>
> > >>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
> > >>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
> > >>>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>>
> > >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> > >>>>>>
> > >>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> > >>>>>>
> > >>>>>> I was wondering if there was a way to mitigate this limitation.
> > >>>>>>
> > >>>>>> a.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> fptr_t get_fptr();
> > >>>>>> int main(int argc, char *argv[])
> > >>>>>> {
> > >>>>>> fptr_t fp = get_fptr();
> > >>>>>> return fp();
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> b.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> int foo(void) { return 11; }
> > >>>>>> int bar(void) { return 22; }
> > >>>>>>
> > >>>>>> static fptr_t fptr = bar;
> > >>>>>> static int i = 53;
> > >>>>>>
> > >>>>>> fptr_t get_fptr(void)
> > >>>>>> {
> > >>>>>> if (i >= 0)
> > >>>>>>   fptr = foo;
> > >>>>>> else
> > >>>>>>   fptr = bar;
> > >>>>>>
> > >>>>>> return fptr;
> > >>>>>> }
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> LLVM Developers mailing list
> > >>>>>> [hidden email]
> > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   408-460-2413
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Teresa Johnson |    Software Engineer |      [hidden email] |   408-460-2413
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> LLVM Developers mailing list
> > >>> [hidden email]
> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> --
> > >>> Peter
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> --
> > >> Peter
> > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > [hidden email]
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> >
> > --
> > --
> > Peter
>
>
>
>
> --
> --
> Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] ThinLTO + CFI

Tom Stellard via llvm-dev
On Tue, May 1, 2018 at 11:29 AM, <[hidden email]> wrote:
It’s not the functions themselves, it’s the jump tables that end up in the same section of the thin-link temporary object. If I have, say, foo and bar and both of them need a jump table entry, both entries will be in the same section of the temp object. The real functions will be renamed to foo.cfi and bar.cfi. If I list “foo.cfi, foo, bar.cfi, bar” in the link order file, the linker (I use lld) will place foo.cfi followed by the entire jump table, followed by bar. In our case the jump table section is about 80K, so the expectation of foo being on the same page as bar breaks down.

I see what the problem is, it's that we use a custom section name for jump tables. Normally, each function (the jump table for each type is a separate function) gets a separate section, but that isn't the case for functions with custom section names.

If you comment out this line of LowerTypeTests.cpp:
does that help reduce the size of the individual sections?

Peter


> On May 1, 2018, at 11:16 AM, Peter Collingbourne <[hidden email]> wrote:
>
>
>
> On Tue, May 1, 2018 at 11:10 AM,  <[hidden email]> wrote:
>
>
> > On Apr 30, 2018, at 6:04 PM, Peter Collingbourne <[hidden email]> wrote:
> >
> > On Mon, Apr 30, 2018 at 1:05 PM,  <[hidden email]> wrote:
> > Replacing direct calls to jump table entries with calls to real targets recovers most of the performance loss in my benchmark. Dealing with link order files is a bit cumbersome though: I can’t keep both, say “foo” and “foo.cfi” next to each other in link order, because if both exist, the linker reorders the jump table next to the real function. This is not what we want when the goal is to get rid of calls through jump tables.
> >
> > I would have thought that it could be beneficial to place the jump table next to one of the target functions, as it may reduce the chance of an additional page fault when performing an indirect call via a jump table. Do your benchmarks show otherwise?
>
> The problem is that the jump table entries end up in a large section of <name>.lto.o and the entire section is placed right next to one of the real functions. It breaks all locality in our case.
>
> Is <name>.lto.o being compiled with function sections? That should allow the linker to move just the jump table. I know that lld and the gold plugin do that by default, but I don't know what your linker does.
>
> > 
> > I had to manually pick the functions which were renamed to “foo.cfi” and replace them in the link order file. The process can be automated, but it’s a bit flaky if we have to do it to make CFI work.
> >
> > I attached 2 patches here: one is the direct call replacement and the other is moving type test lowering pass to run later in the pipeline. Interestingly, running the pass later seems to help with performance as well. Though the comment in PassManagerBuilder implies that this pass needs to run early. Why is it different from full LTO?
> >
> > The issue with running the passes early is that the llvm.assume(llvm.type.test) pattern that WholeProgramDevirt looks for is somewhat fragile, so it needs to see the IR first before one of the other passes can break it. (This is probably something that we want to fix by using a different pattern.) I don't think that the same issue exists with any of the patterns that LowerTypeTests looks for, so it would probably be fine to move it later.
>
> Moving WholeProgramDevirt breaks some tests. LowerTypeTests seems OK, not sure how much we can trust the tests though ;)
> I’ll create a Phab review for this as well.
>
> Thanks, I'll take a look.
>
> Peter
>
>
> >
> > Peter
> >
> >
> >
> >
> >
> >
> > > On Apr 27, 2018, at 11:04 AM, via llvm-dev <[hidden email]> wrote:
> > >
> > > For the test case below, I get the following IR for main() on entry to ThinLTO backend invocation of LowerTypeTestsModule::lower(). This is after I moved this pass down in the pipeline so it’s invoked after inlining.
> > >
> > > The declarations for foo() and bar() are read in at the time of module import, Importer.importFunctions() in lto::thinBackend(). They do not have type metadata attached to them.
> > > In lowerTypeTestCall() we check if the pointer in the type test is of a known type, so we look at bitcast and then select operands. foo and bar in select are global objects with no type metadata, so the type check cannot be guaranteed to be true and it can’t be eliminated. In full LTO case this works as expected: both foo and bar are the same known type and type check is gone.
> > >
> > > Maybe the problem is not with renaming but with the missing type metadata in this particular case.
> > > Though having so many direct calls routed through the jump table still seems problematic. Is there a feasible solution?
> > >
> > > --------------------------------
> > > define hidden i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 !type !3 !type !4 {
> > >  %3 = add nsw i32 %0, -2
> > >  store i32 %3, i32* @i, align 4, !tbaa !5
> > >  %4 = icmp sgt i32 %0, 1
> > >  %5 = select i1 %4, i32 ()* @foo, i32 ()* @bar
> > >  %6 = bitcast i32 ()* %5 to i8*, !nosanitize !9
> > >  %7 = tail call i1 @llvm.type.test(i8* nonnull %6, metadata !"_ZTSFivE"), !nosanitize !9
> > >  br i1 %7, label %10, label %8, !prof !10, !nosanitize !9
> > >
> > > ; <label>:8:                                      ; preds = %2
> > >  %9 = ptrtoint i32 ()* %5 to i64
> > >  tail call void @__ubsan_handle_cfi_check_fail_abort(i8* getelementptr inbounds ({ i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }, { i8, { [4 x i8]*, i32, i32 }, { i16, i16, [13 x i8] }* }* @anon.fad58de7366495db4650cfefac2fcd61.1, i64 0, i32 0), i64 %9, i64 undef) #4, !nosanitize !9
> > >  unreachable, !nosanitize !9
> > >
> > > ; <label>:10:                                     ; preds = %2
> > >  %11 = tail call i32 %5() #5
> > >  ret i32 %11
> > > }
> > >
> > > . . .
> > > declare hidden i32 @foo() #3
> > > declare hidden i32 @bar() #3
> > >
> > >
> > > —————————test case---------------
> > > b.c
> > > =================
> > > typedef int (*fptr_t) (void);
> > > fptr_t get_fptr();
> > > extern int i;
> > >
> > > int main(int argc, char *argv[])
> > > {
> > >  i = argc - 2;
> > >  fptr_t fp = get_fptr();
> > >  return fp();
> > > }
> > >
> > > v.c
> > > ================
> > > int i;
> > > typedef int (*fptr_t) (void);
> > > int foo(void) {  return 11; }
> > > int bar(void) {  return 22; }
> > > fptr_t get_fptr(void) {  return (i >= 0) ? foo : bar; }
> > >
> > >
> > >> On Apr 26, 2018, at 5:29 PM, Peter Collingbourne <[hidden email]> wrote:
> > >>
> > >>> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct?
> > >>
> > >> In fact it is all calls that go through a function pointer type that is used anywhere in the program for an indirect call, but depending on your program that could be very close to "yes".
> > >>
> > >>> Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> > >>
> > >> As far as I know, renaming happens during the LowerTypeTests pass, after the type checks are lowered.
> > >> Lowering: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1620
> > >> Renaming: http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#1642
> > >> Do you have an example of what you are seeing?
> > >>
> > >> Peter
> > >>
> > >> On Thu, Apr 26, 2018 at 4:54 PM,  <[hidden email]> wrote:
> > >> Hi Peter,
> > >>
> > >> We could probably tolerate a certain amount of unused jump table entries. However, I just realized that all non-inline imported calls end up going through a jump table entry. Is that correct? Initially I thought you meant calls promoted from indirect. While this can be fixed by replacing direct calls to jump tables with direct calls to real targets, I found other cases where ThinLTO+CFI has issues.
> > >>
> > >> In ThinLTO backend, type test lowering happens very early in the pipeline, before inlining. When the type check after the call to get_fptr() is lowered (in my original example, below), the compiler cannot see that both targets belong to the same type and that the type check will always return ‘true’ and can be eliminated. Moving the type check lowering pass further down the pipeline (after inlining) still does not solve the problem because CFI renaming happens early and symbols attached to the jump table do not have a matching type.
> > >>
> > >> I’m trying to think if there’s a way to delay renaming until ThinLTO backend type check lowering pass. It would help with solving both problems.
> > >>
> > >> Thanks.
> > >> Dmitry.
> > >>
> > >>
> > >>
> > >>>>>>
> > >>>>>> a.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> fptr_t get_fptr();
> > >>>>>> int main(int argc, char *argv[])
> > >>>>>> {
> > >>>>>> fptr_t fp = get_fptr();
> > >>>>>> return fp();
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> b.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> int foo(void) { return 11; }
> > >>>>>> int bar(void) { return 22; }
> > >>>>>>
> > >>>>>> static fptr_t fptr = bar;
> > >>>>>> static int i = 53;
> > >>>>>>
> > >>>>>> fptr_t get_fptr(void)
> > >>>>>> {
> > >>>>>> if (i >= 0)
> > >>>>>>   fptr = foo;
> > >>>>>> else
> > >>>>>>   fptr = bar;
> > >>>>>>
> > >>>>>> return fptr;
> > >>>>>> }
> > >>>>>>
> > >>
> > >>
> > >>
> > >>
> > >>> On Apr 19, 2018, at 6:18 PM, Peter Collingbourne <[hidden email]> wrote:
> > >>>
> > >>> Regarding the orderfile, yes, I was thinking more about ordering the real functions.
> > >>>
> > >>> In that case it sounds like your best option may be to implement the optimization pass to make direct calls go directly to the real function. From a performance perspective I don't think it would make much difference if there are unused jump table entries.
> > >>>
> > >>> Peter
> > >>>
> > >>> On Thu, Apr 19, 2018 at 6:09 PM, via llvm-dev <[hidden email]> wrote:
> > >>> Teresa, Peter,
> > >>>
> > >>> Thanks for your help!
> > >>> I need to re-run my experiments as the compiler I used did not have the latest changes like r327254.
> > >>> The fact that the decision about routing calls through jump table entries is made early may be problematic. In my experiments with FreeBSD kernel, ThinLTO produced thousands jump table entries compared to only dozens with full LTO. As for re-ordering jump table entries, I don’t think it’s going to work as they are placed in the same section. Including *.cfi names into a link order file will take care of re-ordering real functions routed through jump table entries, but in our case we need to force some functions to be on the same page. So not having jump table entries for the functions that don't really need them would be ideal.
> > >>>
> > >>> Thanks.
> > >>> Dmitry.
> > >>>
> > >>>
> > >>>> On Apr 18, 2018, at 6:11 PM, Teresa Johnson <[hidden email]> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Apr 18, 2018 at 4:49 PM,  <[hidden email]> wrote:
> > >>>> Hi Teresa,
> > >>>>
> > >>>> Thanks for the info!
> > >>>> This example is my attempt to reduce FreeBSD kernel to something more manageable :)
> > >>>>
> > >>>> I will take a look at why globals are not being imported in this case. What’s the best tool to look into ThinLTO objects and their summaries? Most dumping tools don’t seem to like ThinLTO bitcode files…
> > >>>>
> > >>>> Sadly there isn't a really great way to dump the summaries. =( There was a patch awhile back by a GSOC student to dump in YAML format, but there was resistance from some who preferred dumping to llvm assembly via llvm-dis and support reading in the summary from llvm assembly. It's been on my list of things to do, hasn't yet risen high enough in priority to work on that. For now, you have to use llvm-bcanalyzer -dump and look at the raw format.
> > >>>>
> > >>>> Teresa
> > >>>>
> > >>>>
> > >>>> Hopefully Peter can chime in regarding CFI related issues.
> > >>>>
> > >>>> Thanks.
> > >>>> Dmitry.
> > >>>>
> > >>>>
> > >>>>> On Apr 17, 2018, at 9:37 AM, Teresa Johnson <[hidden email]> wrote:
> > >>>>>
> > >>>>> Hi Dmitry,
> > >>>>>
> > >>>>> Sorry for the late reply. For CFI specific code generation, pcc is a better person to answer. But on the issue of global variables being optimized, that hasn't happened yet. That would be great if you wanted to pick that up!
> > >>>>>
> > >>>>> In your original email example, it seems like the file static i=53 could be constant propagated since there are no other defs, and the code in get_fptr simplified during the compile step, but I assume this is part of a more complex example where it is not possible to do this? Also note that with r327254 we started importing global variables. Do you know why we don't import in your case? I wonder if it has to do with it being CFI inserted code?
> > >>>>>
> > >>>>> Teresa
> > >>>>>
> > >>>>> On Tue, Apr 17, 2018 at 9:17 AM <[hidden email]> wrote:
> > >>>>> I watched  Teresa’s talk on ThinLTO from last year’s CppCon, and it sounded like adding global variable information to the summaries was in the works, or at least in planning. Can someone (Teresa?) please share the current status? If it’s part of future plans, are there any specific proposals that can be picked up and worked on?
> > >>>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>>
> > >>>>>> On Apr 9, 2018, at 6:51 PM, via llvm-dev <[hidden email]> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> I’m working on setting up ThinLTO+CFI for a C application which uses a lot of function pointers. While functionally it appears stable, it’s performance is significantly degraded, to the tune of double digit percentage points compared to regular LTO+CFI.
> > >>>>>>
> > >>>>>> Looking into possible causes I see that under ThinLTO+CFI iCall type checks almost always generate jump table entries for indirect calls, which creates another level of indirection for every such call. On top of that it breaks the link order layout because real function names point to jump table entries. It appears that I’m hitting a limitation in ThinLTO on how much information it can propagate across modules, particularly information about constants. In the example below, the fact that “i” is effectively a constant, is lost under ThinLTO, and the inlined copy of b.c:get_fptr() in a.c does not eliminate the conditional, which, for CFI purposes requires to generate a type check/jump table.
> > >>>>>>
> > >>>>>> I was wondering if there was a way to mitigate this limitation.
> > >>>>>>
> > >>>>>> a.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> fptr_t get_fptr();
> > >>>>>> int main(int argc, char *argv[])
> > >>>>>> {
> > >>>>>> fptr_t fp = get_fptr();
> > >>>>>> return fp();
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> b.c
> > >>>>>> =============================
> > >>>>>> typedef int (*fptr_t) (void);
> > >>>>>> int foo(void) { return 11; }
> > >>>>>> int bar(void) { return 22; }
> > >>>>>>
> > >>>>>> static fptr_t fptr = bar;
> > >>>>>> static int i = 53;
> > >>>>>>
> > >>>>>> fptr_t get_fptr(void)
> > >>>>>> {
> > >>>>>> if (i >= 0)
> > >>>>>>   fptr = foo;
> > >>>>>> else
> > >>>>>>   fptr = bar;
> > >>>>>>
> > >>>>>> return fptr;
> > >>>>>> }
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> LLVM Developers mailing list
> > >>>>>> [hidden email]
> > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Teresa Johnson |   Software Engineer |      [hidden email] |   408-460-2413
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Teresa Johnson |    Software Engineer |      [hidden email] |   408-460-2413
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> LLVM Developers mailing list
> > >>> [hidden email]
> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> --
> > >>> Peter
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> --
> > >> Peter
> > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > [hidden email]
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> >
> > --
> > --
> > Peter
>
>
>
>
> --
> --
> Peter




--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
12