[llvm-dev] Rewriting calls to varargs functions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [cfe-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
I can imagine it being useful to make suggestions as a clang-tidy check, rather than an “optimisation”. In particular, you might want to suggest to callers of printf that they might as well fold in constants into the string, and maybe using puts instead.

> On 23 May 2018, at 01:42, Dávid Bolvanský via cfe-dev <[hidden email]> wrote:
>
> Thanks.
>
> Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.
>
> Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..
>
> 2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:
>
> On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
>> Hello,
>>
>> A new patch:
>> https://reviews.llvm.org/D47159
>>
>> proposes transformations like:
>> printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")
>
> To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.
>
> In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?
>
>  -Hal
>
>>
>> As Eli noted:
>>
>> "I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.
>>
>> Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:
>>
>> %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"
>>
>>
>> I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.
>>
>> Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"?
>> And whether we can rewrite calls to varargs functions safely under some conditions..
>>
>> Thanks
>>
>>
>>
>>
>> ______________________________
>> _________________
>> LLVM Developers mailing list
>>
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- Dean

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
In reply to this post by Sudhindra kulkarni via llvm-dev


On 05/22/2018 10:42 AM, Dávid Bolvanský wrote:
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Okay. I think it's true that integers will be the same regardless of locale (so long as the ' flag is not used, as that brings in a dependence on LC_NUMERIC).


Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

Sure, but it seems to me unlikely that this will affect performance. Is it a code-size optimization (this actually isn't obvious to me because the string representation might be longer than the binary form of the constant plus the extra instructions)?

 -Hal


2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
It could save useless parsing in s/f/printf during runtime.

E.g. for heavy "fprint"ing code like fprintf(f, "%s: %s", TAG, msg); I think it could be quite useful. 
After this transformation we would get fprintf(f, "ABC: %s", msg);  --> We could save one push/mov instruction + less parsing in printf every time we call it. We would just replace string constant "%s: %s" with "ABC: %s" and possibly orphaned "ABC" constant could be removed completely.



2018-05-22 18:36 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 10:42 AM, Dávid Bolvanský wrote:
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Okay. I think it's true that integers will be the same regardless of locale (so long as the ' flag is not used, as that brings in a dependence on LC_NUMERIC).


Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

Sure, but it seems to me unlikely that this will affect performance. Is it a code-size optimization (this actually isn't obvious to me because the string representation might be longer than the binary form of the constant plus the extra instructions)?

 -Hal



2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
On Tue, May 22, 2018 at 12:59 PM, Dávid Bolvanský via llvm-dev <[hidden email]> wrote:
It could save useless parsing in s/f/printf during runtime.
A mix of calls to puts and calls to printf with format strings containing just a conversion specifier can help towards such a goal without mutating constants beyond the format string.
 

E.g. for heavy "fprint"ing code like fprintf(f, "%s: %s", TAG, msg); I think it could be quite useful. 
After this transformation we would get fprintf(f, "ABC: %s", msg);  --> We could save one push/mov instruction + less parsing in printf every time we call it. We would just replace string constant "%s: %s" with "ABC: %s" and possibly orphaned "ABC" constant could be removed completely.



2018-05-22 18:36 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 10:42 AM, Dávid Bolvanský wrote:
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Okay. I think it's true that integers will be the same regardless of locale (so long as the ' flag is not used, as that brings in a dependence on LC_NUMERIC).


Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

Sure, but it seems to me unlikely that this will affect performance. Is it a code-size optimization (this actually isn't obvious to me because the string representation might be longer than the binary form of the constant plus the extra instructions)?

 -Hal



2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
In reply to this post by Sudhindra kulkarni via llvm-dev


On 05/22/2018 11:59 AM, Dávid Bolvanský wrote:
It could save useless parsing in s/f/printf during runtime.

Sure. But it is not clear that matters. printf is expensive anyway. Maybe this matters more for snprintf? Have you benchmarked this?


E.g. for heavy "fprint"ing code like fprintf(f, "%s: %s", TAG, msg); I think it could be quite useful. 
After this transformation we would get fprintf(f, "ABC: %s", msg);  --> We could save one push/mov instruction + less parsing in printf every time we call it. We would just replace string constant "%s: %s" with "ABC: %s" and possibly orphaned "ABC" constant could be removed completely.

Possibly. You also might end up substituting the string into many other strings, resulting in many other longer strings, and thus increasing the size of the executable.

 -Hal




2018-05-22 18:36 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 10:42 AM, Dávid Bolvanský wrote:
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Okay. I think it's true that integers will be the same regardless of locale (so long as the ' flag is not used, as that brings in a dependence on LC_NUMERIC).


Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

Sure, but it seems to me unlikely that this will affect performance. Is it a code-size optimization (this actually isn't obvious to me because the string representation might be longer than the binary form of the constant plus the extra instructions)?

 -Hal



2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
1. 
1000000 x printf .. the time difference is cca 0.1s. No "benchmark" for fprintf/snprintf yet.

2.
We could disable this if "opt for size" or set a limit for lengths or strings to be substituted.

3.
Anyway, I don't know if we can safely rewrite varargs. If is not possible (we saw what had happened with printf on mips), then nothing to do here anyway :)

2018-05-22 20:20 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 11:59 AM, Dávid Bolvanský wrote:
It could save useless parsing in s/f/printf during runtime.

Sure. But it is not clear that matters. printf is expensive anyway. Maybe this matters more for snprintf? Have you benchmarked this?


E.g. for heavy "fprint"ing code like fprintf(f, "%s: %s", TAG, msg); I think it could be quite useful. 
After this transformation we would get fprintf(f, "ABC: %s", msg);  --> We could save one push/mov instruction + less parsing in printf every time we call it. We would just replace string constant "%s: %s" with "ABC: %s" and possibly orphaned "ABC" constant could be removed completely.

Possibly. You also might end up substituting the string into many other strings, resulting in many other longer strings, and thus increasing the size of the executable.

 -Hal





2018-05-22 18:36 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 10:42 AM, Dávid Bolvanský wrote:
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Okay. I think it's true that integers will be the same regardless of locale (so long as the ' flag is not used, as that brings in a dependence on LC_NUMERIC).


Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

Sure, but it seems to me unlikely that this will affect performance. Is it a code-size optimization (this actually isn't obvious to me because the string representation might be longer than the binary form of the constant plus the extra instructions)?

 -Hal



2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
In reply to this post by Sudhindra kulkarni via llvm-dev
Converting to puts is usually not possible: puts appends a newline to its output. The only really appropriate thing to convert to, that works in general, is fwrite. But we can't convert to that because we can't form the 'stdout' parameter (stdout might be a macro rather than a global, or might have a nontrivial mangling, so LLVM can't synthesize it). Also, converting printf("Hello, %s", "world") to printf("Hello, world") is likely a pessimization rather than an optimization for performance: printing a string via %s just needs to write the string, whereas printing a format string needs to scan for %s.

Having said all that, the opposite conversion (from printf("Hello, %s", "world") to printf("%s", "Hello, world")) may be marginally worthwhile. And there are some non-trivial tradeoffs here if you want to optimize for size. (Eg, some format string refactorings may permit more string constant reuse.)

On 22 May 2018 at 10:26, Hubert Tong via llvm-dev <[hidden email]> wrote:
On Tue, May 22, 2018 at 12:59 PM, Dávid Bolvanský via llvm-dev <[hidden email]> wrote:
It could save useless parsing in s/f/printf during runtime.
A mix of calls to puts and calls to printf with format strings containing just a conversion specifier can help towards such a goal without mutating constants beyond the format string.
 

E.g. for heavy "fprint"ing code like fprintf(f, "%s: %s", TAG, msg); I think it could be quite useful. 
After this transformation we would get fprintf(f, "ABC: %s", msg);  --> We could save one push/mov instruction + less parsing in printf every time we call it. We would just replace string constant "%s: %s" with "ABC: %s" and possibly orphaned "ABC" constant could be removed completely.



2018-05-22 18:36 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 10:42 AM, Dávid Bolvanský wrote:
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Okay. I think it's true that integers will be the same regardless of locale (so long as the ' flag is not used, as that brings in a dependence on LC_NUMERIC).


Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

Sure, but it seems to me unlikely that this will affect performance. Is it a code-size optimization (this actually isn't obvious to me because the string representation might be longer than the binary form of the constant plus the extra instructions)?

 -Hal



2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
Interesting ideas, thanks!

But only if "printf("Hello, %s", "world") to printf("%s", "Hello, world")"-like transformation makes some sense, I think it is not worth it at all to do it.


Anyway, thank you for all your suggestions.

2018-05-23 2:11 GMT+02:00 Richard Smith <[hidden email]>:
Converting to puts is usually not possible: puts appends a newline to its output. The only really appropriate thing to convert to, that works in general, is fwrite. But we can't convert to that because we can't form the 'stdout' parameter (stdout might be a macro rather than a global, or might have a nontrivial mangling, so LLVM can't synthesize it). Also, converting printf("Hello, %s", "world") to printf("Hello, world") is likely a pessimization rather than an optimization for performance: printing a string via %s just needs to write the string, whereas printing a format string needs to scan for %s.

Having said all that, the opposite conversion (from printf("Hello, %s", "world") to printf("%s", "Hello, world")) may be marginally worthwhile. And there are some non-trivial tradeoffs here if you want to optimize for size. (Eg, some format string refactorings may permit more string constant reuse.)

On 22 May 2018 at 10:26, Hubert Tong via llvm-dev <[hidden email]> wrote:
On Tue, May 22, 2018 at 12:59 PM, Dávid Bolvanský via llvm-dev <[hidden email]> wrote:
It could save useless parsing in s/f/printf during runtime.
A mix of calls to puts and calls to printf with format strings containing just a conversion specifier can help towards such a goal without mutating constants beyond the format string.
 

E.g. for heavy "fprint"ing code like fprintf(f, "%s: %s", TAG, msg); I think it could be quite useful. 
After this transformation we would get fprintf(f, "ABC: %s", msg);  --> We could save one push/mov instruction + less parsing in printf every time we call it. We would just replace string constant "%s: %s" with "ABC: %s" and possibly orphaned "ABC" constant could be removed completely.



2018-05-22 18:36 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 10:42 AM, Dávid Bolvanský wrote:
Thanks.

Yes, to substitute only some of the arguments. Formatting used by printf depends on the locale but only for double, float types I think - yes, I would not place double/float constants into the format string.

Okay. I think it's true that integers will be the same regardless of locale (so long as the ' flag is not used, as that brings in a dependence on LC_NUMERIC).


Why? To reduce number of constants (some of them could be merged into the format string) and number of args when calling printf/fprintf/sprintf, etc..

Sure, but it seems to me unlikely that this will affect performance. Is it a code-size optimization (this actually isn't obvious to me because the string representation might be longer than the binary form of the constant plus the extra instructions)?

 -Hal



2018-05-22 16:22 GMT+02:00 Hal Finkel <[hidden email]>:


On 05/22/2018 04:32 AM, Dávid Bolvanský via llvm-dev wrote:
Hello,

A new patch:

proposes transformations like:
printf("Hello, %s %d", "world", 123) - > printf("Hello world 123")

To clarify, the real question here comes up when you can only substitute some of the arguments? If you can substitute all of the arguments, then you can turn this into a call to puts.

In any case , why do you want to do this? Also, doesn't the formatting used by printf depend on the process's current locale?

 -Hal


As Eli noted:

"I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2"


I would to hear more suggestions whether it is safe or not. Seems like for mips Clang produces some weird IR, but e.g. x86 IR seems ok.

Any folks from Clang/LLVM to bring more information about "varargs vs ABI vs LLVM vs Clang"? 
And whether we can rewrite calls to varargs functions safely under some conditions..

Thanks




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Rewriting calls to varargs functions

Sudhindra kulkarni via llvm-dev
In reply to this post by Sudhindra kulkarni via llvm-dev
On Tue, May 22, 2018 at 11:36:32AM -0500, Hal Finkel via llvm-dev wrote:
> Sure, but it seems to me unlikely that this will affect performance. Is
> it a code-size optimization (this actually isn't obvious to me because
> the string representation might be longer than the binary form of the
> constant plus the extra instructions)?

More importantly, there is a quite non-trivial chance that the format
string is duplicated elsewhere, so this can often be a pure loss.

Joerg
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev