Hi all,
I've written a patch to extend FileCheck to support matching arithmetic expressions involving variable [1] (eg. to match REG+1 where REG is a variable with a numeric value). It was suggested to me in the review to introduce the concept of numeric variable and to allow for specifying the base the value are written in. [1] https://reviews.llvm.org/D49084 I think the syntax should satisfy the below requirements: * based off the [[]] construct since anything else might overload an existing valid syntax (eg. $$ is supposed to match literally now) * consistent with syntax for expressions using @LINE * consistent with using ':' to define regular variable * allows to specify base of the number a numeric variable is being set to * allows to specify base of the result of the numeric expression I've come up with the following syntax for which I'd like feedback: Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X is the numeric variable being defined and <base is optional in which case base defaults to 10 Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is optional in which case base defaults 10 Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression must contain at least one numeric variable I'm not a big fan of the > for the output base being inside the expression but [[exp]]>base would match >base literally. Any suggestions / opinions? Best regards, Thomas _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi Thomas, In general, I think this is a good proposal. However, I don't think that using ">" or "<" to specify base (at least alone) is a good idea, as it might clash with future ideas to do comparisons etc. I also think it would be nice to have the syntax consistent between definition and use. My first thought on a reasonable alternative was to use commas to separate the two parts, so something like: [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal string. In both cases, the base component is optional, and defaults to decimal. This led me to thing that it might be better to use something similar to printf style for the latter half, so to capture a hexadecimal number with a leading "0x" would be:
"0x[[# VAR, %x:]]" and to use it would be
"0x[[# VAR, %x]]". Indeed, that would allow straightforward conversions between formats, so say you defined it by capturing a decimal integer and using it to match a hexadecimal in upper case, with leading 0x and 8 digits following the 0x: CHECK: [[# VAR, %d:]] # Defines CHECK:
0x[[# VAR + 1, %8X]] # Uses Of course, if we go down that route, it would probably make more sense to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a decimal and "[[# %8X, VAR + 1]]" to use it). Regards, James On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: Hi all, _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi James,
I like that suggestion very much but I think keeping the order of the two sides as initially proposed makes more sense. In printf/scanf the string is first because the primary use of these functions is to do I/O and so you first specify what you are going to output/input and then where to capture variables. The primary objective of FileCheck variables and expressions is to capture/print them, the specifier is an addon to allow some conversion. Does it make sense? In the interest of speeding things up I plan to start implementing this proposal starting tomorrow unless someone gives some more feedback. Best regards, Thomas On Fri, 13 Jul 2018 at 15:51, James Henderson <[hidden email]> wrote: > > Hi Thomas, > > In general, I think this is a good proposal. However, I don't think that using ">" or "<" to specify base (at least alone) is a good idea, as it might clash with future ideas to do comparisons etc. I also think it would be nice to have the syntax consistent between definition and use. My first thought on a reasonable alternative was to use commas to separate the two parts, so something like: > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal string. In both cases, the base component is optional, and defaults to decimal. > > This led me to thing that it might be better to use something similar to printf style for the latter half, so to capture a hexadecimal number with a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# VAR, %x]]". Indeed, that would allow straightforward conversions between formats, so say you defined it by capturing a decimal integer and using it to match a hexadecimal in upper case, with leading 0x and 8 digits following the 0x: > > CHECK: [[# VAR, %d:]] # Defines > CHECK: 0x[[# VAR + 1, %8X]] # Uses > > Of course, if we go down that route, it would probably make more sense to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a decimal and "[[# %8X, VAR + 1]]" to use it). > > Regards, > > James > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >> >> Hi all, >> >> I've written a patch to extend FileCheck to support matching >> arithmetic expressions involving variable [1] (eg. to match REG+1 >> where REG is a variable with a numeric value). It was suggested to me >> in the review to introduce the concept of numeric variable and to >> allow for specifying the base the value are written in. >> >> [1] https://reviews.llvm.org/D49084 >> >> I think the syntax should satisfy the below requirements: >> >> * based off the [[]] construct since anything else might overload an >> existing valid syntax (eg. $$ is supposed to match literally now) >> * consistent with syntax for expressions using @LINE >> * consistent with using ':' to define regular variable >> * allows to specify base of the number a numeric variable is being set to >> * allows to specify base of the result of the numeric expression >> >> I've come up with the following syntax for which I'd like feedback: >> >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X >> is the numeric variable being defined and <base is optional in which >> case base defaults to 10 >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is >> optional in which case base defaults 10 >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression >> must contain at least one numeric variable >> >> >> I'm not a big fan of the > for the output base being inside the >> expression but [[exp]]>base would match >base literally. >> >> Any suggestions / opinions? >> >> Best regards, >> >> Thomas >> _______________________________________________ >> LLVM Developers mailing list >> [hidden email] >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
> -----Original Message----- > From: llvm-dev [mailto:[hidden email]] On Behalf Of > Thomas Preudhomme via llvm-dev > Sent: Monday, July 16, 2018 6:24 AM > To: [hidden email] > Cc: [hidden email] > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and > expressions > > Hi James, > > I like that suggestion very much but I think keeping the order of the > two sides as initially proposed makes more sense. In printf/scanf the > string is first because the primary use of these functions is to do > I/O and so you first specify what you are going to output/input and > then where to capture variables. The primary objective of FileCheck > variables and expressions is to capture/print them, the specifier is > an addon to allow some conversion. Does it make sense? My immediate reaction is that I'd rather not have FileCheck get into the business of handling printf specifiers. OTOH, while LLVM tools do typically print lowercase hex, that's not guaranteed, and looking at the output of other tools can be useful too. So, a way to specify the case for a hex conversion seems worthwhile. I had also been thinking in terms of the trailing colon to distinguish definition from use, as James suggested, that's sort-of consistent with the current syntax. This is starting to make parsing the insides of [[]] much more involved, so you'll want to pay attention to making that code well-structured and readable. --paulr > > In the interest of speeding things up I plan to start implementing > this proposal starting tomorrow unless someone gives some more > feedback. > > Best regards, > > Thomas > > On Fri, 13 Jul 2018 at 15:51, James Henderson > <[hidden email]> wrote: > > > > Hi Thomas, > > > > In general, I think this is a good proposal. However, I don't think that > using ">" or "<" to specify base (at least alone) is a good idea, as it > might clash with future ideas to do comparisons etc. I also think it would > be nice to have the syntax consistent between definition and use. My first > thought on a reasonable alternative was to use commas to separate the two > parts, so something like: > > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal > string. In both cases, the base component is optional, and defaults to > decimal. > > > > This led me to thing that it might be better to use something similar to > printf style for the latter half, so to capture a hexadecimal number with > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# > VAR, %x]]". Indeed, that would allow straightforward conversions between > formats, so say you defined it by capturing a decimal integer and using it > to match a hexadecimal in upper case, with leading 0x and 8 digits > following the 0x: > > > > CHECK: [[# VAR, %d:]] # Defines > > CHECK: 0x[[# VAR + 1, %8X]] # Uses > > > > Of course, if we go down that route, it would probably make more sense > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a > decimal and "[[# %8X, VAR + 1]]" to use it). > > > > Regards, > > > > James > > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- > [hidden email]> wrote: > >> > >> Hi all, > >> > >> I've written a patch to extend FileCheck to support matching > >> arithmetic expressions involving variable [1] (eg. to match REG+1 > >> where REG is a variable with a numeric value). It was suggested to me > >> in the review to introduce the concept of numeric variable and to > >> allow for specifying the base the value are written in. > >> > >> [1] https://reviews.llvm.org/D49084 > >> > >> I think the syntax should satisfy the below requirements: > >> > >> * based off the [[]] construct since anything else might overload an > >> existing valid syntax (eg. $$ is supposed to match literally now) > >> * consistent with syntax for expressions using @LINE > >> * consistent with using ':' to define regular variable > >> * allows to specify base of the number a numeric variable is being set > to > >> * allows to specify base of the result of the numeric expression > >> > >> I've come up with the following syntax for which I'd like feedback: > >> > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X > >> is the numeric variable being defined and <base is optional in which > >> case base defaults to 10 > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is > >> optional in which case base defaults 10 > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression > >> must contain at least one numeric variable > >> > >> > >> I'm not a big fan of the > for the output base being inside the > >> expression but [[exp]]>base would match >base literally. > >> > >> Any suggestions / opinions? > >> > >> Best regards, > >> > >> Thomas > >> _______________________________________________ > >> LLVM Developers mailing list > >> [hidden email] > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > _______________________________________________ > LLVM Developers mailing list > [hidden email] > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
To be clear, I do not intend to add support for hex specifier in the
current patch, I just want to make sure the syntax we choose is going to allow it later. My immediate use case is decimal integer and I intend to write the code so that it's easy to extend to more type of numeric variables and expressions later. This way we'll only add specifier that are actually required by actual testcases. Best regards, Thomas On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: > > > > > -----Original Message----- > > From: llvm-dev [mailto:[hidden email]] On Behalf Of > > Thomas Preudhomme via llvm-dev > > Sent: Monday, July 16, 2018 6:24 AM > > To: [hidden email] > > Cc: [hidden email] > > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and > > expressions > > > > Hi James, > > > > I like that suggestion very much but I think keeping the order of the > > two sides as initially proposed makes more sense. In printf/scanf the > > string is first because the primary use of these functions is to do > > I/O and so you first specify what you are going to output/input and > > then where to capture variables. The primary objective of FileCheck > > variables and expressions is to capture/print them, the specifier is > > an addon to allow some conversion. Does it make sense? > > My immediate reaction is that I'd rather not have FileCheck get into > the business of handling printf specifiers. OTOH, while LLVM tools > do typically print lowercase hex, that's not guaranteed, and looking > at the output of other tools can be useful too. So, a way to specify > the case for a hex conversion seems worthwhile. > > I had also been thinking in terms of the trailing colon to distinguish > definition from use, as James suggested, that's sort-of consistent > with the current syntax. > > This is starting to make parsing the insides of [[]] much more involved, > so you'll want to pay attention to making that code well-structured and > readable. > --paulr > > > > > In the interest of speeding things up I plan to start implementing > > this proposal starting tomorrow unless someone gives some more > > feedback. > > > > Best regards, > > > > Thomas > > > > On Fri, 13 Jul 2018 at 15:51, James Henderson > > <[hidden email]> wrote: > > > > > > Hi Thomas, > > > > > > In general, I think this is a good proposal. However, I don't think that > > using ">" or "<" to specify base (at least alone) is a good idea, as it > > might clash with future ideas to do comparisons etc. I also think it would > > be nice to have the syntax consistent between definition and use. My first > > thought on a reasonable alternative was to use commas to separate the two > > parts, so something like: > > > > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are > > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal > > string. In both cases, the base component is optional, and defaults to > > decimal. > > > > > > This led me to thing that it might be better to use something similar to > > printf style for the latter half, so to capture a hexadecimal number with > > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# > > VAR, %x]]". Indeed, that would allow straightforward conversions between > > formats, so say you defined it by capturing a decimal integer and using it > > to match a hexadecimal in upper case, with leading 0x and 8 digits > > following the 0x: > > > > > > CHECK: [[# VAR, %d:]] # Defines > > > CHECK: 0x[[# VAR + 1, %8X]] # Uses > > > > > > Of course, if we go down that route, it would probably make more sense > > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a > > decimal and "[[# %8X, VAR + 1]]" to use it). > > > > > > Regards, > > > > > > James > > > > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- > > [hidden email]> wrote: > > >> > > >> Hi all, > > >> > > >> I've written a patch to extend FileCheck to support matching > > >> arithmetic expressions involving variable [1] (eg. to match REG+1 > > >> where REG is a variable with a numeric value). It was suggested to me > > >> in the review to introduce the concept of numeric variable and to > > >> allow for specifying the base the value are written in. > > >> > > >> [1] https://reviews.llvm.org/D49084 > > >> > > >> I think the syntax should satisfy the below requirements: > > >> > > >> * based off the [[]] construct since anything else might overload an > > >> existing valid syntax (eg. $$ is supposed to match literally now) > > >> * consistent with syntax for expressions using @LINE > > >> * consistent with using ':' to define regular variable > > >> * allows to specify base of the number a numeric variable is being set > > to > > >> * allows to specify base of the result of the numeric expression > > >> > > >> I've come up with the following syntax for which I'd like feedback: > > >> > > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X > > >> is the numeric variable being defined and <base is optional in which > > >> case base defaults to 10 > > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is > > >> optional in which case base defaults 10 > > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression > > >> must contain at least one numeric variable > > >> > > >> > > >> I'm not a big fan of the > for the output base being inside the > > >> expression but [[exp]]>base would match >base literally. > > >> > > >> Any suggestions / opinions? > > >> > > >> Best regards, > > >> > > >> Thomas > > >> _______________________________________________ > > >> LLVM Developers mailing list > > >> [hidden email] > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > [hidden email] > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
On Tue, 17 Jul 2018 at 10:02 Thomas Preudhomme via llvm-dev <[hidden email]> wrote: To be clear, I do not intend to add support for hex specifier in the I also added FileCheck expressions to our fork of LLVM in order to allow testing both a 128-bit and a 256-bits versions of our CHERI ISA in a single test case [1]. I used [[@EXPR foo * 2Â + 1]] for FileCheck expressions [2]. I'm not particularly happy with this syntax since it is quite verbose (but then again we don't need it that often so it doesn't really matter). It also doesn't allow saving the expression result so it needs to be repeated everywhere. I could probably use [[@EXPR:OUTVAR INVARÂ + 42]] but I haven't really had the need for that yet. We currently need the following two features: - Simple arithmetic with multiple operations. Example: `cld $gp, $zero, [[@EXPR 2 * $CAP_SIZE - 8]]($c11)` - Conversion to hex (upper and lower case since not all tools are consistent here) and to decimal. Example:Â // READOBJ-NEXT: 0x50 R_MIPS_64/R_MIPS_NONE/R_MIPS_NONE .data 0x[[@EXPR hex($CAP_SIZE * 2)]] Alex [1] For most test cases the simple -DVAR=value flag in FileCheck is good enough: we have a %cheri_FileCheck lit substitution that expands to `FileCheck '-D$CAP_SIZE=16/32'` . This works for most IR level tests since usually the only thing that is different is "align 16" vs "align 32". However, when checking the assembly output or linker addresses we often need something more complex. [2] A test case showing all the currently supported expressions can be found here: <https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt> Â On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi Alex,
Thanks for the feedback. My first thought was that introducing the new pseudo var @EXPR is a nice way to generalize that syntax beyond @LINE since it would also evaluate to an arithmetic value. On the other hand there is a small inconsistency because @LINE evaluates to a value which can be part of an expression while @EXPR is an expression, and so the @ syntax as a whole becomes defined as introducing something which is not a regular variable, ie. a negative definition. I'll stick with the # syntax because # is usually associated with numbers and can be defined as introducing an integer expression/variable. The one question I wonder is if the # should be next to the variable name or next to the [[ as proposed by James. I like the former better *but* I think the latter makes more sense since [[#VAR + 1]] would suggest that the [[<something>]] syntax already allows numeric expression without numeric variable which is not the case. Having the # right at the start also clearly indicates that the whole expression might have a conversion specifier. Finally, the # syntax can allow defining a variable with the result of an arithmetic expression: [[#BAR, %x:]] [[# FOO:BAR+12]] So BAR takes an hex value in lower case syntax, value gets added 12 (in decimal) and the result is put into FOO. In which case there should be no format specifier when defining FOO. ie. format specifier for definition is only when there's nothing about the colon. Of course we could allow hex immediate with 0x syntax if needed. Again, I'm not advocating for implementing all this from the start, but make sure that the syntax would allow it if we realize we need this later and I think Jame's proposal does. It seems this syntax would suit all your current uses (albeit the rewriting necessary), did I miss something? Best regards, Thomas On Tue, 17 Jul 2018 at 21:59, Alexander Richardson <[hidden email]> wrote: > > > > On Tue, 17 Jul 2018 at 10:02 Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >> >> To be clear, I do not intend to add support for hex specifier in the >> current patch, I just want to make sure the syntax we choose is going >> to allow it later. My immediate use case is decimal integer and I >> intend to write the code so that it's easy to extend to more type of >> numeric variables and expressions later. This way we'll only add >> specifier that are actually required by actual testcases. >> > > I also added FileCheck expressions to our fork of LLVM in order to allow testing both a 128-bit and a 256-bits versions of our CHERI ISA in a single test case [1]. > I used [[@EXPR foo * 2 + 1]] for FileCheck expressions [2]. I'm not particularly happy with this syntax since it is quite verbose (but then again we don't need it that often so it doesn't really matter). It also doesn't allow saving the expression result so it needs to be repeated everywhere. I could probably use [[@EXPR:OUTVAR INVAR + 42]] but I haven't really had the need for that yet. > > We currently need the following two features: > > - Simple arithmetic with multiple operations. Example: > `cld $gp, $zero, [[@EXPR 2 * $CAP_SIZE - 8]]($c11)` > > - Conversion to hex (upper and lower case since not all tools are consistent here) and to decimal. > Example: // READOBJ-NEXT: 0x50 R_MIPS_64/R_MIPS_NONE/R_MIPS_NONE .data 0x[[@EXPR hex($CAP_SIZE * 2)]] > > Alex > > [1] For most test cases the simple -DVAR=value flag in FileCheck is good enough: we have a %cheri_FileCheck lit substitution that expands to `FileCheck '-D$CAP_SIZE=16/32'` . This works for most IR level tests since usually the only thing that is different is "align 16" vs "align 32". However, when checking the assembly output or linker addresses we often need something more complex. > > [2] A test case showing all the currently supported expressions can be found here: <https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt> > > >> >> On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: >> > >> > >> > >> > > -----Original Message----- >> > > From: llvm-dev [mailto:[hidden email]] On Behalf Of >> > > Thomas Preudhomme via llvm-dev >> > > Sent: Monday, July 16, 2018 6:24 AM >> > > To: [hidden email] >> > > Cc: [hidden email] >> > > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and >> > > expressions >> > > >> > > Hi James, >> > > >> > > I like that suggestion very much but I think keeping the order of the >> > > two sides as initially proposed makes more sense. In printf/scanf the >> > > string is first because the primary use of these functions is to do >> > > I/O and so you first specify what you are going to output/input and >> > > then where to capture variables. The primary objective of FileCheck >> > > variables and expressions is to capture/print them, the specifier is >> > > an addon to allow some conversion. Does it make sense? >> > >> > My immediate reaction is that I'd rather not have FileCheck get into >> > the business of handling printf specifiers. OTOH, while LLVM tools >> > do typically print lowercase hex, that's not guaranteed, and looking >> > at the output of other tools can be useful too. So, a way to specify >> > the case for a hex conversion seems worthwhile. >> > >> > I had also been thinking in terms of the trailing colon to distinguish >> > definition from use, as James suggested, that's sort-of consistent >> > with the current syntax. >> > >> > This is starting to make parsing the insides of [[]] much more involved, >> > so you'll want to pay attention to making that code well-structured and >> > readable. >> > --paulr >> > >> > > >> > > In the interest of speeding things up I plan to start implementing >> > > this proposal starting tomorrow unless someone gives some more >> > > feedback. >> > > >> > > Best regards, >> > > >> > > Thomas >> > > >> > > On Fri, 13 Jul 2018 at 15:51, James Henderson >> > > <[hidden email]> wrote: >> > > > >> > > > Hi Thomas, >> > > > >> > > > In general, I think this is a good proposal. However, I don't think that >> > > using ">" or "<" to specify base (at least alone) is a good idea, as it >> > > might clash with future ideas to do comparisons etc. I also think it would >> > > be nice to have the syntax consistent between definition and use. My first >> > > thought on a reasonable alternative was to use commas to separate the two >> > > parts, so something like: >> > > > >> > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are >> > > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal >> > > string. In both cases, the base component is optional, and defaults to >> > > decimal. >> > > > >> > > > This led me to thing that it might be better to use something similar to >> > > printf style for the latter half, so to capture a hexadecimal number with >> > > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# >> > > VAR, %x]]". Indeed, that would allow straightforward conversions between >> > > formats, so say you defined it by capturing a decimal integer and using it >> > > to match a hexadecimal in upper case, with leading 0x and 8 digits >> > > following the 0x: >> > > > >> > > > CHECK: [[# VAR, %d:]] # Defines >> > > > CHECK: 0x[[# VAR + 1, %8X]] # Uses >> > > > >> > > > Of course, if we go down that route, it would probably make more sense >> > > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a >> > > decimal and "[[# %8X, VAR + 1]]" to use it). >> > > > >> > > > Regards, >> > > > >> > > > James >> > > > >> > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- >> > > [hidden email]> wrote: >> > > >> >> > > >> Hi all, >> > > >> >> > > >> I've written a patch to extend FileCheck to support matching >> > > >> arithmetic expressions involving variable [1] (eg. to match REG+1 >> > > >> where REG is a variable with a numeric value). It was suggested to me >> > > >> in the review to introduce the concept of numeric variable and to >> > > >> allow for specifying the base the value are written in. >> > > >> >> > > >> [1] https://reviews.llvm.org/D49084 >> > > >> >> > > >> I think the syntax should satisfy the below requirements: >> > > >> >> > > >> * based off the [[]] construct since anything else might overload an >> > > >> existing valid syntax (eg. $$ is supposed to match literally now) >> > > >> * consistent with syntax for expressions using @LINE >> > > >> * consistent with using ':' to define regular variable >> > > >> * allows to specify base of the number a numeric variable is being set >> > > to >> > > >> * allows to specify base of the result of the numeric expression >> > > >> >> > > >> I've come up with the following syntax for which I'd like feedback: >> > > >> >> > > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X >> > > >> is the numeric variable being defined and <base is optional in which >> > > >> case base defaults to 10 >> > > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is >> > > >> optional in which case base defaults 10 >> > > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression >> > > >> must contain at least one numeric variable >> > > >> >> > > >> >> > > >> I'm not a big fan of the > for the output base being inside the >> > > >> expression but [[exp]]>base would match >base literally. >> > > >> >> > > >> Any suggestions / opinions? >> > > >> >> > > >> Best regards, >> > > >> >> > > >> Thomas >> > > >> _______________________________________________ >> > > >> LLVM Developers mailing list >> > > >> [hidden email] >> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > >> > > > >> > > _______________________________________________ >> > > LLVM Developers mailing list >> > > [hidden email] >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> [hidden email] >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
On Wed, 18 Jul 2018 at 13:50 Thomas Preudhomme <[hidden email]> wrote: Hi Alex, Hi Thomas, That would indeed work fine for me and it would be easy to update our tests with a few regex replaces. I think I prefer the [[# %FMTSPEC, EXPR]] syntax since that would also make it possible to have commas in the expression part. This might be useful if we allow function-call like expressions such asÂ [[# %X, pow(10, FOO) + 20]]. Alex Â Best regards, _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi Alexander,
Please forgive me if I'm missing the obvious but I do not see how the order helps allowing a comma in the expression. It seems to me that what would allow it is to make FMTSPEC mandatory or at least the comma to separate it (ie. [[#,EXPR]] for the default format specifier). In any case comma in a function-call like expression can be distinguished from comma for the format specifier since one is always inside a parenthesized expression. That said I don't have a strong opinion about the ordering of the expression wrt. the format specifier. I find EXPR, FMTSPEC more natural but at 2 persons (James and you) expressed preference for the reverse order so I'll assume that's the general preference. Best regards, Thomas P.S.: My apologies for only asking now but how do you prefer to be called? Alexander Vs Alex Vs something else? On Sun, 22 Jul 2018 at 20:23, Alexander Richardson <[hidden email]> wrote: > > On Wed, 18 Jul 2018 at 13:50 Thomas Preudhomme <[hidden email]> wrote: >> >> Hi Alex, >> >> Thanks for the feedback. My first thought was that introducing the new >> pseudo var @EXPR is a nice way to generalize that syntax beyond @LINE >> since it would also evaluate to an arithmetic value. On the other hand >> there is a small inconsistency because @LINE evaluates to a value >> which can be part of an expression while @EXPR is an expression, and >> so the @ syntax as a whole becomes defined as introducing something >> which is not a regular variable, ie. a negative definition. >> >> I'll stick with the # syntax because # is usually associated with >> numbers and can be defined as introducing an integer >> expression/variable. The one question I wonder is if the # should be >> next to the variable name or next to the [[ as proposed by James. I >> like the former better *but* I think the latter makes more sense since >> [[#VAR + 1]] would suggest that the [[<something>]] syntax already >> allows numeric expression without numeric variable which is not the >> case. Having the # right at the start also clearly indicates that the >> whole expression might have a conversion specifier. Finally, the # >> syntax can allow defining a variable with the result of an arithmetic >> expression: >> [[#BAR, %x:]] >> [[# FOO:BAR+12]] >> >> So BAR takes an hex value in lower case syntax, value gets added 12 >> (in decimal) and the result is put into FOO. In which case there >> should be no format specifier when defining FOO. ie. format specifier >> for definition is only when there's nothing about the colon. Of course >> we could allow hex immediate with 0x syntax if needed. Again, I'm not >> advocating for implementing all this from the start, but make sure >> that the syntax would allow it if we realize we need this later and I >> think Jame's proposal does. >> >> It seems this syntax would suit all your current uses (albeit the >> rewriting necessary), did I miss something? >> > > Hi Thomas, > > That would indeed work fine for me and it would be easy to update our tests with a few regex replaces. > > I think I prefer the [[# %FMTSPEC, EXPR]] syntax since that would also make it possible to have commas in the expression part. This might be useful if we allow function-call like expressions such as [[# %X, pow(10, FOO) + 20]]. > > > Alex > > > >> >> Best regards, >> >> Thomas >> >> On Tue, 17 Jul 2018 at 21:59, Alexander Richardson >> <[hidden email]> wrote: >> > >> > >> > >> > On Tue, 17 Jul 2018 at 10:02 Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >> >> >> >> To be clear, I do not intend to add support for hex specifier in the >> >> current patch, I just want to make sure the syntax we choose is going >> >> to allow it later. My immediate use case is decimal integer and I >> >> intend to write the code so that it's easy to extend to more type of >> >> numeric variables and expressions later. This way we'll only add >> >> specifier that are actually required by actual testcases. >> >> >> > >> > I also added FileCheck expressions to our fork of LLVM in order to allow testing both a 128-bit and a 256-bits versions of our CHERI ISA in a single test case [1]. >> > I used [[@EXPR foo * 2 + 1]] for FileCheck expressions [2]. I'm not particularly happy with this syntax since it is quite verbose (but then again we don't need it that often so it doesn't really matter). It also doesn't allow saving the expression result so it needs to be repeated everywhere. I could probably use [[@EXPR:OUTVAR INVAR + 42]] but I haven't really had the need for that yet. >> > >> > We currently need the following two features: >> > >> > - Simple arithmetic with multiple operations. Example: >> > `cld $gp, $zero, [[@EXPR 2 * $CAP_SIZE - 8]]($c11)` >> > >> > - Conversion to hex (upper and lower case since not all tools are consistent here) and to decimal. >> > Example: // READOBJ-NEXT: 0x50 R_MIPS_64/R_MIPS_NONE/R_MIPS_NONE .data 0x[[@EXPR hex($CAP_SIZE * 2)]] >> > >> > Alex >> > >> > [1] For most test cases the simple -DVAR=value flag in FileCheck is good enough: we have a %cheri_FileCheck lit substitution that expands to `FileCheck '-D$CAP_SIZE=16/32'` . This works for most IR level tests since usually the only thing that is different is "align 16" vs "align 32". However, when checking the assembly output or linker addresses we often need something more complex. >> > >> > [2] A test case showing all the currently supported expressions can be found here: <https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt> >> > >> > >> >> >> >> On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: >> >> > >> >> > >> >> > >> >> > > -----Original Message----- >> >> > > From: llvm-dev [mailto:[hidden email]] On Behalf Of >> >> > > Thomas Preudhomme via llvm-dev >> >> > > Sent: Monday, July 16, 2018 6:24 AM >> >> > > To: [hidden email] >> >> > > Cc: [hidden email] >> >> > > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and >> >> > > expressions >> >> > > >> >> > > Hi James, >> >> > > >> >> > > I like that suggestion very much but I think keeping the order of the >> >> > > two sides as initially proposed makes more sense. In printf/scanf the >> >> > > string is first because the primary use of these functions is to do >> >> > > I/O and so you first specify what you are going to output/input and >> >> > > then where to capture variables. The primary objective of FileCheck >> >> > > variables and expressions is to capture/print them, the specifier is >> >> > > an addon to allow some conversion. Does it make sense? >> >> > >> >> > My immediate reaction is that I'd rather not have FileCheck get into >> >> > the business of handling printf specifiers. OTOH, while LLVM tools >> >> > do typically print lowercase hex, that's not guaranteed, and looking >> >> > at the output of other tools can be useful too. So, a way to specify >> >> > the case for a hex conversion seems worthwhile. >> >> > >> >> > I had also been thinking in terms of the trailing colon to distinguish >> >> > definition from use, as James suggested, that's sort-of consistent >> >> > with the current syntax. >> >> > >> >> > This is starting to make parsing the insides of [[]] much more involved, >> >> > so you'll want to pay attention to making that code well-structured and >> >> > readable. >> >> > --paulr >> >> > >> >> > > >> >> > > In the interest of speeding things up I plan to start implementing >> >> > > this proposal starting tomorrow unless someone gives some more >> >> > > feedback. >> >> > > >> >> > > Best regards, >> >> > > >> >> > > Thomas >> >> > > >> >> > > On Fri, 13 Jul 2018 at 15:51, James Henderson >> >> > > <[hidden email]> wrote: >> >> > > > >> >> > > > Hi Thomas, >> >> > > > >> >> > > > In general, I think this is a good proposal. However, I don't think that >> >> > > using ">" or "<" to specify base (at least alone) is a good idea, as it >> >> > > might clash with future ideas to do comparisons etc. I also think it would >> >> > > be nice to have the syntax consistent between definition and use. My first >> >> > > thought on a reasonable alternative was to use commas to separate the two >> >> > > parts, so something like: >> >> > > > >> >> > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are >> >> > > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal >> >> > > string. In both cases, the base component is optional, and defaults to >> >> > > decimal. >> >> > > > >> >> > > > This led me to thing that it might be better to use something similar to >> >> > > printf style for the latter half, so to capture a hexadecimal number with >> >> > > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# >> >> > > VAR, %x]]". Indeed, that would allow straightforward conversions between >> >> > > formats, so say you defined it by capturing a decimal integer and using it >> >> > > to match a hexadecimal in upper case, with leading 0x and 8 digits >> >> > > following the 0x: >> >> > > > >> >> > > > CHECK: [[# VAR, %d:]] # Defines >> >> > > > CHECK: 0x[[# VAR + 1, %8X]] # Uses >> >> > > > >> >> > > > Of course, if we go down that route, it would probably make more sense >> >> > > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a >> >> > > decimal and "[[# %8X, VAR + 1]]" to use it). >> >> > > > >> >> > > > Regards, >> >> > > > >> >> > > > James >> >> > > > >> >> > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- >> >> > > [hidden email]> wrote: >> >> > > >> >> >> > > >> Hi all, >> >> > > >> >> >> > > >> I've written a patch to extend FileCheck to support matching >> >> > > >> arithmetic expressions involving variable [1] (eg. to match REG+1 >> >> > > >> where REG is a variable with a numeric value). It was suggested to me >> >> > > >> in the review to introduce the concept of numeric variable and to >> >> > > >> allow for specifying the base the value are written in. >> >> > > >> >> >> > > >> [1] https://reviews.llvm.org/D49084 >> >> > > >> >> >> > > >> I think the syntax should satisfy the below requirements: >> >> > > >> >> >> > > >> * based off the [[]] construct since anything else might overload an >> >> > > >> existing valid syntax (eg. $$ is supposed to match literally now) >> >> > > >> * consistent with syntax for expressions using @LINE >> >> > > >> * consistent with using ':' to define regular variable >> >> > > >> * allows to specify base of the number a numeric variable is being set >> >> > > to >> >> > > >> * allows to specify base of the result of the numeric expression >> >> > > >> >> >> > > >> I've come up with the following syntax for which I'd like feedback: >> >> > > >> >> >> > > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X >> >> > > >> is the numeric variable being defined and <base is optional in which >> >> > > >> case base defaults to 10 >> >> > > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is >> >> > > >> optional in which case base defaults 10 >> >> > > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression >> >> > > >> must contain at least one numeric variable >> >> > > >> >> >> > > >> >> >> > > >> I'm not a big fan of the > for the output base being inside the >> >> > > >> expression but [[exp]]>base would match >base literally. >> >> > > >> >> >> > > >> Any suggestions / opinions? >> >> > > >> >> >> > > >> Best regards, >> >> > > >> >> >> > > >> Thomas >> >> > > >> _______________________________________________ >> >> > > >> LLVM Developers mailing list >> >> > > >> [hidden email] >> >> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > >> >> > > > >> >> > > _______________________________________________ >> >> > > LLVM Developers mailing list >> >> > > [hidden email] >> >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> [hidden email] >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Regarding the format, I could be persuaded that either works. In particular, if one is easier to implement than the other, I'd go with the easier implementation, personally. James On 26 July 2018 at 10:28, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: Hi Alexander, _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
In reply to this post by Robin Eklind via llvm-dev
On Thu, 26 Jul 2018 at 10:28 Thomas Preudhomme <[hidden email]> wrote: Hi Alexander, Hi Thomas,Â I though that FMTSPEC first might be easier to implement because you can just check if the first non-whitespace character after # is a %. If it is parse a fmtspec followed by a comma and if not treat everything else as the expression. But you are right a function-like syntax would always contain parentheses so there is no ambiguity. I think [[#,EXPR]]Â looks a bit strange and I think we can determine default format vs format specifier based on the first character after the # being a % or not. I.e. [[#EXPR]] means default format and [[#%x,EXPR]] is hex. Does that sound reasonable? Â That said I don't have a strong opinion about the ordering of the I don't have a strong preference whether it should come before or after and agree with James that whatever is easiest to implement should be done. Thanks, Alex Best regards, Most people call me Alex but if you prefer Alexander is also fine. Â
_______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi Alex,
On Fri, 27 Jul 2018 at 11:53, Alexander Richardson <[hidden email]> wrote: > > On Thu, 26 Jul 2018 at 10:28 Thomas Preudhomme <[hidden email]> wrote: >> >> Hi Alexander, >> >> Please forgive me if I'm missing the obvious but I do not see how the >> order helps allowing a comma in the expression. It seems to me that >> what would allow it is to make FMTSPEC mandatory or at least the comma >> to separate it (ie. [[#,EXPR]] for the default format specifier). In >> any case comma in a function-call like expression can be distinguished >> from comma for the format specifier since one is always inside a >> parenthesized expression. >> > Hi Thomas, > > I though that FMTSPEC first might be easier to implement because you can just check if the first non-whitespace character after # is a %. If it is parse a fmtspec followed by a comma and if not treat everything else as the expression. But you are right a function-like syntax would always contain parentheses so there is no ambiguity. > I think [[#,EXPR]] looks a bit strange and I think we can determine default format vs format specifier based on the first character after the # being a % or not. I.e. [[#EXPR]] means default format and [[#%x,EXPR]] is hex. Does that sound reasonable? Yes it does. I've started reworking the changes I made to FileCheck.rst to document the agreed upon syntax. At the moment I'm thinking about supporting %u, %d, %x and %X as input and output format specifier, the optionality of format specifier (defaulting to %u) and basic numeric variable definition and numeric expression use involving a variable and an immediate. In particular, I do *not* plan to implement the following: - defining a numeric variable from a numeric expression - arithmetic operations other than - and + - arithmetic expression involving several variables I'll make sure that this can easily be added later and will mention in the doc that the syntax for these feature has already been agreed as well. Feel free to give me feedback on the set of features I intend to implement in this initial patch. Best regards, Thomas > > >> >> That said I don't have a strong opinion about the ordering of the >> expression wrt. the format specifier. I find EXPR, FMTSPEC more >> natural but at 2 persons (James and you) expressed preference for the >> reverse order so I'll assume that's the general preference. >> > > I don't have a strong preference whether it should come before or after and agree with James that whatever is easiest to implement should be done. > > Thanks, > Alex > > >> Best regards, >> >> Thomas >> >> P.S.: My apologies for only asking now but how do you prefer to be >> called? Alexander Vs Alex Vs something else? > > Most people call me Alex but if you prefer Alexander is also fine. > >> >> >> On Sun, 22 Jul 2018 at 20:23, Alexander Richardson >> <[hidden email]> wrote: >> > >> > On Wed, 18 Jul 2018 at 13:50 Thomas Preudhomme <[hidden email]> wrote: >> >> >> >> Hi Alex, >> >> >> >> Thanks for the feedback. My first thought was that introducing the new >> >> pseudo var @EXPR is a nice way to generalize that syntax beyond @LINE >> >> since it would also evaluate to an arithmetic value. On the other hand >> >> there is a small inconsistency because @LINE evaluates to a value >> >> which can be part of an expression while @EXPR is an expression, and >> >> so the @ syntax as a whole becomes defined as introducing something >> >> which is not a regular variable, ie. a negative definition. >> >> >> >> I'll stick with the # syntax because # is usually associated with >> >> numbers and can be defined as introducing an integer >> >> expression/variable. The one question I wonder is if the # should be >> >> next to the variable name or next to the [[ as proposed by James. I >> >> like the former better *but* I think the latter makes more sense since >> >> [[#VAR + 1]] would suggest that the [[<something>]] syntax already >> >> allows numeric expression without numeric variable which is not the >> >> case. Having the # right at the start also clearly indicates that the >> >> whole expression might have a conversion specifier. Finally, the # >> >> syntax can allow defining a variable with the result of an arithmetic >> >> expression: >> >> [[#BAR, %x:]] >> >> [[# FOO:BAR+12]] >> >> >> >> So BAR takes an hex value in lower case syntax, value gets added 12 >> >> (in decimal) and the result is put into FOO. In which case there >> >> should be no format specifier when defining FOO. ie. format specifier >> >> for definition is only when there's nothing about the colon. Of course >> >> we could allow hex immediate with 0x syntax if needed. Again, I'm not >> >> advocating for implementing all this from the start, but make sure >> >> that the syntax would allow it if we realize we need this later and I >> >> think Jame's proposal does. >> >> >> >> It seems this syntax would suit all your current uses (albeit the >> >> rewriting necessary), did I miss something? >> >> >> > >> > Hi Thomas, >> > >> > That would indeed work fine for me and it would be easy to update our tests with a few regex replaces. >> > >> > I think I prefer the [[# %FMTSPEC, EXPR]] syntax since that would also make it possible to have commas in the expression part. This might be useful if we allow function-call like expressions such as [[# %X, pow(10, FOO) + 20]]. >> > >> > >> > Alex >> > >> > >> > >> >> >> >> Best regards, >> >> >> >> Thomas >> >> >> >> On Tue, 17 Jul 2018 at 21:59, Alexander Richardson >> >> <[hidden email]> wrote: >> >> > >> >> > >> >> > >> >> > On Tue, 17 Jul 2018 at 10:02 Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >> >> >> >> >> >> To be clear, I do not intend to add support for hex specifier in the >> >> >> current patch, I just want to make sure the syntax we choose is going >> >> >> to allow it later. My immediate use case is decimal integer and I >> >> >> intend to write the code so that it's easy to extend to more type of >> >> >> numeric variables and expressions later. This way we'll only add >> >> >> specifier that are actually required by actual testcases. >> >> >> >> >> > >> >> > I also added FileCheck expressions to our fork of LLVM in order to allow testing both a 128-bit and a 256-bits versions of our CHERI ISA in a single test case [1]. >> >> > I used [[@EXPR foo * 2 + 1]] for FileCheck expressions [2]. I'm not particularly happy with this syntax since it is quite verbose (but then again we don't need it that often so it doesn't really matter). It also doesn't allow saving the expression result so it needs to be repeated everywhere. I could probably use [[@EXPR:OUTVAR INVAR + 42]] but I haven't really had the need for that yet. >> >> > >> >> > We currently need the following two features: >> >> > >> >> > - Simple arithmetic with multiple operations. Example: >> >> > `cld $gp, $zero, [[@EXPR 2 * $CAP_SIZE - 8]]($c11)` >> >> > >> >> > - Conversion to hex (upper and lower case since not all tools are consistent here) and to decimal. >> >> > Example: // READOBJ-NEXT: 0x50 R_MIPS_64/R_MIPS_NONE/R_MIPS_NONE .data 0x[[@EXPR hex($CAP_SIZE * 2)]] >> >> > >> >> > Alex >> >> > >> >> > [1] For most test cases the simple -DVAR=value flag in FileCheck is good enough: we have a %cheri_FileCheck lit substitution that expands to `FileCheck '-D$CAP_SIZE=16/32'` . This works for most IR level tests since usually the only thing that is different is "align 16" vs "align 32". However, when checking the assembly output or linker addresses we often need something more complex. >> >> > >> >> > [2] A test case showing all the currently supported expressions can be found here: <https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt> >> >> > >> >> > >> >> >> >> >> >> On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: >> >> >> > >> >> >> > >> >> >> > >> >> >> > > -----Original Message----- >> >> >> > > From: llvm-dev [mailto:[hidden email]] On Behalf Of >> >> >> > > Thomas Preudhomme via llvm-dev >> >> >> > > Sent: Monday, July 16, 2018 6:24 AM >> >> >> > > To: [hidden email] >> >> >> > > Cc: [hidden email] >> >> >> > > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and >> >> >> > > expressions >> >> >> > > >> >> >> > > Hi James, >> >> >> > > >> >> >> > > I like that suggestion very much but I think keeping the order of the >> >> >> > > two sides as initially proposed makes more sense. In printf/scanf the >> >> >> > > string is first because the primary use of these functions is to do >> >> >> > > I/O and so you first specify what you are going to output/input and >> >> >> > > then where to capture variables. The primary objective of FileCheck >> >> >> > > variables and expressions is to capture/print them, the specifier is >> >> >> > > an addon to allow some conversion. Does it make sense? >> >> >> > >> >> >> > My immediate reaction is that I'd rather not have FileCheck get into >> >> >> > the business of handling printf specifiers. OTOH, while LLVM tools >> >> >> > do typically print lowercase hex, that's not guaranteed, and looking >> >> >> > at the output of other tools can be useful too. So, a way to specify >> >> >> > the case for a hex conversion seems worthwhile. >> >> >> > >> >> >> > I had also been thinking in terms of the trailing colon to distinguish >> >> >> > definition from use, as James suggested, that's sort-of consistent >> >> >> > with the current syntax. >> >> >> > >> >> >> > This is starting to make parsing the insides of [[]] much more involved, >> >> >> > so you'll want to pay attention to making that code well-structured and >> >> >> > readable. >> >> >> > --paulr >> >> >> > >> >> >> > > >> >> >> > > In the interest of speeding things up I plan to start implementing >> >> >> > > this proposal starting tomorrow unless someone gives some more >> >> >> > > feedback. >> >> >> > > >> >> >> > > Best regards, >> >> >> > > >> >> >> > > Thomas >> >> >> > > >> >> >> > > On Fri, 13 Jul 2018 at 15:51, James Henderson >> >> >> > > <[hidden email]> wrote: >> >> >> > > > >> >> >> > > > Hi Thomas, >> >> >> > > > >> >> >> > > > In general, I think this is a good proposal. However, I don't think that >> >> >> > > using ">" or "<" to specify base (at least alone) is a good idea, as it >> >> >> > > might clash with future ideas to do comparisons etc. I also think it would >> >> >> > > be nice to have the syntax consistent between definition and use. My first >> >> >> > > thought on a reasonable alternative was to use commas to separate the two >> >> >> > > parts, so something like: >> >> >> > > > >> >> >> > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are >> >> >> > > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal >> >> >> > > string. In both cases, the base component is optional, and defaults to >> >> >> > > decimal. >> >> >> > > > >> >> >> > > > This led me to thing that it might be better to use something similar to >> >> >> > > printf style for the latter half, so to capture a hexadecimal number with >> >> >> > > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# >> >> >> > > VAR, %x]]". Indeed, that would allow straightforward conversions between >> >> >> > > formats, so say you defined it by capturing a decimal integer and using it >> >> >> > > to match a hexadecimal in upper case, with leading 0x and 8 digits >> >> >> > > following the 0x: >> >> >> > > > >> >> >> > > > CHECK: [[# VAR, %d:]] # Defines >> >> >> > > > CHECK: 0x[[# VAR + 1, %8X]] # Uses >> >> >> > > > >> >> >> > > > Of course, if we go down that route, it would probably make more sense >> >> >> > > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a >> >> >> > > decimal and "[[# %8X, VAR + 1]]" to use it). >> >> >> > > > >> >> >> > > > Regards, >> >> >> > > > >> >> >> > > > James >> >> >> > > > >> >> >> > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- >> >> >> > > [hidden email]> wrote: >> >> >> > > >> >> >> >> > > >> Hi all, >> >> >> > > >> >> >> >> > > >> I've written a patch to extend FileCheck to support matching >> >> >> > > >> arithmetic expressions involving variable [1] (eg. to match REG+1 >> >> >> > > >> where REG is a variable with a numeric value). It was suggested to me >> >> >> > > >> in the review to introduce the concept of numeric variable and to >> >> >> > > >> allow for specifying the base the value are written in. >> >> >> > > >> >> >> >> > > >> [1] https://reviews.llvm.org/D49084 >> >> >> > > >> >> >> >> > > >> I think the syntax should satisfy the below requirements: >> >> >> > > >> >> >> >> > > >> * based off the [[]] construct since anything else might overload an >> >> >> > > >> existing valid syntax (eg. $$ is supposed to match literally now) >> >> >> > > >> * consistent with syntax for expressions using @LINE >> >> >> > > >> * consistent with using ':' to define regular variable >> >> >> > > >> * allows to specify base of the number a numeric variable is being set >> >> >> > > to >> >> >> > > >> * allows to specify base of the result of the numeric expression >> >> >> > > >> >> >> >> > > >> I've come up with the following syntax for which I'd like feedback: >> >> >> > > >> >> >> >> > > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X >> >> >> > > >> is the numeric variable being defined and <base is optional in which >> >> >> > > >> case base defaults to 10 >> >> >> > > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is >> >> >> > > >> optional in which case base defaults 10 >> >> >> > > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression >> >> >> > > >> must contain at least one numeric variable >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> I'm not a big fan of the > for the output base being inside the >> >> >> > > >> expression but [[exp]]>base would match >base literally. >> >> >> > > >> >> >> >> > > >> Any suggestions / opinions? >> >> >> > > >> >> >> >> > > >> Best regards, >> >> >> > > >> >> >> >> > > >> Thomas >> >> >> > > >> _______________________________________________ >> >> >> > > >> LLVM Developers mailing list >> >> >> > > >> [hidden email] >> >> >> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> > > > >> >> >> > > > >> >> >> > > _______________________________________________ >> >> >> > > LLVM Developers mailing list >> >> >> > > [hidden email] >> >> >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> _______________________________________________ >> >> >> LLVM Developers mailing list >> >> >> [hidden email] >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
This looks like a reasonable subset of features to me. My only question is related to this one: >
- arithmetic expression involving several variables Is it actually harder to write FileCheck to handle this case than to not handle it? I'm (naively) assuming that the variables will be in some form of container, and are just substituted in. If it is harder, that's fine. Otherwise, I just say do it. James On 31 July 2018 at 11:51, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: Hi Alex, _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
I can certainly envision a use case for a [BASE + LENGTH + 4] computation to verify the address of a next-thingy. Comes up in DWARF dumps all the time. --paulr From: llvm-dev [mailto:[hidden email]]
On Behalf Of James Henderson via llvm-dev This looks like a reasonable subset of features to me. My only question is related to this one: > - arithmetic expression involving several variables Is it actually harder to write FileCheck to handle this case than to not handle it? I'm (naively) assuming that the variables will be in some form of container, and are just substituted in. If it is harder, that's fine. Otherwise, I just
say do it. James On 31 July 2018 at 11:51, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: Hi Alex,
_______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi, I had some more thoughts. Summary of my proposal is at the bottom, but basically I wonder if we need to look again at the syntax a little. +++Details+++ In
https://reviews.llvm.org/ Example proposal: [[# %x < VAR - 10]] [[# < VAR - 10]] The first would match a hex number that is strictly 10 or more less than the value of VAR, and the second would match whatever the default pattern is. Thus the format specifier still works as before. The only difference is that we replace the ',' with a comparison operator (equally valid would be '<=', '>' etc). That then led me to wonder, why not use '==' (or just '=') to indicate equality, instead of ',' i.e: [[# %x == VAR - 10]] Related to this, it occurred to me that sometimes, we might want to capture the variable for reuse later, but also verify that it is based on some other variable (e.g. END is 4 higher than BEGIN, and written in hex). So maybe both can live alongside each other: [[# %x, VAR1 < VAR2 - 10]] This would capture a hex number, store it in VAR1, but fail if that number is not more than ten less than VAR2. Maybe we might want to use a colon to delineate the capture side from the verification side. +++Summary+++ I think the following would be my proposal: [[# %x, VAR1 : < VAR2 - 10]] // Capture VAR1 from hex, fail if it doesn't meet the right-hand expression. [[# %x, VAR3 :]] // Capture VAR3 from hex, always succeeding. [[# %x = VAR4 + 10]] // Capture a hex string that must equal VAR4 + 10. Thus if nothing is after the colon, just
capture a variable (which I think is what we agreed on before). Anything
after a colon is used as a variable expression that must match the
captured expression.
That would suggest that the following would not want to be valid syntax, but maybe ',' could be treated as synonymous with '=', or maybe the colon can be omitted in cases where no verification is needed (and thus the following becomes an assignment)? [[# %x, VAR5]] Thoughts? James On 31 July 2018 at 17:36, <[hidden email]> wrote:
_______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi James, Yes I think you summary proposal is a good one though I disagree with the colon being optional because there is ambiguity with looking for the value of VAR5 in the %x format. If anything, [[# %x, VAR5]] is equivalent to [[#:%x, VAR5]] or ([[#:%x = VAR5]] with your proposal. My other suggestion would be to use == rather than = since = could be confused with assignment. Note that I'll stick to only implementing = for now as supporting <, <=, > or >= requires a different logic than what I'm doing now. By the way FYI, I have already started working on the new syntax, still a fair amount to do as I was busy on other tasks but I'm progressing. Best regards, Thomas On Fri, 17 Aug 2018 at 10:39, James Henderson <[hidden email]> wrote:
_______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Sounds good to me. I think we're at the point where there seems to be a broad agreement on the style, and it'll be easier to see it once it's put into practice, so it'll be good to see what you've come up with once it's ready! James On 22 August 2018 at 10:07, Thomas Preudhomme <[hidden email]> wrote:
_______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi there,
The patch is taking shape with all features implemented and most tests passing and I need an agreement on the few issues left. == Conversion to hex of negative values == In short, what should ??? be matched against in the following case: -30 ??? CHECK: [[# %d, VAR:]] CHECK-NEXT: [[# %x, VAR]] Should an error occur because VAR is negative? Or VAR bitpattern interpreted as a positive value, but then what width should VAR have? I'm leaning towards giving an error. == Implicit conversion == During the previous discussion I suggested having a default conversion of %u. For ease of use I'm thinking of instead using the mode the variable being used was defined in, eg. C ??? CHECK: [[# %x, VAR]] CHECK-NEXT: [[# VAR + 1]] would match ??? against D rather than 13. For expression with multiple variable, I'm thinking of forbidding mix of variables defined with %x and %X, having %x and %X win over %d and %u, and %d win over %u. ie. with C c -12 10 ??? !!! @@@ %%% $$$ CHECK: [[# %X, VAR1:]] CHECK-NEXT: [[# %x, VAR2:]] CHECK-NEXT: [[# %d, VAR3:]] CHECK-NEXT: [[#, %u, VAR4:]] CHECK-NEXT: [[# VAR1 + VAR2]] CHECK-NEXT: [[# VAR1 + VAR3 + VAR4]] CHECK-NEXT: [[# VAR2 + VAR3 + VAR4]] CHECK-NEXT: [[# VAR3 + VAR4]] CHECK-NEXT: [[# VAR4 - 12]] - the CHECK-NEXT for ??? would give an error - !!! would be matched against A - @@@ would be matched against a - %%% would *successfully* be matched against -2 - $$$ would fail to be matched against -2 (because conversion is implicitely %u) What do you think of this idea? Do you agree it improves usability? Is there corner cases I've forgotten? == Format specifier == In previous email I talked about input and output conversion. I don't think it makes sense and indeed examples only show one conversion each time. I think it's better to talk about a match or substitution conversion. That is, expressions evaluate to a value and this is what gets stored in variables (alongside the conversion to allow the above suggestion) and the conversion describe in which base to write the value for matching it. Does that model make sense? Best regards, Thomas On Wed, 22 Aug 2018 at 10:11, James Henderson <[hidden email]> wrote: > > Sounds good to me. I think we're at the point where there seems to be a broad agreement on the style, and it'll be easier to see it once it's put into practice, so it'll be good to see what you've come up with once it's ready! > > James > > On 22 August 2018 at 10:07, Thomas Preudhomme <[hidden email]> wrote: >> >> Hi James, >> >> Yes I think you summary proposal is a good one though I disagree with the colon being optional because there is ambiguity with looking for the value of VAR5 in the %x format. If anything, [[# %x, VAR5]] is equivalent to [[#:%x, VAR5]] or ([[#:%x = VAR5]] with your proposal. My other suggestion would be to use == rather than = since = could be confused with assignment. >> >> Note that I'll stick to only implementing = for now as supporting <, <=, > or >= requires a different logic than what I'm doing now. >> >> By the way FYI, I have already started working on the new syntax, still a fair amount to do as I was busy on other tasks but I'm progressing. >> >> Best regards, >> >> Thomas >> >> On Fri, 17 Aug 2018 at 10:39, James Henderson <[hidden email]> wrote: >>> >>> Hi, >>> >>> I had some more thoughts. Summary of my proposal is at the bottom, but basically I wonder if we need to look again at the syntax a little. >>> >>> +++Details+++ >>> >>> In https://reviews.llvm.org/D49964, the proposed test at the time of writing has the size of a compressed and a decompressed version of a section hard-coded in. This feels a little fragile to me, and really what's interesting is whether the compressed section is smaller than the decompressed section. That then led me to think that the current test harness we use for some of our tools allows us to capture an integer and then compare it against another integer to report success or failure. Sometimes, the value in the comparison is computed from a captured number too. I don't have a fully-thought out syntax for this, but I think it should be complementary to the variable expression syntax. >>> >>> Example proposal: >>> >>> [[# %x < VAR - 10]] >>> [[# < VAR - 10]] >>> >>> The first would match a hex number that is strictly 10 or more less than the value of VAR, and the second would match whatever the default pattern is. Thus the format specifier still works as before. The only difference is that we replace the ',' with a comparison operator (equally valid would be '<=', '>' etc). That then led me to wonder, why not use '==' (or just '=') to indicate equality, instead of ',' i.e: >>> >>> [[# %x == VAR - 10]] >>> >>> Related to this, it occurred to me that sometimes, we might want to capture the variable for reuse later, but also verify that it is based on some other variable (e.g. END is 4 higher than BEGIN, and written in hex). So maybe both can live alongside each other: >>> >>> [[# %x, VAR1 < VAR2 - 10]] >>> >>> This would capture a hex number, store it in VAR1, but fail if that number is not more than ten less than VAR2. Maybe we might want to use a colon to delineate the capture side from the verification side. >>> >>> +++Summary+++ >>> >>> I think the following would be my proposal: >>> >>> [[# %x, VAR1 : < VAR2 - 10]] // Capture VAR1 from hex, fail if it doesn't meet the right-hand expression. >>> [[# %x, VAR3 :]] // Capture VAR3 from hex, always succeeding. >>> [[# %x = VAR4 + 10]] // Capture a hex string that must equal VAR4 + 10. >>> >>> Thus if nothing is after the colon, just capture a variable (which I think is what we agreed on before). Anything after a colon is used as a variable expression that must match the captured expression. >>> >>> That would suggest that the following would not want to be valid syntax, but maybe ',' could be treated as synonymous with '=', or maybe the colon can be omitted in cases where no verification is needed (and thus the following becomes an assignment)? >>> [[# %x, VAR5]] >>> >>> Thoughts? >>> >>> James >>> >>> >>> On 31 July 2018 at 17:36, <[hidden email]> wrote: >>>> >>>> I can certainly envision a use case for a [BASE + LENGTH + 4] computation to verify the address of a next-thingy. Comes up in DWARF dumps all the time. >>>> >>>> --paulr >>>> >>>> >>>> >>>> From: llvm-dev [mailto:[hidden email]] On Behalf Of James Henderson via llvm-dev >>>> Sent: Tuesday, July 31, 2018 11:53 AM >>>> To: Thomas Preudhomme >>>> Cc: llvm-dev >>>> >>>> >>>> Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and expressions >>>> >>>> >>>> >>>> This looks like a reasonable subset of features to me. My only question is related to this one: >>>> >>>> >>>> >>>> > - arithmetic expression involving several variables >>>> >>>> >>>> >>>> Is it actually harder to write FileCheck to handle this case than to not handle it? I'm (naively) assuming that the variables will be in some form of container, and are just substituted in. If it is harder, that's fine. Otherwise, I just say do it. >>>> >>>> >>>> >>>> James >>>> >>>> >>>> >>>> On 31 July 2018 at 11:51, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >>>> >>>> Hi Alex, >>>> >>>> On Fri, 27 Jul 2018 at 11:53, Alexander Richardson >>>> <[hidden email]> wrote: >>>> > >>>> > On Thu, 26 Jul 2018 at 10:28 Thomas Preudhomme <[hidden email]> wrote: >>>> >> >>>> >> Hi Alexander, >>>> >> >>>> >> Please forgive me if I'm missing the obvious but I do not see how the >>>> >> order helps allowing a comma in the expression. It seems to me that >>>> >> what would allow it is to make FMTSPEC mandatory or at least the comma >>>> >> to separate it (ie. [[#,EXPR]] for the default format specifier). In >>>> >> any case comma in a function-call like expression can be distinguished >>>> >> from comma for the format specifier since one is always inside a >>>> >> parenthesized expression. >>>> >> >>>> > Hi Thomas, >>>> > >>>> > I though that FMTSPEC first might be easier to implement because you can just check if the first non-whitespace character after # is a %. If it is parse a fmtspec followed by a comma and if not treat everything else as the expression. But you are right a function-like syntax would always contain parentheses so there is no ambiguity. >>>> > I think [[#,EXPR]] looks a bit strange and I think we can determine default format vs format specifier based on the first character after the # being a % or not. I.e. [[#EXPR]] means default format and [[#%x,EXPR]] is hex. Does that sound reasonable? >>>> >>>> Yes it does. I've started reworking the changes I made to >>>> FileCheck.rst to document the agreed upon syntax. At the moment I'm >>>> thinking about supporting %u, %d, %x and %X as input and output format >>>> specifier, the optionality of format specifier (defaulting to %u) and >>>> basic numeric variable definition and numeric expression use involving >>>> a variable and an immediate. In particular, I do *not* plan to >>>> implement the following: >>>> - defining a numeric variable from a numeric expression >>>> - arithmetic operations other than - and + >>>> - arithmetic expression involving several variables >>>> >>>> I'll make sure that this can easily be added later and will mention in >>>> the doc that the syntax for these feature has already been agreed as >>>> well. >>>> >>>> Feel free to give me feedback on the set of features I intend to >>>> implement in this initial patch. >>>> >>>> Best regards, >>>> >>>> Thomas >>>> >>>> >>>> > >>>> > >>>> >> >>>> >> That said I don't have a strong opinion about the ordering of the >>>> >> expression wrt. the format specifier. I find EXPR, FMTSPEC more >>>> >> natural but at 2 persons (James and you) expressed preference for the >>>> >> reverse order so I'll assume that's the general preference. >>>> >> >>>> > >>>> > I don't have a strong preference whether it should come before or after and agree with James that whatever is easiest to implement should be done. >>>> > >>>> > Thanks, >>>> > Alex >>>> > >>>> > >>>> >> Best regards, >>>> >> >>>> >> Thomas >>>> >> >>>> >> P.S.: My apologies for only asking now but how do you prefer to be >>>> >> called? Alexander Vs Alex Vs something else? >>>> > >>>> > Most people call me Alex but if you prefer Alexander is also fine. >>>> > >>>> >> >>>> >> >>>> >> On Sun, 22 Jul 2018 at 20:23, Alexander Richardson >>>> >> <[hidden email]> wrote: >>>> >> > >>>> >> > On Wed, 18 Jul 2018 at 13:50 Thomas Preudhomme <[hidden email]> wrote: >>>> >> >> >>>> >> >> Hi Alex, >>>> >> >> >>>> >> >> Thanks for the feedback. My first thought was that introducing the new >>>> >> >> pseudo var @EXPR is a nice way to generalize that syntax beyond @LINE >>>> >> >> since it would also evaluate to an arithmetic value. On the other hand >>>> >> >> there is a small inconsistency because @LINE evaluates to a value >>>> >> >> which can be part of an expression while @EXPR is an expression, and >>>> >> >> so the @ syntax as a whole becomes defined as introducing something >>>> >> >> which is not a regular variable, ie. a negative definition. >>>> >> >> >>>> >> >> I'll stick with the # syntax because # is usually associated with >>>> >> >> numbers and can be defined as introducing an integer >>>> >> >> expression/variable. The one question I wonder is if the # should be >>>> >> >> next to the variable name or next to the [[ as proposed by James. I >>>> >> >> like the former better *but* I think the latter makes more sense since >>>> >> >> [[#VAR + 1]] would suggest that the [[<something>]] syntax already >>>> >> >> allows numeric expression without numeric variable which is not the >>>> >> >> case. Having the # right at the start also clearly indicates that the >>>> >> >> whole expression might have a conversion specifier. Finally, the # >>>> >> >> syntax can allow defining a variable with the result of an arithmetic >>>> >> >> expression: >>>> >> >> [[#BAR, %x:]] >>>> >> >> [[# FOO:BAR+12]] >>>> >> >> >>>> >> >> So BAR takes an hex value in lower case syntax, value gets added 12 >>>> >> >> (in decimal) and the result is put into FOO. In which case there >>>> >> >> should be no format specifier when defining FOO. ie. format specifier >>>> >> >> for definition is only when there's nothing about the colon. Of course >>>> >> >> we could allow hex immediate with 0x syntax if needed. Again, I'm not >>>> >> >> advocating for implementing all this from the start, but make sure >>>> >> >> that the syntax would allow it if we realize we need this later and I >>>> >> >> think Jame's proposal does. >>>> >> >> >>>> >> >> It seems this syntax would suit all your current uses (albeit the >>>> >> >> rewriting necessary), did I miss something? >>>> >> >> >>>> >> > >>>> >> > Hi Thomas, >>>> >> > >>>> >> > That would indeed work fine for me and it would be easy to update our tests with a few regex replaces. >>>> >> > >>>> >> > I think I prefer the [[# %FMTSPEC, EXPR]] syntax since that would also make it possible to have commas in the expression part. This might be useful if we allow function-call like expressions such as [[# %X, pow(10, FOO) + 20]]. >>>> >> > >>>> >> > >>>> >> > Alex >>>> >> > >>>> >> > >>>> >> > >>>> >> >> >>>> >> >> Best regards, >>>> >> >> >>>> >> >> Thomas >>>> >> >> >>>> >> >> On Tue, 17 Jul 2018 at 21:59, Alexander Richardson >>>> >> >> <[hidden email]> wrote: >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > On Tue, 17 Jul 2018 at 10:02 Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >>>> >> >> >> >>>> >> >> >> To be clear, I do not intend to add support for hex specifier in the >>>> >> >> >> current patch, I just want to make sure the syntax we choose is going >>>> >> >> >> to allow it later. My immediate use case is decimal integer and I >>>> >> >> >> intend to write the code so that it's easy to extend to more type of >>>> >> >> >> numeric variables and expressions later. This way we'll only add >>>> >> >> >> specifier that are actually required by actual testcases. >>>> >> >> >> >>>> >> >> > >>>> >> >> > I also added FileCheck expressions to our fork of LLVM in order to allow testing both a 128-bit and a 256-bits versions of our CHERI ISA in a single test case [1]. >>>> >> >> > I used [[@EXPR foo * 2 + 1]] for FileCheck expressions [2]. I'm not particularly happy with this syntax since it is quite verbose (but then again we don't need it that often so it doesn't really matter). It also doesn't allow saving the expression result so it needs to be repeated everywhere. I could probably use [[@EXPR:OUTVAR INVAR + 42]] but I haven't really had the need for that yet. >>>> >> >> > >>>> >> >> > We currently need the following two features: >>>> >> >> > >>>> >> >> > - Simple arithmetic with multiple operations. Example: >>>> >> >> > `cld $gp, $zero, [[@EXPR 2 * $CAP_SIZE - 8]]($c11)` >>>> >> >> > >>>> >> >> > - Conversion to hex (upper and lower case since not all tools are consistent here) and to decimal. >>>> >> >> > Example: // READOBJ-NEXT: 0x50 R_MIPS_64/R_MIPS_NONE/R_MIPS_NONE .data 0x[[@EXPR hex($CAP_SIZE * 2)]] >>>> >> >> > >>>> >> >> > Alex >>>> >> >> > >>>> >> >> > [1] For most test cases the simple -DVAR=value flag in FileCheck is good enough: we have a %cheri_FileCheck lit substitution that expands to `FileCheck '-D$CAP_SIZE=16/32'` . This works for most IR level tests since usually the only thing that is different is "align 16" vs "align 32". However, when checking the assembly output or linker addresses we often need something more complex. >>>> >> >> > >>>> >> >> > [2] A test case showing all the currently supported expressions can be found here: <https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt> >>>> >> >> > >>>> >> >> > >>>> >> >> >> >>>> >> >> >> On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> > > -----Original Message----- >>>> >> >> >> > > From: llvm-dev [mailto:[hidden email]] On Behalf Of >>>> >> >> >> > > Thomas Preudhomme via llvm-dev >>>> >> >> >> > > Sent: Monday, July 16, 2018 6:24 AM >>>> >> >> >> > > To: [hidden email] >>>> >> >> >> > > Cc: [hidden email] >>>> >> >> >> > > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and >>>> >> >> >> > > expressions >>>> >> >> >> > > >>>> >> >> >> > > Hi James, >>>> >> >> >> > > >>>> >> >> >> > > I like that suggestion very much but I think keeping the order of the >>>> >> >> >> > > two sides as initially proposed makes more sense. In printf/scanf the >>>> >> >> >> > > string is first because the primary use of these functions is to do >>>> >> >> >> > > I/O and so you first specify what you are going to output/input and >>>> >> >> >> > > then where to capture variables. The primary objective of FileCheck >>>> >> >> >> > > variables and expressions is to capture/print them, the specifier is >>>> >> >> >> > > an addon to allow some conversion. Does it make sense? >>>> >> >> >> > >>>> >> >> >> > My immediate reaction is that I'd rather not have FileCheck get into >>>> >> >> >> > the business of handling printf specifiers. OTOH, while LLVM tools >>>> >> >> >> > do typically print lowercase hex, that's not guaranteed, and looking >>>> >> >> >> > at the output of other tools can be useful too. So, a way to specify >>>> >> >> >> > the case for a hex conversion seems worthwhile. >>>> >> >> >> > >>>> >> >> >> > I had also been thinking in terms of the trailing colon to distinguish >>>> >> >> >> > definition from use, as James suggested, that's sort-of consistent >>>> >> >> >> > with the current syntax. >>>> >> >> >> > >>>> >> >> >> > This is starting to make parsing the insides of [[]] much more involved, >>>> >> >> >> > so you'll want to pay attention to making that code well-structured and >>>> >> >> >> > readable. >>>> >> >> >> > --paulr >>>> >> >> >> > >>>> >> >> >> > > >>>> >> >> >> > > In the interest of speeding things up I plan to start implementing >>>> >> >> >> > > this proposal starting tomorrow unless someone gives some more >>>> >> >> >> > > feedback. >>>> >> >> >> > > >>>> >> >> >> > > Best regards, >>>> >> >> >> > > >>>> >> >> >> > > Thomas >>>> >> >> >> > > >>>> >> >> >> > > On Fri, 13 Jul 2018 at 15:51, James Henderson >>>> >> >> >> > > <[hidden email]> wrote: >>>> >> >> >> > > > >>>> >> >> >> > > > Hi Thomas, >>>> >> >> >> > > > >>>> >> >> >> > > > In general, I think this is a good proposal. However, I don't think that >>>> >> >> >> > > using ">" or "<" to specify base (at least alone) is a good idea, as it >>>> >> >> >> > > might clash with future ideas to do comparisons etc. I also think it would >>>> >> >> >> > > be nice to have the syntax consistent between definition and use. My first >>>> >> >> >> > > thought on a reasonable alternative was to use commas to separate the two >>>> >> >> >> > > parts, so something like: >>>> >> >> >> > > > >>>> >> >> >> > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are >>>> >> >> >> > > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal >>>> >> >> >> > > string. In both cases, the base component is optional, and defaults to >>>> >> >> >> > > decimal. >>>> >> >> >> > > > >>>> >> >> >> > > > This led me to thing that it might be better to use something similar to >>>> >> >> >> > > printf style for the latter half, so to capture a hexadecimal number with >>>> >> >> >> > > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# >>>> >> >> >> > > VAR, %x]]". Indeed, that would allow straightforward conversions between >>>> >> >> >> > > formats, so say you defined it by capturing a decimal integer and using it >>>> >> >> >> > > to match a hexadecimal in upper case, with leading 0x and 8 digits >>>> >> >> >> > > following the 0x: >>>> >> >> >> > > > >>>> >> >> >> > > > CHECK: [[# VAR, %d:]] # Defines >>>> >> >> >> > > > CHECK: 0x[[# VAR + 1, %8X]] # Uses >>>> >> >> >> > > > >>>> >> >> >> > > > Of course, if we go down that route, it would probably make more sense >>>> >> >> >> > > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a >>>> >> >> >> > > decimal and "[[# %8X, VAR + 1]]" to use it). >>>> >> >> >> > > > >>>> >> >> >> > > > Regards, >>>> >> >> >> > > > >>>> >> >> >> > > > James >>>> >> >> >> > > > >>>> >> >> >> > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- >>>> >> >> >> > > [hidden email]> wrote: >>>> >> >> >> > > >> >>>> >> >> >> > > >> Hi all, >>>> >> >> >> > > >> >>>> >> >> >> > > >> I've written a patch to extend FileCheck to support matching >>>> >> >> >> > > >> arithmetic expressions involving variable [1] (eg. to match REG+1 >>>> >> >> >> > > >> where REG is a variable with a numeric value). It was suggested to me >>>> >> >> >> > > >> in the review to introduce the concept of numeric variable and to >>>> >> >> >> > > >> allow for specifying the base the value are written in. >>>> >> >> >> > > >> >>>> >> >> >> > > >> [1] https://reviews.llvm.org/D49084 >>>> >> >> >> > > >> >>>> >> >> >> > > >> I think the syntax should satisfy the below requirements: >>>> >> >> >> > > >> >>>> >> >> >> > > >> * based off the [[]] construct since anything else might overload an >>>> >> >> >> > > >> existing valid syntax (eg. $$ is supposed to match literally now) >>>> >> >> >> > > >> * consistent with syntax for expressions using @LINE >>>> >> >> >> > > >> * consistent with using ':' to define regular variable >>>> >> >> >> > > >> * allows to specify base of the number a numeric variable is being set >>>> >> >> >> > > to >>>> >> >> >> > > >> * allows to specify base of the result of the numeric expression >>>> >> >> >> > > >> >>>> >> >> >> > > >> I've come up with the following syntax for which I'd like feedback: >>>> >> >> >> > > >> >>>> >> >> >> > > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X >>>> >> >> >> > > >> is the numeric variable being defined and <base is optional in which >>>> >> >> >> > > >> case base defaults to 10 >>>> >> >> >> > > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is >>>> >> >> >> > > >> optional in which case base defaults 10 >>>> >> >> >> > > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression >>>> >> >> >> > > >> must contain at least one numeric variable >>>> >> >> >> > > >> >>>> >> >> >> > > >> >>>> >> >> >> > > >> I'm not a big fan of the > for the output base being inside the >>>> >> >> >> > > >> expression but [[exp]]>base would match >base literally. >>>> >> >> >> > > >> >>>> >> >> >> > > >> Any suggestions / opinions? >>>> >> >> >> > > >> >>>> >> >> >> > > >> Best regards, >>>> >> >> >> > > >> >>>> >> >> >> > > >> Thomas >>>> >> >> >> > > >> _______________________________________________ >>>> >> >> >> > > >> LLVM Developers mailing list >>>> >> >> >> > > >> [hidden email] >>>> >> >> >> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >> >> >> > > > >>>> >> >> >> > > > >>>> >> >> >> > > _______________________________________________ >>>> >> >> >> > > LLVM Developers mailing list >>>> >> >> >> > > [hidden email] >>>> >> >> >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >> >> >> _______________________________________________ >>>> >> >> >> LLVM Developers mailing list >>>> >> >> >> [hidden email] >>>> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> [hidden email] >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >>> > LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi James,
Thanks for the feedback. Replies are inline. On Mon, 10 Sep 2018 at 15:26, James Henderson <[hidden email]> wrote: > > Hi Thomas, > > Sorry for the delayed response - I've been on annual leave for the past week. > > > == Conversion to hex of negative values == > My initial thought was that we should issue an error, but actually, I'm not sure that's the right thing to do, as I could imagine needing to add together signed integers and hex-captured values. A good example of this are ELF relocations, where it is quite normal to need to add together a signed addend and an unsigned offset. In case that was not clear, I was not suggesting preventing expressions with hex and decimal values, only converting the resulting value if negative to an hex output. ie. [[#%x, FOO:]] [[#%d, BAR:]] [[#%x, FOO+BAR]] would be allowed as long as FOO+BAR is positive. > > Instead, I think you should assume a 4-byte sized value by default, and use something like %lx to represent 8-byte values (and similar patterns for 2-byte or possibly even single-byte ones), emitting an error if not possible, because the size is too large. Should I do the same for signed and unsigned decimal conversions then, ie. give an error if the expression cannot be expressed in 32bit unless there is a l modifier? > > > == Implicit conversion == > I think having a default conversion of the same format as the captured variable makes sense. However, I think we are just risking confusion if we try to do any implicit conversions from one type to another. In such a case, I think I'd be inclined to force the user to provide a format specifier if the types of the input variables are different and emit an error otherwise, although maybe there is a case for %d and %u being combined. On the other hand, I think combining %d and %x types could get particularly confusing. Here's what I'd propose: > 1) Drop the distinction between %x and %X (see below). Hex values should implicitly match other hex values, regardless of case. > 2) Ban implicit conversions when combining %x/X with %d/u. I don't see an inherent reason why one should be better than the other and a default could cause confusion/false positives etc. > 3) If %d/u is mixed and the expression is negative, match %d, otherwise match %u (because %d and %u are identical in this situation). > > Related aside: if %u is explicitly specified, and the result is negative, I recommend emitting an error, since it's not possible to represent a negative as an unsigned integer without assuming some size of values and underflowing. > > On the distinction between %x and %X, I'm not sure if we need it. It seems unlikely that the user cares about the case of the hex digits in the majority of instances where expressions are required. The only case for distinguishing that I can think of is if a user wants to test the actual output style, in which case they can still use {{[0-9A-Fa-f]}}. Yes, I suppose they could use a CHECK or CHECK-NEXT with numeric expression for testing the value and a CHECK-SAME with [0-9a-f] or [0-9A-F] to test for format on a separate line. I'll incorporate the change, thanks. > > > == Format specifier == > Your comments here seem sensible to me. > > Regards, > > James Best regards, Thomas > > On 5 September 2018 at 14:39, Thomas Preudhomme <[hidden email]> wrote: >> >> Hi there, >> >> The patch is taking shape with all features implemented and most tests >> passing and I need an agreement on the few issues left. >> >> == Conversion to hex of negative values == >> >> In short, what should ??? be matched against in the following case: >> >> -30 >> ??? >> CHECK: [[# %d, VAR:]] >> CHECK-NEXT: [[# %x, VAR]] >> >> Should an error occur because VAR is negative? Or VAR bitpattern >> interpreted as a positive value, but then what width should VAR have? >> I'm leaning towards giving an error. >> >> == Implicit conversion == >> >> During the previous discussion I suggested having a default conversion >> of %u. For ease of use I'm thinking of instead using the mode the >> variable being used was defined in, eg. >> >> C >> ??? >> CHECK: [[# %x, VAR]] >> CHECK-NEXT: [[# VAR + 1]] >> >> would match ??? against D rather than 13. For expression with multiple >> variable, I'm thinking of forbidding mix of variables defined with %x >> and %X, having %x and %X win over %d and %u, and %d win over %u. ie. >> with >> >> C >> c >> -12 >> 10 >> ??? >> !!! >> @@@ >> %%% >> $$$ >> CHECK: [[# %X, VAR1:]] >> CHECK-NEXT: [[# %x, VAR2:]] >> CHECK-NEXT: [[# %d, VAR3:]] >> CHECK-NEXT: [[#, %u, VAR4:]] >> CHECK-NEXT: [[# VAR1 + VAR2]] >> CHECK-NEXT: [[# VAR1 + VAR3 + VAR4]] >> CHECK-NEXT: [[# VAR2 + VAR3 + VAR4]] >> CHECK-NEXT: [[# VAR3 + VAR4]] >> CHECK-NEXT: [[# VAR4 - 12]] >> >> - the CHECK-NEXT for ??? would give an error >> - !!! would be matched against A >> - @@@ would be matched against a >> - %%% would *successfully* be matched against -2 >> - $$$ would fail to be matched against -2 (because conversion is implicitely %u) >> >> What do you think of this idea? Do you agree it improves usability? Is >> there corner cases I've forgotten? >> >> == Format specifier == >> >> In previous email I talked about input and output conversion. I don't >> think it makes sense and indeed examples only show one conversion each >> time. I think it's better to talk about a match or substitution >> conversion. That is, expressions evaluate to a value and this is what >> gets stored in variables (alongside the conversion to allow the above >> suggestion) and the conversion describe in which base to write the >> value for matching it. Does that model make sense? >> >> Best regards, >> >> Thomas >> On Wed, 22 Aug 2018 at 10:11, James Henderson >> <[hidden email]> wrote: >> > >> > Sounds good to me. I think we're at the point where there seems to be a broad agreement on the style, and it'll be easier to see it once it's put into practice, so it'll be good to see what you've come up with once it's ready! >> > >> > James >> > >> > On 22 August 2018 at 10:07, Thomas Preudhomme <[hidden email]> wrote: >> >> >> >> Hi James, >> >> >> >> Yes I think you summary proposal is a good one though I disagree with the colon being optional because there is ambiguity with looking for the value of VAR5 in the %x format. If anything, [[# %x, VAR5]] is equivalent to [[#:%x, VAR5]] or ([[#:%x = VAR5]] with your proposal. My other suggestion would be to use == rather than = since = could be confused with assignment. >> >> >> >> Note that I'll stick to only implementing = for now as supporting <, <=, > or >= requires a different logic than what I'm doing now. >> >> >> >> By the way FYI, I have already started working on the new syntax, still a fair amount to do as I was busy on other tasks but I'm progressing. >> >> >> >> Best regards, >> >> >> >> Thomas >> >> >> >> On Fri, 17 Aug 2018 at 10:39, James Henderson <[hidden email]> wrote: >> >>> >> >>> Hi, >> >>> >> >>> I had some more thoughts. Summary of my proposal is at the bottom, but basically I wonder if we need to look again at the syntax a little. >> >>> >> >>> +++Details+++ >> >>> >> >>> In https://reviews.llvm.org/D49964, the proposed test at the time of writing has the size of a compressed and a decompressed version of a section hard-coded in. This feels a little fragile to me, and really what's interesting is whether the compressed section is smaller than the decompressed section. That then led me to think that the current test harness we use for some of our tools allows us to capture an integer and then compare it against another integer to report success or failure. Sometimes, the value in the comparison is computed from a captured number too. I don't have a fully-thought out syntax for this, but I think it should be complementary to the variable expression syntax. >> >>> >> >>> Example proposal: >> >>> >> >>> [[# %x < VAR - 10]] >> >>> [[# < VAR - 10]] >> >>> >> >>> The first would match a hex number that is strictly 10 or more less than the value of VAR, and the second would match whatever the default pattern is. Thus the format specifier still works as before. The only difference is that we replace the ',' with a comparison operator (equally valid would be '<=', '>' etc). That then led me to wonder, why not use '==' (or just '=') to indicate equality, instead of ',' i.e: >> >>> >> >>> [[# %x == VAR - 10]] >> >>> >> >>> Related to this, it occurred to me that sometimes, we might want to capture the variable for reuse later, but also verify that it is based on some other variable (e.g. END is 4 higher than BEGIN, and written in hex). So maybe both can live alongside each other: >> >>> >> >>> [[# %x, VAR1 < VAR2 - 10]] >> >>> >> >>> This would capture a hex number, store it in VAR1, but fail if that number is not more than ten less than VAR2. Maybe we might want to use a colon to delineate the capture side from the verification side. >> >>> >> >>> +++Summary+++ >> >>> >> >>> I think the following would be my proposal: >> >>> >> >>> [[# %x, VAR1 : < VAR2 - 10]] // Capture VAR1 from hex, fail if it doesn't meet the right-hand expression. >> >>> [[# %x, VAR3 :]] // Capture VAR3 from hex, always succeeding. >> >>> [[# %x = VAR4 + 10]] // Capture a hex string that must equal VAR4 + 10. >> >>> >> >>> Thus if nothing is after the colon, just capture a variable (which I think is what we agreed on before). Anything after a colon is used as a variable expression that must match the captured expression. >> >>> >> >>> That would suggest that the following would not want to be valid syntax, but maybe ',' could be treated as synonymous with '=', or maybe the colon can be omitted in cases where no verification is needed (and thus the following becomes an assignment)? >> >>> [[# %x, VAR5]] >> >>> >> >>> Thoughts? >> >>> >> >>> James >> >>> >> >>> >> >>> On 31 July 2018 at 17:36, <[hidden email]> wrote: >> >>>> >> >>>> I can certainly envision a use case for a [BASE + LENGTH + 4] computation to verify the address of a next-thingy. Comes up in DWARF dumps all the time. >> >>>> >> >>>> --paulr >> >>>> >> >>>> >> >>>> >> >>>> From: llvm-dev [mailto:[hidden email]] On Behalf Of James Henderson via llvm-dev >> >>>> Sent: Tuesday, July 31, 2018 11:53 AM >> >>>> To: Thomas Preudhomme >> >>>> Cc: llvm-dev >> >>>> >> >>>> >> >>>> Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and expressions >> >>>> >> >>>> >> >>>> >> >>>> This looks like a reasonable subset of features to me. My only question is related to this one: >> >>>> >> >>>> >> >>>> >> >>>> > - arithmetic expression involving several variables >> >>>> >> >>>> >> >>>> >> >>>> Is it actually harder to write FileCheck to handle this case than to not handle it? I'm (naively) assuming that the variables will be in some form of container, and are just substituted in. If it is harder, that's fine. Otherwise, I just say do it. >> >>>> >> >>>> >> >>>> >> >>>> James >> >>>> >> >>>> >> >>>> >> >>>> On 31 July 2018 at 11:51, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >> >>>> >> >>>> Hi Alex, >> >>>> >> >>>> On Fri, 27 Jul 2018 at 11:53, Alexander Richardson >> >>>> <[hidden email]> wrote: >> >>>> > >> >>>> > On Thu, 26 Jul 2018 at 10:28 Thomas Preudhomme <[hidden email]> wrote: >> >>>> >> >> >>>> >> Hi Alexander, >> >>>> >> >> >>>> >> Please forgive me if I'm missing the obvious but I do not see how the >> >>>> >> order helps allowing a comma in the expression. It seems to me that >> >>>> >> what would allow it is to make FMTSPEC mandatory or at least the comma >> >>>> >> to separate it (ie. [[#,EXPR]] for the default format specifier). In >> >>>> >> any case comma in a function-call like expression can be distinguished >> >>>> >> from comma for the format specifier since one is always inside a >> >>>> >> parenthesized expression. >> >>>> >> >> >>>> > Hi Thomas, >> >>>> > >> >>>> > I though that FMTSPEC first might be easier to implement because you can just check if the first non-whitespace character after # is a %. If it is parse a fmtspec followed by a comma and if not treat everything else as the expression. But you are right a function-like syntax would always contain parentheses so there is no ambiguity. >> >>>> > I think [[#,EXPR]] looks a bit strange and I think we can determine default format vs format specifier based on the first character after the # being a % or not. I.e. [[#EXPR]] means default format and [[#%x,EXPR]] is hex. Does that sound reasonable? >> >>>> >> >>>> Yes it does. I've started reworking the changes I made to >> >>>> FileCheck.rst to document the agreed upon syntax. At the moment I'm >> >>>> thinking about supporting %u, %d, %x and %X as input and output format >> >>>> specifier, the optionality of format specifier (defaulting to %u) and >> >>>> basic numeric variable definition and numeric expression use involving >> >>>> a variable and an immediate. In particular, I do *not* plan to >> >>>> implement the following: >> >>>> - defining a numeric variable from a numeric expression >> >>>> - arithmetic operations other than - and + >> >>>> - arithmetic expression involving several variables >> >>>> >> >>>> I'll make sure that this can easily be added later and will mention in >> >>>> the doc that the syntax for these feature has already been agreed as >> >>>> well. >> >>>> >> >>>> Feel free to give me feedback on the set of features I intend to >> >>>> implement in this initial patch. >> >>>> >> >>>> Best regards, >> >>>> >> >>>> Thomas >> >>>> >> >>>> >> >>>> > >> >>>> > >> >>>> >> >> >>>> >> That said I don't have a strong opinion about the ordering of the >> >>>> >> expression wrt. the format specifier. I find EXPR, FMTSPEC more >> >>>> >> natural but at 2 persons (James and you) expressed preference for the >> >>>> >> reverse order so I'll assume that's the general preference. >> >>>> >> >> >>>> > >> >>>> > I don't have a strong preference whether it should come before or after and agree with James that whatever is easiest to implement should be done. >> >>>> > >> >>>> > Thanks, >> >>>> > Alex >> >>>> > >> >>>> > >> >>>> >> Best regards, >> >>>> >> >> >>>> >> Thomas >> >>>> >> >> >>>> >> P.S.: My apologies for only asking now but how do you prefer to be >> >>>> >> called? Alexander Vs Alex Vs something else? >> >>>> > >> >>>> > Most people call me Alex but if you prefer Alexander is also fine. >> >>>> > >> >>>> >> >> >>>> >> >> >>>> >> On Sun, 22 Jul 2018 at 20:23, Alexander Richardson >> >>>> >> <[hidden email]> wrote: >> >>>> >> > >> >>>> >> > On Wed, 18 Jul 2018 at 13:50 Thomas Preudhomme <[hidden email]> wrote: >> >>>> >> >> >> >>>> >> >> Hi Alex, >> >>>> >> >> >> >>>> >> >> Thanks for the feedback. My first thought was that introducing the new >> >>>> >> >> pseudo var @EXPR is a nice way to generalize that syntax beyond @LINE >> >>>> >> >> since it would also evaluate to an arithmetic value. On the other hand >> >>>> >> >> there is a small inconsistency because @LINE evaluates to a value >> >>>> >> >> which can be part of an expression while @EXPR is an expression, and >> >>>> >> >> so the @ syntax as a whole becomes defined as introducing something >> >>>> >> >> which is not a regular variable, ie. a negative definition. >> >>>> >> >> >> >>>> >> >> I'll stick with the # syntax because # is usually associated with >> >>>> >> >> numbers and can be defined as introducing an integer >> >>>> >> >> expression/variable. The one question I wonder is if the # should be >> >>>> >> >> next to the variable name or next to the [[ as proposed by James. I >> >>>> >> >> like the former better *but* I think the latter makes more sense since >> >>>> >> >> [[#VAR + 1]] would suggest that the [[<something>]] syntax already >> >>>> >> >> allows numeric expression without numeric variable which is not the >> >>>> >> >> case. Having the # right at the start also clearly indicates that the >> >>>> >> >> whole expression might have a conversion specifier. Finally, the # >> >>>> >> >> syntax can allow defining a variable with the result of an arithmetic >> >>>> >> >> expression: >> >>>> >> >> [[#BAR, %x:]] >> >>>> >> >> [[# FOO:BAR+12]] >> >>>> >> >> >> >>>> >> >> So BAR takes an hex value in lower case syntax, value gets added 12 >> >>>> >> >> (in decimal) and the result is put into FOO. In which case there >> >>>> >> >> should be no format specifier when defining FOO. ie. format specifier >> >>>> >> >> for definition is only when there's nothing about the colon. Of course >> >>>> >> >> we could allow hex immediate with 0x syntax if needed. Again, I'm not >> >>>> >> >> advocating for implementing all this from the start, but make sure >> >>>> >> >> that the syntax would allow it if we realize we need this later and I >> >>>> >> >> think Jame's proposal does. >> >>>> >> >> >> >>>> >> >> It seems this syntax would suit all your current uses (albeit the >> >>>> >> >> rewriting necessary), did I miss something? >> >>>> >> >> >> >>>> >> > >> >>>> >> > Hi Thomas, >> >>>> >> > >> >>>> >> > That would indeed work fine for me and it would be easy to update our tests with a few regex replaces. >> >>>> >> > >> >>>> >> > I think I prefer the [[# %FMTSPEC, EXPR]] syntax since that would also make it possible to have commas in the expression part. This might be useful if we allow function-call like expressions such as [[# %X, pow(10, FOO) + 20]]. >> >>>> >> > >> >>>> >> > >> >>>> >> > Alex >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> >> >> >>>> >> >> Best regards, >> >>>> >> >> >> >>>> >> >> Thomas >> >>>> >> >> >> >>>> >> >> On Tue, 17 Jul 2018 at 21:59, Alexander Richardson >> >>>> >> >> <[hidden email]> wrote: >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > On Tue, 17 Jul 2018 at 10:02 Thomas Preudhomme via llvm-dev <[hidden email]> wrote: >> >>>> >> >> >> >> >>>> >> >> >> To be clear, I do not intend to add support for hex specifier in the >> >>>> >> >> >> current patch, I just want to make sure the syntax we choose is going >> >>>> >> >> >> to allow it later. My immediate use case is decimal integer and I >> >>>> >> >> >> intend to write the code so that it's easy to extend to more type of >> >>>> >> >> >> numeric variables and expressions later. This way we'll only add >> >>>> >> >> >> specifier that are actually required by actual testcases. >> >>>> >> >> >> >> >>>> >> >> > >> >>>> >> >> > I also added FileCheck expressions to our fork of LLVM in order to allow testing both a 128-bit and a 256-bits versions of our CHERI ISA in a single test case [1]. >> >>>> >> >> > I used [[@EXPR foo * 2 + 1]] for FileCheck expressions [2]. I'm not particularly happy with this syntax since it is quite verbose (but then again we don't need it that often so it doesn't really matter). It also doesn't allow saving the expression result so it needs to be repeated everywhere. I could probably use [[@EXPR:OUTVAR INVAR + 42]] but I haven't really had the need for that yet. >> >>>> >> >> > >> >>>> >> >> > We currently need the following two features: >> >>>> >> >> > >> >>>> >> >> > - Simple arithmetic with multiple operations. Example: >> >>>> >> >> > `cld $gp, $zero, [[@EXPR 2 * $CAP_SIZE - 8]]($c11)` >> >>>> >> >> > >> >>>> >> >> > - Conversion to hex (upper and lower case since not all tools are consistent here) and to decimal. >> >>>> >> >> > Example: // READOBJ-NEXT: 0x50 R_MIPS_64/R_MIPS_NONE/R_MIPS_NONE .data 0x[[@EXPR hex($CAP_SIZE * 2)]] >> >>>> >> >> > >> >>>> >> >> > Alex >> >>>> >> >> > >> >>>> >> >> > [1] For most test cases the simple -DVAR=value flag in FileCheck is good enough: we have a %cheri_FileCheck lit substitution that expands to `FileCheck '-D$CAP_SIZE=16/32'` . This works for most IR level tests since usually the only thing that is different is "align 16" vs "align 32". However, when checking the assembly output or linker addresses we often need something more complex. >> >>>> >> >> > >> >>>> >> >> > [2] A test case showing all the currently supported expressions can be found here: <https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt> >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> >> >> >>>> >> >> >> On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: >> >>>> >> >> >> > >> >>>> >> >> >> > >> >>>> >> >> >> > >> >>>> >> >> >> > > -----Original Message----- >> >>>> >> >> >> > > From: llvm-dev [mailto:[hidden email]] On Behalf Of >> >>>> >> >> >> > > Thomas Preudhomme via llvm-dev >> >>>> >> >> >> > > Sent: Monday, July 16, 2018 6:24 AM >> >>>> >> >> >> > > To: [hidden email] >> >>>> >> >> >> > > Cc: [hidden email] >> >>>> >> >> >> > > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and >> >>>> >> >> >> > > expressions >> >>>> >> >> >> > > >> >>>> >> >> >> > > Hi James, >> >>>> >> >> >> > > >> >>>> >> >> >> > > I like that suggestion very much but I think keeping the order of the >> >>>> >> >> >> > > two sides as initially proposed makes more sense. In printf/scanf the >> >>>> >> >> >> > > string is first because the primary use of these functions is to do >> >>>> >> >> >> > > I/O and so you first specify what you are going to output/input and >> >>>> >> >> >> > > then where to capture variables. The primary objective of FileCheck >> >>>> >> >> >> > > variables and expressions is to capture/print them, the specifier is >> >>>> >> >> >> > > an addon to allow some conversion. Does it make sense? >> >>>> >> >> >> > >> >>>> >> >> >> > My immediate reaction is that I'd rather not have FileCheck get into >> >>>> >> >> >> > the business of handling printf specifiers. OTOH, while LLVM tools >> >>>> >> >> >> > do typically print lowercase hex, that's not guaranteed, and looking >> >>>> >> >> >> > at the output of other tools can be useful too. So, a way to specify >> >>>> >> >> >> > the case for a hex conversion seems worthwhile. >> >>>> >> >> >> > >> >>>> >> >> >> > I had also been thinking in terms of the trailing colon to distinguish >> >>>> >> >> >> > definition from use, as James suggested, that's sort-of consistent >> >>>> >> >> >> > with the current syntax. >> >>>> >> >> >> > >> >>>> >> >> >> > This is starting to make parsing the insides of [[]] much more involved, >> >>>> >> >> >> > so you'll want to pay attention to making that code well-structured and >> >>>> >> >> >> > readable. >> >>>> >> >> >> > --paulr >> >>>> >> >> >> > >> >>>> >> >> >> > > >> >>>> >> >> >> > > In the interest of speeding things up I plan to start implementing >> >>>> >> >> >> > > this proposal starting tomorrow unless someone gives some more >> >>>> >> >> >> > > feedback. >> >>>> >> >> >> > > >> >>>> >> >> >> > > Best regards, >> >>>> >> >> >> > > >> >>>> >> >> >> > > Thomas >> >>>> >> >> >> > > >> >>>> >> >> >> > > On Fri, 13 Jul 2018 at 15:51, James Henderson >> >>>> >> >> >> > > <[hidden email]> wrote: >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > Hi Thomas, >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > In general, I think this is a good proposal. However, I don't think that >> >>>> >> >> >> > > using ">" or "<" to specify base (at least alone) is a good idea, as it >> >>>> >> >> >> > > might clash with future ideas to do comparisons etc. I also think it would >> >>>> >> >> >> > > be nice to have the syntax consistent between definition and use. My first >> >>>> >> >> >> > > thought on a reasonable alternative was to use commas to separate the two >> >>>> >> >> >> > > parts, so something like: >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are >> >>>> >> >> >> > > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal >> >>>> >> >> >> > > string. In both cases, the base component is optional, and defaults to >> >>>> >> >> >> > > decimal. >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > This led me to thing that it might be better to use something similar to >> >>>> >> >> >> > > printf style for the latter half, so to capture a hexadecimal number with >> >>>> >> >> >> > > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# >> >>>> >> >> >> > > VAR, %x]]". Indeed, that would allow straightforward conversions between >> >>>> >> >> >> > > formats, so say you defined it by capturing a decimal integer and using it >> >>>> >> >> >> > > to match a hexadecimal in upper case, with leading 0x and 8 digits >> >>>> >> >> >> > > following the 0x: >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > CHECK: [[# VAR, %d:]] # Defines >> >>>> >> >> >> > > > CHECK: 0x[[# VAR + 1, %8X]] # Uses >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > Of course, if we go down that route, it would probably make more sense >> >>>> >> >> >> > > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a >> >>>> >> >> >> > > decimal and "[[# %8X, VAR + 1]]" to use it). >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > Regards, >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > James >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- >> >>>> >> >> >> > > [hidden email]> wrote: >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> Hi all, >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> I've written a patch to extend FileCheck to support matching >> >>>> >> >> >> > > >> arithmetic expressions involving variable [1] (eg. to match REG+1 >> >>>> >> >> >> > > >> where REG is a variable with a numeric value). It was suggested to me >> >>>> >> >> >> > > >> in the review to introduce the concept of numeric variable and to >> >>>> >> >> >> > > >> allow for specifying the base the value are written in. >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> [1] https://reviews.llvm.org/D49084 >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> I think the syntax should satisfy the below requirements: >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> * based off the [[]] construct since anything else might overload an >> >>>> >> >> >> > > >> existing valid syntax (eg. $$ is supposed to match literally now) >> >>>> >> >> >> > > >> * consistent with syntax for expressions using @LINE >> >>>> >> >> >> > > >> * consistent with using ':' to define regular variable >> >>>> >> >> >> > > >> * allows to specify base of the number a numeric variable is being set >> >>>> >> >> >> > > to >> >>>> >> >> >> > > >> * allows to specify base of the result of the numeric expression >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> I've come up with the following syntax for which I'd like feedback: >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X >> >>>> >> >> >> > > >> is the numeric variable being defined and <base is optional in which >> >>>> >> >> >> > > >> case base defaults to 10 >> >>>> >> >> >> > > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is >> >>>> >> >> >> > > >> optional in which case base defaults 10 >> >>>> >> >> >> > > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression >> >>>> >> >> >> > > >> must contain at least one numeric variable >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> I'm not a big fan of the > for the output base being inside the >> >>>> >> >> >> > > >> expression but [[exp]]>base would match >base literally. >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> Any suggestions / opinions? >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> Best regards, >> >>>> >> >> >> > > >> >> >>>> >> >> >> > > >> Thomas >> >>>> >> >> >> > > >> _______________________________________________ >> >>>> >> >> >> > > >> LLVM Developers mailing list >> >>>> >> >> >> > > >> [hidden email] >> >>>> >> >> >> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>>> >> >> >> > > > >> >>>> >> >> >> > > > >> >>>> >> >> >> > > _______________________________________________ >> >>>> >> >> >> > > LLVM Developers mailing list >> >>>> >> >> >> > > [hidden email] >> >>>> >> >> >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>>> >> >> >> _______________________________________________ >> >>>> >> >> >> LLVM Developers mailing list >> >>>> >> >> >> [hidden email] >> >>>> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>>> _______________________________________________ >> >>>> LLVM Developers mailing list >> >>>> [hidden email] >> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>>> >> >>>> >> >>> >> >>> >> > > > LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
On 12 September 2018 at 14:50, Thomas Preudhomme
<[hidden email]> wrote: > > Hi James, > > Thanks for the feedback. Replies are inline. > > On Mon, 10 Sep 2018 at 15:26, James Henderson > <[hidden email]> wrote: > > > > Hi Thomas, > > > > Sorry for the delayed response - I've been on annual leave for the past week. > > > > > == Conversion to hex of negative values == > > My initial thought was that we should issue an error, but actually, I'm not sure that's the right thing to do, as I could imagine needing to add together signed integers and hex-captured values. A good example of this are ELF relocations, where it is quite normal to need to add together a signed addend and an unsigned offset. > > In case that was not clear, I was not suggesting preventing > expressions with hex and decimal values, only converting the resulting > value if negative to an hex output. ie. > > [[#%x, FOO:]] > [[#%d, BAR:]] > [[#%x, FOO+BAR]] > > would be allowed as long as FOO+BAR is positive. provide an explicit output format. > > > > > Instead, I think you should assume a 4-byte sized value by default, and use something like %lx to represent 8-byte values (and similar patterns for 2-byte or possibly even single-byte ones), emitting an error if not possible, because the size is too large. > > Should I do the same for signed and unsigned decimal conversions then, > ie. give an error if the expression cannot be expressed in 32bit > unless there is a l modifier? I don't think output as signed should result in an error here, regardless of size, since there is always a clear mapping from an unsigned to a signed (exception might be if an unsigned can't fit in a signed 64-bit value), though I could probably be persuaded otherwise. For unsigned output, I think the same rules as hex output would make sense. Essentially, we need to know where a negative value maps to in the unsigned range, so we have to make some sort of assumption about sizes. I'm not too fussed about what cases should error and should not in this area, so feel free to make your own judgement. > > > > > > == Implicit conversion == > > I think having a default conversion of the same format as the captured variable makes sense. However, I think we are just risking confusion if we try to do any implicit conversions from one type to another. In such a case, I think I'd be inclined to force the user to provide a format specifier if the types of the input variables are different and emit an error otherwise, although maybe there is a case for %d and %u being combined. On the other hand, I think combining %d and %x types could get particularly confusing. Here's what I'd propose: > > 1) Drop the distinction between %x and %X (see below). Hex values should implicitly match other hex values, regardless of case. > > 2) Ban implicit conversions when combining %x/X with %d/u. I don't see an inherent reason why one should be better than the other and a default could cause confusion/false positives etc. > > 3) If %d/u is mixed and the expression is negative, match %d, otherwise match %u (because %d and %u are identical in this situation). > > > > Related aside: if %u is explicitly specified, and the result is negative, I recommend emitting an error, since it's not possible to represent a negative as an unsigned integer without assuming some size of values and underflowing. > > > > On the distinction between %x and %X, I'm not sure if we need it. It seems unlikely that the user cares about the case of the hex digits in the majority of instances where expressions are required. The only case for distinguishing that I can think of is if a user wants to test the actual output style, in which case they can still use {{[0-9A-Fa-f]}}. > > Yes, I suppose they could use a CHECK or CHECK-NEXT with numeric > expression for testing the value and a CHECK-SAME with [0-9a-f] or > [0-9A-F] to test for format on a separate line. I'll incorporate the > change, thanks. > > > > > > == Format specifier == > > Your comments here seem sensible to me. > > > > Regards, > > > > James > > Best regards, > > Thomas > > > > > On 5 September 2018 at 14:39, Thomas Preudhomme <[hidden email]> wrote: > >> > >> Hi there, > >> > >> The patch is taking shape with all features implemented and most tests > >> passing and I need an agreement on the few issues left. > >> > >> == Conversion to hex of negative values == > >> > >> In short, what should ??? be matched against in the following case: > >> > >> -30 > >> ??? > >> CHECK: [[# %d, VAR:]] > >> CHECK-NEXT: [[# %x, VAR]] > >> > >> Should an error occur because VAR is negative? Or VAR bitpattern > >> interpreted as a positive value, but then what width should VAR have? > >> I'm leaning towards giving an error. > >> > >> == Implicit conversion == > >> > >> During the previous discussion I suggested having a default conversion > >> of %u. For ease of use I'm thinking of instead using the mode the > >> variable being used was defined in, eg. > >> > >> C > >> ??? > >> CHECK: [[# %x, VAR]] > >> CHECK-NEXT: [[# VAR + 1]] > >> > >> would match ??? against D rather than 13. For expression with multiple > >> variable, I'm thinking of forbidding mix of variables defined with %x > >> and %X, having %x and %X win over %d and %u, and %d win over %u. ie. > >> with > >> > >> C > >> c > >> -12 > >> 10 > >> ??? > >> !!! > >> @@@ > >> %%% > >> $$$ > >> CHECK: [[# %X, VAR1:]] > >> CHECK-NEXT: [[# %x, VAR2:]] > >> CHECK-NEXT: [[# %d, VAR3:]] > >> CHECK-NEXT: [[#, %u, VAR4:]] > >> CHECK-NEXT: [[# VAR1 + VAR2]] > >> CHECK-NEXT: [[# VAR1 + VAR3 + VAR4]] > >> CHECK-NEXT: [[# VAR2 + VAR3 + VAR4]] > >> CHECK-NEXT: [[# VAR3 + VAR4]] > >> CHECK-NEXT: [[# VAR4 - 12]] > >> > >> - the CHECK-NEXT for ??? would give an error > >> - !!! would be matched against A > >> - @@@ would be matched against a > >> - %%% would *successfully* be matched against -2 > >> - $$$ would fail to be matched against -2 (because conversion is implicitely %u) > >> > >> What do you think of this idea? Do you agree it improves usability? Is > >> there corner cases I've forgotten? > >> > >> == Format specifier == > >> > >> In previous email I talked about input and output conversion. I don't > >> think it makes sense and indeed examples only show one conversion each > >> time. I think it's better to talk about a match or substitution > >> conversion. That is, expressions evaluate to a value and this is what > >> gets stored in variables (alongside the conversion to allow the above > >> suggestion) and the conversion describe in which base to write the > >> value for matching it. Does that model make sense? > >> > >> Best regards, > >> > >> Thomas > >> On Wed, 22 Aug 2018 at 10:11, James Henderson > >> <[hidden email]> wrote: > >> > > >> > Sounds good to me. I think we're at the point where there seems to be a broad agreement on the style, and it'll be easier to see it once it's put into practice, so it'll be good to see what you've come up with once it's ready! > >> > > >> > James > >> > > >> > On 22 August 2018 at 10:07, Thomas Preudhomme <[hidden email]> wrote: > >> >> > >> >> Hi James, > >> >> > >> >> Yes I think you summary proposal is a good one though I disagree with the colon being optional because there is ambiguity with looking for the value of VAR5 in the %x format. If anything, [[# %x, VAR5]] is equivalent to [[#:%x, VAR5]] or ([[#:%x = VAR5]] with your proposal. My other suggestion would be to use == rather than = since = could be confused with assignment. > >> >> > >> >> Note that I'll stick to only implementing = for now as supporting <, <=, > or >= requires a different logic than what I'm doing now. > >> >> > >> >> By the way FYI, I have already started working on the new syntax, still a fair amount to do as I was busy on other tasks but I'm progressing. > >> >> > >> >> Best regards, > >> >> > >> >> Thomas > >> >> > >> >> On Fri, 17 Aug 2018 at 10:39, James Henderson <[hidden email]> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> I had some more thoughts. Summary of my proposal is at the bottom, but basically I wonder if we need to look again at the syntax a little. > >> >>> > >> >>> +++Details+++ > >> >>> > >> >>> In https://reviews.llvm.org/D49964, the proposed test at the time of writing has the size of a compressed and a decompressed version of a section hard-coded in. This feels a little fragile to me, and really what's interesting is whether the compressed section is smaller than the decompressed section. That then led me to think that the current test harness we use for some of our tools allows us to capture an integer and then compare it against another integer to report success or failure. Sometimes, the value in the comparison is computed from a captured number too. I don't have a fully-thought out syntax for this, but I think it should be complementary to the variable expression syntax. > >> >>> > >> >>> Example proposal: > >> >>> > >> >>> [[# %x < VAR - 10]] > >> >>> [[# < VAR - 10]] > >> >>> > >> >>> The first would match a hex number that is strictly 10 or more less than the value of VAR, and the second would match whatever the default pattern is. Thus the format specifier still works as before. The only difference is that we replace the ',' with a comparison operator (equally valid would be '<=', '>' etc). That then led me to wonder, why not use '==' (or just '=') to indicate equality, instead of ',' i.e: > >> >>> > >> >>> [[# %x == VAR - 10]] > >> >>> > >> >>> Related to this, it occurred to me that sometimes, we might want to capture the variable for reuse later, but also verify that it is based on some other variable (e.g. END is 4 higher than BEGIN, and written in hex). So maybe both can live alongside each other: > >> >>> > >> >>> [[# %x, VAR1 < VAR2 - 10]] > >> >>> > >> >>> This would capture a hex number, store it in VAR1, but fail if that number is not more than ten less than VAR2. Maybe we might want to use a colon to delineate the capture side from the verification side. > >> >>> > >> >>> +++Summary+++ > >> >>> > >> >>> I think the following would be my proposal: > >> >>> > >> >>> [[# %x, VAR1 : < VAR2 - 10]] // Capture VAR1 from hex, fail if it doesn't meet the right-hand expression. > >> >>> [[# %x, VAR3 :]] // Capture VAR3 from hex, always succeeding. > >> >>> [[# %x = VAR4 + 10]] // Capture a hex string that must equal VAR4 + 10. > >> >>> > >> >>> Thus if nothing is after the colon, just capture a variable (which I think is what we agreed on before). Anything after a colon is used as a variable expression that must match the captured expression. > >> >>> > >> >>> That would suggest that the following would not want to be valid syntax, but maybe ',' could be treated as synonymous with '=', or maybe the colon can be omitted in cases where no verification is needed (and thus the following becomes an assignment)? > >> >>> [[# %x, VAR5]] > >> >>> > >> >>> Thoughts? > >> >>> > >> >>> James > >> >>> > >> >>> > >> >>> On 31 July 2018 at 17:36, <[hidden email]> wrote: > >> >>>> > >> >>>> I can certainly envision a use case for a [BASE + LENGTH + 4] computation to verify the address of a next-thingy. Comes up in DWARF dumps all the time. > >> >>>> > >> >>>> --paulr > >> >>>> > >> >>>> > >> >>>> > >> >>>> From: llvm-dev [mailto:[hidden email]] On Behalf Of James Henderson via llvm-dev > >> >>>> Sent: Tuesday, July 31, 2018 11:53 AM > >> >>>> To: Thomas Preudhomme > >> >>>> Cc: llvm-dev > >> >>>> > >> >>>> > >> >>>> Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and expressions > >> >>>> > >> >>>> > >> >>>> > >> >>>> This looks like a reasonable subset of features to me. My only question is related to this one: > >> >>>> > >> >>>> > >> >>>> > >> >>>> > - arithmetic expression involving several variables > >> >>>> > >> >>>> > >> >>>> > >> >>>> Is it actually harder to write FileCheck to handle this case than to not handle it? I'm (naively) assuming that the variables will be in some form of container, and are just substituted in. If it is harder, that's fine. Otherwise, I just say do it. > >> >>>> > >> >>>> > >> >>>> > >> >>>> James > >> >>>> > >> >>>> > >> >>>> > >> >>>> On 31 July 2018 at 11:51, Thomas Preudhomme via llvm-dev <[hidden email]> wrote: > >> >>>> > >> >>>> Hi Alex, > >> >>>> > >> >>>> On Fri, 27 Jul 2018 at 11:53, Alexander Richardson > >> >>>> <[hidden email]> wrote: > >> >>>> > > >> >>>> > On Thu, 26 Jul 2018 at 10:28 Thomas Preudhomme <[hidden email]> wrote: > >> >>>> >> > >> >>>> >> Hi Alexander, > >> >>>> >> > >> >>>> >> Please forgive me if I'm missing the obvious but I do not see how the > >> >>>> >> order helps allowing a comma in the expression. It seems to me that > >> >>>> >> what would allow it is to make FMTSPEC mandatory or at least the comma > >> >>>> >> to separate it (ie. [[#,EXPR]] for the default format specifier). In > >> >>>> >> any case comma in a function-call like expression can be distinguished > >> >>>> >> from comma for the format specifier since one is always inside a > >> >>>> >> parenthesized expression. > >> >>>> >> > >> >>>> > Hi Thomas, > >> >>>> > > >> >>>> > I though that FMTSPEC first might be easier to implement because you can just check if the first non-whitespace character after # is a %. If it is parse a fmtspec followed by a comma and if not treat everything else as the expression. But you are right a function-like syntax would always contain parentheses so there is no ambiguity. > >> >>>> > I think [[#,EXPR]] looks a bit strange and I think we can determine default format vs format specifier based on the first character after the # being a % or not. I.e. [[#EXPR]] means default format and [[#%x,EXPR]] is hex. Does that sound reasonable? > >> >>>> > >> >>>> Yes it does. I've started reworking the changes I made to > >> >>>> FileCheck.rst to document the agreed upon syntax. At the moment I'm > >> >>>> thinking about supporting %u, %d, %x and %X as input and output format > >> >>>> specifier, the optionality of format specifier (defaulting to %u) and > >> >>>> basic numeric variable definition and numeric expression use involving > >> >>>> a variable and an immediate. In particular, I do *not* plan to > >> >>>> implement the following: > >> >>>> - defining a numeric variable from a numeric expression > >> >>>> - arithmetic operations other than - and + > >> >>>> - arithmetic expression involving several variables > >> >>>> > >> >>>> I'll make sure that this can easily be added later and will mention in > >> >>>> the doc that the syntax for these feature has already been agreed as > >> >>>> well. > >> >>>> > >> >>>> Feel free to give me feedback on the set of features I intend to > >> >>>> implement in this initial patch. > >> >>>> > >> >>>> Best regards, > >> >>>> > >> >>>> Thomas > >> >>>> > >> >>>> > >> >>>> > > >> >>>> > > >> >>>> >> > >> >>>> >> That said I don't have a strong opinion about the ordering of the > >> >>>> >> expression wrt. the format specifier. I find EXPR, FMTSPEC more > >> >>>> >> natural but at 2 persons (James and you) expressed preference for the > >> >>>> >> reverse order so I'll assume that's the general preference. > >> >>>> >> > >> >>>> > > >> >>>> > I don't have a strong preference whether it should come before or after and agree with James that whatever is easiest to implement should be done. > >> >>>> > > >> >>>> > Thanks, > >> >>>> > Alex > >> >>>> > > >> >>>> > > >> >>>> >> Best regards, > >> >>>> >> > >> >>>> >> Thomas > >> >>>> >> > >> >>>> >> P.S.: My apologies for only asking now but how do you prefer to be > >> >>>> >> called? Alexander Vs Alex Vs something else? > >> >>>> > > >> >>>> > Most people call me Alex but if you prefer Alexander is also fine. > >> >>>> > > >> >>>> >> > >> >>>> >> > >> >>>> >> On Sun, 22 Jul 2018 at 20:23, Alexander Richardson > >> >>>> >> <[hidden email]> wrote: > >> >>>> >> > > >> >>>> >> > On Wed, 18 Jul 2018 at 13:50 Thomas Preudhomme <[hidden email]> wrote: > >> >>>> >> >> > >> >>>> >> >> Hi Alex, > >> >>>> >> >> > >> >>>> >> >> Thanks for the feedback. My first thought was that introducing the new > >> >>>> >> >> pseudo var @EXPR is a nice way to generalize that syntax beyond @LINE > >> >>>> >> >> since it would also evaluate to an arithmetic value. On the other hand > >> >>>> >> >> there is a small inconsistency because @LINE evaluates to a value > >> >>>> >> >> which can be part of an expression while @EXPR is an expression, and > >> >>>> >> >> so the @ syntax as a whole becomes defined as introducing something > >> >>>> >> >> which is not a regular variable, ie. a negative definition. > >> >>>> >> >> > >> >>>> >> >> I'll stick with the # syntax because # is usually associated with > >> >>>> >> >> numbers and can be defined as introducing an integer > >> >>>> >> >> expression/variable. The one question I wonder is if the # should be > >> >>>> >> >> next to the variable name or next to the [[ as proposed by James. I > >> >>>> >> >> like the former better *but* I think the latter makes more sense since > >> >>>> >> >> [[#VAR + 1]] would suggest that the [[<something>]] syntax already > >> >>>> >> >> allows numeric expression without numeric variable which is not the > >> >>>> >> >> case. Having the # right at the start also clearly indicates that the > >> >>>> >> >> whole expression might have a conversion specifier. Finally, the # > >> >>>> >> >> syntax can allow defining a variable with the result of an arithmetic > >> >>>> >> >> expression: > >> >>>> >> >> [[#BAR, %x:]] > >> >>>> >> >> [[# FOO:BAR+12]] > >> >>>> >> >> > >> >>>> >> >> So BAR takes an hex value in lower case syntax, value gets added 12 > >> >>>> >> >> (in decimal) and the result is put into FOO. In which case there > >> >>>> >> >> should be no format specifier when defining FOO. ie. format specifier > >> >>>> >> >> for definition is only when there's nothing about the colon. Of course > >> >>>> >> >> we could allow hex immediate with 0x syntax if needed. Again, I'm not > >> >>>> >> >> advocating for implementing all this from the start, but make sure > >> >>>> >> >> that the syntax would allow it if we realize we need this later and I > >> >>>> >> >> think Jame's proposal does. > >> >>>> >> >> > >> >>>> >> >> It seems this syntax would suit all your current uses (albeit the > >> >>>> >> >> rewriting necessary), did I miss something? > >> >>>> >> >> > >> >>>> >> > > >> >>>> >> > Hi Thomas, > >> >>>> >> > > >> >>>> >> > That would indeed work fine for me and it would be easy to update our tests with a few regex replaces. > >> >>>> >> > > >> >>>> >> > I think I prefer the [[# %FMTSPEC, EXPR]] syntax since that would also make it possible to have commas in the expression part. This might be useful if we allow function-call like expressions such as [[# %X, pow(10, FOO) + 20]]. > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > Alex > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> >> >> > >> >>>> >> >> Best regards, > >> >>>> >> >> > >> >>>> >> >> Thomas > >> >>>> >> >> > >> >>>> >> >> On Tue, 17 Jul 2018 at 21:59, Alexander Richardson > >> >>>> >> >> <[hidden email]> wrote: > >> >>>> >> >> > > >> >>>> >> >> > > >> >>>> >> >> > > >> >>>> >> >> > On Tue, 17 Jul 2018 at 10:02 Thomas Preudhomme via llvm-dev <[hidden email]> wrote: > >> >>>> >> >> >> > >> >>>> >> >> >> To be clear, I do not intend to add support for hex specifier in the > >> >>>> >> >> >> current patch, I just want to make sure the syntax we choose is going > >> >>>> >> >> >> to allow it later. My immediate use case is decimal integer and I > >> >>>> >> >> >> intend to write the code so that it's easy to extend to more type of > >> >>>> >> >> >> numeric variables and expressions later. This way we'll only add > >> >>>> >> >> >> specifier that are actually required by actual testcases. > >> >>>> >> >> >> > >> >>>> >> >> > > >> >>>> >> >> > I also added FileCheck expressions to our fork of LLVM in order to allow testing both a 128-bit and a 256-bits versions of our CHERI ISA in a single test case [1]. > >> >>>> >> >> > I used [[@EXPR foo * 2 + 1]] for FileCheck expressions [2]. I'm not particularly happy with this syntax since it is quite verbose (but then again we don't need it that often so it doesn't really matter). It also doesn't allow saving the expression result so it needs to be repeated everywhere. I could probably use [[@EXPR:OUTVAR INVAR + 42]] but I haven't really had the need for that yet. > >> >>>> >> >> > > >> >>>> >> >> > We currently need the following two features: > >> >>>> >> >> > > >> >>>> >> >> > - Simple arithmetic with multiple operations. Example: > >> >>>> >> >> > `cld $gp, $zero, [[@EXPR 2 * $CAP_SIZE - 8]]($c11)` > >> >>>> >> >> > > >> >>>> >> >> > - Conversion to hex (upper and lower case since not all tools are consistent here) and to decimal. > >> >>>> >> >> > Example: // READOBJ-NEXT: 0x50 R_MIPS_64/R_MIPS_NONE/R_MIPS_NONE .data 0x[[@EXPR hex($CAP_SIZE * 2)]] > >> >>>> >> >> > > >> >>>> >> >> > Alex > >> >>>> >> >> > > >> >>>> >> >> > [1] For most test cases the simple -DVAR=value flag in FileCheck is good enough: we have a %cheri_FileCheck lit substitution that expands to `FileCheck '-D$CAP_SIZE=16/32'` . This works for most IR level tests since usually the only thing that is different is "align 16" vs "align 32". However, when checking the assembly output or linker addresses we often need something more complex. > >> >>>> >> >> > > >> >>>> >> >> > [2] A test case showing all the currently supported expressions can be found here: <https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt> > >> >>>> >> >> > > >> >>>> >> >> > > >> >>>> >> >> >> > >> >>>> >> >> >> On Mon, 16 Jul 2018 at 18:39, <[hidden email]> wrote: > >> >>>> >> >> >> > > >> >>>> >> >> >> > > >> >>>> >> >> >> > > >> >>>> >> >> >> > > -----Original Message----- > >> >>>> >> >> >> > > From: llvm-dev [mailto:[hidden email]] On Behalf Of > >> >>>> >> >> >> > > Thomas Preudhomme via llvm-dev > >> >>>> >> >> >> > > Sent: Monday, July 16, 2018 6:24 AM > >> >>>> >> >> >> > > To: [hidden email] > >> >>>> >> >> >> > > Cc: [hidden email] > >> >>>> >> >> >> > > Subject: Re: [llvm-dev] Syntax for FileCheck numeric variables and > >> >>>> >> >> >> > > expressions > >> >>>> >> >> >> > > > >> >>>> >> >> >> > > Hi James, > >> >>>> >> >> >> > > > >> >>>> >> >> >> > > I like that suggestion very much but I think keeping the order of the > >> >>>> >> >> >> > > two sides as initially proposed makes more sense. In printf/scanf the > >> >>>> >> >> >> > > string is first because the primary use of these functions is to do > >> >>>> >> >> >> > > I/O and so you first specify what you are going to output/input and > >> >>>> >> >> >> > > then where to capture variables. The primary objective of FileCheck > >> >>>> >> >> >> > > variables and expressions is to capture/print them, the specifier is > >> >>>> >> >> >> > > an addon to allow some conversion. Does it make sense? > >> >>>> >> >> >> > > >> >>>> >> >> >> > My immediate reaction is that I'd rather not have FileCheck get into > >> >>>> >> >> >> > the business of handling printf specifiers. OTOH, while LLVM tools > >> >>>> >> >> >> > do typically print lowercase hex, that's not guaranteed, and looking > >> >>>> >> >> >> > at the output of other tools can be useful too. So, a way to specify > >> >>>> >> >> >> > the case for a hex conversion seems worthwhile. > >> >>>> >> >> >> > > >> >>>> >> >> >> > I had also been thinking in terms of the trailing colon to distinguish > >> >>>> >> >> >> > definition from use, as James suggested, that's sort-of consistent > >> >>>> >> >> >> > with the current syntax. > >> >>>> >> >> >> > > >> >>>> >> >> >> > This is starting to make parsing the insides of [[]] much more involved, > >> >>>> >> >> >> > so you'll want to pay attention to making that code well-structured and > >> >>>> >> >> >> > readable. > >> >>>> >> >> >> > --paulr > >> >>>> >> >> >> > > >> >>>> >> >> >> > > > >> >>>> >> >> >> > > In the interest of speeding things up I plan to start implementing > >> >>>> >> >> >> > > this proposal starting tomorrow unless someone gives some more > >> >>>> >> >> >> > > feedback. > >> >>>> >> >> >> > > > >> >>>> >> >> >> > > Best regards, > >> >>>> >> >> >> > > > >> >>>> >> >> >> > > Thomas > >> >>>> >> >> >> > > > >> >>>> >> >> >> > > On Fri, 13 Jul 2018 at 15:51, James Henderson > >> >>>> >> >> >> > > <[hidden email]> wrote: > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > Hi Thomas, > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > In general, I think this is a good proposal. However, I don't think that > >> >>>> >> >> >> > > using ">" or "<" to specify base (at least alone) is a good idea, as it > >> >>>> >> >> >> > > might clash with future ideas to do comparisons etc. I also think it would > >> >>>> >> >> >> > > be nice to have the syntax consistent between definition and use. My first > >> >>>> >> >> >> > > thought on a reasonable alternative was to use commas to separate the two > >> >>>> >> >> >> > > parts, so something like: > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > [[# VAR, 16:]] to capture a hexadecimal number (where the spaces are > >> >>>> >> >> >> > > optional). [[# VAR, 16]] to use a variable, converted to a hexadecimal > >> >>>> >> >> >> > > string. In both cases, the base component is optional, and defaults to > >> >>>> >> >> >> > > decimal. > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > This led me to thing that it might be better to use something similar to > >> >>>> >> >> >> > > printf style for the latter half, so to capture a hexadecimal number with > >> >>>> >> >> >> > > a leading "0x" would be: "0x[[# VAR, %x:]]" and to use it would be "0x[[# > >> >>>> >> >> >> > > VAR, %x]]". Indeed, that would allow straightforward conversions between > >> >>>> >> >> >> > > formats, so say you defined it by capturing a decimal integer and using it > >> >>>> >> >> >> > > to match a hexadecimal in upper case, with leading 0x and 8 digits > >> >>>> >> >> >> > > following the 0x: > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > CHECK: [[# VAR, %d:]] # Defines > >> >>>> >> >> >> > > > CHECK: 0x[[# VAR + 1, %8X]] # Uses > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > Of course, if we go down that route, it would probably make more sense > >> >>>> >> >> >> > > to reverse the two sides (e.g. to become "[[# %d, VAR:]]" to capture a > >> >>>> >> >> >> > > decimal and "[[# %8X, VAR + 1]]" to use it). > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > Regards, > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > James > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > On 12 July 2018 at 15:34, Thomas Preudhomme via llvm-dev <llvm- > >> >>>> >> >> >> > > [hidden email]> wrote: > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> Hi all, > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> I've written a patch to extend FileCheck to support matching > >> >>>> >> >> >> > > >> arithmetic expressions involving variable [1] (eg. to match REG+1 > >> >>>> >> >> >> > > >> where REG is a variable with a numeric value). It was suggested to me > >> >>>> >> >> >> > > >> in the review to introduce the concept of numeric variable and to > >> >>>> >> >> >> > > >> allow for specifying the base the value are written in. > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> [1] https://reviews.llvm.org/D49084 > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> I think the syntax should satisfy the below requirements: > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> * based off the [[]] construct since anything else might overload an > >> >>>> >> >> >> > > >> existing valid syntax (eg. $$ is supposed to match literally now) > >> >>>> >> >> >> > > >> * consistent with syntax for expressions using @LINE > >> >>>> >> >> >> > > >> * consistent with using ':' to define regular variable > >> >>>> >> >> >> > > >> * allows to specify base of the number a numeric variable is being set > >> >>>> >> >> >> > > to > >> >>>> >> >> >> > > >> * allows to specify base of the result of the numeric expression > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> I've come up with the following syntax for which I'd like feedback: > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> Numeric variable definition: [[#X<base:]] (eg. [[#ADDR<16:]]) where X > >> >>>> >> >> >> > > >> is the numeric variable being defined and <base is optional in which > >> >>>> >> >> >> > > >> case base defaults to 10 > >> >>>> >> >> >> > > >> Numeric variable use: [[#X>base]] (eg. [[#ADDR]]>2) where <base is > >> >>>> >> >> >> > > >> optional in which case base defaults 10 > >> >>>> >> >> >> > > >> Numeric expression: [[exp>base]] (eg. [[#ADDR+2>16]] where expression > >> >>>> >> >> >> > > >> must contain at least one numeric variable > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> I'm not a big fan of the > for the output base being inside the > >> >>>> >> >> >> > > >> expression but [[exp]]>base would match >base literally. > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> Any suggestions / opinions? > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> Best regards, > >> >>>> >> >> >> > > >> > >> >>>> >> >> >> > > >> Thomas > >> >>>> >> >> >> > > >> _______________________________________________ > >> >>>> >> >> >> > > >> LLVM Developers mailing list > >> >>>> >> >> >> > > >> [hidden email] > >> >>>> >> >> >> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > > > >> >>>> >> >> >> > > _______________________________________________ > >> >>>> >> >> >> > > LLVM Developers mailing list > >> >>>> >> >> >> > > [hidden email] > >> >>>> >> >> >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >>>> >> >> >> _______________________________________________ > >> >>>> >> >> >> LLVM Developers mailing list > >> >>>> >> >> >> [hidden email] > >> >>>> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >>>> _______________________________________________ > >> >>>> LLVM Developers mailing list > >> >>>> [hidden email] > >> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >>>> > >> >>>> > >> >>> > >> >>> > >> > > > > > LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Free forum by Nabble | Edit this page |