[llvm-dev] MASM & RIP-relative addressing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] MASM & RIP-relative addressing

Alex Brachet-Mialot via llvm-dev
Hi all,

Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in enabling MASM's convention that (unless specified) all memory location references should be RIP-relative. Without it, we emit the wrong instructions for "call", "jmp", etc., and anything we build fails at the linking stage.

My best attempt at this so far is a small patch to X86AsmParser.cpp - just taking any Intel expression with no specified base register and switching it to use RIP - and this works alright. There's at least one exception: it breaks the "jcc" instructions, at least "jcc <label>". The issue seems to be that the "jcc" family exclusively takes a relative offset, never an absolute reference... so adding a base register causes the operand not to match. ("jcc" is always RIP-relative anyway.)

I'm not very familiar with the operand-matching logic, and am still pretty new to LLVM as a whole. Are there more X86 instructions this will interact badly with? Any thoughts on how this could be handled better?

If this is mostly a valid approach, might there be a way to change the operand type of "jcc" to accept offset(base) operands, as long as base == X86::RIP, then ignore the RIP bit?

Thanks,
- Eric

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MASM & RIP-relative addressing

Alex Brachet-Mialot via llvm-dev

All immediate jump instructions on x86 (call/jmp/jcc) have a relative offset operand.  The destination is, in some sense, “rip-relative”, but we don’t represent it like that in LLVM.  If you look at the TableGen descriptions, jumps use brtarget32, and calls use i32imm_pcrel.  In both Microsoft and GNU assembly syntax, this is something like “call baz”.

 

“call”/”jmp” also have a register/memory form, for indirect calls.  In 64-bit, this allows rip-relative references, to call a function pointer stored in a global variable.  In Microsoft assembly syntax, this is “call QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.

 

For 64-bit x86, any reference to a global has to be a rip-relative address (since all 64-bit programs are position-independent), but on 32-bit x86, it’s also possible to refer to the address of a variable using something like “add eax, OFFSET baz”.

 

For globals which are explicitly labeled “PTR” or “OFFSET”, the correct representation should be unambiguous, and it should be easy to print appropriate error messages.  For other cases, I’m not sure what the inference rules are.  It might vary depending on the opcode.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of Eric Astor via llvm-dev
Sent: Monday, January 20, 2020 6:26 PM
To: LLVM-dev <[hidden email]>
Subject: [EXT] [llvm-dev] MASM & RIP-relative addressing

 

Hi all,

 

Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in enabling MASM's convention that (unless specified) all memory location references should be RIP-relative. Without it, we emit the wrong instructions for "call", "jmp", etc., and anything we build fails at the linking stage.

 

My best attempt at this so far is a small patch to X86AsmParser.cpp - just taking any Intel expression with no specified base register and switching it to use RIP - and this works alright. There's at least one exception: it breaks the "jcc" instructions, at least "jcc <label>". The issue seems to be that the "jcc" family exclusively takes a relative offset, never an absolute reference... so adding a base register causes the operand not to match. ("jcc" is always RIP-relative anyway.)

 

I'm not very familiar with the operand-matching logic, and am still pretty new to LLVM as a whole. Are there more X86 instructions this will interact badly with? Any thoughts on how this could be handled better?

 

If this is mostly a valid approach, might there be a way to change the operand type of "jcc" to accept offset(base) operands, as long as base == X86::RIP, then ignore the RIP bit?

 

Thanks,

- Eric


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MASM & RIP-relative addressing

Alex Brachet-Mialot via llvm-dev
Apologies - I apparently remembered part of the issue incorrectly, so this ended up quite confusing. The problem comes when referencing labels in a different section of the binary. To clarify, if I assemble the code:

.data
foo BYTE 5
.code
mov eax, foo

with Microsoft's ml64.exe, it emits an object file disassembling to:

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]
                000000000000000b:  IMAGE_REL_AMD64_REL32        foo

On the other hand, if I use my current local draft of llvm-ml, I get a different result. I actually get the same result as I do for llvm-mc, using the corresponding code:

.data
foo:
.byte 5
.text
.intel_syntax
mov eax, foo

Either way, LLVM emits an object file with disassembly (and relocation) as follows:

       0:       8b 04 25 00 00 00 00    mov     eax, dword ptr [0]
                0000000000000003:  IMAGE_REL_AMD64_ADDR32       foo


To replicate the results from ml64.exe with LLVM, I instead need to use

mov eax, [foo + rip]

in place of mov eax, foo. At least when building with llvm-ml, we need to mimic ml.exe's approach; a reference to a symbol in another section should use the relative addressing mode.

My first attempt to fix this was very clumsy - when in MASM mode, I forced all expressions without a base register to presume RIP. Unfortunately, that breaks any attempt to use "jcc", since it turns label references into absolute memory references with a base register (and the "jcc" family doesn't accept absolute memory operands). Any suggestions for how I can fix the issue described here without breaking "jcc"?

On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <[hidden email]> wrote:

All immediate jump instructions on x86 (call/jmp/jcc) have a relative offset operand.  The destination is, in some sense, “rip-relative”, but we don’t represent it like that in LLVM.  If you look at the TableGen descriptions, jumps use brtarget32, and calls use i32imm_pcrel.  In both Microsoft and GNU assembly syntax, this is something like “call baz”.

 

“call”/”jmp” also have a register/memory form, for indirect calls.  In 64-bit, this allows rip-relative references, to call a function pointer stored in a global variable.  In Microsoft assembly syntax, this is “call QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.

 

For 64-bit x86, any reference to a global has to be a rip-relative address (since all 64-bit programs are position-independent), but on 32-bit x86, it’s also possible to refer to the address of a variable using something like “add eax, OFFSET baz”.

 

For globals which are explicitly labeled “PTR” or “OFFSET”, the correct representation should be unambiguous, and it should be easy to print appropriate error messages.  For other cases, I’m not sure what the inference rules are.  It might vary depending on the opcode.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of Eric Astor via llvm-dev
Sent: Monday, January 20, 2020 6:26 PM
To: LLVM-dev <[hidden email]>
Subject: [EXT] [llvm-dev] MASM & RIP-relative addressing

 

Hi all,

 

Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in enabling MASM's convention that (unless specified) all memory location references should be RIP-relative. Without it, we emit the wrong instructions for "call", "jmp", etc., and anything we build fails at the linking stage.

 

My best attempt at this so far is a small patch to X86AsmParser.cpp - just taking any Intel expression with no specified base register and switching it to use RIP - and this works alright. There's at least one exception: it breaks the "jcc" instructions, at least "jcc <label>". The issue seems to be that the "jcc" family exclusively takes a relative offset, never an absolute reference... so adding a base register causes the operand not to match. ("jcc" is always RIP-relative anyway.)

 

I'm not very familiar with the operand-matching logic, and am still pretty new to LLVM as a whole. Are there more X86 instructions this will interact badly with? Any thoughts on how this could be handled better?

 

If this is mostly a valid approach, might there be a way to change the operand type of "jcc" to accept offset(base) operands, as long as base == X86::RIP, then ignore the RIP bit?

 

Thanks,

- Eric


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MASM & RIP-relative addressing

Alex Brachet-Mialot via llvm-dev
Clarifying a minor copy/paste error, ml64.exe actually outputs:

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]
                0000000000000002:  IMAGE_REL_AMD64_REL32        foo

In other words, the relocation info is the same... but the instruction uses RIP-relative addressing, not absolute.

On Tue, Jan 21, 2020 at 5:41 PM Eric Astor <[hidden email]> wrote:
Apologies - I apparently remembered part of the issue incorrectly, so this ended up quite confusing. The problem comes when referencing labels in a different section of the binary. To clarify, if I assemble the code:

.data
foo BYTE 5
.code
mov eax, foo

with Microsoft's ml64.exe, it emits an object file disassembling to:

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]
                000000000000000b:  IMAGE_REL_AMD64_REL32        foo

On the other hand, if I use my current local draft of llvm-ml, I get a different result. I actually get the same result as I do for llvm-mc, using the corresponding code:

.data
foo:
.byte 5
.text
.intel_syntax
mov eax, foo

Either way, LLVM emits an object file with disassembly (and relocation) as follows:

       0:       8b 04 25 00 00 00 00    mov     eax, dword ptr [0]
                0000000000000003:  IMAGE_REL_AMD64_ADDR32       foo


To replicate the results from ml64.exe with LLVM, I instead need to use

mov eax, [foo + rip]

in place of mov eax, foo. At least when building with llvm-ml, we need to mimic ml.exe's approach; a reference to a symbol in another section should use the relative addressing mode.

My first attempt to fix this was very clumsy - when in MASM mode, I forced all expressions without a base register to presume RIP. Unfortunately, that breaks any attempt to use "jcc", since it turns label references into absolute memory references with a base register (and the "jcc" family doesn't accept absolute memory operands). Any suggestions for how I can fix the issue described here without breaking "jcc"?

On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <[hidden email]> wrote:

All immediate jump instructions on x86 (call/jmp/jcc) have a relative offset operand.  The destination is, in some sense, “rip-relative”, but we don’t represent it like that in LLVM.  If you look at the TableGen descriptions, jumps use brtarget32, and calls use i32imm_pcrel.  In both Microsoft and GNU assembly syntax, this is something like “call baz”.

 

“call”/”jmp” also have a register/memory form, for indirect calls.  In 64-bit, this allows rip-relative references, to call a function pointer stored in a global variable.  In Microsoft assembly syntax, this is “call QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.

 

For 64-bit x86, any reference to a global has to be a rip-relative address (since all 64-bit programs are position-independent), but on 32-bit x86, it’s also possible to refer to the address of a variable using something like “add eax, OFFSET baz”.

 

For globals which are explicitly labeled “PTR” or “OFFSET”, the correct representation should be unambiguous, and it should be easy to print appropriate error messages.  For other cases, I’m not sure what the inference rules are.  It might vary depending on the opcode.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of Eric Astor via llvm-dev
Sent: Monday, January 20, 2020 6:26 PM
To: LLVM-dev <[hidden email]>
Subject: [EXT] [llvm-dev] MASM & RIP-relative addressing

 

Hi all,

 

Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in enabling MASM's convention that (unless specified) all memory location references should be RIP-relative. Without it, we emit the wrong instructions for "call", "jmp", etc., and anything we build fails at the linking stage.

 

My best attempt at this so far is a small patch to X86AsmParser.cpp - just taking any Intel expression with no specified base register and switching it to use RIP - and this works alright. There's at least one exception: it breaks the "jcc" instructions, at least "jcc <label>". The issue seems to be that the "jcc" family exclusively takes a relative offset, never an absolute reference... so adding a base register causes the operand not to match. ("jcc" is always RIP-relative anyway.)

 

I'm not very familiar with the operand-matching logic, and am still pretty new to LLVM as a whole. Are there more X86 instructions this will interact badly with? Any thoughts on how this could be handled better?

 

If this is mostly a valid approach, might there be a way to change the operand type of "jcc" to accept offset(base) operands, as long as base == X86::RIP, then ignore the RIP bit?

 

Thanks,

- Eric


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MASM & RIP-relative addressing

Alex Brachet-Mialot via llvm-dev

Are you asking what the parsing rules are, or how you should modify the LLVM code to achieve that result?

 

If the latter, you haven’t really given enough detail here.  What code, exactly, have you tried modifying?  Do you have any ideas for how it could work?

 

-Eli

 

From: Eric Astor <[hidden email]>
Sent: Tuesday, January 21, 2020 2:44 PM
To: Eli Friedman <[hidden email]>
Cc: llvm-dev <[hidden email]>
Subject: [EXT] Re: [llvm-dev] MASM & RIP-relative addressing

 

Clarifying a minor copy/paste error, ml64.exe actually outputs:

 

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]

                0000000000000002:  IMAGE_REL_AMD64_REL32        foo

 

In other words, the relocation info is the same... but the instruction uses RIP-relative addressing, not absolute.

 

On Tue, Jan 21, 2020 at 5:41 PM Eric Astor <[hidden email]> wrote:

Apologies - I apparently remembered part of the issue incorrectly, so this ended up quite confusing. The problem comes when referencing labels in a different section of the binary. To clarify, if I assemble the code:

 

.data

foo BYTE 5

.code

mov eax, foo

 

with Microsoft's ml64.exe, it emits an object file disassembling to:

 

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]

                000000000000000b:  IMAGE_REL_AMD64_REL32        foo

 

On the other hand, if I use my current local draft of llvm-ml, I get a different result. I actually get the same result as I do for llvm-mc, using the corresponding code:

 

.data

foo:

.byte 5

.text

.intel_syntax

mov eax, foo

 

Either way, LLVM emits an object file with disassembly (and relocation) as follows:

 

       0:       8b 04 25 00 00 00 00    mov     eax, dword ptr [0]
                0000000000000003:  IMAGE_REL_AMD64_ADDR32       foo

 

To replicate the results from ml64.exe with LLVM, I instead need to use

 

mov eax, [foo + rip]

 

in place of mov eax, foo. At least when building with llvm-ml, we need to mimic ml.exe's approach; a reference to a symbol in another section should use the relative addressing mode.

 

My first attempt to fix this was very clumsy - when in MASM mode, I forced all expressions without a base register to presume RIP. Unfortunately, that breaks any attempt to use "jcc", since it turns label references into absolute memory references with a base register (and the "jcc" family doesn't accept absolute memory operands). Any suggestions for how I can fix the issue described here without breaking "jcc"?

 

On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <[hidden email]> wrote:

All immediate jump instructions on x86 (call/jmp/jcc) have a relative offset operand.  The destination is, in some sense, “rip-relative”, but we don’t represent it like that in LLVM.  If you look at the TableGen descriptions, jumps use brtarget32, and calls use i32imm_pcrel.  In both Microsoft and GNU assembly syntax, this is something like “call baz”.

 

“call”/”jmp” also have a register/memory form, for indirect calls.  In 64-bit, this allows rip-relative references, to call a function pointer stored in a global variable.  In Microsoft assembly syntax, this is “call QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.

 

For 64-bit x86, any reference to a global has to be a rip-relative address (since all 64-bit programs are position-independent), but on 32-bit x86, it’s also possible to refer to the address of a variable using something like “add eax, OFFSET baz”.

 

For globals which are explicitly labeled “PTR” or “OFFSET”, the correct representation should be unambiguous, and it should be easy to print appropriate error messages.  For other cases, I’m not sure what the inference rules are.  It might vary depending on the opcode.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of Eric Astor via llvm-dev
Sent: Monday, January 20, 2020 6:26 PM
To: LLVM-dev <[hidden email]>
Subject: [EXT] [llvm-dev] MASM & RIP-relative addressing

 

Hi all,

 

Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in enabling MASM's convention that (unless specified) all memory location references should be RIP-relative. Without it, we emit the wrong instructions for "call", "jmp", etc., and anything we build fails at the linking stage.

 

My best attempt at this so far is a small patch to X86AsmParser.cpp - just taking any Intel expression with no specified base register and switching it to use RIP - and this works alright. There's at least one exception: it breaks the "jcc" instructions, at least "jcc <label>". The issue seems to be that the "jcc" family exclusively takes a relative offset, never an absolute reference... so adding a base register causes the operand not to match. ("jcc" is always RIP-relative anyway.)

 

I'm not very familiar with the operand-matching logic, and am still pretty new to LLVM as a whole. Are there more X86 instructions this will interact badly with? Any thoughts on how this could be handled better?

 

If this is mostly a valid approach, might there be a way to change the operand type of "jcc" to accept offset(base) operands, as long as base == X86::RIP, then ignore the RIP bit?

 

Thanks,

- Eric


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MASM & RIP-relative addressing

Alex Brachet-Mialot via llvm-dev
I'd been meaning to ask if anyone had any ideas for the LLVM changes. On the other hand, while I was continuing to try to figure out the rules for MASM's implicit RIP-relative addressing logic, I stumbled on (what seems to be) the right thing to modify in LLVM to make this work.

For the record: the issue is that when referencing a memory address, if the instruction can take a relative address and the operand has no base register specified, x64 MASM assumes it should be RIP-relative. My previous attempt to replicate this in LLVM essentially added X86::RIP as the BaseReg for any X86Operand generated from an IntelExpr that didn't already have one... but some operands are used in contexts (such as "jcc" instructions) that don't accept operands with a BaseReg.

My new approach is to fall back to a "DefaultBaseReg" on the X86Operand (set to X86::RIP for operands parsed from x64 MASM, and otherwise to X86::NoRegister) whenever the operand is rendered via addMemOperands... while leaving the BaseReg itself unmodified. This means that instructions (like jcc) that take only AbsMem operands can match the operand without interference, and it will be rendered correctly in the final output, while instructions that take relative addresses will default to RIP-relative addressing when possible.

Best,
- Eric

On Tue, Jan 21, 2020 at 6:16 PM Eli Friedman <[hidden email]> wrote:

Are you asking what the parsing rules are, or how you should modify the LLVM code to achieve that result?

 

If the latter, you haven’t really given enough detail here.  What code, exactly, have you tried modifying?  Do you have any ideas for how it could work?

 

-Eli

 

From: Eric Astor <[hidden email]>
Sent: Tuesday, January 21, 2020 2:44 PM
To: Eli Friedman <[hidden email]>
Cc: llvm-dev <[hidden email]>
Subject: [EXT] Re: [llvm-dev] MASM & RIP-relative addressing

 

Clarifying a minor copy/paste error, ml64.exe actually outputs:

 

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]

                0000000000000002:  IMAGE_REL_AMD64_REL32        foo

 

In other words, the relocation info is the same... but the instruction uses RIP-relative addressing, not absolute.

 

On Tue, Jan 21, 2020 at 5:41 PM Eric Astor <[hidden email]> wrote:

Apologies - I apparently remembered part of the issue incorrectly, so this ended up quite confusing. The problem comes when referencing labels in a different section of the binary. To clarify, if I assemble the code:

 

.data

foo BYTE 5

.code

mov eax, foo

 

with Microsoft's ml64.exe, it emits an object file disassembling to:

 

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]

                000000000000000b:  IMAGE_REL_AMD64_REL32        foo

 

On the other hand, if I use my current local draft of llvm-ml, I get a different result. I actually get the same result as I do for llvm-mc, using the corresponding code:

 

.data

foo:

.byte 5

.text

.intel_syntax

mov eax, foo

 

Either way, LLVM emits an object file with disassembly (and relocation) as follows:

 

       0:       8b 04 25 00 00 00 00    mov     eax, dword ptr [0]
                0000000000000003:  IMAGE_REL_AMD64_ADDR32       foo

 

To replicate the results from ml64.exe with LLVM, I instead need to use

 

mov eax, [foo + rip]

 

in place of mov eax, foo. At least when building with llvm-ml, we need to mimic ml.exe's approach; a reference to a symbol in another section should use the relative addressing mode.

 

My first attempt to fix this was very clumsy - when in MASM mode, I forced all expressions without a base register to presume RIP. Unfortunately, that breaks any attempt to use "jcc", since it turns label references into absolute memory references with a base register (and the "jcc" family doesn't accept absolute memory operands). Any suggestions for how I can fix the issue described here without breaking "jcc"?

 

On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <[hidden email]> wrote:

All immediate jump instructions on x86 (call/jmp/jcc) have a relative offset operand.  The destination is, in some sense, “rip-relative”, but we don’t represent it like that in LLVM.  If you look at the TableGen descriptions, jumps use brtarget32, and calls use i32imm_pcrel.  In both Microsoft and GNU assembly syntax, this is something like “call baz”.

 

“call”/”jmp” also have a register/memory form, for indirect calls.  In 64-bit, this allows rip-relative references, to call a function pointer stored in a global variable.  In Microsoft assembly syntax, this is “call QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.

 

For 64-bit x86, any reference to a global has to be a rip-relative address (since all 64-bit programs are position-independent), but on 32-bit x86, it’s also possible to refer to the address of a variable using something like “add eax, OFFSET baz”.

 

For globals which are explicitly labeled “PTR” or “OFFSET”, the correct representation should be unambiguous, and it should be easy to print appropriate error messages.  For other cases, I’m not sure what the inference rules are.  It might vary depending on the opcode.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of Eric Astor via llvm-dev
Sent: Monday, January 20, 2020 6:26 PM
To: LLVM-dev <[hidden email]>
Subject: [EXT] [llvm-dev] MASM & RIP-relative addressing

 

Hi all,

 

Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in enabling MASM's convention that (unless specified) all memory location references should be RIP-relative. Without it, we emit the wrong instructions for "call", "jmp", etc., and anything we build fails at the linking stage.

 

My best attempt at this so far is a small patch to X86AsmParser.cpp - just taking any Intel expression with no specified base register and switching it to use RIP - and this works alright. There's at least one exception: it breaks the "jcc" instructions, at least "jcc <label>". The issue seems to be that the "jcc" family exclusively takes a relative offset, never an absolute reference... so adding a base register causes the operand not to match. ("jcc" is always RIP-relative anyway.)

 

I'm not very familiar with the operand-matching logic, and am still pretty new to LLVM as a whole. Are there more X86 instructions this will interact badly with? Any thoughts on how this could be handled better?

 

If this is mostly a valid approach, might there be a way to change the operand type of "jcc" to accept offset(base) operands, as long as base == X86::RIP, then ignore the RIP bit?

 

Thanks,

- Eric


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MASM & RIP-relative addressing

Alex Brachet-Mialot via llvm-dev

That makes sense: let matching figure out whether the chosen instruction treats the operand as a branch target or a memory operand, and translate the operand based on that.

 

-Eli

 

From: Eric Astor <[hidden email]>
Sent: Wednesday, January 22, 2020 10:44 AM
To: Eli Friedman <[hidden email]>
Cc: llvm-dev <[hidden email]>
Subject: [EXT] Re: [llvm-dev] MASM & RIP-relative addressing

 

I'd been meaning to ask if anyone had any ideas for the LLVM changes. On the other hand, while I was continuing to try to figure out the rules for MASM's implicit RIP-relative addressing logic, I stumbled on (what seems to be) the right thing to modify in LLVM to make this work.

 

For the record: the issue is that when referencing a memory address, if the instruction can take a relative address and the operand has no base register specified, x64 MASM assumes it should be RIP-relative. My previous attempt to replicate this in LLVM essentially added X86::RIP as the BaseReg for any X86Operand generated from an IntelExpr that didn't already have one... but some operands are used in contexts (such as "jcc" instructions) that don't accept operands with a BaseReg.

 

My new approach is to fall back to a "DefaultBaseReg" on the X86Operand (set to X86::RIP for operands parsed from x64 MASM, and otherwise to X86::NoRegister) whenever the operand is rendered via addMemOperands... while leaving the BaseReg itself unmodified. This means that instructions (like jcc) that take only AbsMem operands can match the operand without interference, and it will be rendered correctly in the final output, while instructions that take relative addresses will default to RIP-relative addressing when possible.

 

Best,

- Eric

 

On Tue, Jan 21, 2020 at 6:16 PM Eli Friedman <[hidden email]> wrote:

Are you asking what the parsing rules are, or how you should modify the LLVM code to achieve that result?

 

If the latter, you haven’t really given enough detail here.  What code, exactly, have you tried modifying?  Do you have any ideas for how it could work?

 

-Eli

 

From: Eric Astor <[hidden email]>
Sent: Tuesday, January 21, 2020 2:44 PM
To: Eli Friedman <[hidden email]>
Cc: llvm-dev <[hidden email]>
Subject: [EXT] Re: [llvm-dev] MASM & RIP-relative addressing

 

Clarifying a minor copy/paste error, ml64.exe actually outputs:

 

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]

                0000000000000002:  IMAGE_REL_AMD64_REL32        foo

 

In other words, the relocation info is the same... but the instruction uses RIP-relative addressing, not absolute.

 

On Tue, Jan 21, 2020 at 5:41 PM Eric Astor <[hidden email]> wrote:

Apologies - I apparently remembered part of the issue incorrectly, so this ended up quite confusing. The problem comes when referencing labels in a different section of the binary. To clarify, if I assemble the code:

 

.data

foo BYTE 5

.code

mov eax, foo

 

with Microsoft's ml64.exe, it emits an object file disassembling to:

 

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]

                000000000000000b:  IMAGE_REL_AMD64_REL32        foo

 

On the other hand, if I use my current local draft of llvm-ml, I get a different result. I actually get the same result as I do for llvm-mc, using the corresponding code:

 

.data

foo:

.byte 5

.text

.intel_syntax

mov eax, foo

 

Either way, LLVM emits an object file with disassembly (and relocation) as follows:

 

       0:       8b 04 25 00 00 00 00    mov     eax, dword ptr [0]
                0000000000000003:  IMAGE_REL_AMD64_ADDR32       foo

 

To replicate the results from ml64.exe with LLVM, I instead need to use

 

mov eax, [foo + rip]

 

in place of mov eax, foo. At least when building with llvm-ml, we need to mimic ml.exe's approach; a reference to a symbol in another section should use the relative addressing mode.

 

My first attempt to fix this was very clumsy - when in MASM mode, I forced all expressions without a base register to presume RIP. Unfortunately, that breaks any attempt to use "jcc", since it turns label references into absolute memory references with a base register (and the "jcc" family doesn't accept absolute memory operands). Any suggestions for how I can fix the issue described here without breaking "jcc"?

 

On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <[hidden email]> wrote:

All immediate jump instructions on x86 (call/jmp/jcc) have a relative offset operand.  The destination is, in some sense, “rip-relative”, but we don’t represent it like that in LLVM.  If you look at the TableGen descriptions, jumps use brtarget32, and calls use i32imm_pcrel.  In both Microsoft and GNU assembly syntax, this is something like “call baz”.

 

“call”/”jmp” also have a register/memory form, for indirect calls.  In 64-bit, this allows rip-relative references, to call a function pointer stored in a global variable.  In Microsoft assembly syntax, this is “call QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.

 

For 64-bit x86, any reference to a global has to be a rip-relative address (since all 64-bit programs are position-independent), but on 32-bit x86, it’s also possible to refer to the address of a variable using something like “add eax, OFFSET baz”.

 

For globals which are explicitly labeled “PTR” or “OFFSET”, the correct representation should be unambiguous, and it should be easy to print appropriate error messages.  For other cases, I’m not sure what the inference rules are.  It might vary depending on the opcode.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of Eric Astor via llvm-dev
Sent: Monday, January 20, 2020 6:26 PM
To: LLVM-dev <[hidden email]>
Subject: [EXT] [llvm-dev] MASM & RIP-relative addressing

 

Hi all,

 

Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in enabling MASM's convention that (unless specified) all memory location references should be RIP-relative. Without it, we emit the wrong instructions for "call", "jmp", etc., and anything we build fails at the linking stage.

 

My best attempt at this so far is a small patch to X86AsmParser.cpp - just taking any Intel expression with no specified base register and switching it to use RIP - and this works alright. There's at least one exception: it breaks the "jcc" instructions, at least "jcc <label>". The issue seems to be that the "jcc" family exclusively takes a relative offset, never an absolute reference... so adding a base register causes the operand not to match. ("jcc" is always RIP-relative anyway.)

 

I'm not very familiar with the operand-matching logic, and am still pretty new to LLVM as a whole. Are there more X86 instructions this will interact badly with? Any thoughts on how this could be handled better?

 

If this is mostly a valid approach, might there be a way to change the operand type of "jcc" to accept offset(base) operands, as long as base == X86::RIP, then ignore the RIP bit?

 

Thanks,

- Eric


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev