[llvm-dev] MachineFunction Instructions Pass using Segment Registers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
Dear All,

Currently I am trying to inject custom x86-64 assembly into a
functions entry basic block. More specifically, I am trying to build
assembly in a machine function pass from scratch.

While the dumped machine function instruction info displays that %gs
will be used, when I perform objdump -d on my executable I am see that
%gs is replaced by %ebp? Why is this happening?

I know it probably has something to do with me not specifying operands
properly, but I cannot find enough documentation on this besides
looking through code comments such as X86BaseInfo.cpp. I feel there
isn't enough for me to be able to connect the dots.

Below I have sample code: %gs holds a base address to a memory
location where I am trying to store information. I am trying to update
the %gs register pointer location before saving more values, etc.

LLVM C++ codeMachine Function pass code:
MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
TII->get(X86::SUB32ri),X86::GS)
                    .addReg(X86::GS)
                    .addImm(0x8);

machine function pass dump:
 %gs = SUB32ri %gs, 8, implicit-def %eflags

Objdump -d assembly from executable
  400510:   81 ed 04 00 00 00       sub    $0x8,%ebp


TLDR: I am trying to create custom assembly via BuildMI() and manipulate segment
registers via a MachineFunctionPass.

I have looked at LLVMs safestack implementation, but they are taking a
fairly complicated hybrid approach between an IR Function pass with
Backend support. I would like to stay as a single machinefunction
pass.

Believe me I would do this at the IR level if I didnt need to
specifically use the segment registers.

Thanks for the help in advance!

Sincerely,

Christopher Jelesnianski
Graduate Research Assistant
Virginia Tech
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
The SUB32ri can't instruction can't operate on segment registers. It operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 bits of the register value make it into the binary encoding. Objdump just extracts those 3 or 4 bits back out and prints one of the EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.

~Craig


On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev <[hidden email]> wrote:
Dear All,

Currently I am trying to inject custom x86-64 assembly into a
functions entry basic block. More specifically, I am trying to build
assembly in a machine function pass from scratch.

While the dumped machine function instruction info displays that %gs
will be used, when I perform objdump -d on my executable I am see that
%gs is replaced by %ebp? Why is this happening?

I know it probably has something to do with me not specifying operands
properly, but I cannot find enough documentation on this besides
looking through code comments such as X86BaseInfo.cpp. I feel there
isn't enough for me to be able to connect the dots.

Below I have sample code: %gs holds a base address to a memory
location where I am trying to store information. I am trying to update
the %gs register pointer location before saving more values, etc.

LLVM C++ codeMachine Function pass code:
MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
TII->get(X86::SUB32ri),X86::GS)
                    .addReg(X86::GS)
                    .addImm(0x8);

machine function pass dump:
 %gs = SUB32ri %gs, 8, implicit-def %eflags

Objdump -d assembly from executable
  400510:   81 ed 04 00 00 00       sub    $0x8,%ebp


TLDR: I am trying to create custom assembly via BuildMI() and manipulate segment
registers via a MachineFunctionPass.

I have looked at LLVMs safestack implementation, but they are taking a
fairly complicated hybrid approach between an IR Function pass with
Backend support. I would like to stay as a single machinefunction
pass.

Believe me I would do this at the IR level if I didnt need to
specifically use the segment registers.

Thanks for the help in advance!

Sincerely,

Christopher Jelesnianski
Graduate Research Assistant
Virginia Tech
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
More specifically there is no instruction that can add/subtract segment registers. They can only be updated my the mov segment register instructions, opcodes 0x8c and 0x8e in x86 assembly.

I suggest you write the text version of the assembly you want to generate and assemble it with llvm-mc. This will tell you if its even valid. After that you can use -show-inst to print the names of the instructions that X86 uses that you can give to BuildMI. 

~Craig


On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]> wrote:
The SUB32ri can't instruction can't operate on segment registers. It operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 bits of the register value make it into the binary encoding. Objdump just extracts those 3 or 4 bits back out and prints one of the EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.

~Craig


On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev <[hidden email]> wrote:
Dear All,

Currently I am trying to inject custom x86-64 assembly into a
functions entry basic block. More specifically, I am trying to build
assembly in a machine function pass from scratch.

While the dumped machine function instruction info displays that %gs
will be used, when I perform objdump -d on my executable I am see that
%gs is replaced by %ebp? Why is this happening?

I know it probably has something to do with me not specifying operands
properly, but I cannot find enough documentation on this besides
looking through code comments such as X86BaseInfo.cpp. I feel there
isn't enough for me to be able to connect the dots.

Below I have sample code: %gs holds a base address to a memory
location where I am trying to store information. I am trying to update
the %gs register pointer location before saving more values, etc.

LLVM C++ codeMachine Function pass code:
MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
TII->get(X86::SUB32ri),X86::GS)
                    .addReg(X86::GS)
                    .addImm(0x8);

machine function pass dump:
 %gs = SUB32ri %gs, 8, implicit-def %eflags

Objdump -d assembly from executable
  400510:   81 ed 04 00 00 00       sub    $0x8,%ebp


TLDR: I am trying to create custom assembly via BuildMI() and manipulate segment
registers via a MachineFunctionPass.

I have looked at LLVMs safestack implementation, but they are taking a
fairly complicated hybrid approach between an IR Function pass with
Backend support. I would like to stay as a single machinefunction
pass.

Believe me I would do this at the IR level if I didnt need to
specifically use the segment registers.

Thanks for the help in advance!

Sincerely,

Christopher Jelesnianski
Graduate Research Assistant
Virginia Tech
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
Dear Craig,

Thank you super much for the quick reply! Yea I'm still new to working
on the back-end and that sounds great. I already have the raw assembly
of what I want to accomplish so this is perfect. I just tried it and
yea, I will have to break down my assembly even further to more
simpler operations. You're right about my assembly dealing with
segment registers as I'm getting the following error:
"error: unknown use of instruction mnemonic without a size suffix"

Just curious, what does it mean by size suffix??

It's super cool to see the equivalent with "-show-inst"!!! Thank you
so much for this help!

Last note, I know that the definitions (e.g. def SUB32ri) of the
various instructions can be found in the various ****.td, but is there
documentation where the meaning or quick reference of every
X86::XXXXXX llvm instruction macro can found, so I can quickly pick
and choose which actual macro I need to use, to "work forwards" rather
than working backwards by writing the assembly first and using llvm-mc
-show-inst  ??

Thanks super much again.

Sincerely,

Chris Jelesnianski
Graduate Research Assistant
Virginia Tech

On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <[hidden email]> wrote:

> More specifically there is no instruction that can add/subtract segment
> registers. They can only be updated my the mov segment register
> instructions, opcodes 0x8c and 0x8e in x86 assembly.
>
> I suggest you write the text version of the assembly you want to generate
> and assemble it with llvm-mc. This will tell you if its even valid. After
> that you can use -show-inst to print the names of the instructions that X86
> uses that you can give to BuildMI.
>
> ~Craig
>
>
> On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]> wrote:
>>
>> The SUB32ri can't instruction can't operate on segment registers. It
>> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 bits
>> of the register value make it into the binary encoding. Objdump just
>> extracts those 3 or 4 bits back out and prints one of the
>> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>>
>> ~Craig
>>
>>
>> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> <[hidden email]> wrote:
>>>
>>> Dear All,
>>>
>>> Currently I am trying to inject custom x86-64 assembly into a
>>> functions entry basic block. More specifically, I am trying to build
>>> assembly in a machine function pass from scratch.
>>>
>>> While the dumped machine function instruction info displays that %gs
>>> will be used, when I perform objdump -d on my executable I am see that
>>> %gs is replaced by %ebp? Why is this happening?
>>>
>>> I know it probably has something to do with me not specifying operands
>>> properly, but I cannot find enough documentation on this besides
>>> looking through code comments such as X86BaseInfo.cpp. I feel there
>>> isn't enough for me to be able to connect the dots.
>>>
>>> Below I have sample code: %gs holds a base address to a memory
>>> location where I am trying to store information. I am trying to update
>>> the %gs register pointer location before saving more values, etc.
>>>
>>> LLVM C++ codeMachine Function pass code:
>>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>>> TII->get(X86::SUB32ri),X86::GS)
>>>                     .addReg(X86::GS)
>>>                     .addImm(0x8);
>>>
>>> machine function pass dump:
>>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>>>
>>> Objdump -d assembly from executable
>>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>>>
>>>
>>> TLDR: I am trying to create custom assembly via BuildMI() and manipulate
>>> segment
>>> registers via a MachineFunctionPass.
>>>
>>> I have looked at LLVMs safestack implementation, but they are taking a
>>> fairly complicated hybrid approach between an IR Function pass with
>>> Backend support. I would like to stay as a single machinefunction
>>> pass.
>>>
>>> Believe me I would do this at the IR level if I didnt need to
>>> specifically use the segment registers.
>>>
>>> Thanks for the help in advance!
>>>
>>> Sincerely,
>>>
>>> Christopher Jelesnianski
>>> Graduate Research Assistant
>>> Virginia Tech
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
The size suffix thing is a weird quirk in our assembler I should look into fixing. Instructions in at&t syntax usually have a size suffix that is often optional

For example:
  add %ax, %bx
and
  addw %ax, %bx

Are equivalent because the register name indicates the size.

but for an instruction like this
  addw $1, (%ax)

There is nothing to infer the size from so an explicit suffix is required.

So for an instruction like "add %ax, %bx" from above, we try to guess the size suffix from the register. In your case, you used a segment register which we couldn't guess the size from. And then we printed a bad error message.

There's no quick reference as such for the meaning of the various X86::XXXXXX names. But the complete list of them is in lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are meant to be fairly straight forward to understand. The first part of the name should almost always be the instruction name from the Intel/AMD manuals. The lower case letters at the end sort of convey operand types, but often not the number of operands even though it looks that way. The most common letters are 'r' for register, 'm' for memory and 'i' for immediate. Numbers after 'i' specify the size of the immediate if its important to distinguish from other sizes or different than the size of the instruction. The lower case letters are most useful to distinguish different instructions from each other. So for example, if two instructions only differ in the lower case letters and one says "rr" and one says "rm", the first is the register form and the second is the memory form of the same instruction.

~Craig


On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <[hidden email]> wrote:
Dear Craig,

Thank you super much for the quick reply! Yea I'm still new to working
on the back-end and that sounds great. I already have the raw assembly
of what I want to accomplish so this is perfect. I just tried it and
yea, I will have to break down my assembly even further to more
simpler operations. You're right about my assembly dealing with
segment registers as I'm getting the following error:
"error: unknown use of instruction mnemonic without a size suffix"

Just curious, what does it mean by size suffix??

It's super cool to see the equivalent with "-show-inst"!!! Thank you
so much for this help!

Last note, I know that the definitions (e.g. def SUB32ri) of the
various instructions can be found in the various ****.td, but is there
documentation where the meaning or quick reference of every
X86::XXXXXX llvm instruction macro can found, so I can quickly pick
and choose which actual macro I need to use, to "work forwards" rather
than working backwards by writing the assembly first and using llvm-mc
-show-inst  ??

Thanks super much again.

Sincerely,

Chris Jelesnianski
Graduate Research Assistant
Virginia Tech

On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <[hidden email]> wrote:
> More specifically there is no instruction that can add/subtract segment
> registers. They can only be updated my the mov segment register
> instructions, opcodes 0x8c and 0x8e in x86 assembly.
>
> I suggest you write the text version of the assembly you want to generate
> and assemble it with llvm-mc. This will tell you if its even valid. After
> that you can use -show-inst to print the names of the instructions that X86
> uses that you can give to BuildMI.
>
> ~Craig
>
>
> On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]> wrote:
>>
>> The SUB32ri can't instruction can't operate on segment registers. It
>> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4 bits
>> of the register value make it into the binary encoding. Objdump just
>> extracts those 3 or 4 bits back out and prints one of the
>> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>>
>> ~Craig
>>
>>
>> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> <[hidden email]> wrote:
>>>
>>> Dear All,
>>>
>>> Currently I am trying to inject custom x86-64 assembly into a
>>> functions entry basic block. More specifically, I am trying to build
>>> assembly in a machine function pass from scratch.
>>>
>>> While the dumped machine function instruction info displays that %gs
>>> will be used, when I perform objdump -d on my executable I am see that
>>> %gs is replaced by %ebp? Why is this happening?
>>>
>>> I know it probably has something to do with me not specifying operands
>>> properly, but I cannot find enough documentation on this besides
>>> looking through code comments such as X86BaseInfo.cpp. I feel there
>>> isn't enough for me to be able to connect the dots.
>>>
>>> Below I have sample code: %gs holds a base address to a memory
>>> location where I am trying to store information. I am trying to update
>>> the %gs register pointer location before saving more values, etc.
>>>
>>> LLVM C++ codeMachine Function pass code:
>>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>>> TII->get(X86::SUB32ri),X86::GS)
>>>                     .addReg(X86::GS)
>>>                     .addImm(0x8);
>>>
>>> machine function pass dump:
>>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>>>
>>> Objdump -d assembly from executable
>>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>>>
>>>
>>> TLDR: I am trying to create custom assembly via BuildMI() and manipulate
>>> segment
>>> registers via a MachineFunctionPass.
>>>
>>> I have looked at LLVMs safestack implementation, but they are taking a
>>> fairly complicated hybrid approach between an IR Function pass with
>>> Backend support. I would like to stay as a single machinefunction
>>> pass.
>>>
>>> Believe me I would do this at the IR level if I didnt need to
>>> specifically use the segment registers.
>>>
>>> Thanks for the help in advance!
>>>
>>> Sincerely,
>>>
>>> Christopher Jelesnianski
>>> Graduate Research Assistant
>>> Virginia Tech
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
Dear Craig,

Thanks for the help so far. I have rewritten my assembly to comply
with user-land not being able to directly modify the segment registers
%GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
instruction + operands. Now I am working backwards to actually code
this assembly into my MachineFunctionPass and got the easy assembly
implemented, however my more complicated asm is still struggling as I
am still seeing 0x0(%rbp) instead of (%gs) or errors.
Core question here being: how do I properly create BuildMI statements
for assembly dealing with offsets?
-------------------------------------------------------------------------------------------------
Assembly I want to translate:
mov   (%gs), %r14                  //get value off %GS base addresss
mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) + R14 ]
--------------------------------------------------------------------------------------------------
LLVM-MC -show inst gives:
movq    (%gs), %r14          # <MCInst #1810 MOV64rm
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:0>>
movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Reg:118>>
-------------------------------------------------------------------------------------------------------
I'll be honest and say I don't really know how to add the operands
properly to BuildMI. I figured out the following so far
MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
want), where instruction result goes)

this has .add(MachineOperand)
            .addReg(X86::a reg macro)
            .addIMM(a constant like 0x8)
            and a few more I dont think apply to me.

but I am not sure I must follow a specific order? I am assuming yes
and it has something to do with X86InstrInfo.td definitions, but not
sure.
--------------------------------------------------------------------------------------------------------
LLVM C++ code I tried to translate this to:
/* 1 mov   (%gs), %r14 */
    MachineInstrBuilder e1 =
BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
       .addReg(X86::GS);
/* 2 mov %r15, %gs:0x0(%r14) */
    MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
    MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
    MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
    MachineOperand disp = MachineOperand::CreateImm(0x0);

    BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
      .add(baseReg)
      .add(scaleAmt)
      .add(indexReg);

/* both instructions give the following error

clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::operator[](llvm::SmallVectorTemplateCommon<T,
<template-parameter-1-2> >::size_type) const [with T =
llvm::MCOperand; <template-parameter-1-2> = void;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::const_reference = const llvm::MCOperand&;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::size_type = long unsigned int]: Assertion `idx < size()' failed.

I saw this function in the code base but not sure what it does
"addDirectMem(MachineInstructionBuilder_thing, register you want to
use);"


This is be the last bit of information I think I need to finish up
this implementation. Thanks again for your help!

Sincerely,

Chris Jelesnianski

On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <[hidden email]> wrote:

> The size suffix thing is a weird quirk in our assembler I should look into
> fixing. Instructions in at&t syntax usually have a size suffix that is often
> optional
>
> For example:
>   add %ax, %bx
> and
>   addw %ax, %bx
>
> Are equivalent because the register name indicates the size.
>
> but for an instruction like this
>   addw $1, (%ax)
>
> There is nothing to infer the size from so an explicit suffix is required.
>
> So for an instruction like "add %ax, %bx" from above, we try to guess the
> size suffix from the register. In your case, you used a segment register
> which we couldn't guess the size from. And then we printed a bad error
> message.
>
> There's no quick reference as such for the meaning of the various
> X86::XXXXXX names. But the complete list of them is in
> lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are meant
> to be fairly straight forward to understand. The first part of the name
> should almost always be the instruction name from the Intel/AMD manuals. The
> lower case letters at the end sort of convey operand types, but often not
> the number of operands even though it looks that way. The most common
> letters are 'r' for register, 'm' for memory and 'i' for immediate. Numbers
> after 'i' specify the size of the immediate if its important to distinguish
> from other sizes or different than the size of the instruction. The lower
> case letters are most useful to distinguish different instructions from each
> other. So for example, if two instructions only differ in the lower case
> letters and one says "rr" and one says "rm", the first is the register form
> and the second is the memory form of the same instruction.
>
> ~Craig
>
>
> On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <[hidden email]> wrote:
>>
>> Dear Craig,
>>
>> Thank you super much for the quick reply! Yea I'm still new to working
>> on the back-end and that sounds great. I already have the raw assembly
>> of what I want to accomplish so this is perfect. I just tried it and
>> yea, I will have to break down my assembly even further to more
>> simpler operations. You're right about my assembly dealing with
>> segment registers as I'm getting the following error:
>> "error: unknown use of instruction mnemonic without a size suffix"
>>
>> Just curious, what does it mean by size suffix??
>>
>> It's super cool to see the equivalent with "-show-inst"!!! Thank you
>> so much for this help!
>>
>> Last note, I know that the definitions (e.g. def SUB32ri) of the
>> various instructions can be found in the various ****.td, but is there
>> documentation where the meaning or quick reference of every
>> X86::XXXXXX llvm instruction macro can found, so I can quickly pick
>> and choose which actual macro I need to use, to "work forwards" rather
>> than working backwards by writing the assembly first and using llvm-mc
>> -show-inst  ??
>>
>> Thanks super much again.
>>
>> Sincerely,
>>
>> Chris Jelesnianski
>> Graduate Research Assistant
>> Virginia Tech
>>
>> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <[hidden email]>
>> wrote:
>> > More specifically there is no instruction that can add/subtract segment
>> > registers. They can only be updated my the mov segment register
>> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
>> >
>> > I suggest you write the text version of the assembly you want to
>> > generate
>> > and assemble it with llvm-mc. This will tell you if its even valid.
>> > After
>> > that you can use -show-inst to print the names of the instructions that
>> > X86
>> > uses that you can give to BuildMI.
>> >
>> > ~Craig
>> >
>> >
>> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]>
>> > wrote:
>> >>
>> >> The SUB32ri can't instruction can't operate on segment registers. It
>> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4
>> >> bits
>> >> of the register value make it into the binary encoding. Objdump just
>> >> extracts those 3 or 4 bits back out and prints one of the
>> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>> >>
>> >> ~Craig
>> >>
>> >>
>> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> >> <[hidden email]> wrote:
>> >>>
>> >>> Dear All,
>> >>>
>> >>> Currently I am trying to inject custom x86-64 assembly into a
>> >>> functions entry basic block. More specifically, I am trying to build
>> >>> assembly in a machine function pass from scratch.
>> >>>
>> >>> While the dumped machine function instruction info displays that %gs
>> >>> will be used, when I perform objdump -d on my executable I am see that
>> >>> %gs is replaced by %ebp? Why is this happening?
>> >>>
>> >>> I know it probably has something to do with me not specifying operands
>> >>> properly, but I cannot find enough documentation on this besides
>> >>> looking through code comments such as X86BaseInfo.cpp. I feel there
>> >>> isn't enough for me to be able to connect the dots.
>> >>>
>> >>> Below I have sample code: %gs holds a base address to a memory
>> >>> location where I am trying to store information. I am trying to update
>> >>> the %gs register pointer location before saving more values, etc.
>> >>>
>> >>> LLVM C++ codeMachine Function pass code:
>> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>> >>> TII->get(X86::SUB32ri),X86::GS)
>> >>>                     .addReg(X86::GS)
>> >>>                     .addImm(0x8);
>> >>>
>> >>> machine function pass dump:
>> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>> >>>
>> >>> Objdump -d assembly from executable
>> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>> >>>
>> >>>
>> >>> TLDR: I am trying to create custom assembly via BuildMI() and
>> >>> manipulate
>> >>> segment
>> >>> registers via a MachineFunctionPass.
>> >>>
>> >>> I have looked at LLVMs safestack implementation, but they are taking a
>> >>> fairly complicated hybrid approach between an IR Function pass with
>> >>> Backend support. I would like to stay as a single machinefunction
>> >>> pass.
>> >>>
>> >>> Believe me I would do this at the IR level if I didnt need to
>> >>> specifically use the segment registers.
>> >>>
>> >>> Thanks for the help in advance!
>> >>>
>> >>> Sincerely,
>> >>>
>> >>> Christopher Jelesnianski
>> >>> Graduate Research Assistant
>> >>> Virginia Tech
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
This shouldn't have parsed. 

movq    (%gs), %r14 

That's trying to use%gs as a base register which isn't valid. GNU assembler rejects it. And coincidentally llvm-mc started rejecting it on trunk late last week.  That's probably why it printed as %ebp.

I don't know if there is an instruction to read the base of %gs directly. Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs.. But ussing %gs as part of the memory address for any other instruction is automatically relative to the base of %gs.

 
~Craig


On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <[hidden email]> wrote:
Dear Craig,

Thanks for the help so far. I have rewritten my assembly to comply
with user-land not being able to directly modify the segment registers
%GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
instruction + operands. Now I am working backwards to actually code
this assembly into my MachineFunctionPass and got the easy assembly
implemented, however my more complicated asm is still struggling as I
am still seeing 0x0(%rbp) instead of (%gs) or errors.
Core question here being: how do I properly create BuildMI statements
for assembly dealing with offsets?
-------------------------------------------------------------------------------------------------
Assembly I want to translate:
mov   (%gs), %r14                  //get value off %GS base addresss
mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) + R14 ]
--------------------------------------------------------------------------------------------------
LLVM-MC -show inst gives:
movq    (%gs), %r14          # <MCInst #1810 MOV64rm
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:0>>
movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Reg:118>>
-------------------------------------------------------------------------------------------------------
I'll be honest and say I don't really know how to add the operands
properly to BuildMI. I figured out the following so far
MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
want), where instruction result goes)

this has .add(MachineOperand)
            .addReg(X86::a reg macro)
            .addIMM(a constant like 0x8)
            and a few more I dont think apply to me.

but I am not sure I must follow a specific order? I am assuming yes
and it has something to do with X86InstrInfo.td definitions, but not
sure.
--------------------------------------------------------------------------------------------------------
LLVM C++ code I tried to translate this to:
/* 1 mov   (%gs), %r14 */
    MachineInstrBuilder e1 =
BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
       .addReg(X86::GS);
/* 2 mov %r15, %gs:0x0(%r14) */
    MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
    MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
    MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
    MachineOperand disp = MachineOperand::CreateImm(0x0);

    BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
      .add(baseReg)
      .add(scaleAmt)
      .add(indexReg);

/* both instructions give the following error

clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::operator[](llvm::SmallVectorTemplateCommon<T,
<template-parameter-1-2> >::size_type) const [with T =
llvm::MCOperand; <template-parameter-1-2> = void;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::const_reference = const llvm::MCOperand&;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::size_type = long unsigned int]: Assertion `idx < size()' failed.

I saw this function in the code base but not sure what it does
"addDirectMem(MachineInstructionBuilder_thing, register you want to
use);"


This is be the last bit of information I think I need to finish up
this implementation. Thanks again for your help!

Sincerely,

Chris Jelesnianski

On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <[hidden email]> wrote:
> The size suffix thing is a weird quirk in our assembler I should look into
> fixing. Instructions in at&t syntax usually have a size suffix that is often
> optional
>
> For example:
>   add %ax, %bx
> and
>   addw %ax, %bx
>
> Are equivalent because the register name indicates the size.
>
> but for an instruction like this
>   addw $1, (%ax)
>
> There is nothing to infer the size from so an explicit suffix is required.
>
> So for an instruction like "add %ax, %bx" from above, we try to guess the
> size suffix from the register. In your case, you used a segment register
> which we couldn't guess the size from. And then we printed a bad error
> message.
>
> There's no quick reference as such for the meaning of the various
> X86::XXXXXX names. But the complete list of them is in
> lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are meant
> to be fairly straight forward to understand. The first part of the name
> should almost always be the instruction name from the Intel/AMD manuals. The
> lower case letters at the end sort of convey operand types, but often not
> the number of operands even though it looks that way. The most common
> letters are 'r' for register, 'm' for memory and 'i' for immediate. Numbers
> after 'i' specify the size of the immediate if its important to distinguish
> from other sizes or different than the size of the instruction. The lower
> case letters are most useful to distinguish different instructions from each
> other. So for example, if two instructions only differ in the lower case
> letters and one says "rr" and one says "rm", the first is the register form
> and the second is the memory form of the same instruction.
>
> ~Craig
>
>
> On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <[hidden email]> wrote:
>>
>> Dear Craig,
>>
>> Thank you super much for the quick reply! Yea I'm still new to working
>> on the back-end and that sounds great. I already have the raw assembly
>> of what I want to accomplish so this is perfect. I just tried it and
>> yea, I will have to break down my assembly even further to more
>> simpler operations. You're right about my assembly dealing with
>> segment registers as I'm getting the following error:
>> "error: unknown use of instruction mnemonic without a size suffix"
>>
>> Just curious, what does it mean by size suffix??
>>
>> It's super cool to see the equivalent with "-show-inst"!!! Thank you
>> so much for this help!
>>
>> Last note, I know that the definitions (e.g. def SUB32ri) of the
>> various instructions can be found in the various ****.td, but is there
>> documentation where the meaning or quick reference of every
>> X86::XXXXXX llvm instruction macro can found, so I can quickly pick
>> and choose which actual macro I need to use, to "work forwards" rather
>> than working backwards by writing the assembly first and using llvm-mc
>> -show-inst  ??
>>
>> Thanks super much again.
>>
>> Sincerely,
>>
>> Chris Jelesnianski
>> Graduate Research Assistant
>> Virginia Tech
>>
>> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <[hidden email]>
>> wrote:
>> > More specifically there is no instruction that can add/subtract segment
>> > registers. They can only be updated my the mov segment register
>> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
>> >
>> > I suggest you write the text version of the assembly you want to
>> > generate
>> > and assemble it with llvm-mc. This will tell you if its even valid.
>> > After
>> > that you can use -show-inst to print the names of the instructions that
>> > X86
>> > uses that you can give to BuildMI.
>> >
>> > ~Craig
>> >
>> >
>> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]>
>> > wrote:
>> >>
>> >> The SUB32ri can't instruction can't operate on segment registers. It
>> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4
>> >> bits
>> >> of the register value make it into the binary encoding. Objdump just
>> >> extracts those 3 or 4 bits back out and prints one of the
>> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>> >>
>> >> ~Craig
>> >>
>> >>
>> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> >> <[hidden email]> wrote:
>> >>>
>> >>> Dear All,
>> >>>
>> >>> Currently I am trying to inject custom x86-64 assembly into a
>> >>> functions entry basic block. More specifically, I am trying to build
>> >>> assembly in a machine function pass from scratch.
>> >>>
>> >>> While the dumped machine function instruction info displays that %gs
>> >>> will be used, when I perform objdump -d on my executable I am see that
>> >>> %gs is replaced by %ebp? Why is this happening?
>> >>>
>> >>> I know it probably has something to do with me not specifying operands
>> >>> properly, but I cannot find enough documentation on this besides
>> >>> looking through code comments such as X86BaseInfo.cpp. I feel there
>> >>> isn't enough for me to be able to connect the dots.
>> >>>
>> >>> Below I have sample code: %gs holds a base address to a memory
>> >>> location where I am trying to store information. I am trying to update
>> >>> the %gs register pointer location before saving more values, etc.
>> >>>
>> >>> LLVM C++ codeMachine Function pass code:
>> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>> >>> TII->get(X86::SUB32ri),X86::GS)
>> >>>                     .addReg(X86::GS)
>> >>>                     .addImm(0x8);
>> >>>
>> >>> machine function pass dump:
>> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>> >>>
>> >>> Objdump -d assembly from executable
>> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>> >>>
>> >>>
>> >>> TLDR: I am trying to create custom assembly via BuildMI() and
>> >>> manipulate
>> >>> segment
>> >>> registers via a MachineFunctionPass.
>> >>>
>> >>> I have looked at LLVMs safestack implementation, but they are taking a
>> >>> fairly complicated hybrid approach between an IR Function pass with
>> >>> Backend support. I would like to stay as a single machinefunction
>> >>> pass.
>> >>>
>> >>> Believe me I would do this at the IR level if I didnt need to
>> >>> specifically use the segment registers.
>> >>>
>> >>> Thanks for the help in advance!
>> >>>
>> >>> Sincerely,
>> >>>
>> >>> Christopher Jelesnianski
>> >>> Graduate Research Assistant
>> >>> Virginia Tech
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
BTW: If you work on the MI level, then I recommend to use a debug build of llvm and to pass -verify-machineinstrs to llc and it should catch you using registers that are not part of the instructions register classes.

- Matthias

On Jun 26, 2018, at 1:13 PM, Craig Topper via llvm-dev <[hidden email]> wrote:

This shouldn't have parsed. 

movq    (%gs), %r14 

That's trying to use%gs as a base register which isn't valid. GNU assembler rejects it. And coincidentally llvm-mc started rejecting it on trunk late last week.  That's probably why it printed as %ebp.

I don't know if there is an instruction to read the base of %gs directly. Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs.. But ussing %gs as part of the memory address for any other instruction is automatically relative to the base of %gs.

 
~Craig


On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <[hidden email]> wrote:
Dear Craig,

Thanks for the help so far. I have rewritten my assembly to comply
with user-land not being able to directly modify the segment registers
%GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
instruction + operands. Now I am working backwards to actually code
this assembly into my MachineFunctionPass and got the easy assembly
implemented, however my more complicated asm is still struggling as I
am still seeing 0x0(%rbp) instead of (%gs) or errors.
Core question here being: how do I properly create BuildMI statements
for assembly dealing with offsets?
-------------------------------------------------------------------------------------------------
Assembly I want to translate:
mov   (%gs), %r14                  //get value off %GS base addresss
mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) + R14 ]
--------------------------------------------------------------------------------------------------
LLVM-MC -show inst gives:
movq    (%gs), %r14          # <MCInst #1810 MOV64rm
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:0>>
movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
                                        #  <MCOperand Reg:117>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:0>
                                        #  <MCOperand Reg:33>
                                        #  <MCOperand Reg:118>>
-------------------------------------------------------------------------------------------------------
I'll be honest and say I don't really know how to add the operands
properly to BuildMI. I figured out the following so far
MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
want), where instruction result goes)

this has .add(MachineOperand)
            .addReg(X86::a reg macro)
            .addIMM(a constant like 0x8)
            and a few more I dont think apply to me.

but I am not sure I must follow a specific order? I am assuming yes
and it has something to do with X86InstrInfo.td definitions, but not
sure.
--------------------------------------------------------------------------------------------------------
LLVM C++ code I tried to translate this to:
/* 1 mov   (%gs), %r14 */
    MachineInstrBuilder e1 =
BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
       .addReg(X86::GS);
/* 2 mov %r15, %gs:0x0(%r14) */
    MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
    MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
    MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
    MachineOperand disp = MachineOperand::CreateImm(0x0);

    BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
      .add(baseReg)
      .add(scaleAmt)
      .add(indexReg);

/* both instructions give the following error

clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::operator[](llvm::SmallVectorTemplateCommon<T,
<template-parameter-1-2> >::size_type) const [with T =
llvm::MCOperand; <template-parameter-1-2> = void;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::const_reference = const llvm::MCOperand&;
llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>::size_type = long unsigned int]: Assertion `idx < size()' failed.

I saw this function in the code base but not sure what it does
"addDirectMem(MachineInstructionBuilder_thing, register you want to
use);"


This is be the last bit of information I think I need to finish up
this implementation. Thanks again for your help!

Sincerely,

Chris Jelesnianski

On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <[hidden email]> wrote:
> The size suffix thing is a weird quirk in our assembler I should look into
> fixing. Instructions in at&t syntax usually have a size suffix that is often
> optional
>
> For example:
>   add %ax, %bx
> and
>   addw %ax, %bx
>
> Are equivalent because the register name indicates the size.
>
> but for an instruction like this
>   addw $1, (%ax)
>
> There is nothing to infer the size from so an explicit suffix is required.
>
> So for an instruction like "add %ax, %bx" from above, we try to guess the
> size suffix from the register. In your case, you used a segment register
> which we couldn't guess the size from. And then we printed a bad error
> message.
>
> There's no quick reference as such for the meaning of the various
> X86::XXXXXX names. But the complete list of them is in
> lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are meant
> to be fairly straight forward to understand. The first part of the name
> should almost always be the instruction name from the Intel/AMD manuals. The
> lower case letters at the end sort of convey operand types, but often not
> the number of operands even though it looks that way. The most common
> letters are 'r' for register, 'm' for memory and 'i' for immediate. Numbers
> after 'i' specify the size of the immediate if its important to distinguish
> from other sizes or different than the size of the instruction. The lower
> case letters are most useful to distinguish different instructions from each
> other. So for example, if two instructions only differ in the lower case
> letters and one says "rr" and one says "rm", the first is the register form
> and the second is the memory form of the same instruction.
>
> ~Craig
>
>
> On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <[hidden email]> wrote:
>>
>> Dear Craig,
>>
>> Thank you super much for the quick reply! Yea I'm still new to working
>> on the back-end and that sounds great. I already have the raw assembly
>> of what I want to accomplish so this is perfect. I just tried it and
>> yea, I will have to break down my assembly even further to more
>> simpler operations. You're right about my assembly dealing with
>> segment registers as I'm getting the following error:
>> "error: unknown use of instruction mnemonic without a size suffix"
>>
>> Just curious, what does it mean by size suffix??
>>
>> It's super cool to see the equivalent with "-show-inst"!!! Thank you
>> so much for this help!
>>
>> Last note, I know that the definitions (e.g. def SUB32ri) of the
>> various instructions can be found in the various ****.td, but is there
>> documentation where the meaning or quick reference of every
>> X86::XXXXXX llvm instruction macro can found, so I can quickly pick
>> and choose which actual macro I need to use, to "work forwards" rather
>> than working backwards by writing the assembly first and using llvm-mc
>> -show-inst  ??
>>
>> Thanks super much again.
>>
>> Sincerely,
>>
>> Chris Jelesnianski
>> Graduate Research Assistant
>> Virginia Tech
>>
>> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <[hidden email]>
>> wrote:
>> > More specifically there is no instruction that can add/subtract segment
>> > registers. They can only be updated my the mov segment register
>> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
>> >
>> > I suggest you write the text version of the assembly you want to
>> > generate
>> > and assemble it with llvm-mc. This will tell you if its even valid.
>> > After
>> > that you can use -show-inst to print the names of the instructions that
>> > X86
>> > uses that you can give to BuildMI.
>> >
>> > ~Craig
>> >
>> >
>> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]>
>> > wrote:
>> >>
>> >> The SUB32ri can't instruction can't operate on segment registers. It
>> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or 4
>> >> bits
>> >> of the register value make it into the binary encoding. Objdump just
>> >> extracts those 3 or 4 bits back out and prints one of the
>> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>> >>
>> >> ~Craig
>> >>
>> >>
>> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> >> <[hidden email]> wrote:
>> >>>
>> >>> Dear All,
>> >>>
>> >>> Currently I am trying to inject custom x86-64 assembly into a
>> >>> functions entry basic block. More specifically, I am trying to build
>> >>> assembly in a machine function pass from scratch.
>> >>>
>> >>> While the dumped machine function instruction info displays that %gs
>> >>> will be used, when I perform objdump -d on my executable I am see that
>> >>> %gs is replaced by %ebp? Why is this happening?
>> >>>
>> >>> I know it probably has something to do with me not specifying operands
>> >>> properly, but I cannot find enough documentation on this besides
>> >>> looking through code comments such as X86BaseInfo.cpp. I feel there
>> >>> isn't enough for me to be able to connect the dots.
>> >>>
>> >>> Below I have sample code: %gs holds a base address to a memory
>> >>> location where I am trying to store information. I am trying to update
>> >>> the %gs register pointer location before saving more values, etc.
>> >>>
>> >>> LLVM C++ codeMachine Function pass code:
>> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>> >>> TII->get(X86::SUB32ri),X86::GS)
>> >>>                     .addReg(X86::GS)
>> >>>                     .addImm(0x8);
>> >>>
>> >>> machine function pass dump:
>> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>> >>>
>> >>> Objdump -d assembly from executable
>> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>> >>>
>> >>>
>> >>> TLDR: I am trying to create custom assembly via BuildMI() and
>> >>> manipulate
>> >>> segment
>> >>> registers via a MachineFunctionPass.
>> >>>
>> >>> I have looked at LLVMs safestack implementation, but they are taking a
>> >>> fairly complicated hybrid approach between an IR Function pass with
>> >>> Backend support. I would like to stay as a single machinefunction
>> >>> pass.
>> >>>
>> >>> Believe me I would do this at the IR level if I didnt need to
>> >>> specifically use the segment registers.
>> >>>
>> >>> Thanks for the help in advance!
>> >>>
>> >>> Sincerely,
>> >>>
>> >>> Christopher Jelesnianski
>> >>> Graduate Research Assistant
>> >>> Virginia Tech
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
 Dear Craig,

Whoops, you're right. thats still what I theoretically want to do
though. I replaced it with the following:

movq %gs:0x0, %r14   - this doesn't get any complaints from gnu-as or llvm-mc

Got the following LLVM-MC -show-inst
movq    %gs:0, %r14             # <MCInst #1810 MOV64rm
                                        #  <MCOperand Reg:117>
       //Destination
                                        #  <MCOperand Reg:0>
        //Base Reg
                                        #  <MCOperand Imm:1>
       //Scale
                                        #  <MCOperand Reg:0>
        //Index Reg
                                        #  <MCOperand Imm:0>
       //Displacement
                                        #  <MCOperand Reg:33>>
      //Segment Reg

This looks better as 33 (%gs) is in the right spot now in the Segment
spot of the MCOperands instead of the BaseReg spot, according to
http://llvm.org/doxygen/X86BaseInfo_8h_source.html

The only weird behavior I could not figure out was for an XOR instruction
LLVM-MC -show-inst:
xorq    %r15, %r15              # <MCInst #15401 XOR64rr
                                        #  <MCOperand Reg:118>
                                        #  <MCOperand Reg:118>
                                        #  <MCOperand Reg:118>>

I had to use .addDef instead of .addReg
My C++ code:
    BuildMI(MBB,MBB.end(),DL,TII->get(X86::XOR64rr),X86::R15)
      .addDef(X86::R15)
      .addReg(X86::R15);

I the same error until I replaced the first instance of .addReg to
.addDef. Why do I need to use .addDEF here??

Nonetheless, my machineFunctionPass compiles now with no errors, with this edit.
I'm only taking the llvm-mc -show-inst information (from above) as a
cue to what my C++ code should look like. I got LLVM to compile my
BuildMI instructions finally. Thanks again for the help!
------------------------------------
Dear Matthias,

Thanks for the tip! Both of your responses helped in my debugging.

Sincerely,

Chris Jelesnianski

On Tue, Jun 26, 2018 at 4:37 PM, Matthias Braun <[hidden email]> wrote:

> BTW: If you work on the MI level, then I recommend to use a debug build of
> llvm and to pass -verify-machineinstrs to llc and it should catch you using
> registers that are not part of the instructions register classes.
>
> - Matthias
>
>
> On Jun 26, 2018, at 1:13 PM, Craig Topper via llvm-dev
> <[hidden email]> wrote:
>
> This shouldn't have parsed.
>
> movq    (%gs), %r14
>
> That's trying to use%gs as a base register which isn't valid. GNU assembler
> rejects it. And coincidentally llvm-mc started rejecting it on trunk late
> last week.  That's probably why it printed as %ebp.
>
> I don't know if there is an instruction to read the base of %gs directly.
> Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs.. But
> ussing %gs as part of the memory address for any other instruction is
> automatically relative to the base of %gs.
>
>
> ~Craig
>
>
> On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <[hidden email]> wrote:
>>
>> Dear Craig,
>>
>> Thanks for the help so far. I have rewritten my assembly to comply
>> with user-land not being able to directly modify the segment registers
>> %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
>> instruction + operands. Now I am working backwards to actually code
>> this assembly into my MachineFunctionPass and got the easy assembly
>> implemented, however my more complicated asm is still struggling as I
>> am still seeing 0x0(%rbp) instead of (%gs) or errors.
>> Core question here being: how do I properly create BuildMI statements
>> for assembly dealing with offsets?
>>
>> -------------------------------------------------------------------------------------------------
>> Assembly I want to translate:
>> mov   (%gs), %r14                  //get value off %GS base addresss
>> mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) +
>> R14 ]
>>
>> --------------------------------------------------------------------------------------------------
>> LLVM-MC -show inst gives:
>> movq    (%gs), %r14          # <MCInst #1810 MOV64rm
>>                                         #  <MCOperand Reg:117>
>>                                         #  <MCOperand Reg:33>
>>                                         #  <MCOperand Imm:1>
>>                                         #  <MCOperand Reg:0>
>>                                         #  <MCOperand Imm:0>
>>                                         #  <MCOperand Reg:0>>
>> movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
>>                                         #  <MCOperand Reg:117>
>>                                         #  <MCOperand Imm:1>
>>                                         #  <MCOperand Reg:0>
>>                                         #  <MCOperand Imm:0>
>>                                         #  <MCOperand Reg:33>
>>                                         #  <MCOperand Reg:118>>
>>
>> -------------------------------------------------------------------------------------------------------
>> I'll be honest and say I don't really know how to add the operands
>> properly to BuildMI. I figured out the following so far
>> MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
>> DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
>> want), where instruction result goes)
>>
>> this has .add(MachineOperand)
>>             .addReg(X86::a reg macro)
>>             .addIMM(a constant like 0x8)
>>             and a few more I dont think apply to me.
>>
>> but I am not sure I must follow a specific order? I am assuming yes
>> and it has something to do with X86InstrInfo.td definitions, but not
>> sure.
>>
>> --------------------------------------------------------------------------------------------------------
>> LLVM C++ code I tried to translate this to:
>> /* 1 mov   (%gs), %r14 */
>>     MachineInstrBuilder e1 =
>> BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
>>        .addReg(X86::GS);
>> /* 2 mov %r15, %gs:0x0(%r14) */
>>     MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
>>     MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
>>     MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
>>     MachineOperand disp = MachineOperand::CreateImm(0x0);
>>
>>     BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
>>       .add(baseReg)
>>       .add(scaleAmt)
>>       .add(indexReg);
>>
>> /* both instructions give the following error
>>
>> clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
>> T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::operator[](llvm::SmallVectorTemplateCommon<T,
>> <template-parameter-1-2> >::size_type) const [with T =
>> llvm::MCOperand; <template-parameter-1-2> = void;
>> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::const_reference = const llvm::MCOperand&;
>> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::size_type = long unsigned int]: Assertion `idx < size()' failed.
>>
>> I saw this function in the code base but not sure what it does
>> "addDirectMem(MachineInstructionBuilder_thing, register you want to
>> use);"
>>
>>
>> This is be the last bit of information I think I need to finish up
>> this implementation. Thanks again for your help!
>>
>> Sincerely,
>>
>> Chris Jelesnianski
>>
>> On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <[hidden email]>
>> wrote:
>> > The size suffix thing is a weird quirk in our assembler I should look
>> > into
>> > fixing. Instructions in at&t syntax usually have a size suffix that is
>> > often
>> > optional
>> >
>> > For example:
>> >   add %ax, %bx
>> > and
>> >   addw %ax, %bx
>> >
>> > Are equivalent because the register name indicates the size.
>> >
>> > but for an instruction like this
>> >   addw $1, (%ax)
>> >
>> > There is nothing to infer the size from so an explicit suffix is
>> > required.
>> >
>> > So for an instruction like "add %ax, %bx" from above, we try to guess
>> > the
>> > size suffix from the register. In your case, you used a segment register
>> > which we couldn't guess the size from. And then we printed a bad error
>> > message.
>> >
>> > There's no quick reference as such for the meaning of the various
>> > X86::XXXXXX names. But the complete list of them is in
>> > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are
>> > meant
>> > to be fairly straight forward to understand. The first part of the name
>> > should almost always be the instruction name from the Intel/AMD manuals.
>> > The
>> > lower case letters at the end sort of convey operand types, but often
>> > not
>> > the number of operands even though it looks that way. The most common
>> > letters are 'r' for register, 'm' for memory and 'i' for immediate.
>> > Numbers
>> > after 'i' specify the size of the immediate if its important to
>> > distinguish
>> > from other sizes or different than the size of the instruction. The
>> > lower
>> > case letters are most useful to distinguish different instructions from
>> > each
>> > other. So for example, if two instructions only differ in the lower case
>> > letters and one says "rr" and one says "rm", the first is the register
>> > form
>> > and the second is the memory form of the same instruction.
>> >
>> > ~Craig
>> >
>> >
>> > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <[hidden email]> wrote:
>> >>
>> >> Dear Craig,
>> >>
>> >> Thank you super much for the quick reply! Yea I'm still new to working
>> >> on the back-end and that sounds great. I already have the raw assembly
>> >> of what I want to accomplish so this is perfect. I just tried it and
>> >> yea, I will have to break down my assembly even further to more
>> >> simpler operations. You're right about my assembly dealing with
>> >> segment registers as I'm getting the following error:
>> >> "error: unknown use of instruction mnemonic without a size suffix"
>> >>
>> >> Just curious, what does it mean by size suffix??
>> >>
>> >> It's super cool to see the equivalent with "-show-inst"!!! Thank you
>> >> so much for this help!
>> >>
>> >> Last note, I know that the definitions (e.g. def SUB32ri) of the
>> >> various instructions can be found in the various ****.td, but is there
>> >> documentation where the meaning or quick reference of every
>> >> X86::XXXXXX llvm instruction macro can found, so I can quickly pick
>> >> and choose which actual macro I need to use, to "work forwards" rather
>> >> than working backwards by writing the assembly first and using llvm-mc
>> >> -show-inst  ??
>> >>
>> >> Thanks super much again.
>> >>
>> >> Sincerely,
>> >>
>> >> Chris Jelesnianski
>> >> Graduate Research Assistant
>> >> Virginia Tech
>> >>
>> >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <[hidden email]>
>> >> wrote:
>> >> > More specifically there is no instruction that can add/subtract
>> >> > segment
>> >> > registers. They can only be updated my the mov segment register
>> >> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
>> >> >
>> >> > I suggest you write the text version of the assembly you want to
>> >> > generate
>> >> > and assemble it with llvm-mc. This will tell you if its even valid.
>> >> > After
>> >> > that you can use -show-inst to print the names of the instructions
>> >> > that
>> >> > X86
>> >> > uses that you can give to BuildMI.
>> >> >
>> >> > ~Craig
>> >> >
>> >> >
>> >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> The SUB32ri can't instruction can't operate on segment registers. It
>> >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or
>> >> >> 4
>> >> >> bits
>> >> >> of the register value make it into the binary encoding. Objdump just
>> >> >> extracts those 3 or 4 bits back out and prints one of the
>> >> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>> >> >>
>> >> >> ~Craig
>> >> >>
>> >> >>
>> >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> >> >> <[hidden email]> wrote:
>> >> >>>
>> >> >>> Dear All,
>> >> >>>
>> >> >>> Currently I am trying to inject custom x86-64 assembly into a
>> >> >>> functions entry basic block. More specifically, I am trying to
>> >> >>> build
>> >> >>> assembly in a machine function pass from scratch.
>> >> >>>
>> >> >>> While the dumped machine function instruction info displays that
>> >> >>> %gs
>> >> >>> will be used, when I perform objdump -d on my executable I am see
>> >> >>> that
>> >> >>> %gs is replaced by %ebp? Why is this happening?
>> >> >>>
>> >> >>> I know it probably has something to do with me not specifying
>> >> >>> operands
>> >> >>> properly, but I cannot find enough documentation on this besides
>> >> >>> looking through code comments such as X86BaseInfo.cpp. I feel there
>> >> >>> isn't enough for me to be able to connect the dots.
>> >> >>>
>> >> >>> Below I have sample code: %gs holds a base address to a memory
>> >> >>> location where I am trying to store information. I am trying to
>> >> >>> update
>> >> >>> the %gs register pointer location before saving more values, etc.
>> >> >>>
>> >> >>> LLVM C++ codeMachine Function pass code:
>> >> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>> >> >>> TII->get(X86::SUB32ri),X86::GS)
>> >> >>>                     .addReg(X86::GS)
>> >> >>>                     .addImm(0x8);
>> >> >>>
>> >> >>> machine function pass dump:
>> >> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>> >> >>>
>> >> >>> Objdump -d assembly from executable
>> >> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>> >> >>>
>> >> >>>
>> >> >>> TLDR: I am trying to create custom assembly via BuildMI() and
>> >> >>> manipulate
>> >> >>> segment
>> >> >>> registers via a MachineFunctionPass.
>> >> >>>
>> >> >>> I have looked at LLVMs safestack implementation, but they are
>> >> >>> taking a
>> >> >>> fairly complicated hybrid approach between an IR Function pass with
>> >> >>> Backend support. I would like to stay as a single machinefunction
>> >> >>> pass.
>> >> >>>
>> >> >>> Believe me I would do this at the IR level if I didnt need to
>> >> >>> specifically use the segment registers.
>> >> >>>
>> >> >>> Thanks for the help in advance!
>> >> >>>
>> >> >>> Sincerely,
>> >> >>>
>> >> >>> Christopher Jelesnianski
>> >> >>> Graduate Research Assistant
>> >> >>> Virginia Tech
>> >> >>> _______________________________________________
>> >> >>> LLVM Developers mailing list
>> >> >>> [hidden email]
>> >> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] MachineFunction Instructions Pass using Segment Registers

Muhui Jiang via llvm-dev
It’s possible that BuildMI knows that the first input register has to be the same register as the destination and automatically did the addReg. That information is certainly known to LLVM, but I’d have to check the implementation. Unfortunately I’m not at a computer now to check.

On Tue, Jun 26, 2018 at 5:49 PM K Jelesnianski <[hidden email]> wrote:
 Dear Craig,

Whoops, you're right. thats still what I theoretically want to do
though. I replaced it with the following:

movq %gs:0x0, %r14   - this doesn't get any complaints from gnu-as or llvm-mc

Got the following LLVM-MC -show-inst
movq    %gs:0, %r14             # <MCInst #1810 MOV64rm
                                        #  <MCOperand Reg:117>
       //Destination
                                        #  <MCOperand Reg:0>
        //Base Reg
                                        #  <MCOperand Imm:1>
       //Scale
                                        #  <MCOperand Reg:0>
        //Index Reg
                                        #  <MCOperand Imm:0>
       //Displacement
                                        #  <MCOperand Reg:33>>
      //Segment Reg

This looks better as 33 (%gs) is in the right spot now in the Segment
spot of the MCOperands instead of the BaseReg spot, according to
http://llvm.org/doxygen/X86BaseInfo_8h_source.html

The only weird behavior I could not figure out was for an XOR instruction
LLVM-MC -show-inst:
xorq    %r15, %r15              # <MCInst #15401 XOR64rr
                                        #  <MCOperand Reg:118>
                                        #  <MCOperand Reg:118>
                                        #  <MCOperand Reg:118>>

I had to use .addDef instead of .addReg
My C++ code:
    BuildMI(MBB,MBB.end(),DL,TII->get(X86::XOR64rr),X86::R15)
      .addDef(X86::R15)
      .addReg(X86::R15);

I the same error until I replaced the first instance of .addReg to
.addDef. Why do I need to use .addDEF here??

Nonetheless, my machineFunctionPass compiles now with no errors, with this edit.
I'm only taking the llvm-mc -show-inst information (from above) as a
cue to what my C++ code should look like. I got LLVM to compile my
BuildMI instructions finally. Thanks again for the help!
------------------------------------
Dear Matthias,

Thanks for the tip! Both of your responses helped in my debugging.

Sincerely,

Chris Jelesnianski

On Tue, Jun 26, 2018 at 4:37 PM, Matthias Braun <[hidden email]> wrote:
> BTW: If you work on the MI level, then I recommend to use a debug build of
> llvm and to pass -verify-machineinstrs to llc and it should catch you using
> registers that are not part of the instructions register classes.
>
> - Matthias
>
>
> On Jun 26, 2018, at 1:13 PM, Craig Topper via llvm-dev
> <[hidden email]> wrote:
>
> This shouldn't have parsed.
>
> movq    (%gs), %r14
>
> That's trying to use%gs as a base register which isn't valid. GNU assembler
> rejects it. And coincidentally llvm-mc started rejecting it on trunk late
> last week.  That's probably why it printed as %ebp.
>
> I don't know if there is an instruction to read the base of %gs directly.
> Maybe rdgsbase, but that's only available on Ivy Bridge and later CPUs.. But
> ussing %gs as part of the memory address for any other instruction is
> automatically relative to the base of %gs.
>
>
> ~Craig
>
>
> On Tue, Jun 26, 2018 at 12:57 PM K Jelesnianski <[hidden email]> wrote:
>>
>> Dear Craig,
>>
>> Thanks for the help so far. I have rewritten my assembly to comply
>> with user-land not being able to directly modify the segment registers
>> %GS/%FS. I used llvm-mc with -show-inst to get the equivalent LLVM
>> instruction + operands. Now I am working backwards to actually code
>> this assembly into my MachineFunctionPass and got the easy assembly
>> implemented, however my more complicated asm is still struggling as I
>> am still seeing 0x0(%rbp) instead of (%gs) or errors.
>> Core question here being: how do I properly create BuildMI statements
>> for assembly dealing with offsets?
>>
>> -------------------------------------------------------------------------------------------------
>> Assembly I want to translate:
>> mov   (%gs), %r14                  //get value off %GS base addresss
>> mov %r15, %gs:0x0(%r14)     //put value in R15 into R14:(%GS)  [ (%GS) +
>> R14 ]
>>
>> --------------------------------------------------------------------------------------------------
>> LLVM-MC -show inst gives:
>> movq    (%gs), %r14          # <MCInst #1810 MOV64rm
>>                                         #  <MCOperand Reg:117>
>>                                         #  <MCOperand Reg:33>
>>                                         #  <MCOperand Imm:1>
>>                                         #  <MCOperand Reg:0>
>>                                         #  <MCOperand Imm:0>
>>                                         #  <MCOperand Reg:0>>
>> movq    %r15, %gs:(%r14)        # <MCInst #1803 MOV64mr
>>                                         #  <MCOperand Reg:117>
>>                                         #  <MCOperand Imm:1>
>>                                         #  <MCOperand Reg:0>
>>                                         #  <MCOperand Imm:0>
>>                                         #  <MCOperand Reg:33>
>>                                         #  <MCOperand Reg:118>>
>>
>> -------------------------------------------------------------------------------------------------------
>> I'll be honest and say I don't really know how to add the operands
>> properly to BuildMI. I figured out the following so far
>> MachineInstrBuilder thing = BuildMI(MachineBB, Position in MBB ,
>> DebugLoc(not sure what this accomplishes), TII->get( X86 instruction I
>> want), where instruction result goes)
>>
>> this has .add(MachineOperand)
>>             .addReg(X86::a reg macro)
>>             .addIMM(a constant like 0x8)
>>             and a few more I dont think apply to me.
>>
>> but I am not sure I must follow a specific order? I am assuming yes
>> and it has something to do with X86InstrInfo.td definitions, but not
>> sure.
>>
>> --------------------------------------------------------------------------------------------------------
>> LLVM C++ code I tried to translate this to:
>> /* 1 mov   (%gs), %r14 */
>>     MachineInstrBuilder e1 =
>> BuildMI(MBB,MBB.end(),DL,TII->get(X86::MOV64rm),X86::R14)
>>        .addReg(X86::GS);
>> /* 2 mov %r15, %gs:0x0(%r14) */
>>     MachineOperand baseReg = MachineOperand::CreateReg(X86::GS,false);
>>     MachineOperand scaleAmt = MachineOperand::CreateImm(0x1);
>>     MachineOperand indexReg = MachineOperand::CreateReg(X86::R14,false);
>>     MachineOperand disp = MachineOperand::CreateImm(0x0);
>>
>>     BuildMI(MBB, MBB.end(), DL, TII->get(X86::MOV64mr))
>>       .add(baseReg)
>>       .add(scaleAmt)
>>       .add(indexReg);
>>
>> /* both instructions give the following error
>>
>> clang-6.0: /LLVM6.0.0/llvm/include/llvm/ADT/SmallVector.h:154: const
>> T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::operator[](llvm::SmallVectorTemplateCommon<T,
>> <template-parameter-1-2> >::size_type) const [with T =
>> llvm::MCOperand; <template-parameter-1-2> = void;
>> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::const_reference = const llvm::MCOperand&;
>> llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2>
>> >::size_type = long unsigned int]: Assertion `idx < size()' failed.
>>
>> I saw this function in the code base but not sure what it does
>> "addDirectMem(MachineInstructionBuilder_thing, register you want to
>> use);"
>>
>>
>> This is be the last bit of information I think I need to finish up
>> this implementation. Thanks again for your help!
>>
>> Sincerely,
>>
>> Chris Jelesnianski
>>
>> On Sat, Jun 23, 2018 at 11:32 PM, Craig Topper <[hidden email]>
>> wrote:
>> > The size suffix thing is a weird quirk in our assembler I should look
>> > into
>> > fixing. Instructions in at&t syntax usually have a size suffix that is
>> > often
>> > optional
>> >
>> > For example:
>> >   add %ax, %bx
>> > and
>> >   addw %ax, %bx
>> >
>> > Are equivalent because the register name indicates the size.
>> >
>> > but for an instruction like this
>> >   addw $1, (%ax)
>> >
>> > There is nothing to infer the size from so an explicit suffix is
>> > required.
>> >
>> > So for an instruction like "add %ax, %bx" from above, we try to guess
>> > the
>> > size suffix from the register. In your case, you used a segment register
>> > which we couldn't guess the size from. And then we printed a bad error
>> > message.
>> >
>> > There's no quick reference as such for the meaning of the various
>> > X86::XXXXXX names. But the complete list of them is in
>> > lib/Target/X86/X86GenInstrInfo.inc in your build area. The names are
>> > meant
>> > to be fairly straight forward to understand. The first part of the name
>> > should almost always be the instruction name from the Intel/AMD manuals.
>> > The
>> > lower case letters at the end sort of convey operand types, but often
>> > not
>> > the number of operands even though it looks that way. The most common
>> > letters are 'r' for register, 'm' for memory and 'i' for immediate.
>> > Numbers
>> > after 'i' specify the size of the immediate if its important to
>> > distinguish
>> > from other sizes or different than the size of the instruction. The
>> > lower
>> > case letters are most useful to distinguish different instructions from
>> > each
>> > other. So for example, if two instructions only differ in the lower case
>> > letters and one says "rr" and one says "rm", the first is the register
>> > form
>> > and the second is the memory form of the same instruction.
>> >
>> > ~Craig
>> >
>> >
>> > On Sat, Jun 23, 2018 at 7:55 PM K Jelesnianski <[hidden email]> wrote:
>> >>
>> >> Dear Craig,
>> >>
>> >> Thank you super much for the quick reply! Yea I'm still new to working
>> >> on the back-end and that sounds great. I already have the raw assembly
>> >> of what I want to accomplish so this is perfect. I just tried it and
>> >> yea, I will have to break down my assembly even further to more
>> >> simpler operations. You're right about my assembly dealing with
>> >> segment registers as I'm getting the following error:
>> >> "error: unknown use of instruction mnemonic without a size suffix"
>> >>
>> >> Just curious, what does it mean by size suffix??
>> >>
>> >> It's super cool to see the equivalent with "-show-inst"!!! Thank you
>> >> so much for this help!
>> >>
>> >> Last note, I know that the definitions (e.g. def SUB32ri) of the
>> >> various instructions can be found in the various ****.td, but is there
>> >> documentation where the meaning or quick reference of every
>> >> X86::XXXXXX llvm instruction macro can found, so I can quickly pick
>> >> and choose which actual macro I need to use, to "work forwards" rather
>> >> than working backwards by writing the assembly first and using llvm-mc
>> >> -show-inst  ??
>> >>
>> >> Thanks super much again.
>> >>
>> >> Sincerely,
>> >>
>> >> Chris Jelesnianski
>> >> Graduate Research Assistant
>> >> Virginia Tech
>> >>
>> >> On Sat, Jun 23, 2018 at 8:45 PM, Craig Topper <[hidden email]>
>> >> wrote:
>> >> > More specifically there is no instruction that can add/subtract
>> >> > segment
>> >> > registers. They can only be updated my the mov segment register
>> >> > instructions, opcodes 0x8c and 0x8e in x86 assembly.
>> >> >
>> >> > I suggest you write the text version of the assembly you want to
>> >> > generate
>> >> > and assemble it with llvm-mc. This will tell you if its even valid.
>> >> > After
>> >> > that you can use -show-inst to print the names of the instructions
>> >> > that
>> >> > X86
>> >> > uses that you can give to BuildMI.
>> >> >
>> >> > ~Craig
>> >> >
>> >> >
>> >> > On Sat, Jun 23, 2018 at 5:36 PM Craig Topper <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> The SUB32ri can't instruction can't operate on segment registers. It
>> >> >> operates on EAX/EBX/EDX/ECX/EBP, etc. When it gets encoded only 3 or
>> >> >> 4
>> >> >> bits
>> >> >> of the register value make it into the binary encoding. Objdump just
>> >> >> extracts those 3 or 4 bits back out and prints one of the
>> >> >> EAX/EBX/EDX/ECX/EBP registers that those bits correspond to.
>> >> >>
>> >> >> ~Craig
>> >> >>
>> >> >>
>> >> >> On Sat, Jun 23, 2018 at 5:28 PM K Jelesnianski via llvm-dev
>> >> >> <[hidden email]> wrote:
>> >> >>>
>> >> >>> Dear All,
>> >> >>>
>> >> >>> Currently I am trying to inject custom x86-64 assembly into a
>> >> >>> functions entry basic block. More specifically, I am trying to
>> >> >>> build
>> >> >>> assembly in a machine function pass from scratch.
>> >> >>>
>> >> >>> While the dumped machine function instruction info displays that
>> >> >>> %gs
>> >> >>> will be used, when I perform objdump -d on my executable I am see
>> >> >>> that
>> >> >>> %gs is replaced by %ebp? Why is this happening?
>> >> >>>
>> >> >>> I know it probably has something to do with me not specifying
>> >> >>> operands
>> >> >>> properly, but I cannot find enough documentation on this besides
>> >> >>> looking through code comments such as X86BaseInfo.cpp. I feel there
>> >> >>> isn't enough for me to be able to connect the dots.
>> >> >>>
>> >> >>> Below I have sample code: %gs holds a base address to a memory
>> >> >>> location where I am trying to store information. I am trying to
>> >> >>> update
>> >> >>> the %gs register pointer location before saving more values, etc.
>> >> >>>
>> >> >>> LLVM C++ codeMachine Function pass code:
>> >> >>> MachineInstrBuilder sss = BuildMI(MBB, MBB.begin(), DL,
>> >> >>> TII->get(X86::SUB32ri),X86::GS)
>> >> >>>                     .addReg(X86::GS)
>> >> >>>                     .addImm(0x8);
>> >> >>>
>> >> >>> machine function pass dump:
>> >> >>>  %gs = SUB32ri %gs, 8, implicit-def %eflags
>> >> >>>
>> >> >>> Objdump -d assembly from executable
>> >> >>>   400510:   81 ed 04 00 00 00       sub    $0x8,%ebp
>> >> >>>
>> >> >>>
>> >> >>> TLDR: I am trying to create custom assembly via BuildMI() and
>> >> >>> manipulate
>> >> >>> segment
>> >> >>> registers via a MachineFunctionPass.
>> >> >>>
>> >> >>> I have looked at LLVMs safestack implementation, but they are
>> >> >>> taking a
>> >> >>> fairly complicated hybrid approach between an IR Function pass with
>> >> >>> Backend support. I would like to stay as a single machinefunction
>> >> >>> pass.
>> >> >>>
>> >> >>> Believe me I would do this at the IR level if I didnt need to
>> >> >>> specifically use the segment registers.
>> >> >>>
>> >> >>> Thanks for the help in advance!
>> >> >>>
>> >> >>> Sincerely,
>> >> >>>
>> >> >>> Christopher Jelesnianski
>> >> >>> Graduate Research Assistant
>> >> >>> Virginia Tech
>> >> >>> _______________________________________________
>> >> >>> LLVM Developers mailing list
>> >> >>> [hidden email]
>> >> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
--
~Craig

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev