X86 disassembler is quite broken on handling REX

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

X86 disassembler is quite broken on handling REX

Jun Koi
hi,

i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code.

below are some examples:

$ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
<stdin>:1:1: warning: invalid instruction encoding
0x41,0x0f,0xeb,0xc3
^


the last example should also return "por %mm3, %mm0", but it fails to understand the input.

the reason stays with this line in X86DisassemblerDecoder.cpp:

    rm  |= bFromREX(insn->rexPrefix) << 3;

we can see that we take into account REX.B, but for "por" (0F EB), this should be ignored.

there are quite a lot of other instructions taking into account REX like this, while according to the manual, REX should be ignored.

i dont see any clean solution for this issue without some significant changes into the way we decode ModRM & providing more information to .td files.

any idea?

thanks.
Jun

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: X86 disassembler is quite broken on handling REX

Craig Topper
I believe this particular error is caused by this. That seems easy enough to just drop the bit. Do you have other non-mmx examples?

    case TYPE_MM:                                         \
      if (index > 7)                                      \
        *valid = 0;                                       \
      return prefix##_MM0 + index;

On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <[hidden email]> wrote:
hi,

i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code.

below are some examples:

$ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
<stdin>:1:1: warning: invalid instruction encoding
0x41,0x0f,0xeb,0xc3
^


the last example should also return "por %mm3, %mm0", but it fails to understand the input.

the reason stays with this line in X86DisassemblerDecoder.cpp:

    rm  |= bFromREX(insn->rexPrefix) << 3;

we can see that we take into account REX.B, but for "por" (0F EB), this should be ignored.

there are quite a lot of other instructions taking into account REX like this, while according to the manual, REX should be ignored.

i dont see any clean solution for this issue without some significant changes into the way we decode ModRM & providing more information to .td files.

any idea?

thanks.
Jun


--
~Craig

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: X86 disassembler is quite broken on handling REX

Jun Koi


On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <[hidden email]> wrote:
I believe this particular error is caused by this. That seems easy enough to just drop the bit. Do you have other non-mmx examples?

    case TYPE_MM:                                         \
      if (index > 7)                                      \
        *valid = 0;                                       \
      return prefix##_MM0 + index;

yes, exactly this place. but the question is: how do we know when to drop the REX.B?


i dont know any non-MMX examples. it seems only MMX related instructions have this issue.

thanks,
Jun


 

On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <[hidden email]> wrote:
hi,

i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code.

below are some examples:

$ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
<stdin>:1:1: warning: invalid instruction encoding
0x41,0x0f,0xeb,0xc3
^


the last example should also return "por %mm3, %mm0", but it fails to understand the input.

the reason stays with this line in X86DisassemblerDecoder.cpp:

    rm  |= bFromREX(insn->rexPrefix) << 3;

we can see that we take into account REX.B, but for "por" (0F EB), this should be ignored.

there are quite a lot of other instructions taking into account REX like this, while according to the manual, REX should be ignored.

i dont see any clean solution for this issue without some significant changes into the way we decode ModRM & providing more information to .td files.

any idea?

thanks.
Jun


--
~Craig


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: X86 disassembler is quite broken on handling REX

Craig Topper
Wouldn't changing

    case TYPE_MM:                                         \
      if (index > 7)                                      \
        *valid = 0;                                       \
      return prefix##_MM0 + index;


to

    case TYPE_MM:                                         \
      return prefix##_MM0 + (index & 0x7);


Fix the issue for both rex.b and rex.r?

On Tue, Dec 23, 2014 at 10:54 PM, Jun Koi <[hidden email]> wrote:


On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <[hidden email]> wrote:
I believe this particular error is caused by this. That seems easy enough to just drop the bit. Do you have other non-mmx examples?

    case TYPE_MM:                                         \
      if (index > 7)                                      \
        *valid = 0;                                       \
      return prefix##_MM0 + index;

yes, exactly this place. but the question is: how do we know when to drop the REX.B?


i dont know any non-MMX examples. it seems only MMX related instructions have this issue.

thanks,
Jun


 

On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <[hidden email]> wrote:
hi,

i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code.

below are some examples:

$ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
<stdin>:1:1: warning: invalid instruction encoding
0x41,0x0f,0xeb,0xc3
^


the last example should also return "por %mm3, %mm0", but it fails to understand the input.

the reason stays with this line in X86DisassemblerDecoder.cpp:

    rm  |= bFromREX(insn->rexPrefix) << 3;

we can see that we take into account REX.B, but for "por" (0F EB), this should be ignored.

there are quite a lot of other instructions taking into account REX like this, while according to the manual, REX should be ignored.

i dont see any clean solution for this issue without some significant changes into the way we decode ModRM & providing more information to .td files.

any idea?

thanks.
Jun


--
~Craig



--
~Craig

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: X86 disassembler is quite broken on handling REX

Jun Koi


On Wed, Dec 24, 2014 at 2:59 PM, Craig Topper <[hidden email]> wrote:
Wouldn't changing

    case TYPE_MM:                                         \
      if (index > 7)                                      \
        *valid = 0;                                       \
      return prefix##_MM0 + index;


to

    case TYPE_MM:                                         \
      return prefix##_MM0 + (index & 0x7);


Fix the issue for both rex.b and rex.r?

this sounds OK. but there is no more check (index > 7)? is there any case that ca be the issue?

thanks,
Jun


 

On Tue, Dec 23, 2014 at 10:54 PM, Jun Koi <[hidden email]> wrote:


On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <[hidden email]> wrote:
I believe this particular error is caused by this. That seems easy enough to just drop the bit. Do you have other non-mmx examples?

    case TYPE_MM:                                         \
      if (index > 7)                                      \
        *valid = 0;                                       \
      return prefix##_MM0 + index;

yes, exactly this place. but the question is: how do we know when to drop the REX.B?


i dont know any non-MMX examples. it seems only MMX related instructions have this issue.

thanks,
Jun


 

On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <[hidden email]> wrote:
hi,

i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code.

below are some examples:

$ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
    por    %mm3, %mm0

$ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
    .text
<stdin>:1:1: warning: invalid instruction encoding
0x41,0x0f,0xeb,0xc3
^


the last example should also return "por %mm3, %mm0", but it fails to understand the input.

the reason stays with this line in X86DisassemblerDecoder.cpp:

    rm  |= bFromREX(insn->rexPrefix) << 3;

we can see that we take into account REX.B, but for "por" (0F EB), this should be ignored.

there are quite a lot of other instructions taking into account REX like this, while according to the manual, REX should be ignored.

i dont see any clean solution for this issue without some significant changes into the way we decode ModRM & providing more information to .td files.

any idea?

thanks.
Jun


--
~Craig



--
~Craig


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev