[llvm-dev] Distinguish between ARM and Thumb

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Distinguish between ARM and Thumb

Robin Eklind via llvm-dev
Hi

Nowadays I am using LLVM to do ARM binary analysis. I was wondering is llvm available to provide some debugging information on the mode of ARM. 

For example, llvm-dwarfdump could dump some instructions information for debugging. Is it able to know the mode for each instruction?  Or we may write some llvm pass to help us to know the instruction mode? Any suggestions are welcomed. Many Thanks

Regards
Muhui

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Distinguish between ARM and Thumb

Robin Eklind via llvm-dev
Hello Muhui,

If you are disassembling a non-stripped ELF binary you can find out
the Arm/Thumb state by looking at the mapping symbols $t and $a,
alternatively each ELF symbol of type STT_FUNC will have bit 0 set to
0 for Arm state and bit 1 for Thumb state. Hence with the symbol table
you can reconstruct the state at each address by finding a symbol.
More information is available in ELF for the Arm Architecture [1].

If you have got a stripped binary without any symbolic information
then life gets a lot more difficult. There are some encoding rules [2]
that can help you find out whether a Thumb instruction is 2 or 4 bytes
long but in general you'll at least need to know whether you are
starting on an Arm or Thumb instruction and will need to trace control
flow instructions to track state changes and to avoid interpreting
literal data as instructions.

For the former I don't think you need to do much beyond reading the
symbol table. I don't think LLVM does passes to reconstruct binaries,
that logic would usually lie in a tool like objdump.

Hope this helps

Peter

[1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf
(search for mapping symbols)
[2] https://developer.arm.com/products/architecture/a-profile/docs/ddi0406/latest/arm-architecture-reference-manual-armv7-a-and-armv7-r-edition
(search for Thumb instruction encoding)

On 28 June 2018 at 13:32, Muhui Jiang via llvm-dev
<[hidden email]> wrote:

> Hi
>
> Nowadays I am using LLVM to do ARM binary analysis. I was wondering is llvm
> available to provide some debugging information on the mode of ARM.
>
> For example, llvm-dwarfdump could dump some instructions information for
> debugging. Is it able to know the mode for each instruction?  Or we may
> write some llvm pass to help us to know the instruction mode? Any
> suggestions are welcomed. Many Thanks
>
> Regards
> Muhui
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Distinguish between ARM and Thumb

Robin Eklind via llvm-dev
Hi Peter

Thank you so much for your detail and quick reply.

I think I have already known how to do it on non-stripped binary. 

Regards
Muhui

2018-06-28 9:07 GMT-04:00 Peter Smith <[hidden email]>:
Hello Muhui,

If you are disassembling a non-stripped ELF binary you can find out
the Arm/Thumb state by looking at the mapping symbols $t and $a,
alternatively each ELF symbol of type STT_FUNC will have bit 0 set to
0 for Arm state and bit 1 for Thumb state. Hence with the symbol table
you can reconstruct the state at each address by finding a symbol.
More information is available in ELF for the Arm Architecture [1].

If you have got a stripped binary without any symbolic information
then life gets a lot more difficult. There are some encoding rules [2]
that can help you find out whether a Thumb instruction is 2 or 4 bytes
long but in general you'll at least need to know whether you are
starting on an Arm or Thumb instruction and will need to trace control
flow instructions to track state changes and to avoid interpreting
literal data as instructions.

For the former I don't think you need to do much beyond reading the
symbol table. I don't think LLVM does passes to reconstruct binaries,
that logic would usually lie in a tool like objdump.

Hope this helps

Peter

[1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf
(search for mapping symbols)
[2] https://developer.arm.com/products/architecture/a-profile/docs/ddi0406/latest/arm-architecture-reference-manual-armv7-a-and-armv7-r-edition
(search for Thumb instruction encoding)

On 28 June 2018 at 13:32, Muhui Jiang via llvm-dev
<[hidden email]> wrote:
> Hi
>
> Nowadays I am using LLVM to do ARM binary analysis. I was wondering is llvm
> available to provide some debugging information on the mode of ARM.
>
> For example, llvm-dwarfdump could dump some instructions information for
> debugging. Is it able to know the mode for each instruction?  Or we may
> write some llvm pass to help us to know the instruction mode? Any
> suggestions are welcomed. Many Thanks
>
> Regards
> Muhui
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev