Disassembly arbitrary machine-code byte arrays

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Disassembly arbitrary machine-code byte arrays

Aidan Steele-2
Hi,

My apologies if this appears to be a very trivial question -- I have
tried to solve this on my own and I am stuck. Any assistance that
could be provided would be immensely appreciated.

What is the absolute bare minimum that I need to do to disassemble an
array of, say, ARM machine code bytes? Or an array of Thumb machine
code bytes? For example, I might have an array of unsigned chars --
how could I go about decoding these into MCInst objects? Does such a
decoding process take place in one fell swoop or do I parse the stream
one instruction at a time? Can I ask it to "decode the next 10 bytes"?
What follows is my (feeble) attempt at getting started. It probably
doesn't help that I am only familiar with C and Objective-C and find
C++ syntax absolutely bewildering.

Kind regards,
Aidan Steele

int main (int argc, const char *argv[])
{
 LLVMInitializeARMTargetInfo();
 LLVMInitializeARMTargetMC();
 LLVMInitializeARMAsmParser();
 LLVMInitializeARMDisassembler();

 const llvm::Target Target;

 llvm::OwningPtr<const llvm::MCSubtargetInfo>
STI(Target.createMCSubtargetInfo("", "", ""));
 llvm::OwningPtr<const llvm::MCDisassembler>
disassembler(Target.createMCDisassembler(*STI));

 llvm::OwningPtr<llvm::MemoryBuffer> Buffer;
 llvm::MemoryBuffer::getFile(llvm::StringRef("/path/to/file.bin"), Buffer);
 llvm::MCInst Inst;
 uint64_t Size = 0;

 disassembler->getInstruction(Inst, Size, *Buffer.take(), 0,
llvm::nulls(), llvm::nulls());

//  llvm::StringRef TheArchString("arm-apple-darwin");
//  std::string normalized = llvm::Triple::normalize(TheArchString);
//
//  llvm::Triple TheTriple;
//  TheTriple.setArch(llvm::Triple::arm);
//  TheTriple.setOS(llvm::Triple::Darwin);
//  TheTriple.setVendor(llvm::Triple::Apple);
//  llvm::Target *TheTarget = NULL;

 return 0;
}
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Disassembly arbitrary machine-code byte arrays

James Molloy-3
Hi Aiden,

The easiest thing I can do is to point you to the source of the "llvm-mc" tool, which does exactly what you ask in its "-disassemble" mode. The code is rather small, so it should be easy to work out.

tools/llvm-mc

Cheers,

James

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Aidan Steele
Sent: 19 December 2011 04:30
To: [hidden email]
Subject: [LLVMdev] Disassembly arbitrary machine-code byte arrays

Hi,

My apologies if this appears to be a very trivial question -- I have
tried to solve this on my own and I am stuck. Any assistance that
could be provided would be immensely appreciated.

What is the absolute bare minimum that I need to do to disassemble an
array of, say, ARM machine code bytes? Or an array of Thumb machine
code bytes? For example, I might have an array of unsigned chars --
how could I go about decoding these into MCInst objects? Does such a
decoding process take place in one fell swoop or do I parse the stream
one instruction at a time? Can I ask it to "decode the next 10 bytes"?
What follows is my (feeble) attempt at getting started. It probably
doesn't help that I am only familiar with C and Objective-C and find
C++ syntax absolutely bewildering.

Kind regards,
Aidan Steele

int main (int argc, const char *argv[])
{
 LLVMInitializeARMTargetInfo();
 LLVMInitializeARMTargetMC();
 LLVMInitializeARMAsmParser();
 LLVMInitializeARMDisassembler();

 const llvm::Target Target;

 llvm::OwningPtr<const llvm::MCSubtargetInfo>
STI(Target.createMCSubtargetInfo("", "", ""));
 llvm::OwningPtr<const llvm::MCDisassembler>
disassembler(Target.createMCDisassembler(*STI));

 llvm::OwningPtr<llvm::MemoryBuffer> Buffer;
 llvm::MemoryBuffer::getFile(llvm::StringRef("/path/to/file.bin"), Buffer);
 llvm::MCInst Inst;
 uint64_t Size = 0;

 disassembler->getInstruction(Inst, Size, *Buffer.take(), 0,
llvm::nulls(), llvm::nulls());

//  llvm::StringRef TheArchString("arm-apple-darwin");
//  std::string normalized = llvm::Triple::normalize(TheArchString);
//
//  llvm::Triple TheTriple;
//  TheTriple.setArch(llvm::Triple::arm);
//  TheTriple.setOS(llvm::Triple::Darwin);
//  TheTriple.setVendor(llvm::Triple::Apple);
//  llvm::Target *TheTarget = NULL;

 return 0;
}
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Disassembly arbitrary machine-code byte arrays

Kevin Enderby
Hi Aiden,

The 'C' based interface you could use in is llvm/include/llvm-c/Disassembler.h, which in there is:

/**
 * Disassemble a single instruction using the disassembler context specified in
 * the parameter DC.  The bytes of the instruction are specified in the
 * parameter Bytes, and contains at least BytesSize number of bytes.  The
 * instruction is at the address specified by the PC parameter.  If a valid
 * instruction can be disassembled, its string is returned indirectly in
 * OutString whose size is specified in the parameter OutStringSize.  This
 * function returns the number of bytes in the instruction or zero if there was
 * no valid instruction.
 */
size_t LLVMDisasmInstruction(LLVMDisasmContextRef DC, uint8_t *Bytes,
                             uint64_t BytesSize, uint64_t PC,
                             char *OutString, size_t OutStringSize);

This is used in darwin's otool(1) which is an objdump(1) like tool.  It ends up in the libLTO shared library.

Kev

On Dec 19, 2011, at 1:23 AM, James Molloy wrote:

> Hi Aiden,
>
> The easiest thing I can do is to point you to the source of the "llvm-mc" tool, which does exactly what you ask in its "-disassemble" mode. The code is rather small, so it should be easy to work out.
>
> tools/llvm-mc
>
> Cheers,
>
> James
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Aidan Steele
> Sent: 19 December 2011 04:30
> To: [hidden email]
> Subject: [LLVMdev] Disassembly arbitrary machine-code byte arrays
>
> Hi,
>
> My apologies if this appears to be a very trivial question -- I have
> tried to solve this on my own and I am stuck. Any assistance that
> could be provided would be immensely appreciated.
>
> What is the absolute bare minimum that I need to do to disassemble an
> array of, say, ARM machine code bytes? Or an array of Thumb machine
> code bytes? For example, I might have an array of unsigned chars --
> how could I go about decoding these into MCInst objects? Does such a
> decoding process take place in one fell swoop or do I parse the stream
> one instruction at a time? Can I ask it to "decode the next 10 bytes"?
> What follows is my (feeble) attempt at getting started. It probably
> doesn't help that I am only familiar with C and Objective-C and find
> C++ syntax absolutely bewildering.
>
> Kind regards,
> Aidan Steele
>
> int main (int argc, const char *argv[])
> {
> LLVMInitializeARMTargetInfo();
> LLVMInitializeARMTargetMC();
> LLVMInitializeARMAsmParser();
> LLVMInitializeARMDisassembler();
>
> const llvm::Target Target;
>
> llvm::OwningPtr<const llvm::MCSubtargetInfo>
> STI(Target.createMCSubtargetInfo("", "", ""));
> llvm::OwningPtr<const llvm::MCDisassembler>
> disassembler(Target.createMCDisassembler(*STI));
>
> llvm::OwningPtr<llvm::MemoryBuffer> Buffer;
> llvm::MemoryBuffer::getFile(llvm::StringRef("/path/to/file.bin"), Buffer);
> llvm::MCInst Inst;
> uint64_t Size = 0;
>
> disassembler->getInstruction(Inst, Size, *Buffer.take(), 0,
> llvm::nulls(), llvm::nulls());
>
> //  llvm::StringRef TheArchString("arm-apple-darwin");
> //  std::string normalized = llvm::Triple::normalize(TheArchString);
> //
> //  llvm::Triple TheTriple;
> //  TheTriple.setArch(llvm::Triple::arm);
> //  TheTriple.setOS(llvm::Triple::Darwin);
> //  TheTriple.setVendor(llvm::Triple::Apple);
> //  llvm::Target *TheTarget = NULL;
>
> return 0;
> }
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Disassembly arbitrary machine-code byte arrays

Aidan Steele-2
Hi Kev and James,

Thanks to both of you for responding. I had looked at the otool
release published for 10.7.2 (cctools-800), but it seems that it only
snuck in after that and by the cctools-809 release!

In any case, both that and llvm-mc should be more than adequate! A
follow-up question: is the C interface to LLVM a second-class citizen
or should I reasonably be able to expect to do everything with it that
I could do as a consumer of the C++ API?

Regards,
Aidan

On Tue, Dec 20, 2011 at 7:14 AM, Kevin Enderby <[hidden email]> wrote:

> Hi Aiden,
>
> The 'C' based interface you could use in is llvm/include/llvm-c/Disassembler.h, which in there is:
>
> /**
>  * Disassemble a single instruction using the disassembler context specified in
>  * the parameter DC.  The bytes of the instruction are specified in the
>  * parameter Bytes, and contains at least BytesSize number of bytes.  The
>  * instruction is at the address specified by the PC parameter.  If a valid
>  * instruction can be disassembled, its string is returned indirectly in
>  * OutString whose size is specified in the parameter OutStringSize.  This
>  * function returns the number of bytes in the instruction or zero if there was
>  * no valid instruction.
>  */
> size_t LLVMDisasmInstruction(LLVMDisasmContextRef DC, uint8_t *Bytes,
>                             uint64_t BytesSize, uint64_t PC,
>                             char *OutString, size_t OutStringSize);
>
> This is used in darwin's otool(1) which is an objdump(1) like tool.  It ends up in the libLTO shared library.
>
> Kev
>
> On Dec 19, 2011, at 1:23 AM, James Molloy wrote:
>
>> Hi Aiden,
>>
>> The easiest thing I can do is to point you to the source of the "llvm-mc" tool, which does exactly what you ask in its "-disassemble" mode. The code is rather small, so it should be easy to work out.
>>
>> tools/llvm-mc
>>
>> Cheers,
>>
>> James
>>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf Of Aidan Steele
>> Sent: 19 December 2011 04:30
>> To: [hidden email]
>> Subject: [LLVMdev] Disassembly arbitrary machine-code byte arrays
>>
>> Hi,
>>
>> My apologies if this appears to be a very trivial question -- I have
>> tried to solve this on my own and I am stuck. Any assistance that
>> could be provided would be immensely appreciated.
>>
>> What is the absolute bare minimum that I need to do to disassemble an
>> array of, say, ARM machine code bytes? Or an array of Thumb machine
>> code bytes? For example, I might have an array of unsigned chars --
>> how could I go about decoding these into MCInst objects? Does such a
>> decoding process take place in one fell swoop or do I parse the stream
>> one instruction at a time? Can I ask it to "decode the next 10 bytes"?
>> What follows is my (feeble) attempt at getting started. It probably
>> doesn't help that I am only familiar with C and Objective-C and find
>> C++ syntax absolutely bewildering.
>>
>> Kind regards,
>> Aidan Steele
>>
>> int main (int argc, const char *argv[])
>> {
>> LLVMInitializeARMTargetInfo();
>> LLVMInitializeARMTargetMC();
>> LLVMInitializeARMAsmParser();
>> LLVMInitializeARMDisassembler();
>>
>> const llvm::Target Target;
>>
>> llvm::OwningPtr<const llvm::MCSubtargetInfo>
>> STI(Target.createMCSubtargetInfo("", "", ""));
>> llvm::OwningPtr<const llvm::MCDisassembler>
>> disassembler(Target.createMCDisassembler(*STI));
>>
>> llvm::OwningPtr<llvm::MemoryBuffer> Buffer;
>> llvm::MemoryBuffer::getFile(llvm::StringRef("/path/to/file.bin"), Buffer);
>> llvm::MCInst Inst;
>> uint64_t Size = 0;
>>
>> disassembler->getInstruction(Inst, Size, *Buffer.take(), 0,
>> llvm::nulls(), llvm::nulls());
>>
>> //  llvm::StringRef TheArchString("arm-apple-darwin");
>> //  std::string normalized = llvm::Triple::normalize(TheArchString);
>> //
>> //  llvm::Triple TheTriple;
>> //  TheTriple.setArch(llvm::Triple::arm);
>> //  TheTriple.setOS(llvm::Triple::Darwin);
>> //  TheTriple.setVendor(llvm::Triple::Apple);
>> //  llvm::Target *TheTarget = NULL;
>>
>> return 0;
>> }
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Disassembly arbitrary machine-code byte arrays

Chris Lattner-2

On Dec 19, 2011, at 4:24 PM, Aidan Steele wrote:

>
> In any case, both that and llvm-mc should be more than adequate! A
> follow-up question: is the C interface to LLVM a second-class citizen
> or should I reasonably be able to expect to do everything with it that
> I could do as a consumer of the C++ API?

It is a second class citizen in some ways: you can't do everything with the C API that you can do with the C++ API.  On the other hand, the C API is stable (we don't change the API) where the C++ API changes all the time.

-Chris
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Disassembly arbitrary machine-code byte arrays

Tom Prince
In reply to this post by Aidan Steele-2
On Tue, 20 Dec 2011 11:24:27 +1100, Aidan Steele <[hidden email]> wrote:
> A follow-up question: is the C interface to LLVM a second-class
> citizen or should I reasonably be able to expect to do everything with
> it that I could do as a consumer of the C++ API?

The other thing to note about the C api, is that things are added to it
mostly on an as needed basis. So if something is missing, you can ask it
got it added, which I guess should happen, unless it might be difficult
so support long term (i.e. it exposes something unstable).

  Tom
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev