.debug_info section size in arm executable

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

.debug_info section size in arm executable

Seung-yeon Choe
Dear all

I'd wonder why executable's debug information size is bigger than gcc dwarf debug info.

When I compiled ARM hello world code with -g option, the arm based assembly and executable files are bigger than gcc results significiantly.(apx. 3x)
(source file size was only 3KB)

I think that both clang and gcc use same dwarf format ver2 for debugging. But I don't know why the clang executable file size is bigger than gcc executable.

Could anyone explain about that?



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Nick Lewycky
Seung-yeon Choe wrote:

> Dear all
>
> I'd wonder why executable's debug information size is bigger than gcc dwarf debug info.
>
> When I compiled ARM hello world code with -g option, the arm based assembly and executable files are bigger than gcc results significiantly.(apx. 3x)
> (source file size was only 3KB)
>
> I think that both clang and gcc use same dwarf format ver2 for debugging. But I don't know why the clang executable file size is bigger than gcc executable.
>
> Could anyone explain about that?

For one example, for enums clang will emit the names for all of the enum
cases. GCC only emits the ones that are used.

If you're on linux, I suggest running "readelf -w file.o" to examine
what is actually in the debug info in your files. Figuring out how to
shrink our debug info size is something we're actively working on.

A couple bugs I'm looking at right now are llvm.org/PR11323 and
llvm.org/PR11345 .

Nick
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Chris Lattner-2

On Nov 9, 2011, at 2:58 AM, Nick Lewycky wrote:


For one example, for enums clang will emit the names for all of the enum 
cases. GCC only emits the ones that are used.

Is this a clang bug, or a feature?

-Chris

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Chandler Carruth-2
On Wed, Nov 9, 2011 at 10:35 AM, Chris Lattner <[hidden email]> wrote:

On Nov 9, 2011, at 2:58 AM, Nick Lewycky wrote:


For one example, for enums clang will emit the names for all of the enum 
cases. GCC only emits the ones that are used.

Is this a clang bug, or a feature?

I've argued for enums, it's a feature.

Nick is tracking other cases that are much more likely bugs. 

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Eric Christopher-2
In reply to this post by Chris Lattner-2

On Nov 9, 2011, at 10:35 AM, Chris Lattner wrote:


On Nov 9, 2011, at 2:58 AM, Nick Lewycky wrote:


For one example, for enums clang will emit the names for all of the enum 
cases. GCC only emits the ones that are used.

Is this a clang bug, or a feature?

IMO this is probably a feature. Thinking about it like this:

int foo(enum Bar x) {
switch(x)
          ...
}

int baz (int a)
{
   foo(a);
}

It's not good, but people do it. Also constructing enums via & and | etc. It'd be nice to be able to get the name of whatever it is that the code generator actually produced :)

-eric

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Jim Grosbach

On Nov 9, 2011, at 10:43 AM, Eric Christopher wrote:

>
> On Nov 9, 2011, at 10:35 AM, Chris Lattner wrote:
>
>>
>> On Nov 9, 2011, at 2:58 AM, Nick Lewycky wrote:
>>
>>>
>>> For one example, for enums clang will emit the names for all of the enum
>>> cases. GCC only emits the ones that are used.
>>
>> Is this a clang bug, or a feature?
>
> IMO this is probably a feature. Thinking about it like this:
>
> int foo(enum Bar x) {
> switch(x)
>           ...
> }
>
> int baz (int a)
> {
>    foo(a);
> }
>
> It's not good, but people do it. Also constructing enums via & and | etc. It'd be nice to be able to get the name of whatever it is that the code generator actually produced :)
>

Agreed. LLVM itself does this sort of thing pretty frequently, actually, and having the enum names and values available in the debugger is very nice.

-Jim
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Devang Patel

On Nov 9, 2011, at 10:49 AM, Jim Grosbach wrote:

>
> On Nov 9, 2011, at 10:43 AM, Eric Christopher wrote:
>
>>
>> On Nov 9, 2011, at 10:35 AM, Chris Lattner wrote:
>>
>>>
>>> On Nov 9, 2011, at 2:58 AM, Nick Lewycky wrote:
>>>
>>>>
>>>> For one example, for enums clang will emit the names for all of the enum
>>>> cases. GCC only emits the ones that are used.
>>>
>>> Is this a clang bug, or a feature?
>>
>> IMO this is probably a feature. Thinking about it like this:
>>
>> int foo(enum Bar x) {
>> switch(x)
>>          ...
>> }
>>
>> int baz (int a)
>> {
>>   foo(a);
>> }
>>
>> It's not good, but people do it. Also constructing enums via & and | etc. It'd be nice to be able to get the name of whatever it is that the code generator actually produced :)
>>
>
> Agreed. LLVM itself does this sort of thing pretty frequently, actually, and having the enum names and values available in the debugger is very nice.


+1

In addition, what's the expected size reduction in such cases to offset this inconvenience? Besides, is this an intentional gcc feature or accidental gcc behavior ?

-
Devang

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Chris Lattner-2
In reply to this post by Jim Grosbach

On Nov 9, 2011, at 10:49 AM, Jim Grosbach wrote:
>>
>> It's not good, but people do it. Also constructing enums via & and | etc. It'd be nice to be able to get the name of whatever it is that the code generator actually produced :)
>>
>
> Agreed. LLVM itself does this sort of thing pretty frequently, actually, and having the enum names and values available in the debugger is very nice.

Wouldn't it be better to emit the enumerators that are actually used in a translation unit, instead of emitting all of them?  If we're not emitting them when used, that would be a problem, but I don't see a reason to care about enumerators that are never used anywhere in a program.

-Chris
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Jim Grosbach

On Nov 9, 2011, at 12:52 PM, Chris Lattner wrote:

>
> On Nov 9, 2011, at 10:49 AM, Jim Grosbach wrote:
>>>
>>> It's not good, but people do it. Also constructing enums via & and | etc. It'd be nice to be able to get the name of whatever it is that the code generator actually produced :)
>>>
>>
>> Agreed. LLVM itself does this sort of thing pretty frequently, actually, and having the enum names and values available in the debugger is very nice.
>
> Wouldn't it be better to emit the enumerators that are actually used in a translation unit, instead of emitting all of them?  If we're not emitting them when used, that would be a problem, but I don't see a reason to care about enumerators that are never used anywhere in a program.


Fleshing out Eric's example a bit:
enum bar {A, B, C, D, E, F, ...};

void foo(enum bar a) {
switch(a) {
default:
  // stuff
case B:
case C:
  // other stuff
}
// yet more stuff
}

void bar(int b) {
  //...
  foo(b);
  //...
}

Say something is going wrong for value E of the enum because the default case doesn't work for it. I want to put a conditional breakpoint on the switch statement for 'a == E', but E isn't explicitly used anywhere in the program since the call to foo passes the value as an int, constructed in some arbitrary manner.

Also consider the case where the caller of foo() (which does explicitly use E) is in a dylib we haven't loaded yet, and thus haven't looked at the debug info for. I want to set that same breakpoint.

Without the full range of enum values in the debug info for the translation unit in which foo resides, I can't reference the values.

Now, I can definitely see an argument for omitting the information (probably the entire type) if the enum itself is never referenced in the translation unit. I believe that's the current behaviour already(?)

-Jim
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Chris Lattner-2
On Nov 9, 2011, at 1:08 PM, Jim Grosbach wrote:
>> On Nov 9, 2011, at 10:49 AM, Jim Grosbach wrote:
>>>>
>>>> It's not good, but people do it. Also constructing enums via & and | etc. It'd be nice to be able to get the name of whatever it is that the code generator actually produced :)
>>>>
>>>
>>> Agreed. LLVM itself does this sort of thing pretty frequently, actually, and having the enum names and values available in the debugger is very nice.
>>
>> Wouldn't it be better to emit the enumerators that are actually used in a translation unit, instead of emitting all of them?  If we're not emitting them when used, that would be a problem, but I don't see a reason to care about enumerators that are never used anywhere in a program.

> Say something is going wrong for value E of the enum because the default case doesn't work for it. I want to put a conditional breakpoint on the switch statement for 'a == E', but E isn't explicitly used anywhere in the program since the call to foo passes the value as an int, constructed in some arbitrary manner.

So long as E is used in another file, you'll be able to do this.  If E isn't used anywhere in your app, then it's dead and doesn't seem interesting.

> Also consider the case where the caller of foo() (which does explicitly use E) is in a dylib we haven't loaded yet, and thus haven't looked at the debug info for. I want to set that same breakpoint.

-flimit-debug-info=0.  The entire idea of -gused is that you're building (almost) all of your app with debug info.  It's the default because this matches the common scenario.

In short, I still don't understand why we'd want to emit all the enumerators just because an enum type is used. :)

-Chris

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: .debug_info section size in arm executable

Eric Christopher-2

On Nov 9, 2011, at 2:12 PM, Chris Lattner wrote:

> On Nov 9, 2011, at 1:08 PM, Jim Grosbach wrote:
>>> On Nov 9, 2011, at 10:49 AM, Jim Grosbach wrote:
>>>>>
>>>>> It's not good, but people do it. Also constructing enums via & and | etc. It'd be nice to be able to get the name of whatever it is that the code generator actually produced :)
>>>>>
>>>>
>>>> Agreed. LLVM itself does this sort of thing pretty frequently, actually, and having the enum names and values available in the debugger is very nice.
>>>
>>> Wouldn't it be better to emit the enumerators that are actually used in a translation unit, instead of emitting all of them?  If we're not emitting them when used, that would be a problem, but I don't see a reason to care about enumerators that are never used anywhere in a program.
>
>> Say something is going wrong for value E of the enum because the default case doesn't work for it. I want to put a conditional breakpoint on the switch statement for 'a == E', but E isn't explicitly used anywhere in the program since the call to foo passes the value as an int, constructed in some arbitrary manner.
>
> So long as E is used in another file, you'll be able to do this.  If E isn't used anywhere in your app, then it's dead and doesn't seem interesting.
>

In this case though it's not dead as long as there's a switch on the enum with a default case :) i.e. "We should never get here, but yet we are, what type that I'm not expecting did we get here with?"

>> Also consider the case where the caller of foo() (which does explicitly use E) is in a dylib we haven't loaded yet, and thus haven't looked at the debug info for. I want to set that same breakpoint.
>
> -flimit-debug-info=0.  The entire idea of -gused is that you're building (almost) all of your app with debug info.  It's the default because this matches the common scenario.
>
> In short, I still don't understand why we'd want to emit all the enumerators just because an enum type is used. :)

The above :)

That said if we want to conditionalize this on flimit-debug-info that's fine, however, I think that since the type is used all elements of the type are used. Also, I doubt that (except for pathological cases) enum switch values are our greatest worry for size of the string table :)

-eric
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

SU aliasing

Sergei Larin
Sorry if it is described somewhere, but is there any hidden danger in doing
the following during DAG lowering:

The problem is that at some point I seem to get stale data from
(MMOa->getValue()) which produces bogus alias result.
...and it seems that there should be a more elegant way to implement the
function...

Thanks.

bool SUsDoNotAliasEachOther (SUnit *SU1, SUnit *SU2, AliasAnalysis *AA) {
  if (!SU1 || !SU2)  return false;
  if (SU1->isInstr() || SU2->isInstr())  return false;
  SDNode *SDN1 = SU1->getNode();
  SDNode *SDN2 = SU2->getNode();

  if (!SDN1 || !SDN2)  return false;

  if (SDN1->isMachineOpcode() && SDN2->isMachineOpcode()) {
    MachineMemOperand *MMOa = NULL;
    MachineMemOperand *MMOb = NULL;
    const MachineSDNode *MNb = dyn_cast<MachineSDNode>(SDN2);
    const MachineSDNode *MNa = dyn_cast<MachineSDNode>(SDN1);
    // Check that they are LD/ST
    if (MNa) MMOa = !MNa->memoperands_empty() ?
                    (*MNa->memoperands_begin()) : NULL;
    if (MNb) MMOb = !MNb->memoperands_empty() ?
                    (*MNb->memoperands_begin()) : NULL;
    if (MMOa && MMOa->getValue() && MMOb && MMOb->getValue()) {

      if (!AA->alias (MMOa->getValue(), MMOa->getSize(),
                      MMOb->getValue(), MMOb->getSize())) {
        return true;
      }
    }
  }
  return false;
}

Sergei

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev