[llvm-dev] DW_OP_implicit_pointer design/implementation in general

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
Hey folks,

Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.

The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable? 

Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.

- David

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev


> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to. In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.


-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev


On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:


> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.

I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).
 
In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)
 


-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev


On Thu, Nov 14, 2019 at 1:33 PM David Blaikie <[hidden email]> wrote:


On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:


> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.

I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).

Hey, what'd you know, GCC actually produces what I described:

  DW_TAG_array_type [6] *
    DW_AT_type [DW_FORM_ref4]       (cu + 0x0069 => {0x00000069} "int")
    DW_AT_sibling [DW_FORM_ref4]    (cu + 0x0083 => {0x00000083})

    DW_TAG_subrange_type [7]  
      DW_AT_type [DW_FORM_ref4]     (cu + 0x0083 => {0x00000083} "long unsigned int")
      DW_AT_upper_bound [DW_FORM_exprloc]   (DW_OP_fbreg -40, DW_OP_deref)


No artificial variable the way Clang does it:

  DW_TAG_subprogram
    DW_TAG_variable [4]  
                  DW_AT_location [DW_FORM_exprloc]      (DW_OP_fbreg -24)
                  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x000000a6] = "__vla_expr0")
                  DW_AT_type [DW_FORM_ref4]     (cu + 0x0074 => {0x00000074} "long unsigned int")
                  DW_AT_artificial [DW_FORM_flag_present]       (true)

    DW_TAG_variable
      ...
      DW_AT_type [DW_FORM_ref4]     (cu + 0x007b => {0x0000007b} "int[]")

  DW_TAG_array_type [7] *
    DW_AT_type [DW_FORM_ref4]       (cu + 0x006d => {0x0000006d} "int")

  DW_TAG_subrange_type [8]  
    DW_AT_type [DW_FORM_ref4]     (cu + 0x008b => {0x0000008b} "__ARRAY_SIZE_TYPE__")
    DW_AT_lower_bound [DW_FORM_data1]     (0x00)
    DW_AT_count [DW_FORM_ref4]    (cu + 0x0051 => {0x00000051})

 
Might be nice to tidy that up some time. GCC's been doing this even in DWARF-2 mode (where it just uses FORM_black for the upper bound, and as far back as GCC 6.0 at least.

 
In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)
 


-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
In reply to this post by Doerfert, Johannes via llvm-dev

My reading of the DWARF issue is that it was fairly specifically designed to handle the case of a function taking parameters by pointer/reference, which is then inlined, and the caller is passing local objects rather than other pointers/references.  So:

 

void inline_me(foo *ptr) {

 does something with ptr->x or *ptr;

}

void caller() {

  foo actual_obj;

  inline_me(&actual_obj);

}

 

After inlining, maintaining a pointer to actual_obj might be sub-optimal, but after a “step in” to inline_me, the user wants to look at an expression spelled *ptr even though the actual_obj might not have a memory address (because fields are SROA’d into registers, or whatever).  This is where DW_OP_implicit_pointer saves the day; *ptr and ptr->x are still evaluatable expressions, which expressions are secretly indirecting through the DIE for actual_obj.

 

I think it is not widely applicable outside of that kind of scenario.

--paulr

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 4:34 PM
To: Adrian Prantl <[hidden email]>
Cc: [hidden email]; Robinson, Paul <[hidden email]>; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:



> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.


I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).
 

In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

 

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)

 



-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev


On Thu, Nov 14, 2019 at 1:53 PM Robinson, Paul <[hidden email]> wrote:

My reading of the DWARF issue is that it was fairly specifically designed to handle the case of a function taking parameters by pointer/reference, which is then inlined, and the caller is passing local objects rather than other pointers/references.  So:

 

void inline_me(foo *ptr) {

 does something with ptr->x or *ptr;

}

void caller() {

  foo actual_obj;

  inline_me(&actual_obj);

}

 

After inlining, maintaining a pointer to actual_obj might be sub-optimal, but after a “step in” to inline_me, the user wants to look at an expression spelled *ptr even though the actual_obj might not have a memory address (because fields are SROA’d into registers, or whatever).  This is where DW_OP_implicit_pointer saves the day; *ptr and ptr->x are still evaluatable expressions, which expressions are secretly indirecting through the DIE for actual_obj.

 

I think it is not widely applicable outside of that kind of scenario.


Any ideas why it wouldn't be more general to handle cases where the variable isn't named? Such as:

foo source();
void f(foo);
inline void sink(foo* p) {
  f(*p);
}
int main() {
  sink(&source());
}
 

--paulr

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 4:34 PM
To: Adrian Prantl <[hidden email]>
Cc: [hidden email]; Robinson, Paul <[hidden email]>; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:



> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.


I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).
 

In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

 

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)

 



-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev

| Any ideas why it wouldn't be more general to handle cases where the variable isn't named?

 

Couldn’t there be a DIE (flagged as artificial) to describe the return-value temp?  You’d need such a DIE if you wanted the debugger to be able to look at the return value from source() anyway, in the context of main() and in the absence of inlining.  And given that DIE, implicit_pointer within sink() can refer to it.

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 5:32 PM
To: Robinson, Paul <[hidden email]>
Cc: Adrian Prantl <[hidden email]>; [hidden email]; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:53 PM Robinson, Paul <[hidden email]> wrote:

My reading of the DWARF issue is that it was fairly specifically designed to handle the case of a function taking parameters by pointer/reference, which is then inlined, and the caller is passing local objects rather than other pointers/references.  So:

 

void inline_me(foo *ptr) {

 does something with ptr->x or *ptr;

}

void caller() {

  foo actual_obj;

  inline_me(&actual_obj);

}

 

After inlining, maintaining a pointer to actual_obj might be sub-optimal, but after a “step in” to inline_me, the user wants to look at an expression spelled *ptr even though the actual_obj might not have a memory address (because fields are SROA’d into registers, or whatever).  This is where DW_OP_implicit_pointer saves the day; *ptr and ptr->x are still evaluatable expressions, which expressions are secretly indirecting through the DIE for actual_obj.

 

I think it is not widely applicable outside of that kind of scenario.

 

Any ideas why it wouldn't be more general to handle cases where the variable isn't named? Such as:

foo source();
void f(foo);
inline void sink(foo* p) {
  f(*p);
}
int main() {
  sink(&source());

}

 

--paulr

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 4:34 PM
To: Adrian Prantl <[hidden email]>
Cc: [hidden email]; Robinson, Paul <[hidden email]>; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:



> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.


I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).
 

In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

 

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)

 



-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev


On Fri, Nov 15, 2019 at 8:07 AM Robinson, Paul <[hidden email]> wrote:

| Any ideas why it wouldn't be more general to handle cases where the variable isn't named?

 

Couldn’t there be a DIE (flagged as artificial) to describe the return-value temp?


There could be - though there are very few (the array bound example Adrian gave is the only one I know of - and even that seems unnecessary/GCC uses a different (& I think better/clearer/simpler) representation) cases of artificial variables being generated in Clang/LLVM - it lacks precedent so far as I can tell.
 

  You’d need such a DIE if you wanted the debugger to be able to look at the return value from source() anyway,


Not so far as I know - with GDB (& I assume LLDB) when you call a function and return from it (eg: "finish" or "step" that steps across the end of a function) the debugger prints out the return value (using the DW_AT_type of the DW_TAG_subprogram that was executing & its knowledge of the ABI to know where/how that value would be stored during the return) & you can actually then query it and do other things using the artificial variable name GDB provides

(my example was slightly bogus - you can't take the address of a temporary in C++ like that, but you can take a reference to it, so updating & fleshing out the test:

__attribute__((optnone)) int source() {
  return 3;
}
__attribute__((optnone)) void f(int) {
}
inline void sink(const int& p) {
  f(p);
}
int main() {
  sink(source());
}


& then playing that through GDB:

(gdb) start
Temporary breakpoint 1 at 0x401131: file var.cpp, line 10.
Starting program: /usr/local/google/home/blaikie/dev/scratch/a.out

Temporary breakpoint 1, main () at var.cpp:10
10        sink(source());
(gdb) s
source () at var.cpp:2
2         return 3;
(gdb) fin
Run till exit from #0  source () at var.cpp:2
main () at var.cpp:10
10        sink(source());
Value returned is $1 = 3
(gdb) s
sink (p=<optimized out>) at var.cpp:7
7         f(p);


It'd be nice if the value of 'p' could be printed there, but it seems without introducing artificial variables, the implicit_pointer doesn't provide a way to do that & that seems to me like an unnecessary limitation & complication in the DWARF and in LLVM's intermediate representation compared to having 'p's DW_AT_location describe the value being pointed to directly without the need for another variable?

- Dave
 

in the context of main() and in the absence of inlining.  And given that DIE, implicit_pointer within sink() can refer to it.

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 5:32 PM
To: Robinson, Paul <[hidden email]>
Cc: Adrian Prantl <[hidden email]>; [hidden email]; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:53 PM Robinson, Paul <[hidden email]> wrote:

My reading of the DWARF issue is that it was fairly specifically designed to handle the case of a function taking parameters by pointer/reference, which is then inlined, and the caller is passing local objects rather than other pointers/references.  So:

 

void inline_me(foo *ptr) {

 does something with ptr->x or *ptr;

}

void caller() {

  foo actual_obj;

  inline_me(&actual_obj);

}

 

After inlining, maintaining a pointer to actual_obj might be sub-optimal, but after a “step in” to inline_me, the user wants to look at an expression spelled *ptr even though the actual_obj might not have a memory address (because fields are SROA’d into registers, or whatever).  This is where DW_OP_implicit_pointer saves the day; *ptr and ptr->x are still evaluatable expressions, which expressions are secretly indirecting through the DIE for actual_obj.

 

I think it is not widely applicable outside of that kind of scenario.

 

Any ideas why it wouldn't be more general to handle cases where the variable isn't named? Such as:

foo source();
void f(foo);
inline void sink(foo* p) {
  f(*p);
}
int main() {
  sink(&source());

}

 

--paulr

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 4:34 PM
To: Adrian Prantl <[hidden email]>
Cc: [hidden email]; Robinson, Paul <[hidden email]>; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:



> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.


I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).
 

In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

 

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)

 



-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev

Hi Dave,

 

Let me explain my point of view (apology for long mail)

 

The bigger goal is to enable debugger to be able to display value of as many variables as possible (in theory it can be ALL, practically it may be all important variables).

 

This can be achieved in multiple ways.

  1. Keeping the variable in memory
  2. If optimized out, it can be represented as DWARF expressions (DW_OP_stack_value, DW_OP_lit0, DW_OP_piece).
  3. If variable is optimized out but its value can be seen in other variable (statement ‘ptr=&obj’ implies (‘*ptr=obj’) that value contained by ‘ptr’ can be seen in ‘obj’) . This is done using DW_OP_implicit_pointer. DW_OP_implicit pointer just tells that value can be implicitly checked in OTHER variable which is present. Its responsibility ends here, the value will be displayed ONLY if the OTHER variable has value (may be using DWARF expressions as in way-2 above).

 

Coming back to your test case. Thanks for sharing the test case.

 

If you notice for the same test case value is propagated to function ‘f’.

---------------

__attribute__((optnone)) int source() {

  return 3;

}

__attribute__((optnone)) void f(int i) {

}

inline void sink(const int& p) {

  f(p);

}

int main() {

  sink(source());

}

------------

With gdb

(gdb) bt

#0  f (i=3) at test1.cc:5

#1  0x00000000004004b0 in sink (p=<optimized out>) at test1.cc:7

#2  main () at test1.cc:10

 

In this case though we are not able to check the variable in ‘sink’ but we can check it in ‘f’. As compiler decides it as non-important in sink while propagates it to ‘f’

 

Lets change a little to test case.

----------------------

__attribute__((optnone)) int source() {

  return 3;

}

__attribute__((optnone)) void f(const int& i) {

}

inline void sink(const int& p) {

  f(p);

}

int main() {

  sink(source());

}

-----------------------

#0  f (i=@0x7fffffffe3e4: 3) at test2.cc:5

#1  0x00000000004004b8 in sink (p=@0x7fffffffe3e4: 3) at test2.cc:7

#2  main () at test2.cc:10

-----------------------

In this case we can get the value of ‘p’ in ‘sink’ as well, and that happens because compiler considers it important and decides to keep the temporary (store to it).

 

IMO the variable value not seen is corner case (it doesn’t apply to pointers, it applies to references only when it refers to a temporary and compiler does optimize that temporary). But even if we decide to get this case displayed, it comes under bigger goal of displaying a variable value. We can solve this problem in many ways.

  • Way-1 above) Keeping the temporary (store to it). It is performance penalty compiler decides if variable is that important.
  • Way-3 above) DW_OP_implicit_pointer is only implies/redirects a variable to something that is available in Dwarf, so ideally this case doesn’t fall under this. But yes to extend it we can generate artificial DIE for temporary variable as suggested by Paul
  • Way-2 above) We can use any existing/new Dwarf expression to keep the value in variable itself.

 

IMO since now DW_OP_implicit_pointer is part of standard. It will be good to comply. Moreover popular tool GNU gdb will work with it which would be added advantage. We can start with existing DW_OP_implicit_pointer, incrementally we can increase the scope of bigger goal of displaying all the important variables (with/without help of implicit pointers). Thanks.

 

Regards,

Alok

 

 

From: David Blaikie <[hidden email]>
Sent: Friday, November 15, 2019 11:24 PM
To: Robinson, Paul <[hidden email]>
Cc: Adrian Prantl <[hidden email]>; Sharma, Alok Kumar <[hidden email]>; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

[CAUTION: External Email]

 

 

On Fri, Nov 15, 2019 at 8:07 AM Robinson, Paul <[hidden email]> wrote:

| Any ideas why it wouldn't be more general to handle cases where the variable isn't named?

 

Couldn’t there be a DIE (flagged as artificial) to describe the return-value temp?


There could be - though there are very few (the array bound example Adrian gave is the only one I know of - and even that seems unnecessary/GCC uses a different (& I think better/clearer/simpler) representation) cases of artificial variables being generated in Clang/LLVM - it lacks precedent so far as I can tell.
 

  You’d need such a DIE if you wanted the debugger to be able to look at the return value from source() anyway,


Not so far as I know - with GDB (& I assume LLDB) when you call a function and return from it (eg: "finish" or "step" that steps across the end of a function) the debugger prints out the return value (using the DW_AT_type of the DW_TAG_subprogram that was executing & its knowledge of the ABI to know where/how that value would be stored during the return) & you can actually then query it and do other things using the artificial variable name GDB provides

(my example was slightly bogus - you can't take the address of a temporary in C++ like that, but you can take a reference to it, so updating & fleshing out the test:

__attribute__((optnone)) int source() {
  return 3;
}
__attribute__((optnone)) void f(int) {
}
inline void sink(const int& p) {
  f(p);
}
int main() {
  sink(source());
}


& then playing that through GDB:

(gdb) start
Temporary breakpoint 1 at 0x401131: file var.cpp, line 10.
Starting program: /usr/local/google/home/blaikie/dev/scratch/a.out

Temporary breakpoint 1, main () at var.cpp:10
10        sink(source());
(gdb) s
source () at var.cpp:2
2         return 3;
(gdb) fin
Run till exit from #0  source () at var.cpp:2
main () at var.cpp:10
10        sink(source());
Value returned is $1 = 3
(gdb) s
sink (p=<optimized out>) at var.cpp:7
7         f(p);


It'd be nice if the value of 'p' could be printed there, but it seems without introducing artificial variables, the implicit_pointer doesn't provide a way to do that & that seems to me like an unnecessary limitation & complication in the DWARF and in LLVM's intermediate representation compared to having 'p's DW_AT_location describe the value being pointed to directly without the need for another variable?

- Dave
 

in the context of main() and in the absence of inlining.  And given that DIE, implicit_pointer within sink() can refer to it.

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 5:32 PM
To: Robinson, Paul <[hidden email]>
Cc: Adrian Prantl <[hidden email]>; [hidden email]; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:53 PM Robinson, Paul <[hidden email]> wrote:

My reading of the DWARF issue is that it was fairly specifically designed to handle the case of a function taking parameters by pointer/reference, which is then inlined, and the caller is passing local objects rather than other pointers/references.  So:

 

void inline_me(foo *ptr) {

 does something with ptr->x or *ptr;

}

void caller() {

  foo actual_obj;

  inline_me(&actual_obj);

}

 

After inlining, maintaining a pointer to actual_obj might be sub-optimal, but after a “step in” to inline_me, the user wants to look at an expression spelled *ptr even though the actual_obj might not have a memory address (because fields are SROA’d into registers, or whatever).  This is where DW_OP_implicit_pointer saves the day; *ptr and ptr->x are still evaluatable expressions, which expressions are secretly indirecting through the DIE for actual_obj.

 

I think it is not widely applicable outside of that kind of scenario.

 

Any ideas why it wouldn't be more general to handle cases where the variable isn't named? Such as:

foo source();
void f(foo);
inline void sink(foo* p) {
  f(*p);
}
int main() {
  sink(&source());

}

 

--paulr

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 4:34 PM
To: Adrian Prantl <[hidden email]>
Cc: [hidden email]; Robinson, Paul <[hidden email]>; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:



> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.


I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).
 

In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

 

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)

 



-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
In reply to this post by Doerfert, Johannes via llvm-dev
On 15/11/2019 18:54, David Blaikie via llvm-dev wrote:

>        You’d need such a DIE if you wanted the debugger to be able to
>     look at the return value from source() anyway,
>
>
> Not so far as I know - with GDB (& I assume LLDB) when you call a
> function and return from it (eg: "finish" or "step" that steps across
> the end of a function) the debugger prints out the return value (using
> the DW_AT_type of the DW_TAG_subprogram that was executing & its
> knowledge of the ABI to know where/how that value would be stored during
> the return) & you can actually then query it and do other things using
> the artificial variable name GDB provides

[Not really related to DW_OP_implicit_pointer, but I though this is
worth mentioning.]

I'm not sure how gdb does this (though I don't know how it could do
anything different), but the way this is implemented in lldb is a bit
dodgy, and often does not work for non-trivial return types (== types
that cannot be returned "by value" in registers).

The reason for that is that in these cases the ABI usually specifies
that the address to store these return values is passed to the callee
via some register. This register is usually also volatile, and the
callee is free to reuse it for something else. That means that at the
end of the "finish", in general, we're unable to know what the value of
that register was at the *entry* to that function.

What lldb does right now is read the value of this register *after* the
function returns, and hopes that it has not been modified. This works
for simple leaf functions, but it can fail easily in more complex
scenarios, particularly when optimizations are enabled.

Anyway, what I'm trying to say is that having a more trustworthy method
of specifying the value/location of the function result would not be a
completely bad idea. Or maybe there already is one and we're not using it?

pl
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
In reply to this post by Doerfert, Johannes via llvm-dev
Hi llvm-dev@,

Switching focus to the LLVM implementation, the significant change is
using dbg.value's first operand to refer to a DILocalVariable, rather
than a Value. There's some impedance mismatch here, because all the
documentation (for example in the DbgVariableIntrinsic class)
expresses everything in terms of the variables location, whereas
implicit pointers don't have a location as they represent an extra
level of indirection. This is best demonstrated by the change to
IntrinsicInst.cpp in this patch [0] -- calling getVariableLocation on
any normal dbg.value will return the locations Value, but if it's an
implicit pointer then you'll get the meaningless MetadataAsValue
wrapper back instead. This isn't the variable location, might surprise
existing handlers of dbg.values, and just seems a little off.

I can see why this route has been taken, but by putting a non-Value in
dbg.value's, it really changes what dbg.values represent, a variable
location in the IR. Is there any appetite out there for using a
different intrinsic, something like 'dbg.loc.implicit', instead of
using dbg.value? IMO it would be worthwhile to separate:
 * Debug intrinsics where their position in the IR is important, from
 * Debug intrinsics where both their position in the IR, _and_ a Value
in the IR, are important.
Of which (I think) implicit pointers are the former, and current [2]
dbg.values are the latter. This would also avoid putting
DW_OP_implicit_pointer into expressions in the IR, pre-isel at least.

There's also Vedants suggestion [1] for linking implicit pointer
locations with the dbg.values of the underlying DILocalVariable. I
suspect the presence of control flow might make it difficult (there's
no dbg.phi instruction), but I like the idea of having more explicit
links in the IR, it would be much clearer to interpret what's going
on.

[0] https://reviews.llvm.org/D69999?id=229790
[1] https://reviews.llvm.org/D69886#1736182
[2] Technically dbg.value(undef,...) is the former too, I guess.

--
Thanks,
Jeremy
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
In reply to this post by Doerfert, Johannes via llvm-dev


> -----Original Message-----
> From: Pavel Labath <[hidden email]>
> Sent: Monday, November 18, 2019 5:08 AM
> To: David Blaikie <[hidden email]>; Robinson, Paul
> <[hidden email]>
> Cc: llvm-dev <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in
> general
>
> On 15/11/2019 18:54, David Blaikie via llvm-dev wrote:
> >        You’d need such a DIE if you wanted the debugger to be able to
> >     look at the return value from source() anyway,
> >
> >
> > Not so far as I know - with GDB (& I assume LLDB) when you call a
> > function and return from it (eg: "finish" or "step" that steps across
> > the end of a function) the debugger prints out the return value (using
> > the DW_AT_type of the DW_TAG_subprogram that was executing & its
> > knowledge of the ABI to know where/how that value would be stored during
> > the return) & you can actually then query it and do other things using
> > the artificial variable name GDB provides
>
> [Not really related to DW_OP_implicit_pointer, but I though this is
> worth mentioning.]
>
> I'm not sure how gdb does this (though I don't know how it could do
> anything different), but the way this is implemented in lldb is a bit
> dodgy, and often does not work for non-trivial return types (== types
> that cannot be returned "by value" in registers).
>
> The reason for that is that in these cases the ABI usually specifies
> that the address to store these return values is passed to the callee
> via some register. This register is usually also volatile, and the
> callee is free to reuse it for something else. That means that at the
> end of the "finish", in general, we're unable to know what the value of
> that register was at the *entry* to that function.
>
> What lldb does right now is read the value of this register *after* the
> function returns, and hopes that it has not been modified. This works
> for simple leaf functions, but it can fail easily in more complex
> scenarios, particularly when optimizations are enabled.
>
> Anyway, what I'm trying to say is that having a more trustworthy method
> of specifying the value/location of the function result would not be a
> completely bad idea. Or maybe there already is one and we're not using it?

I was imagining a more trustworthy method, although it's something that
DWARF does not currently specify.  Location of the return *address*, yes;
location of the return *value*, no (only its type).

Currently DWARF assumes the debugger is aware of the ABI used by the
platform and language; this was easy enough back in the days when return
values had simple types and usually one canonical register was enough.
Larger and more complicated return values probably deserve their own DIE,
with location info of some kind.
--paulr

>
> pl
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
In reply to this post by Doerfert, Johannes via llvm-dev

I’ve been reminded of PR37682, where a function with a reference parameter might spend all its time computing the “referenced” value in a temp, and only move the final value back to the referenced object at the end.  This is clearly a situation that could benefit from DW_OP_implicit_pointer, and there is really no other-object DIE for it to refer to.  Given the current spec, the compiler would need to produce a DW_TAG_dwarf_procedure for the parameter DIE to refer to.  Appendix D (Figure D.61) has an example of this construction, although it’s a more contrived source example.

 

Does it have to be spec’d this way?  I think the spec as given is general enough to support DW_OP_implicit_pointer to an aggregate, with different locations for each member.  You could probably come up with a way to specify simpler cases more simply, although you’d need a new DW_OP to do that—there’s no explicit FORM describing the operand of a DW_OP, so we can’t just mess with how the operands are interpreted.

--paulr

 

From: David Blaikie <[hidden email]>
Sent: Friday, November 15, 2019 12:54 PM
To: Robinson, Paul <[hidden email]>
Cc: Adrian Prantl <[hidden email]>; [hidden email]; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Fri, Nov 15, 2019 at 8:07 AM Robinson, Paul <[hidden email]> wrote:

| Any ideas why it wouldn't be more general to handle cases where the variable isn't named?

 

Couldn’t there be a DIE (flagged as artificial) to describe the return-value temp?


There could be - though there are very few (the array bound example Adrian gave is the only one I know of - and even that seems unnecessary/GCC uses a different (& I think better/clearer/simpler) representation) cases of artificial variables being generated in Clang/LLVM - it lacks precedent so far as I can tell.
 

  You’d need such a DIE if you wanted the debugger to be able to look at the return value from source() anyway,


Not so far as I know - with GDB (& I assume LLDB) when you call a function and return from it (eg: "finish" or "step" that steps across the end of a function) the debugger prints out the return value (using the DW_AT_type of the DW_TAG_subprogram that was executing & its knowledge of the ABI to know where/how that value would be stored during the return) & you can actually then query it and do other things using the artificial variable name GDB provides

(my example was slightly bogus - you can't take the address of a temporary in C++ like that, but you can take a reference to it, so updating & fleshing out the test:

__attribute__((optnone)) int source() {
  return 3;
}
__attribute__((optnone)) void f(int) {
}
inline void sink(const int& p) {
  f(p);
}
int main() {
  sink(source());
}


& then playing that through GDB:

(gdb) start
Temporary breakpoint 1 at 0x401131: file var.cpp, line 10.
Starting program: /usr/local/google/home/blaikie/dev/scratch/a.out

Temporary breakpoint 1, main () at var.cpp:10
10        sink(source());
(gdb) s
source () at var.cpp:2
2         return 3;
(gdb) fin
Run till exit from #0  source () at var.cpp:2
main () at var.cpp:10
10        sink(source());
Value returned is $1 = 3
(gdb) s
sink (p=<optimized out>) at var.cpp:7
7         f(p);


It'd be nice if the value of 'p' could be printed there, but it seems without introducing artificial variables, the implicit_pointer doesn't provide a way to do that & that seems to me like an unnecessary limitation & complication in the DWARF and in LLVM's intermediate representation compared to having 'p's DW_AT_location describe the value being pointed to directly without the need for another variable?

- Dave
 

in the context of main() and in the absence of inlining.  And given that DIE, implicit_pointer within sink() can refer to it.

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 5:32 PM
To: Robinson, Paul <[hidden email]>
Cc: Adrian Prantl <[hidden email]>; [hidden email]; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:53 PM Robinson, Paul <[hidden email]> wrote:

My reading of the DWARF issue is that it was fairly specifically designed to handle the case of a function taking parameters by pointer/reference, which is then inlined, and the caller is passing local objects rather than other pointers/references.  So:

 

void inline_me(foo *ptr) {

 does something with ptr->x or *ptr;

}

void caller() {

  foo actual_obj;

  inline_me(&actual_obj);

}

 

After inlining, maintaining a pointer to actual_obj might be sub-optimal, but after a “step in” to inline_me, the user wants to look at an expression spelled *ptr even though the actual_obj might not have a memory address (because fields are SROA’d into registers, or whatever).  This is where DW_OP_implicit_pointer saves the day; *ptr and ptr->x are still evaluatable expressions, which expressions are secretly indirecting through the DIE for actual_obj.

 

I think it is not widely applicable outside of that kind of scenario.

 

Any ideas why it wouldn't be more general to handle cases where the variable isn't named? Such as:

foo source();
void f(foo);
inline void sink(foo* p) {
  f(*p);
}
int main() {
  sink(&source());

}

 

--paulr

 

From: David Blaikie <[hidden email]>
Sent: Thursday, November 14, 2019 4:34 PM
To: Adrian Prantl <[hidden email]>
Cc: [hidden email]; Robinson, Paul <[hidden email]>; Jonas Devlieghere <[hidden email]>; llvm-dev <[hidden email]>
Subject: Re: DW_OP_implicit_pointer design/implementation in general

 

 

 

On Thu, Nov 14, 2019 at 1:27 PM Adrian Prantl <[hidden email]> wrote:



> On Nov 14, 2019, at 1:21 PM, David Blaikie <[hidden email]> wrote:
>
> Hey folks,
>
> Would you all mind having a bit of a design discussion around the feature both at the DWARF level and the LLVM implementation? It seems like what's currently being proposed/reviewed (based on the DWARF feature as spec'd) is a pretty big change & I'm not sure I understand the motivation, exactly.
>
> The core point of my confusion: Why does describing the thing a pointer points to require describing a named variable that it points to? What if it doesn't point to a named variable?

Without having looked at the motivational text when the feature was proposed to DWARF, my assumption was that this is similar to how bounds for variable-length arrays are implemented, where a (potentially) artificial variable is created by the compiler in order to have something to refer to.


I /sort/ of see that case as a bit different, because the array type needs to refer back into the function potentially (to use frame-relative, etc). I could think of other ways to do that in hindsight (like putting the array type definition inside the function to begin with & having the count describe the location directly, for instance).
 

In retrospect I find the entire specification of DW_OP_implicit_pointer to be strangely specific/limited (why one hard-coded offset instead of an arbitrary expression?), but that ship has sailed for DWARF 5 and I'm to blame for not voicing that concern earlier.

 

Sure, but we don't have to implement it if we don't find it to be super useful/worthwhile, right? (if something else would be particularly more general/useful we could instead implement that as an extension, though of course there's cost to that in terms of consumer support, etc)

 



-- adrian

>
> Seems like there should be a way to describe that situation - and that doing so would be a more general solution than one limited to only describing pointers that point to named variables. And would be a simpler implementation in LLVM - without having to deconstruct variables during optimizations, etc, to track one variable's value being concretely related to another variable's value.
>
> - David


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
In reply to this post by Doerfert, Johannes via llvm-dev


> On Nov 18, 2019, at 8:33 AM, Jeremy Morse <[hidden email]> wrote:
>
> Hi llvm-dev@,
>
> Switching focus to the LLVM implementation, the significant change is
> using dbg.value's first operand to refer to a DILocalVariable, rather
> than a Value. There's some impedance mismatch here, because all the
> documentation (for example in the DbgVariableIntrinsic class)
> expresses everything in terms of the variables location, whereas
> implicit pointers don't have a location as they represent an extra
> level of indirection. This is best demonstrated by the change to
> IntrinsicInst.cpp in this patch [0] -- calling getVariableLocation on
> any normal dbg.value will return the locations Value, but if it's an
> implicit pointer then you'll get the meaningless MetadataAsValue
> wrapper back instead. This isn't the variable location, might surprise
> existing handlers of dbg.values, and just seems a little off.
>
> I can see why this route has been taken, but by putting a non-Value in
> dbg.value's, it really changes what dbg.values represent, a variable
> location in the IR. Is there any appetite out there for using a
> different intrinsic, something like 'dbg.loc.implicit', instead of
> using dbg.value? IMO it would be worthwhile to separate:
> * Debug intrinsics where their position in the IR is important, from
> * Debug intrinsics where both their position in the IR, _and_ a Value
> in the IR, are important.
> Of which (I think) implicit pointers are the former, and current [2]
> dbg.values are the latter. This would also avoid putting
> DW_OP_implicit_pointer into expressions in the IR, pre-isel at least.
>


On that particular point, I would like to see is a generalization of dbg.value: Currently llvm.dbg.value binds an SSA value (including constants and undef) and a DIExpression to a DILocalVariable at a position in the instruction stream. That first SSA value argument is an implicit first element in the DIExpression.

A more general form would be a more printf-like signature:

llvm.dbg.value(DILocalVariable, DIExpression, ...)

for example

llvm.dbg.value_new(DILocalVariable("x"), DIExpression(DW_OP_LLVM_arg0), %x)
llvm.dbg.value_new(DILocalVariable("y"), DIExpression(DW_OP_LLVM_arg0, DW_OP_LLVM_arg1, DW_OP_plus),
                   %ptr, %ofs)
llvm.dbg.value_new(DILocalVariable("z"), DIExpression(DW_OP_implicit_pointer, DW_OP_LLVM_arg0, 32),
                   DILocalVariable("base"))
llvm.dbg.value_new(DILocalVariable("c"), DIExpression(DW_OP_constu, 1))

The mandatory arguments would be the variable and the expression, and an arbitrary number of SSA values and potentially other variables.


As far as DW_OP_LLVM_implicit_pointer in particular is concerned, we could also treat the peculiarities of DW_OP_implicit_pointer as a DWARF implementation detail, introduce DW_OP_LLVM_implicit_pointer which transforms the top-of-stack into an implicit pointer (similar to DW_OP_stack_value) and have the DWARF backend insert an artificial variable on the fly.

LLVM IR:

llvm.dbg.value(%base, DILocalVariable("z"), DIExpression(DW_OP_LLVM_implicit_pointer))

AsmPrinter would expand this into two DW_TAG_variable tags with one location (list) entry each.

-- adrian

> There's also Vedants suggestion [1] for linking implicit pointer
> locations with the dbg.values of the underlying DILocalVariable. I
> suspect the presence of control flow might make it difficult (there's
> no dbg.phi instruction), but I like the idea of having more explicit
> links in the IR, it would be much clearer to interpret what's going
> on.
>
> [0] https://reviews.llvm.org/D69999?id=229790
> [1] https://reviews.llvm.org/D69886#1736182
> [2] Technically dbg.value(undef,...) is the former too, I guess.
>
> --
> Thanks,
> Jeremy

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev


> On Nov 19, 2019, at 9:41 AM, Adrian Prantl via llvm-dev <[hidden email]> wrote:
>
>
>
>> On Nov 18, 2019, at 8:33 AM, Jeremy Morse <[hidden email]> wrote:
>>
>> Hi llvm-dev@,
>>
>> Switching focus to the LLVM implementation, the significant change is
>> using dbg.value's first operand to refer to a DILocalVariable, rather
>> than a Value. There's some impedance mismatch here, because all the
>> documentation (for example in the DbgVariableIntrinsic class)
>> expresses everything in terms of the variables location, whereas
>> implicit pointers don't have a location as they represent an extra
>> level of indirection. This is best demonstrated by the change to
>> IntrinsicInst.cpp in this patch [0] -- calling getVariableLocation on
>> any normal dbg.value will return the locations Value, but if it's an
>> implicit pointer then you'll get the meaningless MetadataAsValue
>> wrapper back instead. This isn't the variable location, might surprise
>> existing handlers of dbg.values, and just seems a little off.
>>
>> I can see why this route has been taken, but by putting a non-Value in
>> dbg.value's, it really changes what dbg.values represent, a variable
>> location in the IR. Is there any appetite out there for using a
>> different intrinsic, something like 'dbg.loc.implicit', instead of
>> using dbg.value? IMO it would be worthwhile to separate:
>> * Debug intrinsics where their position in the IR is important, from
>> * Debug intrinsics where both their position in the IR, _and_ a Value
>> in the IR, are important.
>> Of which (I think) implicit pointers are the former, and current [2]
>> dbg.values are the latter. This would also avoid putting
>> DW_OP_implicit_pointer into expressions in the IR, pre-isel at least.
>>
>
>
> On that particular point, I would like to see is a generalization of dbg.value: Currently llvm.dbg.value binds an SSA value (including constants and undef) and a DIExpression to a DILocalVariable at a position in the instruction stream. That first SSA value argument is an implicit first element in the DIExpression.
>
> A more general form would be a more printf-like signature:
>
> llvm.dbg.value(DILocalVariable, DIExpression, ...)
>
> for example
>
> llvm.dbg.value_new(DILocalVariable("x"), DIExpression(DW_OP_LLVM_arg0), %x)
> llvm.dbg.value_new(DILocalVariable("y"), DIExpression(DW_OP_LLVM_arg0, DW_OP_LLVM_arg1, DW_OP_plus),
>                   %ptr, %ofs)
> llvm.dbg.value_new(DILocalVariable("z"), DIExpression(DW_OP_implicit_pointer, DW_OP_LLVM_arg0, 32),
>                   DILocalVariable("base"))
> llvm.dbg.value_new(DILocalVariable("c"), DIExpression(DW_OP_constu, 1))
>
> The mandatory arguments would be the variable and the expression, and an arbitrary number of SSA values and potentially other variables.

I don't have a strong opinion on representation. I can see how having a dedicated instruction to model implicit pointers would aid readability & be simpler to document/grok, but perhaps in the future we'll want to support other operations that refer to variable DIEs. In the short term migrating to an extended dbg.value representation might take more work. Alok, wdyt?

vedant

>
>
> As far as DW_OP_LLVM_implicit_pointer in particular is concerned, we could also treat the peculiarities of DW_OP_implicit_pointer as a DWARF implementation detail, introduce DW_OP_LLVM_implicit_pointer which transforms the top-of-stack into an implicit pointer (similar to DW_OP_stack_value) and have the DWARF backend insert an artificial variable on the fly.
>
> LLVM IR:
>
> llvm.dbg.value(%base, DILocalVariable("z"), DIExpression(DW_OP_LLVM_implicit_pointer))
>
> AsmPrinter would expand this into two DW_TAG_variable tags with one location (list) entry each.
>
> -- adrian
>
>> There's also Vedants suggestion [1] for linking implicit pointer
>> locations with the dbg.values of the underlying DILocalVariable. I
>> suspect the presence of control flow might make it difficult (there's
>> no dbg.phi instruction), but I like the idea of having more explicit
>> links in the IR, it would be much clearer to interpret what's going
>> on.
>>
>> [0] https://reviews.llvm.org/D69999?id=229790
>> [1] https://reviews.llvm.org/D69886#1736182
>> [2] Technically dbg.value(undef,...) is the former too, I guess.
>>
>> --
>> Thanks,
>> Jeremy
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
> I don't have a strong opinion on representation. I can see how having a dedicated instruction to model implicit pointers would aid readability & be simpler to document/grok, but perhaps in the future we'll want to support other operations that refer to variable > DIEs. In the short term migrating to an extended dbg.value representation might take more work. Alok, wdyt?

Below is what I think for each suggestion.

DW_OP_LLVM_implicit_pointer
  * This is a good suggestion to include that in LLVM IR, because representation and specification (types of operands) of it a bit different that actual dwarf expression DW_OP_LLVM_implicit_pointer. while creating actual dwarf info it will be converted to DW_OP_LLVM_implicit_pointer. This is implemented and patch is updated for it.
 
DW_OP_LLVM_arg0
  * This is good suggestion and will help in readability. It is also implemented and is available in updated patch.

Splitting dbg.value
  * This is also a good idea from readability point of view. It also opens possibility of extension below is explanation.
Since dbg.value currently represents (VAR=VALUE), the new intrinsic dbg.deref_value will represent de-referenced value (*VAR = VAL)
    - Below represents ptr=null
      call void @llvm.dbg.value(metadata i32* null, metadata !21, metadata !DIExpression())
    - And below represents *ptr=var
      call void @llvm.dbg.deref.value(metadata !16, metadata !21, metadata !DIExpression(DW_OP_LLVM_implicit_pointer, DW_OP_LLVM_arg0, 0))
    - And below represents *ptr=arr[1]
      call void @llvm.dbg.deref.value(metadata !16, metadata !21, metadata !DIExpression(DW_OP_LLVM_implicit_pointer, DW_OP_LLVM_arg0, 4))
With this new representation we should be able to represent the case mentioned by David (in LLVM IR, we would still need some Dwarf operator to be understood by LLDB) when a variable points to temporary (initialized by constant) and temporary is optimized out.
      tmp=[CONST]; ptr=&tmp;
call void @llvm.dbg.deref.value(metadata [const], metadata !21, metadata !DIExpression(DW_OP_LLVM_arg0))
I shall update my patch with introduction of dbg.deref_value. Please do review.

Variadic dbg.value
   It is also a good idea. But since no immediate benefit seem to be availed by implicit pointer, it can be done independently.

Regards,
Alok


On Wed, Nov 20, 2019 at 5:23 AM Vedant Kumar via llvm-dev <[hidden email]> wrote:


> On Nov 19, 2019, at 9:41 AM, Adrian Prantl via llvm-dev <[hidden email]> wrote:
>
>
>
>> On Nov 18, 2019, at 8:33 AM, Jeremy Morse <[hidden email]> wrote:
>>
>> Hi llvm-dev@,
>>
>> Switching focus to the LLVM implementation, the significant change is
>> using dbg.value's first operand to refer to a DILocalVariable, rather
>> than a Value. There's some impedance mismatch here, because all the
>> documentation (for example in the DbgVariableIntrinsic class)
>> expresses everything in terms of the variables location, whereas
>> implicit pointers don't have a location as they represent an extra
>> level of indirection. This is best demonstrated by the change to
>> IntrinsicInst.cpp in this patch [0] -- calling getVariableLocation on
>> any normal dbg.value will return the locations Value, but if it's an
>> implicit pointer then you'll get the meaningless MetadataAsValue
>> wrapper back instead. This isn't the variable location, might surprise
>> existing handlers of dbg.values, and just seems a little off.
>>
>> I can see why this route has been taken, but by putting a non-Value in
>> dbg.value's, it really changes what dbg.values represent, a variable
>> location in the IR. Is there any appetite out there for using a
>> different intrinsic, something like 'dbg.loc.implicit', instead of
>> using dbg.value? IMO it would be worthwhile to separate:
>> * Debug intrinsics where their position in the IR is important, from
>> * Debug intrinsics where both their position in the IR, _and_ a Value
>> in the IR, are important.
>> Of which (I think) implicit pointers are the former, and current [2]
>> dbg.values are the latter. This would also avoid putting
>> DW_OP_implicit_pointer into expressions in the IR, pre-isel at least.
>>
>
>
> On that particular point, I would like to see is a generalization of dbg.value: Currently llvm.dbg.value binds an SSA value (including constants and undef) and a DIExpression to a DILocalVariable at a position in the instruction stream. That first SSA value argument is an implicit first element in the DIExpression.
>
> A more general form would be a more printf-like signature:
>
> llvm.dbg.value(DILocalVariable, DIExpression, ...)
>
> for example
>
> llvm.dbg.value_new(DILocalVariable("x"), DIExpression(DW_OP_LLVM_arg0), %x)
> llvm.dbg.value_new(DILocalVariable("y"), DIExpression(DW_OP_LLVM_arg0, DW_OP_LLVM_arg1, DW_OP_plus),
>                   %ptr, %ofs)
> llvm.dbg.value_new(DILocalVariable("z"), DIExpression(DW_OP_implicit_pointer, DW_OP_LLVM_arg0, 32),
>                   DILocalVariable("base"))
> llvm.dbg.value_new(DILocalVariable("c"), DIExpression(DW_OP_constu, 1))
>
> The mandatory arguments would be the variable and the expression, and an arbitrary number of SSA values and potentially other variables.

I don't have a strong opinion on representation. I can see how having a dedicated instruction to model implicit pointers would aid readability & be simpler to document/grok, but perhaps in the future we'll want to support other operations that refer to variable DIEs. In the short term migrating to an extended dbg.value representation might take more work. Alok, wdyt?

vedant

>
>
> As far as DW_OP_LLVM_implicit_pointer in particular is concerned, we could also treat the peculiarities of DW_OP_implicit_pointer as a DWARF implementation detail, introduce DW_OP_LLVM_implicit_pointer which transforms the top-of-stack into an implicit pointer (similar to DW_OP_stack_value) and have the DWARF backend insert an artificial variable on the fly.
>
> LLVM IR:
>
> llvm.dbg.value(%base, DILocalVariable("z"), DIExpression(DW_OP_LLVM_implicit_pointer))
>
> AsmPrinter would expand this into two DW_TAG_variable tags with one location (list) entry each.
>
> -- adrian
>
>> There's also Vedants suggestion [1] for linking implicit pointer
>> locations with the dbg.values of the underlying DILocalVariable. I
>> suspect the presence of control flow might make it difficult (there's
>> no dbg.phi instruction), but I like the idea of having more explicit
>> links in the IR, it would be much clearer to interpret what's going
>> on.
>>
>> [0] https://reviews.llvm.org/D69999?id=229790
>> [1] https://reviews.llvm.org/D69886#1736182
>> [2] Technically dbg.value(undef,...) is the former too, I guess.
>>
>> --
>> Thanks,
>> Jeremy
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
Hi,

For a new way of representing things,

Adrian wrote:
> llvm.dbg.value_new(DILocalVariable("y"), DIExpression(DW_OP_LLVM_arg0, DW_OP_LLVM_arg1, DW_OP_plus),
>                    %ptr, %ofs)

I think this would be great -- there're definitely some constructs
created by the induction-variables pass and similar where one could
recover an implicit variable value, if you could for example subtract
one pointer from another.

With the current model of storing DIExpressions as a vector of
opcodes, it might become a pain to salvage a Value that gets optimised
out --in the example, if %ofs were salvaged, presumably
DW_OP_LLVM_arg1 could have to be replaced with several extra
operations. This isn't insurmountable, but I've repeatedly shied away
from scanning through DIExpressions to patch them up. A vector of
opcodes is the final output of the compiler, IMHO richer metadata
should be used in the meantime.

IMHO the implicit pointer work doesn't need to block on this. As said
my mild preference would be for a new intrinsic for this form of
variable location.

~

Inre PR37682,

> I’ve been reminded of PR37682, where a function with a reference parameter might spend all its time computing the “referenced” value in a temp, and only move the final value back to the referenced object at the end.  This is clearly a situation that could benefit from DW_OP_implicit_pointer, and there is really no other-object DIE for it to refer to.  Given the current spec, the compiler would need to produce a DW_TAG_dwarf_procedure for the parameter DIE to refer to.  Appendix D (Figure D.61) has an example of this construction, although it’s a more contrived source example.

This has been working through my mind too, and I think it's slightly
different to what implicit_pointer is trying to achieve. In the case
implicit_pointer is designed for, it's a strict improvement in debug
experience because you're recovering information that couldn't be
expressed. However for PR37682 it's a trade-off between whether the
user might want to examine the pointer, or the pointed-at integer:
AFAIUI, we can only express one of the two, not both. Wheras for
mem2reg'd variables referred to by DIE, there is never a pointer to be
lost.

I think my preference would always be to see temporarily-promoted
values as there's no other way of observing them, but others might
disagree.

--
Thanks,
Jeremy
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
Hi folks,

I am pushing a PoC patch https://reviews.llvm.org/D70833 for review which includes the case when temporary is promoted.

For such cases it generates IR as

  call void @llvm.dbg.derefval(metadata i32 3, metadata !25, metadata !DIExpression(DW_OP_LLVM_explicit_pointer, DW_OP_LLVM_arg0)), !dbg !32

And llvm-darfdump output looks like

-------------
0x0000007b:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x0000004f "_Z4sinkRKi")
                  DW_AT_low_pc  (0x00000000004004c6)
                  DW_AT_high_pc (0x00000000004004d0)
                  DW_AT_call_file       ("/home/alok/openllvm/llvm-project_derefval/build.d/david.cc")
                  DW_AT_call_line       (10)
                  DW_AT_call_column     (0x03)

0x00000088:       DW_TAG_formal_parameter
                    DW_AT_location      (indexed (0x0) loclist = 0x00000010:
                       [0x00000000004004c6, 0x00000000004004d4): DW_OP_explicit_pointer, DW_OP_lit3)
                    DW_AT_abstract_origin       (0x00000055 "p")
------------

Please note that DW_OP_explicit_pointer denotes that following value represents de-referenced value of optimized out pointer. With necessary changes in LLDB debugger this dwarf info can help to detect the explicit de-referenced value of 'p'.

Hi David,

Should we keep on working for the above case separately and resume the review of implicit pointer independently now, which is updated with many suggestions from this discussion?

Regards,
Alok


On Wed, Nov 20, 2019 at 11:24 PM Jeremy Morse <[hidden email]> wrote:
Hi,

For a new way of representing things,

Adrian wrote:
> llvm.dbg.value_new(DILocalVariable("y"), DIExpression(DW_OP_LLVM_arg0, DW_OP_LLVM_arg1, DW_OP_plus),
>                    %ptr, %ofs)

I think this would be great -- there're definitely some constructs
created by the induction-variables pass and similar where one could
recover an implicit variable value, if you could for example subtract
one pointer from another.

With the current model of storing DIExpressions as a vector of
opcodes, it might become a pain to salvage a Value that gets optimised
out --in the example, if %ofs were salvaged, presumably
DW_OP_LLVM_arg1 could have to be replaced with several extra
operations. This isn't insurmountable, but I've repeatedly shied away
from scanning through DIExpressions to patch them up. A vector of
opcodes is the final output of the compiler, IMHO richer metadata
should be used in the meantime.

IMHO the implicit pointer work doesn't need to block on this. As said
my mild preference would be for a new intrinsic for this form of
variable location.

~

Inre PR37682,

> I’ve been reminded of PR37682, where a function with a reference parameter might spend all its time computing the “referenced” value in a temp, and only move the final value back to the referenced object at the end.  This is clearly a situation that could benefit from DW_OP_implicit_pointer, and there is really no other-object DIE for it to refer to.  Given the current spec, the compiler would need to produce a DW_TAG_dwarf_procedure for the parameter DIE to refer to.  Appendix D (Figure D.61) has an example of this construction, although it’s a more contrived source example.

This has been working through my mind too, and I think it's slightly
different to what implicit_pointer is trying to achieve. In the case
implicit_pointer is designed for, it's a strict improvement in debug
experience because you're recovering information that couldn't be
expressed. However for PR37682 it's a trade-off between whether the
user might want to examine the pointer, or the pointed-at integer:
AFAIUI, we can only express one of the two, not both. Wheras for
mem2reg'd variables referred to by DIE, there is never a pointer to be
lost.

I think my preference would always be to see temporarily-promoted
values as there's no other way of observing them, but others might
disagree.

--
Thanks,
Jeremy

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
Sorry I haven't been more engaged with this thread, I have been reading it, so hopefully my reply isn't completely out of line/irrelevant - but I still feel like having a custom dwarf expression operator (& no new intrinsics), like we have for one or two other DW_OP_LLVM_* (that aren't actually generated into the DWARF - though this one perhaps could be in some/all cases as an extension, maybe - or a synthesized variable could be created for compatibility with the current DWARF standard) would make the most sense.

Some thought experiments that I think are relevant:
* does the proposed IR format scale to pointers that don't point to existing variables (that I think has already been touched on in this thread)
* does the proposed IR format support multiple layers of dereference (eg: int ** where we know it ultimately points to the value 3 but can't describe either the first or second level pointers that get to that value) - it sounds like any intrinsic that's special cased to deref (like llvm.dbg.derefval) wouldn't be able to capture that, which seems like it's overly narrow/special case, then?

On Thu, Nov 28, 2019 at 2:29 PM Alok Sharma via llvm-dev <[hidden email]> wrote:
Hi folks,

I am pushing a PoC patch https://reviews.llvm.org/D70833 for review which includes the case when temporary is promoted.

For such cases it generates IR as

  call void @llvm.dbg.derefval(metadata i32 3, metadata !25, metadata !DIExpression(DW_OP_LLVM_explicit_pointer, DW_OP_LLVM_arg0)), !dbg !32

And llvm-darfdump output looks like

-------------
0x0000007b:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x0000004f "_Z4sinkRKi")
                  DW_AT_low_pc  (0x00000000004004c6)
                  DW_AT_high_pc (0x00000000004004d0)
                  DW_AT_call_file       ("/home/alok/openllvm/llvm-project_derefval/build.d/david.cc")
                  DW_AT_call_line       (10)
                  DW_AT_call_column     (0x03)

0x00000088:       DW_TAG_formal_parameter
                    DW_AT_location      (indexed (0x0) loclist = 0x00000010:
                       [0x00000000004004c6, 0x00000000004004d4): DW_OP_explicit_pointer, DW_OP_lit3)
                    DW_AT_abstract_origin       (0x00000055 "p")
------------

Please note that DW_OP_explicit_pointer denotes that following value represents de-referenced value of optimized out pointer. With necessary changes in LLDB debugger this dwarf info can help to detect the explicit de-referenced value of 'p'.

Hi David,

Should we keep on working for the above case separately and resume the review of implicit pointer independently now, which is updated with many suggestions from this discussion?

Regards,
Alok


On Wed, Nov 20, 2019 at 11:24 PM Jeremy Morse <[hidden email]> wrote:
Hi,

For a new way of representing things,

Adrian wrote:
> llvm.dbg.value_new(DILocalVariable("y"), DIExpression(DW_OP_LLVM_arg0, DW_OP_LLVM_arg1, DW_OP_plus),
>                    %ptr, %ofs)

I think this would be great -- there're definitely some constructs
created by the induction-variables pass and similar where one could
recover an implicit variable value, if you could for example subtract
one pointer from another.

With the current model of storing DIExpressions as a vector of
opcodes, it might become a pain to salvage a Value that gets optimised
out --in the example, if %ofs were salvaged, presumably
DW_OP_LLVM_arg1 could have to be replaced with several extra
operations. This isn't insurmountable, but I've repeatedly shied away
from scanning through DIExpressions to patch them up. A vector of
opcodes is the final output of the compiler, IMHO richer metadata
should be used in the meantime.

IMHO the implicit pointer work doesn't need to block on this. As said
my mild preference would be for a new intrinsic for this form of
variable location.

~

Inre PR37682,

> I’ve been reminded of PR37682, where a function with a reference parameter might spend all its time computing the “referenced” value in a temp, and only move the final value back to the referenced object at the end.  This is clearly a situation that could benefit from DW_OP_implicit_pointer, and there is really no other-object DIE for it to refer to.  Given the current spec, the compiler would need to produce a DW_TAG_dwarf_procedure for the parameter DIE to refer to.  Appendix D (Figure D.61) has an example of this construction, although it’s a more contrived source example.

This has been working through my mind too, and I think it's slightly
different to what implicit_pointer is trying to achieve. In the case
implicit_pointer is designed for, it's a strict improvement in debug
experience because you're recovering information that couldn't be
expressed. However for PR37682 it's a trade-off between whether the
user might want to examine the pointer, or the pointed-at integer:
AFAIUI, we can only express one of the two, not both. Wheras for
mem2reg'd variables referred to by DIE, there is never a pointer to be
lost.

I think my preference would always be to see temporarily-promoted
values as there's no other way of observing them, but others might
disagree.

--
Thanks,
Jeremy
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] DW_OP_implicit_pointer design/implementation in general

Doerfert, Johannes via llvm-dev
Let me try to summarize the implementation first.

At the moment, there are two branches.

1. When an existing variable is optimized out and that variable is used to get the de-refereced value, pointed to by another pointer/reference variable.
  Such cases are being addressed using Dwarf expression DW_OP_implicit_pointer as de-referenced value of a pointer can be seen implicitly (using another variable). Before Dwarf is dumped in LLVM IR, we represent it using dbg.derefval (which denotes derefereced value of pointer or reference) and DW_OP_LLVM_implicit_pointer operation.

2. When a temporary variable is optimized out and that variable is used to get de-referenced value of another reference variable (AFAIK it can not be reproduced with pointers)
  Such cases are being addressed using new Dwarf expression DW_OP_explicit_pointer as de-referenced value can be displayed explicitly (in place). In LLVM IR, we represent it using dbg.derefval and DW_OP_LLVM_explicit_pointer operation.

Both of these two branches have some common implementation to define new operations (Dwarf and IR). (D70642, D70643, D69999, D69886).
First branch has additional patches (D70260, 70384, D70385, D70419).
Second branch has additional patch ( D70833).

Let me try to comment on points raised by you.
- Branch 2, (patch D70833) handles cases when temporaries (not existing variables) are optimized out.
- In patch D70385, I have included test points to display that multi layered pointers are working (llvm/test/DebugInfo/dwarfdump-implicit_pointer_mem2reg.c).

I feel that review of branch 1 (implicit pointer) can be resumed (which was halted due to current discussion), while we can continue to discuss branch 2 (explicit pointers D7083) if you want. David, what do you think?

Regards,
Alok

On Fri, Nov 29, 2019 at 4:40 AM David Blaikie <[hidden email]> wrote:
Sorry I haven't been more engaged with this thread, I have been reading it, so hopefully my reply isn't completely out of line/irrelevant - but I still feel like having a custom dwarf expression operator (& no new intrinsics), like we have for one or two other DW_OP_LLVM_* (that aren't actually generated into the DWARF - though this one perhaps could be in some/all cases as an extension, maybe - or a synthesized variable could be created for compatibility with the current DWARF standard) would make the most sense.

Some thought experiments that I think are relevant:
* does the proposed IR format scale to pointers that don't point to existing variables (that I think has already been touched on in this thread)
* does the proposed IR format support multiple layers of dereference (eg: int ** where we know it ultimately points to the value 3 but can't describe either the first or second level pointers that get to that value) - it sounds like any intrinsic that's special cased to deref (like llvm.dbg.derefval) wouldn't be able to capture that, which seems like it's overly narrow/special case, then?

On Thu, Nov 28, 2019 at 2:29 PM Alok Sharma via llvm-dev <[hidden email]> wrote:
Hi folks,

I am pushing a PoC patch https://reviews.llvm.org/D70833 for review which includes the case when temporary is promoted.

For such cases it generates IR as

  call void @llvm.dbg.derefval(metadata i32 3, metadata !25, metadata !DIExpression(DW_OP_LLVM_explicit_pointer, DW_OP_LLVM_arg0)), !dbg !32

And llvm-darfdump output looks like

-------------
0x0000007b:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x0000004f "_Z4sinkRKi")
                  DW_AT_low_pc  (0x00000000004004c6)
                  DW_AT_high_pc (0x00000000004004d0)
                  DW_AT_call_file       ("/home/alok/openllvm/llvm-project_derefval/build.d/david.cc")
                  DW_AT_call_line       (10)
                  DW_AT_call_column     (0x03)

0x00000088:       DW_TAG_formal_parameter
                    DW_AT_location      (indexed (0x0) loclist = 0x00000010:
                       [0x00000000004004c6, 0x00000000004004d4): DW_OP_explicit_pointer, DW_OP_lit3)
                    DW_AT_abstract_origin       (0x00000055 "p")
------------

Please note that DW_OP_explicit_pointer denotes that following value represents de-referenced value of optimized out pointer. With necessary changes in LLDB debugger this dwarf info can help to detect the explicit de-referenced value of 'p'.

Hi David,

Should we keep on working for the above case separately and resume the review of implicit pointer independently now, which is updated with many suggestions from this discussion?

Regards,
Alok


On Wed, Nov 20, 2019 at 11:24 PM Jeremy Morse <[hidden email]> wrote:
Hi,

For a new way of representing things,

Adrian wrote:
> llvm.dbg.value_new(DILocalVariable("y"), DIExpression(DW_OP_LLVM_arg0, DW_OP_LLVM_arg1, DW_OP_plus),
>                    %ptr, %ofs)

I think this would be great -- there're definitely some constructs
created by the induction-variables pass and similar where one could
recover an implicit variable value, if you could for example subtract
one pointer from another.

With the current model of storing DIExpressions as a vector of
opcodes, it might become a pain to salvage a Value that gets optimised
out --in the example, if %ofs were salvaged, presumably
DW_OP_LLVM_arg1 could have to be replaced with several extra
operations. This isn't insurmountable, but I've repeatedly shied away
from scanning through DIExpressions to patch them up. A vector of
opcodes is the final output of the compiler, IMHO richer metadata
should be used in the meantime.

IMHO the implicit pointer work doesn't need to block on this. As said
my mild preference would be for a new intrinsic for this form of
variable location.

~

Inre PR37682,

> I’ve been reminded of PR37682, where a function with a reference parameter might spend all its time computing the “referenced” value in a temp, and only move the final value back to the referenced object at the end.  This is clearly a situation that could benefit from DW_OP_implicit_pointer, and there is really no other-object DIE for it to refer to.  Given the current spec, the compiler would need to produce a DW_TAG_dwarf_procedure for the parameter DIE to refer to.  Appendix D (Figure D.61) has an example of this construction, although it’s a more contrived source example.

This has been working through my mind too, and I think it's slightly
different to what implicit_pointer is trying to achieve. In the case
implicit_pointer is designed for, it's a strict improvement in debug
experience because you're recovering information that couldn't be
expressed. However for PR37682 it's a trade-off between whether the
user might want to examine the pointer, or the pointed-at integer:
AFAIUI, we can only express one of the two, not both. Wheras for
mem2reg'd variables referred to by DIE, there is never a pointer to be
lost.

I think my preference would always be to see temporarily-promoted
values as there's no other way of observing them, but others might
disagree.

--
Thanks,
Jeremy
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
12