[llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
Hi Rui,

The motivation would be primarily that LLVM/Clang/LLD are community projects such that if I or someone in the community added support for e.g. symbol aliases, then it could be reviewed and potentially merged. ld64 on the other hand does not have a community process for patch submission and code review that I am aware of so its unlikely that if someone from the community came up with a patch to support aliases that it would be merged.

In that case I might check out the LLD code and try linking “x86_64-xnu-musl” with it. My requirements are likely simpler than Apple’s however I do need symbol aliases and these are not supported by ld64. The linker synthesised symbols are likely not too difficult to add if they are not present… now on my to do list…

Michael.

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev
From the top of ld64’s ld.cpp

// start temp HACK for cross builds
extern "C" double log2 ( double );
//#define __MATH__
// end temp HACK for cross builds

and a bit further down

//fprintf(stderr, "FinalSection(%16s, %16s) _segmentOrder=%3d, _sectionOrder=0x%08X\n",
// this->segmentName(), this->sectionName(), _segmentOrder, _sectionOrder);

and a bit further down again ld64 uses qsort instead of std::sort

//fprintf(stderr, "UNSORTED final sections:\n");
//for (std::vector<ld::Internal::FinalSection*>::iterator it = sections.begin(); it != sections.end(); ++it) {
// fprintf(stderr, "final section %p %s/%s\n", (*it), (*it)->segmentName(), (*it)->sectionName());
//}
qsort(&sections[0], sections.size(), sizeof(FinalSection*), &InternalState::FinalSection::sectionComparer);
//fprintf(stderr, "SORTED final sections:\n");
//for (std::vector<ld::Internal::FinalSection*>::iterator it = sections.begin(); it != sections.end(); ++it) {
// fprintf(stderr, "final section %p %s/%s\n", (*it), (*it)->segmentName(), (*it)->sectionName());
//}

I doubt that would pass the LLVM projects’ code review. I could also raise an issue or fix this ld64 bug if LLD mach-o was supported and it was in the LLVM bugzilla: “Invalid zero page virtual address when linking with -static -image_base 0x7ffe00000000”.


I’m not using Objective-C so LLD may well fit my purposes. Now determined to try out LLD on macos :-D

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev
OK. I see that the Mach-O linker is not even built when LLD is enabled in Release_40, only the PE/COFF and ELF linkers are built.

From looking at reviews it appears that Clang was able to be linked with LLD on Darwin about 2 years ago, so Mach-O support seems to have regressed.

Curious as to pointers to primordial branches with whatever needs to be resurrected. I couldn’t find any Mach-O cmake flags to enable its build. A pointer to a branch or tag that might have a working Mach-O LLD would be a start.

On 7 Jun 2017, at 11:38 AM, Michael Clark <[hidden email]> wrote:

Hi Rui,

The motivation would be primarily that LLVM/Clang/LLD are community projects such that if I or someone in the community added support for e.g. symbol aliases, then it could be reviewed and potentially merged. ld64 on the other hand does not have a community process for patch submission and code review that I am aware of so its unlikely that if someone from the community came up with a patch to support aliases that it would be merged.

In that case I might check out the LLD code and try linking “x86_64-xnu-musl” with it. My requirements are likely simpler than Apple’s however I do need symbol aliases and these are not supported by ld64. The linker synthesised symbols are likely not too difficult to add if they are not present… now on my to do list…

Michael.

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
On Tue, Jun 6, 2017 at 11:14 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
OK. I see that the Mach-O linker is not even built when LLD is enabled in Release_40, only the PE/COFF and ELF linkers are built.

From looking at reviews it appears that Clang was able to be linked with LLD on Darwin about 2 years ago, so Mach-O support seems to have regressed.

Only a few changes have been made to the Mach-O port in the last two years, so I'd doubt if it has regressed. It could be the case that clang's output has changed in such a way that the linker is not able to handle it.
 
Curious as to pointers to primordial branches with whatever needs to be resurrected. I couldn’t find any Mach-O cmake flags to enable its build. A pointer to a branch or tag that might have a working Mach-O LLD would be a start.


On 7 Jun 2017, at 11:38 AM, Michael Clark <[hidden email]> wrote:

Hi Rui,

The motivation would be primarily that LLVM/Clang/LLD are community projects such that if I or someone in the community added support for e.g. symbol aliases, then it could be reviewed and potentially merged. ld64 on the other hand does not have a community process for patch submission and code review that I am aware of so its unlikely that if someone from the community came up with a patch to support aliases that it would be merged.

In that case I might check out the LLD code and try linking “x86_64-xnu-musl” with it. My requirements are likely simpler than Apple’s however I do need symbol aliases and these are not supported by ld64. The linker synthesised symbols are likely not too difficult to add if they are not present… now on my to do list…

Michael.

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev

On 8 Jun 2017, at 4:53 AM, Rui Ueyama <[hidden email]> wrote:

On Tue, Jun 6, 2017 at 11:14 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
OK. I see that the Mach-O linker is not even built when LLD is enabled in Release_40, only the PE/COFF and ELF linkers are built.

From looking at reviews it appears that Clang was able to be linked with LLD on Darwin about 2 years ago, so Mach-O support seems to have regressed.

Only a few changes have been made to the Mach-O port in the last two years, so I'd doubt if it has regressed. It could be the case that clang's output has changed in such a way that the linker is not able to handle it.

That’s actually good news!

If there is a Mach-O linker that is able to self host Clang builds on macOS, then this is a really good starting point.

From reading a tiny bit about the history, and the LLVM pages on the design of the various linkers, it seems like there is a difference in opinion with respect to the Atom based design of the Mach-O LLD, and whether or not there was to be an abstract design that supports ELF, PE/COFF and Mach-O. It seems not. One would also assume that LTO and/or -ffunction-sections -fdata-sections would obviate the need for Atoms, and that it may in fact increase the complexity of the linker.

From my cursory examination of the source it seems that lld/lib should perhaps be renamed lld/MachO and become the MachO linker besides the ELF and COFF directproes as the common code is not being used by the ELF and the PE/COFF linkers.

I just need to figure out how to build and invoke the Mach-O linker. There is no ‘ld’ in the llvm bin directory as one would be led to believe. I’ll dig into the CMakeLists.txt. I guess lld/lib//Driver/DarwinLdDriver.cpp is the entry point. lld//lib/Driver/CMakeLists.txt however only appears to define a library, versus an executable and there is no top level MachO directory like there is for the other 2 linkers.

$ lld
lld is a generic driver.
Invoke ld.lld (Unix), ld (Mac) or lld-link (Windows) instead.

$ ld.lld --version
LLD 4.0.0

$ lld-link --version
ignoring unknown argument: --version
error: no input files

If I know which CMakeLists.txt defines the binary that hosts the main function and installs it, then I can take it from there.

Curious as to pointers to primordial branches with whatever needs to be resurrected. I couldn’t find any Mach-O cmake flags to enable its build. A pointer to a branch or tag that might have a working Mach-O LLD would be a start.


On 7 Jun 2017, at 11:38 AM, Michael Clark <[hidden email]> wrote:

Hi Rui,

The motivation would be primarily that LLVM/Clang/LLD are community projects such that if I or someone in the community added support for e.g. symbol aliases, then it could be reviewed and potentially merged. ld64 on the other hand does not have a community process for patch submission and code review that I am aware of so its unlikely that if someone from the community came up with a patch to support aliases that it would be merged.

In that case I might check out the LLD code and try linking “x86_64-xnu-musl” with it. My requirements are likely simpler than Apple’s however I do need symbol aliases and these are not supported by ld64. The linker synthesised symbols are likely not too difficult to add if they are not present… now on my to do list…

Michael.

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev

On 8 Jun 2017, at 9:46 AM, Michael Clark <[hidden email]> wrote:


On 8 Jun 2017, at 4:53 AM, Rui Ueyama <[hidden email]> wrote:

On Tue, Jun 6, 2017 at 11:14 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
OK. I see that the Mach-O linker is not even built when LLD is enabled in Release_40, only the PE/COFF and ELF linkers are built.

From looking at reviews it appears that Clang was able to be linked with LLD on Darwin about 2 years ago, so Mach-O support seems to have regressed.

Only a few changes have been made to the Mach-O port in the last two years, so I'd doubt if it has regressed. It could be the case that clang's output has changed in such a way that the linker is not able to handle it.

That’s actually good news!

If there is a Mach-O linker that is able to self host Clang builds on macOS, then this is a really good starting point.

From reading a tiny bit about the history, and the LLVM pages on the design of the various linkers, it seems like there is a difference in opinion with respect to the Atom based design of the Mach-O LLD, and whether or not there was to be an abstract design that supports ELF, PE/COFF and Mach-O. It seems not. One would also assume that LTO and/or -ffunction-sections -fdata-sections would obviate the need for Atoms, and that it may in fact increase the complexity of the linker.

From my cursory examination of the source it seems that lld/lib should perhaps be renamed lld/MachO and become the MachO linker besides the ELF and COFF directproes as the common code is not being used by the ELF and the PE/COFF linkers.

besides the ELF and COFF directories (eyesight and spell checker fail).

I just need to figure out how to build and invoke the Mach-O linker. There is no ‘ld’ in the llvm bin directory as one would be led to believe. I’ll dig into the CMakeLists.txt. I guess lld/lib//Driver/DarwinLdDriver.cpp is the entry point. lld//lib/Driver/CMakeLists.txt however only appears to define a library, versus an executable and there is no top level MachO directory like there is for the other 2 linkers.

$ lld
lld is a generic driver.
Invoke ld.lld (Unix), ld (Mac) or lld-link (Windows) instead.

$ ld.lld --version
LLD 4.0.0

$ lld-link --version
ignoring unknown argument: --version
error: no input files

I read the source. It seems the ‘ld’ symlink for the Mach-O linker is not being created for some reason. The ld.lld and llld-link symlinks are created correctly for the ELF and PE/COFF linkers.

In any case, I have the Mach-O linker invoking now. I will do some testing…

$ ln -s lld ld
$ ./ld 
OVERVIEW: LLVM Linker

USAGE: ./ld [options] <inputs>

BUNDLE EXECUTABLE OPTIONS:
  -bundle_loader <path> The executable that will be loading this Mach-O bundle

DYLIB EXECUTABLE OPTIONS:
  -compatibility_version <version>
                       The dylib's compatibility version
  -current_version <version>
                       The dylib's current version
  -install_name <path> The dylib's install name
  -mark_dead_strippable_dylib
                       Marks the dylib as having no side effects during initialization

LIBRARY OPTIONS:
  -all_load         Forces all members of all static libraries to be loaded
  -force_load <library-path>
                    Forces all members of specified static libraries to be loaded
  -F <dir>          Add directory to framework search path
  -L <dir>          Add directory to library search path
  -syslibroot <dir> Add path to SDK to all absolute library search paths

MAIN EXECUTABLE OPTIONS:
  -export_dynamic     Preserves all global symbols in main executables during LTO
  -e <entry-name>     entry symbol name
  -no_pie             Do not create Position Independent Executable
  -pie                Create Position Independent Executable (for ASLR)
  -stack_size <value> Specifies the maximum stack size for the main thread in a program. Must be a page-size multiple. (default=8Mb)

OBSOLETE OPTIONS:
  -multi_module       Unsupported way to build dylibs
  -objc_gc_compaction Unsupported ObjC GC option
  -objc_gc_only       Unsupported ObjC GC option
  -objc_gc            Unsupported ObjC GC option
  -single_module      Default for dylibs

OPTIMIZATIONS:
  -data_in_code_info      Force generation of a data in code load command
  -dead_strip             Remove unreference code and data
  -exported_symbols_list <file-path>
                          Restricts which symbols will be exported
  -exported_symbol <symbol>
                          Restricts which symbols will be exported
  -flat_namespace         Resolves symbols in any (transitively) linked dynamic libraries. Source libraries are not recorded: dyld will re-search all images at runtime and use the first definition found.
  -function_starts        Force generation of a function starts load command
  -ios_simulator_version_min <version>
                          Minimum iOS simulator version
  -ios_version_min <version>
                          Minimum iOS version
  -keep_private_externs   Private extern (hidden) symbols should not be transformed into local symbols
  -macosx_version_min <version>
                          Minimum Mac OS X version
  -mllvm <option>         Options to pass to LLVM during LTO
  -no_data_in_code_info   Disable generation of a data in code load command
  -no_function_starts     Disable generation of a function starts load command
  -no_objc_category_merging
                          Disables the optimisation which merges Objective-C categories on a class in to the class itself.
  -no_version_load_command
                          Disable generation of a version load command
  -order_file <file-path> re-order and move specified symbols to start of their section
  -sdk_version <version>  SDK version
  -source_version <version>
                          Source version
  -twolevel_namespace     Resolves symbols in listed libraries only. Source libraries are recorded in the symbol table.
  -undefined <undefined>  Determines how undefined symbols are handled.
  -unexported_symbols_list <file-path>
                          Lists symbols that should not be exported
  -unexported_symbol <symbol>
                          A symbol which should not be exported
  -version_load_command   Force generation of a version load command

OPTIONS:
  -arch <arch-name>       Architecture to link
  -demangle               Demangles symbol names in errors and warnings
  -dependency_info <file> Write binary list of files used during link
  -filelist <path>        file containing paths to input files
  -framework <name>       Base name of framework searched for in -F directories
  -l<libname>             Base name of library searched for in -L directories
  -o <path>               Output file path
  -path_exists <path>     Used with -test_file_usage to declare a path
  -print_atoms            Emit output as yaml atoms
  -rpath <path>           Add path to the runpath search path list for image being created
  -sectalign <segname> <sectname> <alignment>
                          Alignment for segment/section
  -sectcreate <segname> <sectname> <file>
                          Create section <segname>/<sectname> from contents of <file>
  -S                      Remove debug information (STABS or DWARF) from the output file
  -test_file_usage        Only files specified by -file_exists are considered to exist. Print which files would be used
  -t                      Print the names of the input files as ld processes them
  -upward-l<libname>      Base name of upward library searched for in -L directories
  -upward_framework <name>
                          Base name of upward framework searched for in -F directories
  -upward_library <path>  path to upward dylib to link with
  -v                      Print linker information
  -Z                      Do not search standard directories for libraries or frameworks

OUTPUT KIND:
  -bundle  Create dynamic bundle
  -dylib   Create dynamic library
  -dynamic Create dynamic executable (default)
  -execute Create main executable (default)
  -preload Create binary for use with embedded systems
  -r       Create relocatable object file
  -static  Create static executable


If I know which CMakeLists.txt defines the binary that hosts the main function and installs it, then I can take it from there.

Curious as to pointers to primordial branches with whatever needs to be resurrected. I couldn’t find any Mach-O cmake flags to enable its build. A pointer to a branch or tag that might have a working Mach-O LLD would be a start.


On 7 Jun 2017, at 11:38 AM, Michael Clark <[hidden email]> wrote:

Hi Rui,

The motivation would be primarily that LLVM/Clang/LLD are community projects such that if I or someone in the community added support for e.g. symbol aliases, then it could be reviewed and potentially merged. ld64 on the other hand does not have a community process for patch submission and code review that I am aware of so its unlikely that if someone from the community came up with a patch to support aliases that it would be merged.

In that case I might check out the LLD code and try linking “x86_64-xnu-musl” with it. My requirements are likely simpler than Apple’s however I do need symbol aliases and these are not supported by ld64. The linker synthesised symbols are likely not too difficult to add if they are not present… now on my to do list…

Michael.

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev

On 8 Jun 2017, at 10:39 AM, Michael Clark <[hidden email]> wrote:

In any case, I have the Mach-O linker invoking now. I will do some testing…

So the linker synthesised symbols don’t exist in the currently Mach-O LLD linker. They are currently used in Apple code so they’ll eventually be necessary. They are also critical to solve some problems like finding the mach headers from early CRT code if you are not dyld due to the kernel start protocol. i.e. necessary to support static PIE + ASLR which I have got working btw.

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

I can see a problem with the design of the Mach-O linker.

Based on the design it appears that the Resolver is supposedly generic code but it is in fact practically just the Mach-O resolver.

We would need the context of the Mach-O linked file to synthesise the Mach-O specific symbols inside of Resolver. In a generic or abstract design we could create an interface that is implemented by the Mach-O driver to allow it to register against the Resolver to resolve these synthesised symbols to the “generic” core. However if this is really just the Mach-O resolver, then it would be substantially simpler to give the resolver the context of the Mach-O file and Mach-O’isms could be use directly. It seems this was the direction of the ELF and PE/COFF linkers. It might make things much simpler to remove generic layers that hinder object file specific quirks. From reading about the ELF and PE/COFF linkers, it seems this allows them to easier optimise for their target. Generic abstractions likely get in the way.

These are the issues I found

- unimplemented pagezero option argument e.g. -pagezero_size 0x1000
- unimplemented segaddr option argument e.g. -segaddr __TEXT 0x7ffe00000000
- looking for dyld_stub_binder when static linking
- unimplemented synthesised symbols segment$(start|end)$__SEGMENT
- unimplemented synthesised symbols section$(start|end)$__SEGMENT$__section

I can work around the inability to set explicit segment addresses by using -image_base. I had used explicit segment addresses due to a bug with -image-base in ld64, where the -pagezero size is erroneously added to the image base. i.e. -pagezero_size 0x1000 -image-base 0x7ffe00000000 would give an  0x7ffe00000000 to __PAGEZERO and x7ffe00001000 for __TEXT. This seems to be related to the issue where the zero page is incorrectly assigned the virtual address of the image base instead of 0 which my post link tool addresses. I would have tried to fix ld64 but the upstream sources would not build. They likely depend on some internal Apple build setup. So this is how I solved it:


In any case, a working linker is a good starting point…

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev
It seems I can find the static offset of the Mach-O header pre-initialisation in the crt without using the special dynamic linker synthesised symbols, rather a statically synthesised symbol that I was previously unaware of “ __mh_execute_header". I later add the slide to find the dynamic offset of the Mach-O headers.

.align 3
__image_base:
.quad __mh_execute_header

I find the slide by subtracting a static pointer to a well known symbol from an RIP-relative access to the same symbol. 

__start_static:
.quad start

leaq start(%rip), %rdx
subq __start_static(%rip), %rdx

The crt then gets the stack pointer, static image base and slide, so it can relocate the image and call constructors.

void _start_c(long *p, uintptr_t image_base, uintptr_t slide)

I’m not sure about the second use case for the start and end of the “__mod_init_func” section, which would likely be required for linking dyld.

On 7 Jun 2017, at 11:08 AM, Michael Clark <[hidden email]> wrote:

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev


On Wed, Jun 7, 2017 at 2:46 PM, Michael Clark via llvm-dev <[hidden email]> wrote:

On 8 Jun 2017, at 4:53 AM, Rui Ueyama <[hidden email]> wrote:

On Tue, Jun 6, 2017 at 11:14 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
OK. I see that the Mach-O linker is not even built when LLD is enabled in Release_40, only the PE/COFF and ELF linkers are built.

From looking at reviews it appears that Clang was able to be linked with LLD on Darwin about 2 years ago, so Mach-O support seems to have regressed.

Only a few changes have been made to the Mach-O port in the last two years, so I'd doubt if it has regressed. It could be the case that clang's output has changed in such a way that the linker is not able to handle it.

That’s actually good news!

If there is a Mach-O linker that is able to self host Clang builds on macOS, then this is a really good starting point.

From reading a tiny bit about the history, and the LLVM pages on the design of the various linkers, it seems like there is a difference in opinion with respect to the Atom based design of the Mach-O LLD, and whether or not there was to be an abstract design that supports ELF, PE/COFF and Mach-O. It seems not. One would also assume that LTO and/or -ffunction-sections -fdata-sections would obviate the need for Atoms, and that it may in fact increase the complexity of the linker.

InputSection (ELF) and Chunk (COFF) are basically "atoms". The main technical obstacle to using atoms for ELF and COFF is that the atom model used in the original linker design assumes a 1:1 mapping of symbols to "atoms" (and the symbol points to the start of the atom). In ELF and COFF symbols and InputSection/Chunk's are decoupled because you can have multiple symbols pointing anywhere in the InputSection/Chunk.

 

From my cursory examination of the source it seems that lld/lib should perhaps be renamed lld/MachO and become the MachO linker besides the ELF and COFF directproes as the common code is not being used by the ELF and the PE/COFF linkers.

The original linker design used by MachO had greater ambitions than the current ELF and COFF designs. It was more aiming for the sort of linker model explained in Paul Bowen-Hugett's talk at the 2016 LLVM developer meeting https://youtu.be/-pL94rqyQ6c?t=20m29s

One way to think about this is that ld64 is already a fast linker that is controlled by the people that work on LLD at Apple. So there isn't much incentive to do what ELF and COFF have done which is at this point getting to a production quality linker program. The original hope for LLD was to go beyond ld64's capabilities to enable new and interesting use cases (see the talk I linked above for some examples). However, for ELF and COFF there wasn't a linker controlled by the LLVM community for those platforms, and so merely reimplementing existing linker programs (with some extra attention to QoI and being modern) was an interesting enough goal in and of itself to push their development. The LLVM community did not want to wait to get ELF and COFF working pending the materialization of next generation linker use cases; simply meeting the requirements of existing linker uses cases was sufficient.

(Note: we still aren't aware of any concrete analysis or experiment demonstrating real benefit to these "next generation linker use cases"; many of them seem quite interesting, but under closer inspection there are a lot of issues that haven't been fully explored)

Anyway, that was a very long way of saying that the MachO linker is actually a very different design and even source organization (it was intended to be factored along certain library boundaries, but we haven't seen any uses cases that would use that), so that moving it to lld/MachO doesn't really make much sense.

 

I just need to figure out how to build and invoke the Mach-O linker. There is no ‘ld’ in the llvm bin directory as one would be led to believe. I’ll dig into the CMakeLists.txt. I guess lld/lib//Driver/DarwinLdDriver.cpp is the entry point. lld//lib/Driver/CMakeLists.txt however only appears to define a library, versus an executable and there is no top level MachO directory like there is for the other 2 linkers.

$ lld
lld is a generic driver.
Invoke ld.lld (Unix), ld (Mac) or lld-link (Windows) instead.

$ ld.lld --version
LLD 4.0.0

$ lld-link --version
ignoring unknown argument: --version
error: no input files

If I know which CMakeLists.txt defines the binary that hosts the main function and installs it, then I can take it from there.

You can see the logic that it uses in lld/tools/lld/lld.cpp

To access the MachO linker, you will want to either run `lld -flavor darwin ...` or invoke lld through a symlink such that argv[0] is `ld` (this is only enabled when LLD is compiled to run on an Apple host machine (#if __APPLE__)).

I guess we could install an `ld64` symlink to access the MachO linker, but the actual system linker on macOS is never actually invoked via the name `ld64` (that's just a name for the linker itself; not the binary; the binary is always `ld`).

-- Sean Silva
 

Curious as to pointers to primordial branches with whatever needs to be resurrected. I couldn’t find any Mach-O cmake flags to enable its build. A pointer to a branch or tag that might have a working Mach-O LLD would be a start.


On 7 Jun 2017, at 11:38 AM, Michael Clark <[hidden email]> wrote:

Hi Rui,

The motivation would be primarily that LLVM/Clang/LLD are community projects such that if I or someone in the community added support for e.g. symbol aliases, then it could be reviewed and potentially merged. ld64 on the other hand does not have a community process for patch submission and code review that I am aware of so its unlikely that if someone from the community came up with a patch to support aliases that it would be merged.

In that case I might check out the LLD code and try linking “x86_64-xnu-musl” with it. My requirements are likely simpler than Apple’s however I do need symbol aliases and these are not supported by ld64. The linker synthesised symbols are likely not too difficult to add if they are not present… now on my to do list…

Michael.

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] LLD support for ld64 mach-o linker synthesised symbols

Hal Finkel via llvm-dev

On 8 Jun 2017, at 3:30 PM, Sean Silva <[hidden email]> wrote:



On Wed, Jun 7, 2017 at 2:46 PM, Michael Clark via llvm-dev <[hidden email]> wrote:

On 8 Jun 2017, at 4:53 AM, Rui Ueyama <[hidden email]> wrote:

On Tue, Jun 6, 2017 at 11:14 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
OK. I see that the Mach-O linker is not even built when LLD is enabled in Release_40, only the PE/COFF and ELF linkers are built.

From looking at reviews it appears that Clang was able to be linked with LLD on Darwin about 2 years ago, so Mach-O support seems to have regressed.

Only a few changes have been made to the Mach-O port in the last two years, so I'd doubt if it has regressed. It could be the case that clang's output has changed in such a way that the linker is not able to handle it.

That’s actually good news!

If there is a Mach-O linker that is able to self host Clang builds on macOS, then this is a really good starting point.

From reading a tiny bit about the history, and the LLVM pages on the design of the various linkers, it seems like there is a difference in opinion with respect to the Atom based design of the Mach-O LLD, and whether or not there was to be an abstract design that supports ELF, PE/COFF and Mach-O. It seems not. One would also assume that LTO and/or -ffunction-sections -fdata-sections would obviate the need for Atoms, and that it may in fact increase the complexity of the linker.

InputSection (ELF) and Chunk (COFF) are basically "atoms". The main technical obstacle to using atoms for ELF and COFF is that the atom model used in the original linker design assumes a 1:1 mapping of symbols to "atoms" (and the symbol points to the start of the atom). In ELF and COFF symbols and InputSection/Chunk’s are decoupled because you can have multiple symbols pointing anywhere in the InputSection/Chunk.

Interesting.

I will take a closer look at the Atom model.

So a symbol has to be able to map to Atom + Addend or indirectly (N_INDR) to another symbol name, with strong exports taking precedence over weak. I’m assuming the Atom model supports indirect symbol references. I will see if I can generate some.

From my cursory examination of the source it seems that lld/lib should perhaps be renamed lld/MachO and become the MachO linker besides the ELF and COFF directproes as the common code is not being used by the ELF and the PE/COFF linkers.

The original linker design used by MachO had greater ambitions than the current ELF and COFF designs. It was more aiming for the sort of linker model explained in Paul Bowen-Hugett's talk at the 2016 LLVM developer meeting https://youtu.be/-pL94rqyQ6c?t=20m29s

Thanks for the link.

The idea of just regenerating changed fragments within source is an interesting if somewhat lofty goal. Reminds me of Merkle hash trees.

However as you point out, there is raw asm and branch labels in asm (presumably inside of Atoms) which are also symbols so while a distinct Atom link might still be consistent, Atom + addend referenced externally in other (inline) asm as an expression in terms of labels may no longer be valid so other fragments (with inline asm) may become invalid. The dependency graph would need to account for symbols that were expressed in asm as expressions between symbols. e.g.

atom1:
..
.L1
..

atom2:
..
.L2
..

atom3:
.quad .L2 - .L1

One way to think about this is that ld64 is already a fast linker that is controlled by the people that work on LLD at Apple. So there isn't much incentive to do what ELF and COFF have done which is at this point getting to a production quality linker program. The original hope for LLD was to go beyond ld64's capabilities to enable new and interesting use cases (see the talk I linked above for some examples). However, for ELF and COFF there wasn't a linker controlled by the LLVM community for those platforms, and so merely reimplementing existing linker programs (with some extra attention to QoI and being modern) was an interesting enough goal in and of itself to push their development. The LLVM community did not want to wait to get ELF and COFF working pending the materialization of next generation linker use cases; simply meeting the requirements of existing linker uses cases was sufficient.

(Note: we still aren't aware of any concrete analysis or experiment demonstrating real benefit to these "next generation linker use cases"; many of them seem quite interesting, but under closer inspection there are a lot of issues that haven't been fully explored)

Anyway, that was a very long way of saying that the MachO linker is actually a very different design and even source organization (it was intended to be factored along certain library boundaries, but we haven’t seen any uses cases that would use that), so that moving it to lld/MachO doesn't really make much sense.


So it probably does make sense for the ELF and PE/COFF linkers to use the Atom model some time in the future, given the Atom model is updated to correctly model indirect references and addends, etc. I will take a look…


I just need to figure out how to build and invoke the Mach-O linker. There is no ‘ld’ in the llvm bin directory as one would be led to believe. I’ll dig into the CMakeLists.txt. I guess lld/lib//Driver/DarwinLdDriver.cpp is the entry point. lld//lib/Driver/CMakeLists.txt however only appears to define a library, versus an executable and there is no top level MachO directory like there is for the other 2 linkers.

$ lld
lld is a generic driver.
Invoke ld.lld (Unix), ld (Mac) or lld-link (Windows) instead.

$ ld.lld --version
LLD 4.0.0

$ lld-link --version
ignoring unknown argument: --version
error: no input files

If I know which CMakeLists.txt defines the binary that hosts the main function and installs it, then I can take it from there.

You can see the logic that it uses in lld/tools/lld/lld.cpp

To access the MachO linker, you will want to either run `lld -flavor darwin ...` or invoke lld through a symlink such that argv[0] is `ld` (this is only enabled when LLD is compiled to run on an Apple host machine (#if __APPLE__)).

I guess we could install an `ld64` symlink to access the MachO linker, but the actual system linker on macOS is never actually invoked via the name `ld64` (that's just a name for the linker itself; not the binary; the binary is always `ld`).

-- Sean Silva
 

Curious as to pointers to primordial branches with whatever needs to be resurrected. I couldn’t find any Mach-O cmake flags to enable its build. A pointer to a branch or tag that might have a working Mach-O LLD would be a start.


On 7 Jun 2017, at 11:38 AM, Michael Clark <[hidden email]> wrote:

Hi Rui,

The motivation would be primarily that LLVM/Clang/LLD are community projects such that if I or someone in the community added support for e.g. symbol aliases, then it could be reviewed and potentially merged. ld64 on the other hand does not have a community process for patch submission and code review that I am aware of so its unlikely that if someone from the community came up with a patch to support aliases that it would be merged.

In that case I might check out the LLD code and try linking “x86_64-xnu-musl” with it. My requirements are likely simpler than Apple’s however I do need symbol aliases and these are not supported by ld64. The linker synthesised symbols are likely not too difficult to add if they are not present… now on my to do list…

Michael.

On 7 Jun 2017, at 11:30 AM, Rui Ueyama <[hidden email]> wrote:

Hi Michael,

The Mach-O version of LLD is not being developed actively, and if some feature is missing, it is likely that it's just not implemented. What is your motivation to use LLD instead of ld64?

On Tue, Jun 6, 2017 at 4:08 PM, Michael Clark via llvm-dev <[hidden email]> wrote:
Hi Folks,

I have a question regarding LLD support for ld64 mach-o linker synthesised symbols. I did a quick search of the LLD source and I can not find support for them so before I start trying to use lld I thought I would ask.

I have found a couple of cases where they are essential. i.e. where there is no other way to get the required information, such as getting the address of the mach-o headers of the current process, with ASLR enabled, if the process is not dyld as exec on macOS only provides the mach header address to dyld (*1). They are used inside of dyld and I am now using them in “x86_64-xnu-musl”.

It’s possible to resolve a mach-o segment offset or a mach-o section offset using these special ld64 linker synthesised symbols. See resolveUndefines:


There are 4 special symbol prefixes for the mach-o linker synthesised symbols:

- segment$start$__SEGMENT
- segment$end$__SEGMENT
- section$start$__SEGMENT$__section
- section$end$__SEGMENT$__section

In asm:

/* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */

.align 3
__image_base:
.quad segment$start$__TEXT
__start_static:
.quad start
.text
.align 3
.global start
start:
       xor %rbp,%rbp
       mov %rsp,%rdi
       andq $-16,%rsp
       movq __image_base(%rip), %rsi
       leaq start(%rip), %rdx
       subq __start_static(%rip), %rdx
       call __start_c

In C:

/* run C++ constructors in __libc_start_main for x86_64-xnu-musl */

typedef void (*__init_fn)(int, char **, char **, char **);
extern __init_fn  __init_start  __asm("section$start$__DATA$__mod_init_func");
extern __init_fn  __init_end    __asm("section$end$__DATA$__mod_init_func”);

static void __init_mod(int argc, char **argv, char **envp, char **applep)
{
        for (__init_fn *p = &__init_start; p < &__init_end; ++p) {
                (*p)(argc, argv, envp, applep);
        }
}

Michael.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Loading...