[llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
This RFC describes pagerando, an improvement upon ASLR for shared
libraries. We're planning to submit this work for upstreaming and
would appreciate feedback before we get to the patch submission stage.

Pagerando randomizes the location of individual memory pages (ASLR
only randomizes the library base address). This increases security
against code-reuse attacks (such as ROP) by tolerating pointer leaks.
Pagerando splits libraries into page-aligned bins at compile time. At
load time, each bin is mapped to a random address. The code in each
bin is immutable and thus shared between processes.

To implement pagerando, the compiler and linker need to build shared
libraries with text segments split into page-aligned (and ideally
page-sized) bins. All inter-bin references are indirected through a
table initialized by the dynamic loader that holds the absolute
address of each bin. At load time the loader randomly chooses an
address for each bin and maps the bin pages from disk into memory.

We're focusing on ARM and AArch64 initially, although there is nothing
particularly target specific that precludes support for other LLVM
backends.

## Design Goals

1. Improve security over ASLR. The randomization granularity
determines how much information a single code pointer leaks. A pointer
to a page reveals less about the location of other code than a pointer
into a contiguous library would.
2. Avoid randomizing files on disk. Modern operating systems provide
verified boot techniques to detect tampering with files. Randomizing
the on-disk layout of system libraries would interfere with the
trusted boot process. Randomizing libraries at compile or link time
would also needlessly complicate deployment and provisioning.
3. Preserve code page sharing. The OS reduces memory usage by mapping
shared file pages to the same physical memory in each process and
locates these pages at different virtual addresses with ASLR. To
preserve sharing of code pages, we cannot modify the contents of
file-mapped pages at load time and are restricted to changing their
ordering and placement in the virtual address space.
4. Backwards compatibility. Randomized code must interoperate
transparently with existing, unmodified executables and shared
libraries. Calls into randomized code must work as-is according to the
normal ABI.
5. Compatibility with other mitigations. Enabling randomization must
not preclude deploying other mitigations such as control-flow
integrity as well.

## Pagerando Design

Pagerando requires a platform-specific extension to the dynamic
loading ABI for compatible libraries to opt-in to. In order to
decouple the address of each code bin (segment) from that of other
bins and global data, we must disallow relative addressing between
different bin segments as well as between legacy segments and bin
segments.

To prepare a library for pagerando, the compiler must first allocate
functions into page-aligned bins corresponding to segments in the
final ELF file. Since these bins will be independently positioned, the
compiler must redirect all inter-bin references through an indirection
table – the Page Offset Table (POT) – which stores the virtual address
of each bin in the library. Indices of POT entries and bin offsets are
statically determined at link time so code will not require any
dynamic relocations to reference functions in another bin or globals
outside of bins. We reserve a register in pagerando-compatible code to
hold the address of the POT. This register is initialized on entry to
the shared library. At load time the dynamic loader maps code bins at
independent, random addresses and updates the dynamic relocations in
the POT.

Reserving a register to hold the POT address changes the internal ABI
calling convention and requires that the POT register be correctly
initialized when entering a library from external code. To initialize
the register, the compiler emits entry wrappers which save the old
contents of the POT register if necessary, initialize the POT
register, and call the target function. Each externally visible
function (conservatively including all address taken functions) needs
an entry wrapper which replaces the function for all external uses.

To optimally pack functions into bins and avoid new static
relocations, we propose using (traditional) LTO. With new static
relocations (i.e. linker cooperation), LTO would not be necessary, but
it is still desirable for more efficient bin packing.

The design of pagerando is based on the mitigations proposed by Backes
and Nürnberger [1], with improvements for compatibility and
deployability. The present design is a refinement of our first
pagerando prototype [2].

## LLVM Changes

To implement pagerando, we propose the following LLVM changes:

New module pass to create entry wrapper functions. This pass will
create entry wrappers as described above and replace exported function
names and all address taken uses with the wrapper. This pass will only
be run when pagerando is enabled.

Instruction Lowering. Pagerando-compatible code must access all global
values (including functions) through the POT since PC-relative memory
addressing is not allowed between a bin and another segment. We
propose that when pagerando is enabled, all global variable accesses
from functions marked as pagerando-compatible must be lowered into
GOT-relative accesses and added to the GOT address loaded from the POT
(currently stored in the first POT entry). Lowering of direct function
calls targeting pagerando-compatible code is slightly more complicated
because we need to determine the POT index of the bin containing the
target function if the target is not in the same bin. However, we
can't properly allocate functions to bins before they are lowered and
an approximate size is available. Therefore, during lowering we should
assume that all function calls must be made indirectly through the POT
with the computation of the POT index and bin offset of the target
function postponed until assembly printing.

New machine module LTO pass to allocate functions into bins. This pass
relies on targets implementing TargetInstrInfo::getInstSizeInBytes
(MachineInstr) so that it knows (approximately) how large the final
function code will be. Functions can also be packed in such a way that
the number of inter-bin calls are minimized by taking the function
call graph and/or execution profiles into account while packing. This
pass only needs to run when pagerando is enabled.

Code Emission. After functions are assigned to bins, we create an
individual MCSection for each bin. These MCSections will map to
independent segments during linking. The AsmPrinter is responsible for
emitting the POT entries during code emission. We cannot easily
represent the POT as a standard IR object because it needs to contain
bin (MCSection) addresses. The AsmPrinter instead can query the
MCContext for the list of bin symbols and emit these symbols directly
into a global POT array.

Gold Plugin Interface. If using LTO to build the module, LLVM can
generate the complete POT for the module and instrument all references
that need to use the POT. However, we must still ensure that bin
sections are each placed into an independent segment so that the
dynamic loader can map each bin separately. The gold plugin interface
currently provides support to assign sections to unique output
segments. However, it does not yet provide plugins an opportunity to
call this interface for new, plugin-created input files. Gold requires
that the plugin provide the file handle of the input section to assign
a section to a unique segment. We will need to upstream a small patch
for gold that provides a new callback to the LTO plugin when gold
receives a new, plugin-generated input file. This would allow the
plugin to obtain the new file’s handle and map its sections to unique
segments. The linker must mark pagerando bin segments in such a way
that the dynamic loader knows that it can randomize each bin segment
independently. We propose a new ELF segment flag PF_RAND_ADDR that can
communicate this for each compatible segment. The compiler and/or
linker must add this flag to compatible segments for the loader to
recognize and randomize the relevant segments.

## Target-Specific Details

We will initially support pagerando for ARM and AArch64, so several
details are worth considering on those targets. For ARM/AArch64, the
r9 register is a platform-specific register that can be used as the
static base register, which is similar in many ways to pagerando. When
not specified by the platform, r9 is a callee-saved general-purpose
register. Thus, using r9 as the POT register will be backwards
compatible when calling out of pagerando code into either legacy code
or a different module; the callee will preserve r9 for use after
returning to pagerando code. In AArch64, r18 is designated as a
platform-specific register, however, it is not specified as
callee-saved when not reserved by the target platform. Thus, to
interoperate with unmodified legacy AArch64 software, we would need to
save r18 in pagerando code before calling into any external code. When
using LTO, the compiler will see the entire module and therefore be
able to identify calls into external vs internal code. Without LTO, it
will likely be more efficient to use a callee-saved register to avoid
the need to save the POT register before each call. We will experiment
with both caller- and callee-saved registers to determine which is
most efficient.


[1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
randomization practical by allowing code sharing. In USENIX Security
Symposium, 2014. https://www.usenix.org/node/184466

[2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
we solved this problem yet? In IEEE Cybersecurity Development
Conference (SecDev), 2016.
http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
<[hidden email]> wrote:

> This RFC describes pagerando, an improvement upon ASLR for shared
> libraries. We're planning to submit this work for upstreaming and
> would appreciate feedback before we get to the patch submission stage.
>
> Pagerando randomizes the location of individual memory pages (ASLR
> only randomizes the library base address). This increases security
> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
> Pagerando splits libraries into page-aligned bins at compile time. At
> load time, each bin is mapped to a random address. The code in each
> bin is immutable and thus shared between processes.
>
> To implement pagerando, the compiler and linker need to build shared
> libraries with text segments split into page-aligned (and ideally
> page-sized) bins. All inter-bin references are indirected through a
> table initialized by the dynamic loader that holds the absolute
> address of each bin. At load time the loader randomly chooses an
> address for each bin and maps the bin pages from disk into memory.
>
> We're focusing on ARM and AArch64 initially, although there is nothing
> particularly target specific that precludes support for other LLVM
> backends.
>
> ## Design Goals
>
> 1. Improve security over ASLR. The randomization granularity
> determines how much information a single code pointer leaks. A pointer
> to a page reveals less about the location of other code than a pointer
> into a contiguous library would.
> 2. Avoid randomizing files on disk. Modern operating systems provide
> verified boot techniques to detect tampering with files. Randomizing
> the on-disk layout of system libraries would interfere with the
> trusted boot process. Randomizing libraries at compile or link time
> would also needlessly complicate deployment and provisioning.
> 3. Preserve code page sharing. The OS reduces memory usage by mapping
> shared file pages to the same physical memory in each process and
> locates these pages at different virtual addresses with ASLR. To
> preserve sharing of code pages, we cannot modify the contents of
> file-mapped pages at load time and are restricted to changing their
> ordering and placement in the virtual address space.
> 4. Backwards compatibility. Randomized code must interoperate
> transparently with existing, unmodified executables and shared
> libraries. Calls into randomized code must work as-is according to the
> normal ABI.
> 5. Compatibility with other mitigations. Enabling randomization must
> not preclude deploying other mitigations such as control-flow
> integrity as well.
>
> ## Pagerando Design
>
> Pagerando requires a platform-specific extension to the dynamic
> loading ABI for compatible libraries to opt-in to. In order to
> decouple the address of each code bin (segment) from that of other
> bins and global data, we must disallow relative addressing between
> different bin segments as well as between legacy segments and bin
> segments.
>
> To prepare a library for pagerando, the compiler must first allocate
> functions into page-aligned bins corresponding to segments in the
> final ELF file. Since these bins will be independently positioned, the
> compiler must redirect all inter-bin references through an indirection
> table – the Page Offset Table (POT) – which stores the virtual address
> of each bin in the library. Indices of POT entries and bin offsets are
> statically determined at link time so code will not require any
> dynamic relocations to reference functions in another bin or globals
> outside of bins. We reserve a register in pagerando-compatible code to
> hold the address of the POT. This register is initialized on entry to
> the shared library. At load time the dynamic loader maps code bins at
> independent, random addresses and updates the dynamic relocations in
> the POT.
>
> Reserving a register to hold the POT address changes the internal ABI
> calling convention and requires that the POT register be correctly
> initialized when entering a library from external code. To initialize
> the register, the compiler emits entry wrappers which save the old
> contents of the POT register if necessary, initialize the POT
> register, and call the target function. Each externally visible
> function (conservatively including all address taken functions) needs
> an entry wrapper which replaces the function for all external uses.
>
> To optimally pack functions into bins and avoid new static
> relocations, we propose using (traditional) LTO. With new static
> relocations (i.e. linker cooperation), LTO would not be necessary, but
> it is still desirable for more efficient bin packing.
>
> The design of pagerando is based on the mitigations proposed by Backes
> and Nürnberger [1], with improvements for compatibility and
> deployability. The present design is a refinement of our first
> pagerando prototype [2].
>
> ## LLVM Changes
>
> To implement pagerando, we propose the following LLVM changes:
>
> New module pass to create entry wrapper functions. This pass will
> create entry wrappers as described above and replace exported function
> names and all address taken uses with the wrapper. This pass will only
> be run when pagerando is enabled.
>
> Instruction Lowering. Pagerando-compatible code must access all global
> values (including functions) through the POT since PC-relative memory
> addressing is not allowed between a bin and another segment. We
> propose that when pagerando is enabled, all global variable accesses
> from functions marked as pagerando-compatible must be lowered into
> GOT-relative accesses and added to the GOT address loaded from the POT
> (currently stored in the first POT entry). Lowering of direct function
> calls targeting pagerando-compatible code is slightly more complicated
> because we need to determine the POT index of the bin containing the
> target function if the target is not in the same bin. However, we
> can't properly allocate functions to bins before they are lowered and
> an approximate size is available. Therefore, during lowering we should
> assume that all function calls must be made indirectly through the POT
> with the computation of the POT index and bin offset of the target
> function postponed until assembly printing.
>
> New machine module LTO pass to allocate functions into bins. This pass
> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
> (MachineInstr) so that it knows (approximately) how large the final
> function code will be. Functions can also be packed in such a way that
> the number of inter-bin calls are minimized by taking the function
> call graph and/or execution profiles into account while packing. This
> pass only needs to run when pagerando is enabled.
>
> Code Emission. After functions are assigned to bins, we create an
> individual MCSection for each bin. These MCSections will map to
> independent segments during linking. The AsmPrinter is responsible for
> emitting the POT entries during code emission. We cannot easily
> represent the POT as a standard IR object because it needs to contain
> bin (MCSection) addresses. The AsmPrinter instead can query the
> MCContext for the list of bin symbols and emit these symbols directly
> into a global POT array.
>
> Gold Plugin Interface. If using LTO to build the module, LLVM can
> generate the complete POT for the module and instrument all references
> that need to use the POT. However, we must still ensure that bin
> sections are each placed into an independent segment so that the
> dynamic loader can map each bin separately. The gold plugin interface
> currently provides support to assign sections to unique output
> segments. However, it does not yet provide plugins an opportunity to
> call this interface for new, plugin-created input files. Gold requires
> that the plugin provide the file handle of the input section to assign
> a section to a unique segment. We will need to upstream a small patch
> for gold that provides a new callback to the LTO plugin when gold
> receives a new, plugin-generated input file. This would allow the
> plugin to obtain the new file’s handle and map its sections to unique
> segments. The linker must mark pagerando bin segments in such a way
> that the dynamic loader knows that it can randomize each bin segment
> independently. We propose a new ELF segment flag PF_RAND_ADDR that can
> communicate this for each compatible segment. The compiler and/or
> linker must add this flag to compatible segments for the loader to
> recognize and randomize the relevant segments.
>
> ## Target-Specific Details
>
> We will initially support pagerando for ARM and AArch64, so several
> details are worth considering on those targets. For ARM/AArch64, the
> r9 register is a platform-specific register that can be used as the
> static base register, which is similar in many ways to pagerando. When
> not specified by the platform, r9 is a callee-saved general-purpose
> register. Thus, using r9 as the POT register will be backwards
> compatible when calling out of pagerando code into either legacy code
> or a different module; the callee will preserve r9 for use after
> returning to pagerando code. In AArch64, r18 is designated as a
> platform-specific register, however, it is not specified as
> callee-saved when not reserved by the target platform. Thus, to
> interoperate with unmodified legacy AArch64 software, we would need to
> save r18 in pagerando code before calling into any external code. When
> using LTO, the compiler will see the entire module and therefore be
> able to identify calls into external vs internal code. Without LTO, it
> will likely be more efficient to use a callee-saved register to avoid
> the need to save the POT register before each call. We will experiment
> with both caller- and callee-saved registers to determine which is
> most efficient.
>
>
> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
> randomization practical by allowing code sharing. In USENIX Security
> Symposium, 2014. https://www.usenix.org/node/184466
>
> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
> we solved this problem yet? In IEEE Cybersecurity Development
> Conference (SecDev), 2016.
> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Out of curiosity, Did you measure what's the impact on performances
of the generated executable? We tried something akin to your proposal
in the past (i.e. randomizing ELF sections layout) and it turned out to be a
sledgehammer for performances (in some cases, i.e. when
-ffunction-sections/-fdata-sections was specified the performances of
the runtime executable dropped by > 10% [cc:ing Michael as he did the
measurements]).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <[hidden email]> wrote:

> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
> <[hidden email]> wrote:
>> This RFC describes pagerando, an improvement upon ASLR for shared
>> libraries. We're planning to submit this work for upstreaming and
>> would appreciate feedback before we get to the patch submission stage.
>>
>> Pagerando randomizes the location of individual memory pages (ASLR
>> only randomizes the library base address). This increases security
>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
>> Pagerando splits libraries into page-aligned bins at compile time. At
>> load time, each bin is mapped to a random address. The code in each
>> bin is immutable and thus shared between processes.
>>
>> To implement pagerando, the compiler and linker need to build shared
>> libraries with text segments split into page-aligned (and ideally
>> page-sized) bins. All inter-bin references are indirected through a
>> table initialized by the dynamic loader that holds the absolute
>> address of each bin. At load time the loader randomly chooses an
>> address for each bin and maps the bin pages from disk into memory.
>>
>> We're focusing on ARM and AArch64 initially, although there is nothing
>> particularly target specific that precludes support for other LLVM
>> backends.
>>
>> ## Design Goals
>>
>> 1. Improve security over ASLR. The randomization granularity
>> determines how much information a single code pointer leaks. A pointer
>> to a page reveals less about the location of other code than a pointer
>> into a contiguous library would.
>> 2. Avoid randomizing files on disk. Modern operating systems provide
>> verified boot techniques to detect tampering with files. Randomizing
>> the on-disk layout of system libraries would interfere with the
>> trusted boot process. Randomizing libraries at compile or link time
>> would also needlessly complicate deployment and provisioning.
>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
>> shared file pages to the same physical memory in each process and
>> locates these pages at different virtual addresses with ASLR. To
>> preserve sharing of code pages, we cannot modify the contents of
>> file-mapped pages at load time and are restricted to changing their
>> ordering and placement in the virtual address space.
>> 4. Backwards compatibility. Randomized code must interoperate
>> transparently with existing, unmodified executables and shared
>> libraries. Calls into randomized code must work as-is according to the
>> normal ABI.
>> 5. Compatibility with other mitigations. Enabling randomization must
>> not preclude deploying other mitigations such as control-flow
>> integrity as well.
>>
>> ## Pagerando Design
>>
>> Pagerando requires a platform-specific extension to the dynamic
>> loading ABI for compatible libraries to opt-in to. In order to
>> decouple the address of each code bin (segment) from that of other
>> bins and global data, we must disallow relative addressing between
>> different bin segments as well as between legacy segments and bin
>> segments.
>>
>> To prepare a library for pagerando, the compiler must first allocate
>> functions into page-aligned bins corresponding to segments in the
>> final ELF file. Since these bins will be independently positioned, the
>> compiler must redirect all inter-bin references through an indirection
>> table – the Page Offset Table (POT) – which stores the virtual address
>> of each bin in the library. Indices of POT entries and bin offsets are
>> statically determined at link time so code will not require any
>> dynamic relocations to reference functions in another bin or globals
>> outside of bins. We reserve a register in pagerando-compatible code to
>> hold the address of the POT. This register is initialized on entry to
>> the shared library. At load time the dynamic loader maps code bins at
>> independent, random addresses and updates the dynamic relocations in
>> the POT.
>>
>> Reserving a register to hold the POT address changes the internal ABI
>> calling convention and requires that the POT register be correctly
>> initialized when entering a library from external code. To initialize
>> the register, the compiler emits entry wrappers which save the old
>> contents of the POT register if necessary, initialize the POT
>> register, and call the target function. Each externally visible
>> function (conservatively including all address taken functions) needs
>> an entry wrapper which replaces the function for all external uses.
>>
>> To optimally pack functions into bins and avoid new static
>> relocations, we propose using (traditional) LTO. With new static
>> relocations (i.e. linker cooperation), LTO would not be necessary, but
>> it is still desirable for more efficient bin packing.
>>
>> The design of pagerando is based on the mitigations proposed by Backes
>> and Nürnberger [1], with improvements for compatibility and
>> deployability. The present design is a refinement of our first
>> pagerando prototype [2].
>>
>> ## LLVM Changes
>>
>> To implement pagerando, we propose the following LLVM changes:
>>
>> New module pass to create entry wrapper functions. This pass will
>> create entry wrappers as described above and replace exported function
>> names and all address taken uses with the wrapper. This pass will only
>> be run when pagerando is enabled.
>>
>> Instruction Lowering. Pagerando-compatible code must access all global
>> values (including functions) through the POT since PC-relative memory
>> addressing is not allowed between a bin and another segment. We
>> propose that when pagerando is enabled, all global variable accesses
>> from functions marked as pagerando-compatible must be lowered into
>> GOT-relative accesses and added to the GOT address loaded from the POT
>> (currently stored in the first POT entry). Lowering of direct function
>> calls targeting pagerando-compatible code is slightly more complicated
>> because we need to determine the POT index of the bin containing the
>> target function if the target is not in the same bin. However, we
>> can't properly allocate functions to bins before they are lowered and
>> an approximate size is available. Therefore, during lowering we should
>> assume that all function calls must be made indirectly through the POT
>> with the computation of the POT index and bin offset of the target
>> function postponed until assembly printing.
>>
>> New machine module LTO pass to allocate functions into bins. This pass
>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
>> (MachineInstr) so that it knows (approximately) how large the final
>> function code will be. Functions can also be packed in such a way that
>> the number of inter-bin calls are minimized by taking the function
>> call graph and/or execution profiles into account while packing. This
>> pass only needs to run when pagerando is enabled.
>>
>> Code Emission. After functions are assigned to bins, we create an
>> individual MCSection for each bin. These MCSections will map to
>> independent segments during linking. The AsmPrinter is responsible for
>> emitting the POT entries during code emission. We cannot easily
>> represent the POT as a standard IR object because it needs to contain
>> bin (MCSection) addresses. The AsmPrinter instead can query the
>> MCContext for the list of bin symbols and emit these symbols directly
>> into a global POT array.
>>
>> Gold Plugin Interface. If using LTO to build the module, LLVM can
>> generate the complete POT for the module and instrument all references
>> that need to use the POT. However, we must still ensure that bin
>> sections are each placed into an independent segment so that the
>> dynamic loader can map each bin separately. The gold plugin interface
>> currently provides support to assign sections to unique output
>> segments. However, it does not yet provide plugins an opportunity to
>> call this interface for new, plugin-created input files. Gold requires
>> that the plugin provide the file handle of the input section to assign
>> a section to a unique segment. We will need to upstream a small patch
>> for gold that provides a new callback to the LTO plugin when gold
>> receives a new, plugin-generated input file. This would allow the
>> plugin to obtain the new file’s handle and map its sections to unique
>> segments. The linker must mark pagerando bin segments in such a way
>> that the dynamic loader knows that it can randomize each bin segment
>> independently. We propose a new ELF segment flag PF_RAND_ADDR that can
>> communicate this for each compatible segment. The compiler and/or
>> linker must add this flag to compatible segments for the loader to
>> recognize and randomize the relevant segments.
>>
>> ## Target-Specific Details
>>
>> We will initially support pagerando for ARM and AArch64, so several
>> details are worth considering on those targets. For ARM/AArch64, the
>> r9 register is a platform-specific register that can be used as the
>> static base register, which is similar in many ways to pagerando. When
>> not specified by the platform, r9 is a callee-saved general-purpose
>> register. Thus, using r9 as the POT register will be backwards
>> compatible when calling out of pagerando code into either legacy code
>> or a different module; the callee will preserve r9 for use after
>> returning to pagerando code. In AArch64, r18 is designated as a
>> platform-specific register, however, it is not specified as
>> callee-saved when not reserved by the target platform. Thus, to
>> interoperate with unmodified legacy AArch64 software, we would need to
>> save r18 in pagerando code before calling into any external code. When
>> using LTO, the compiler will see the entire module and therefore be
>> able to identify calls into external vs internal code. Without LTO, it
>> will likely be more efficient to use a callee-saved register to avoid
>> the need to save the POT register before each call. We will experiment
>> with both caller- and callee-saved registers to determine which is
>> most efficient.
>>
>>
>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
>> randomization practical by allowing code sharing. In USENIX Security
>> Symposium, 2014. https://www.usenix.org/node/184466
>>
>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
>> we solved this problem yet? In IEEE Cybersecurity Development
>> Conference (SecDev), 2016.
>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> Out of curiosity, Did you measure what's the impact on performances
> of the generated executable? We tried something akin to your proposal
> in the past (i.e. randomizing ELF sections layout) and it turned out to be a
> sledgehammer for performances (in some cases, i.e. when
> -ffunction-sections/-fdata-sections was specified the performances of
> the runtime executable dropped by > 10% [cc:ing Michael as he did the
> measurements]).
>

To clarify, I read your paper and I see some benchmarks see
substantial degradations (6.5%), but in your "future work" section you
describe techniques to mitigate the drop, and I wonder if you ever got
to implement them and got new measurements.

Thanks,

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
I don't have performance measurements for the new LTO version of
pagerando yet. I'll definitely be thoroughly measuring performance
once the current prototype is finished before moving forward, and will
post results when I have them.

I'm definitely curious about your work and its performance impact.
Were you randomizing the layout of functions during linking by
reordering function sections? Or did just enabling -ffunction-sections
tank performance?

Thanks,
Stephen

On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <[hidden email]> wrote:

> On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <[hidden email]> wrote:
>> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
>> <[hidden email]> wrote:
>>> This RFC describes pagerando, an improvement upon ASLR for shared
>>> libraries. We're planning to submit this work for upstreaming and
>>> would appreciate feedback before we get to the patch submission stage.
>>>
>>> Pagerando randomizes the location of individual memory pages (ASLR
>>> only randomizes the library base address). This increases security
>>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
>>> Pagerando splits libraries into page-aligned bins at compile time. At
>>> load time, each bin is mapped to a random address. The code in each
>>> bin is immutable and thus shared between processes.
>>>
>>> To implement pagerando, the compiler and linker need to build shared
>>> libraries with text segments split into page-aligned (and ideally
>>> page-sized) bins. All inter-bin references are indirected through a
>>> table initialized by the dynamic loader that holds the absolute
>>> address of each bin. At load time the loader randomly chooses an
>>> address for each bin and maps the bin pages from disk into memory.
>>>
>>> We're focusing on ARM and AArch64 initially, although there is nothing
>>> particularly target specific that precludes support for other LLVM
>>> backends.
>>>
>>> ## Design Goals
>>>
>>> 1. Improve security over ASLR. The randomization granularity
>>> determines how much information a single code pointer leaks. A pointer
>>> to a page reveals less about the location of other code than a pointer
>>> into a contiguous library would.
>>> 2. Avoid randomizing files on disk. Modern operating systems provide
>>> verified boot techniques to detect tampering with files. Randomizing
>>> the on-disk layout of system libraries would interfere with the
>>> trusted boot process. Randomizing libraries at compile or link time
>>> would also needlessly complicate deployment and provisioning.
>>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
>>> shared file pages to the same physical memory in each process and
>>> locates these pages at different virtual addresses with ASLR. To
>>> preserve sharing of code pages, we cannot modify the contents of
>>> file-mapped pages at load time and are restricted to changing their
>>> ordering and placement in the virtual address space.
>>> 4. Backwards compatibility. Randomized code must interoperate
>>> transparently with existing, unmodified executables and shared
>>> libraries. Calls into randomized code must work as-is according to the
>>> normal ABI.
>>> 5. Compatibility with other mitigations. Enabling randomization must
>>> not preclude deploying other mitigations such as control-flow
>>> integrity as well.
>>>
>>> ## Pagerando Design
>>>
>>> Pagerando requires a platform-specific extension to the dynamic
>>> loading ABI for compatible libraries to opt-in to. In order to
>>> decouple the address of each code bin (segment) from that of other
>>> bins and global data, we must disallow relative addressing between
>>> different bin segments as well as between legacy segments and bin
>>> segments.
>>>
>>> To prepare a library for pagerando, the compiler must first allocate
>>> functions into page-aligned bins corresponding to segments in the
>>> final ELF file. Since these bins will be independently positioned, the
>>> compiler must redirect all inter-bin references through an indirection
>>> table – the Page Offset Table (POT) – which stores the virtual address
>>> of each bin in the library. Indices of POT entries and bin offsets are
>>> statically determined at link time so code will not require any
>>> dynamic relocations to reference functions in another bin or globals
>>> outside of bins. We reserve a register in pagerando-compatible code to
>>> hold the address of the POT. This register is initialized on entry to
>>> the shared library. At load time the dynamic loader maps code bins at
>>> independent, random addresses and updates the dynamic relocations in
>>> the POT.
>>>
>>> Reserving a register to hold the POT address changes the internal ABI
>>> calling convention and requires that the POT register be correctly
>>> initialized when entering a library from external code. To initialize
>>> the register, the compiler emits entry wrappers which save the old
>>> contents of the POT register if necessary, initialize the POT
>>> register, and call the target function. Each externally visible
>>> function (conservatively including all address taken functions) needs
>>> an entry wrapper which replaces the function for all external uses.
>>>
>>> To optimally pack functions into bins and avoid new static
>>> relocations, we propose using (traditional) LTO. With new static
>>> relocations (i.e. linker cooperation), LTO would not be necessary, but
>>> it is still desirable for more efficient bin packing.
>>>
>>> The design of pagerando is based on the mitigations proposed by Backes
>>> and Nürnberger [1], with improvements for compatibility and
>>> deployability. The present design is a refinement of our first
>>> pagerando prototype [2].
>>>
>>> ## LLVM Changes
>>>
>>> To implement pagerando, we propose the following LLVM changes:
>>>
>>> New module pass to create entry wrapper functions. This pass will
>>> create entry wrappers as described above and replace exported function
>>> names and all address taken uses with the wrapper. This pass will only
>>> be run when pagerando is enabled.
>>>
>>> Instruction Lowering. Pagerando-compatible code must access all global
>>> values (including functions) through the POT since PC-relative memory
>>> addressing is not allowed between a bin and another segment. We
>>> propose that when pagerando is enabled, all global variable accesses
>>> from functions marked as pagerando-compatible must be lowered into
>>> GOT-relative accesses and added to the GOT address loaded from the POT
>>> (currently stored in the first POT entry). Lowering of direct function
>>> calls targeting pagerando-compatible code is slightly more complicated
>>> because we need to determine the POT index of the bin containing the
>>> target function if the target is not in the same bin. However, we
>>> can't properly allocate functions to bins before they are lowered and
>>> an approximate size is available. Therefore, during lowering we should
>>> assume that all function calls must be made indirectly through the POT
>>> with the computation of the POT index and bin offset of the target
>>> function postponed until assembly printing.
>>>
>>> New machine module LTO pass to allocate functions into bins. This pass
>>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
>>> (MachineInstr) so that it knows (approximately) how large the final
>>> function code will be. Functions can also be packed in such a way that
>>> the number of inter-bin calls are minimized by taking the function
>>> call graph and/or execution profiles into account while packing. This
>>> pass only needs to run when pagerando is enabled.
>>>
>>> Code Emission. After functions are assigned to bins, we create an
>>> individual MCSection for each bin. These MCSections will map to
>>> independent segments during linking. The AsmPrinter is responsible for
>>> emitting the POT entries during code emission. We cannot easily
>>> represent the POT as a standard IR object because it needs to contain
>>> bin (MCSection) addresses. The AsmPrinter instead can query the
>>> MCContext for the list of bin symbols and emit these symbols directly
>>> into a global POT array.
>>>
>>> Gold Plugin Interface. If using LTO to build the module, LLVM can
>>> generate the complete POT for the module and instrument all references
>>> that need to use the POT. However, we must still ensure that bin
>>> sections are each placed into an independent segment so that the
>>> dynamic loader can map each bin separately. The gold plugin interface
>>> currently provides support to assign sections to unique output
>>> segments. However, it does not yet provide plugins an opportunity to
>>> call this interface for new, plugin-created input files. Gold requires
>>> that the plugin provide the file handle of the input section to assign
>>> a section to a unique segment. We will need to upstream a small patch
>>> for gold that provides a new callback to the LTO plugin when gold
>>> receives a new, plugin-generated input file. This would allow the
>>> plugin to obtain the new file’s handle and map its sections to unique
>>> segments. The linker must mark pagerando bin segments in such a way
>>> that the dynamic loader knows that it can randomize each bin segment
>>> independently. We propose a new ELF segment flag PF_RAND_ADDR that can
>>> communicate this for each compatible segment. The compiler and/or
>>> linker must add this flag to compatible segments for the loader to
>>> recognize and randomize the relevant segments.
>>>
>>> ## Target-Specific Details
>>>
>>> We will initially support pagerando for ARM and AArch64, so several
>>> details are worth considering on those targets. For ARM/AArch64, the
>>> r9 register is a platform-specific register that can be used as the
>>> static base register, which is similar in many ways to pagerando. When
>>> not specified by the platform, r9 is a callee-saved general-purpose
>>> register. Thus, using r9 as the POT register will be backwards
>>> compatible when calling out of pagerando code into either legacy code
>>> or a different module; the callee will preserve r9 for use after
>>> returning to pagerando code. In AArch64, r18 is designated as a
>>> platform-specific register, however, it is not specified as
>>> callee-saved when not reserved by the target platform. Thus, to
>>> interoperate with unmodified legacy AArch64 software, we would need to
>>> save r18 in pagerando code before calling into any external code. When
>>> using LTO, the compiler will see the entire module and therefore be
>>> able to identify calls into external vs internal code. Without LTO, it
>>> will likely be more efficient to use a callee-saved register to avoid
>>> the need to save the POT register before each call. We will experiment
>>> with both caller- and callee-saved registers to determine which is
>>> most efficient.
>>>
>>>
>>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
>>> randomization practical by allowing code sharing. In USENIX Security
>>> Symposium, 2014. https://www.usenix.org/node/184466
>>>
>>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
>>> we solved this problem yet? In IEEE Cybersecurity Development
>>> Conference (SecDev), 2016.
>>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> Out of curiosity, Did you measure what's the impact on performances
>> of the generated executable? We tried something akin to your proposal
>> in the past (i.e. randomizing ELF sections layout) and it turned out to be a
>> sledgehammer for performances (in some cases, i.e. when
>> -ffunction-sections/-fdata-sections was specified the performances of
>> the runtime executable dropped by > 10% [cc:ing Michael as he did the
>> measurements]).
>>
>
> To clarify, I read your paper and I see some benchmarks see
> substantial degradations (6.5%), but in your "future work" section you
> describe techniques to mitigate the drop, and I wonder if you ever got
> to implement them and got new measurements.
>
> Thanks,
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
On Mon, Jun 12, 2017 at 1:03 PM, Stephen Crane <[hidden email]> wrote:
I don't have performance measurements for the new LTO version of
pagerando yet. I'll definitely be thoroughly measuring performance
once the current prototype is finished before moving forward, and will
post results when I have them.

I'm definitely curious about your work and its performance impact.
Were you randomizing the layout of functions during linking by
reordering function sections? Or did just enabling -ffunction-sections
tank performance?

Thanks,
Stephen

-ffunction-sections plus randomization of text section order in the linker was a huge performance hit. It may well be different with only randomizing 4k groupings of sections instead.

- Michael Spencer
 

On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <[hidden email]> wrote:
> On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <[hidden email]> wrote:
>> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
>> <[hidden email]> wrote:
>>> This RFC describes pagerando, an improvement upon ASLR for shared
>>> libraries. We're planning to submit this work for upstreaming and
>>> would appreciate feedback before we get to the patch submission stage.
>>>
>>> Pagerando randomizes the location of individual memory pages (ASLR
>>> only randomizes the library base address). This increases security
>>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
>>> Pagerando splits libraries into page-aligned bins at compile time. At
>>> load time, each bin is mapped to a random address. The code in each
>>> bin is immutable and thus shared between processes.
>>>
>>> To implement pagerando, the compiler and linker need to build shared
>>> libraries with text segments split into page-aligned (and ideally
>>> page-sized) bins. All inter-bin references are indirected through a
>>> table initialized by the dynamic loader that holds the absolute
>>> address of each bin. At load time the loader randomly chooses an
>>> address for each bin and maps the bin pages from disk into memory.
>>>
>>> We're focusing on ARM and AArch64 initially, although there is nothing
>>> particularly target specific that precludes support for other LLVM
>>> backends.
>>>
>>> ## Design Goals
>>>
>>> 1. Improve security over ASLR. The randomization granularity
>>> determines how much information a single code pointer leaks. A pointer
>>> to a page reveals less about the location of other code than a pointer
>>> into a contiguous library would.
>>> 2. Avoid randomizing files on disk. Modern operating systems provide
>>> verified boot techniques to detect tampering with files. Randomizing
>>> the on-disk layout of system libraries would interfere with the
>>> trusted boot process. Randomizing libraries at compile or link time
>>> would also needlessly complicate deployment and provisioning.
>>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
>>> shared file pages to the same physical memory in each process and
>>> locates these pages at different virtual addresses with ASLR. To
>>> preserve sharing of code pages, we cannot modify the contents of
>>> file-mapped pages at load time and are restricted to changing their
>>> ordering and placement in the virtual address space.
>>> 4. Backwards compatibility. Randomized code must interoperate
>>> transparently with existing, unmodified executables and shared
>>> libraries. Calls into randomized code must work as-is according to the
>>> normal ABI.
>>> 5. Compatibility with other mitigations. Enabling randomization must
>>> not preclude deploying other mitigations such as control-flow
>>> integrity as well.
>>>
>>> ## Pagerando Design
>>>
>>> Pagerando requires a platform-specific extension to the dynamic
>>> loading ABI for compatible libraries to opt-in to. In order to
>>> decouple the address of each code bin (segment) from that of other
>>> bins and global data, we must disallow relative addressing between
>>> different bin segments as well as between legacy segments and bin
>>> segments.
>>>
>>> To prepare a library for pagerando, the compiler must first allocate
>>> functions into page-aligned bins corresponding to segments in the
>>> final ELF file. Since these bins will be independently positioned, the
>>> compiler must redirect all inter-bin references through an indirection
>>> table – the Page Offset Table (POT) – which stores the virtual address
>>> of each bin in the library. Indices of POT entries and bin offsets are
>>> statically determined at link time so code will not require any
>>> dynamic relocations to reference functions in another bin or globals
>>> outside of bins. We reserve a register in pagerando-compatible code to
>>> hold the address of the POT. This register is initialized on entry to
>>> the shared library. At load time the dynamic loader maps code bins at
>>> independent, random addresses and updates the dynamic relocations in
>>> the POT.
>>>
>>> Reserving a register to hold the POT address changes the internal ABI
>>> calling convention and requires that the POT register be correctly
>>> initialized when entering a library from external code. To initialize
>>> the register, the compiler emits entry wrappers which save the old
>>> contents of the POT register if necessary, initialize the POT
>>> register, and call the target function. Each externally visible
>>> function (conservatively including all address taken functions) needs
>>> an entry wrapper which replaces the function for all external uses.
>>>
>>> To optimally pack functions into bins and avoid new static
>>> relocations, we propose using (traditional) LTO. With new static
>>> relocations (i.e. linker cooperation), LTO would not be necessary, but
>>> it is still desirable for more efficient bin packing.
>>>
>>> The design of pagerando is based on the mitigations proposed by Backes
>>> and Nürnberger [1], with improvements for compatibility and
>>> deployability. The present design is a refinement of our first
>>> pagerando prototype [2].
>>>
>>> ## LLVM Changes
>>>
>>> To implement pagerando, we propose the following LLVM changes:
>>>
>>> New module pass to create entry wrapper functions. This pass will
>>> create entry wrappers as described above and replace exported function
>>> names and all address taken uses with the wrapper. This pass will only
>>> be run when pagerando is enabled.
>>>
>>> Instruction Lowering. Pagerando-compatible code must access all global
>>> values (including functions) through the POT since PC-relative memory
>>> addressing is not allowed between a bin and another segment. We
>>> propose that when pagerando is enabled, all global variable accesses
>>> from functions marked as pagerando-compatible must be lowered into
>>> GOT-relative accesses and added to the GOT address loaded from the POT
>>> (currently stored in the first POT entry). Lowering of direct function
>>> calls targeting pagerando-compatible code is slightly more complicated
>>> because we need to determine the POT index of the bin containing the
>>> target function if the target is not in the same bin. However, we
>>> can't properly allocate functions to bins before they are lowered and
>>> an approximate size is available. Therefore, during lowering we should
>>> assume that all function calls must be made indirectly through the POT
>>> with the computation of the POT index and bin offset of the target
>>> function postponed until assembly printing.
>>>
>>> New machine module LTO pass to allocate functions into bins. This pass
>>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
>>> (MachineInstr) so that it knows (approximately) how large the final
>>> function code will be. Functions can also be packed in such a way that
>>> the number of inter-bin calls are minimized by taking the function
>>> call graph and/or execution profiles into account while packing. This
>>> pass only needs to run when pagerando is enabled.
>>>
>>> Code Emission. After functions are assigned to bins, we create an
>>> individual MCSection for each bin. These MCSections will map to
>>> independent segments during linking. The AsmPrinter is responsible for
>>> emitting the POT entries during code emission. We cannot easily
>>> represent the POT as a standard IR object because it needs to contain
>>> bin (MCSection) addresses. The AsmPrinter instead can query the
>>> MCContext for the list of bin symbols and emit these symbols directly
>>> into a global POT array.
>>>
>>> Gold Plugin Interface. If using LTO to build the module, LLVM can
>>> generate the complete POT for the module and instrument all references
>>> that need to use the POT. However, we must still ensure that bin
>>> sections are each placed into an independent segment so that the
>>> dynamic loader can map each bin separately. The gold plugin interface
>>> currently provides support to assign sections to unique output
>>> segments. However, it does not yet provide plugins an opportunity to
>>> call this interface for new, plugin-created input files. Gold requires
>>> that the plugin provide the file handle of the input section to assign
>>> a section to a unique segment. We will need to upstream a small patch
>>> for gold that provides a new callback to the LTO plugin when gold
>>> receives a new, plugin-generated input file. This would allow the
>>> plugin to obtain the new file’s handle and map its sections to unique
>>> segments. The linker must mark pagerando bin segments in such a way
>>> that the dynamic loader knows that it can randomize each bin segment
>>> independently. We propose a new ELF segment flag PF_RAND_ADDR that can
>>> communicate this for each compatible segment. The compiler and/or
>>> linker must add this flag to compatible segments for the loader to
>>> recognize and randomize the relevant segments.
>>>
>>> ## Target-Specific Details
>>>
>>> We will initially support pagerando for ARM and AArch64, so several
>>> details are worth considering on those targets. For ARM/AArch64, the
>>> r9 register is a platform-specific register that can be used as the
>>> static base register, which is similar in many ways to pagerando. When
>>> not specified by the platform, r9 is a callee-saved general-purpose
>>> register. Thus, using r9 as the POT register will be backwards
>>> compatible when calling out of pagerando code into either legacy code
>>> or a different module; the callee will preserve r9 for use after
>>> returning to pagerando code. In AArch64, r18 is designated as a
>>> platform-specific register, however, it is not specified as
>>> callee-saved when not reserved by the target platform. Thus, to
>>> interoperate with unmodified legacy AArch64 software, we would need to
>>> save r18 in pagerando code before calling into any external code. When
>>> using LTO, the compiler will see the entire module and therefore be
>>> able to identify calls into external vs internal code. Without LTO, it
>>> will likely be more efficient to use a callee-saved register to avoid
>>> the need to save the POT register before each call. We will experiment
>>> with both caller- and callee-saved registers to determine which is
>>> most efficient.
>>>
>>>
>>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
>>> randomization practical by allowing code sharing. In USENIX Security
>>> Symposium, 2014. https://www.usenix.org/node/184466
>>>
>>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
>>> we solved this problem yet? In IEEE Cybersecurity Development
>>> Conference (SecDev), 2016.
>>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> Out of curiosity, Did you measure what's the impact on performances
>> of the generated executable? We tried something akin to your proposal
>> in the past (i.e. randomizing ELF sections layout) and it turned out to be a
>> sledgehammer for performances (in some cases, i.e. when
>> -ffunction-sections/-fdata-sections was specified the performances of
>> the runtime executable dropped by > 10% [cc:ing Michael as he did the
>> measurements]).
>>
>
> To clarify, I read your paper and I see some benchmarks see
> substantial degradations (6.5%), but in your "future work" section you
> describe techniques to mitigate the drop, and I wonder if you ever got
> to implement them and got new measurements.
>
> Thanks,
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev


On Mon, Jun 12, 2017 at 2:13 PM, Michael Spencer via llvm-dev <[hidden email]> wrote:
On Mon, Jun 12, 2017 at 1:03 PM, Stephen Crane <[hidden email]> wrote:
I don't have performance measurements for the new LTO version of
pagerando yet. I'll definitely be thoroughly measuring performance
once the current prototype is finished before moving forward, and will
post results when I have them.

I'm definitely curious about your work and its performance impact.
Were you randomizing the layout of functions during linking by
reordering function sections? Or did just enabling -ffunction-sections
tank performance?

Thanks,
Stephen

-ffunction-sections plus randomization of text section order in the linker was a huge performance hit. It may well be different with only randomizing 4k groupings of sections instead.

What overhead would there be from randomizing layout other than iTLB cost? And by definition if you are remapping at 4k boundaries then that has no effect on iTLB cost (besides the initial need to split at 4k boundaries).

-- Sean Silva
 

- Michael Spencer
 

On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <[hidden email]> wrote:
> On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <[hidden email]> wrote:
>> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
>> <[hidden email]> wrote:
>>> This RFC describes pagerando, an improvement upon ASLR for shared
>>> libraries. We're planning to submit this work for upstreaming and
>>> would appreciate feedback before we get to the patch submission stage.
>>>
>>> Pagerando randomizes the location of individual memory pages (ASLR
>>> only randomizes the library base address). This increases security
>>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
>>> Pagerando splits libraries into page-aligned bins at compile time. At
>>> load time, each bin is mapped to a random address. The code in each
>>> bin is immutable and thus shared between processes.
>>>
>>> To implement pagerando, the compiler and linker need to build shared
>>> libraries with text segments split into page-aligned (and ideally
>>> page-sized) bins. All inter-bin references are indirected through a
>>> table initialized by the dynamic loader that holds the absolute
>>> address of each bin. At load time the loader randomly chooses an
>>> address for each bin and maps the bin pages from disk into memory.
>>>
>>> We're focusing on ARM and AArch64 initially, although there is nothing
>>> particularly target specific that precludes support for other LLVM
>>> backends.
>>>
>>> ## Design Goals
>>>
>>> 1. Improve security over ASLR. The randomization granularity
>>> determines how much information a single code pointer leaks. A pointer
>>> to a page reveals less about the location of other code than a pointer
>>> into a contiguous library would.
>>> 2. Avoid randomizing files on disk. Modern operating systems provide
>>> verified boot techniques to detect tampering with files. Randomizing
>>> the on-disk layout of system libraries would interfere with the
>>> trusted boot process. Randomizing libraries at compile or link time
>>> would also needlessly complicate deployment and provisioning.
>>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
>>> shared file pages to the same physical memory in each process and
>>> locates these pages at different virtual addresses with ASLR. To
>>> preserve sharing of code pages, we cannot modify the contents of
>>> file-mapped pages at load time and are restricted to changing their
>>> ordering and placement in the virtual address space.
>>> 4. Backwards compatibility. Randomized code must interoperate
>>> transparently with existing, unmodified executables and shared
>>> libraries. Calls into randomized code must work as-is according to the
>>> normal ABI.
>>> 5. Compatibility with other mitigations. Enabling randomization must
>>> not preclude deploying other mitigations such as control-flow
>>> integrity as well.
>>>
>>> ## Pagerando Design
>>>
>>> Pagerando requires a platform-specific extension to the dynamic
>>> loading ABI for compatible libraries to opt-in to. In order to
>>> decouple the address of each code bin (segment) from that of other
>>> bins and global data, we must disallow relative addressing between
>>> different bin segments as well as between legacy segments and bin
>>> segments.
>>>
>>> To prepare a library for pagerando, the compiler must first allocate
>>> functions into page-aligned bins corresponding to segments in the
>>> final ELF file. Since these bins will be independently positioned, the
>>> compiler must redirect all inter-bin references through an indirection
>>> table – the Page Offset Table (POT) – which stores the virtual address
>>> of each bin in the library. Indices of POT entries and bin offsets are
>>> statically determined at link time so code will not require any
>>> dynamic relocations to reference functions in another bin or globals
>>> outside of bins. We reserve a register in pagerando-compatible code to
>>> hold the address of the POT. This register is initialized on entry to
>>> the shared library. At load time the dynamic loader maps code bins at
>>> independent, random addresses and updates the dynamic relocations in
>>> the POT.
>>>
>>> Reserving a register to hold the POT address changes the internal ABI
>>> calling convention and requires that the POT register be correctly
>>> initialized when entering a library from external code. To initialize
>>> the register, the compiler emits entry wrappers which save the old
>>> contents of the POT register if necessary, initialize the POT
>>> register, and call the target function. Each externally visible
>>> function (conservatively including all address taken functions) needs
>>> an entry wrapper which replaces the function for all external uses.
>>>
>>> To optimally pack functions into bins and avoid new static
>>> relocations, we propose using (traditional) LTO. With new static
>>> relocations (i.e. linker cooperation), LTO would not be necessary, but
>>> it is still desirable for more efficient bin packing.
>>>
>>> The design of pagerando is based on the mitigations proposed by Backes
>>> and Nürnberger [1], with improvements for compatibility and
>>> deployability. The present design is a refinement of our first
>>> pagerando prototype [2].
>>>
>>> ## LLVM Changes
>>>
>>> To implement pagerando, we propose the following LLVM changes:
>>>
>>> New module pass to create entry wrapper functions. This pass will
>>> create entry wrappers as described above and replace exported function
>>> names and all address taken uses with the wrapper. This pass will only
>>> be run when pagerando is enabled.
>>>
>>> Instruction Lowering. Pagerando-compatible code must access all global
>>> values (including functions) through the POT since PC-relative memory
>>> addressing is not allowed between a bin and another segment. We
>>> propose that when pagerando is enabled, all global variable accesses
>>> from functions marked as pagerando-compatible must be lowered into
>>> GOT-relative accesses and added to the GOT address loaded from the POT
>>> (currently stored in the first POT entry). Lowering of direct function
>>> calls targeting pagerando-compatible code is slightly more complicated
>>> because we need to determine the POT index of the bin containing the
>>> target function if the target is not in the same bin. However, we
>>> can't properly allocate functions to bins before they are lowered and
>>> an approximate size is available. Therefore, during lowering we should
>>> assume that all function calls must be made indirectly through the POT
>>> with the computation of the POT index and bin offset of the target
>>> function postponed until assembly printing.
>>>
>>> New machine module LTO pass to allocate functions into bins. This pass
>>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
>>> (MachineInstr) so that it knows (approximately) how large the final
>>> function code will be. Functions can also be packed in such a way that
>>> the number of inter-bin calls are minimized by taking the function
>>> call graph and/or execution profiles into account while packing. This
>>> pass only needs to run when pagerando is enabled.
>>>
>>> Code Emission. After functions are assigned to bins, we create an
>>> individual MCSection for each bin. These MCSections will map to
>>> independent segments during linking. The AsmPrinter is responsible for
>>> emitting the POT entries during code emission. We cannot easily
>>> represent the POT as a standard IR object because it needs to contain
>>> bin (MCSection) addresses. The AsmPrinter instead can query the
>>> MCContext for the list of bin symbols and emit these symbols directly
>>> into a global POT array.
>>>
>>> Gold Plugin Interface. If using LTO to build the module, LLVM can
>>> generate the complete POT for the module and instrument all references
>>> that need to use the POT. However, we must still ensure that bin
>>> sections are each placed into an independent segment so that the
>>> dynamic loader can map each bin separately. The gold plugin interface
>>> currently provides support to assign sections to unique output
>>> segments. However, it does not yet provide plugins an opportunity to
>>> call this interface for new, plugin-created input files. Gold requires
>>> that the plugin provide the file handle of the input section to assign
>>> a section to a unique segment. We will need to upstream a small patch
>>> for gold that provides a new callback to the LTO plugin when gold
>>> receives a new, plugin-generated input file. This would allow the
>>> plugin to obtain the new file’s handle and map its sections to unique
>>> segments. The linker must mark pagerando bin segments in such a way
>>> that the dynamic loader knows that it can randomize each bin segment
>>> independently. We propose a new ELF segment flag PF_RAND_ADDR that can
>>> communicate this for each compatible segment. The compiler and/or
>>> linker must add this flag to compatible segments for the loader to
>>> recognize and randomize the relevant segments.
>>>
>>> ## Target-Specific Details
>>>
>>> We will initially support pagerando for ARM and AArch64, so several
>>> details are worth considering on those targets. For ARM/AArch64, the
>>> r9 register is a platform-specific register that can be used as the
>>> static base register, which is similar in many ways to pagerando. When
>>> not specified by the platform, r9 is a callee-saved general-purpose
>>> register. Thus, using r9 as the POT register will be backwards
>>> compatible when calling out of pagerando code into either legacy code
>>> or a different module; the callee will preserve r9 for use after
>>> returning to pagerando code. In AArch64, r18 is designated as a
>>> platform-specific register, however, it is not specified as
>>> callee-saved when not reserved by the target platform. Thus, to
>>> interoperate with unmodified legacy AArch64 software, we would need to
>>> save r18 in pagerando code before calling into any external code. When
>>> using LTO, the compiler will see the entire module and therefore be
>>> able to identify calls into external vs internal code. Without LTO, it
>>> will likely be more efficient to use a callee-saved register to avoid
>>> the need to save the POT register before each call. We will experiment
>>> with both caller- and callee-saved registers to determine which is
>>> most efficient.
>>>
>>>
>>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
>>> randomization practical by allowing code sharing. In USENIX Security
>>> Symposium, 2014. https://www.usenix.org/node/184466
>>>
>>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
>>> we solved this problem yet? In IEEE Cybersecurity Development
>>> Conference (SecDev), 2016.
>>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> Out of curiosity, Did you measure what's the impact on performances
>> of the generated executable? We tried something akin to your proposal
>> in the past (i.e. randomizing ELF sections layout) and it turned out to be a
>> sledgehammer for performances (in some cases, i.e. when
>> -ffunction-sections/-fdata-sections was specified the performances of
>> the runtime executable dropped by > 10% [cc:ing Michael as he did the
>> measurements]).
>>
>
> To clarify, I read your paper and I see some benchmarks see
> substantial degradations (6.5%), but in your "future work" section you
> describe techniques to mitigate the drop, and I wonder if you ever got
> to implement them and got new measurements.
>
> Thanks,
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
I could understand a TLB hit if functions that originally happened to
be on the same page were spread across many pages, raising the iTLB
footprint for a given loop, etc. (reduced spatial locality). For
pagerando, since we're splitting on 4k page boundaries and can keep
spatial locality (or attempt to improve it), I'm not sure that TLB
misses will be a large factor. I expect that the runtime overhead of
inter-page indirection will dominate any TLB impact.

On Mon, Jun 12, 2017 at 3:31 PM, Sean Silva <[hidden email]> wrote:

>
>
> On Mon, Jun 12, 2017 at 2:13 PM, Michael Spencer via llvm-dev
> <[hidden email]> wrote:
>>
>> On Mon, Jun 12, 2017 at 1:03 PM, Stephen Crane <[hidden email]> wrote:
>>>
>>> I don't have performance measurements for the new LTO version of
>>> pagerando yet. I'll definitely be thoroughly measuring performance
>>> once the current prototype is finished before moving forward, and will
>>> post results when I have them.
>>>
>>> I'm definitely curious about your work and its performance impact.
>>> Were you randomizing the layout of functions during linking by
>>> reordering function sections? Or did just enabling -ffunction-sections
>>> tank performance?
>>>
>>> Thanks,
>>> Stephen
>>
>>
>> -ffunction-sections plus randomization of text section order in the linker
>> was a huge performance hit. It may well be different with only randomizing
>> 4k groupings of sections instead.
>
>
> What overhead would there be from randomizing layout other than iTLB cost?
> And by definition if you are remapping at 4k boundaries then that has no
> effect on iTLB cost (besides the initial need to split at 4k boundaries).
>
> -- Sean Silva
>
>>
>>
>> - Michael Spencer
>>
>>>
>>>
>>> On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <[hidden email]>
>>> wrote:
>>> > On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <[hidden email]>
>>> > wrote:
>>> >> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
>>> >> <[hidden email]> wrote:
>>> >>> This RFC describes pagerando, an improvement upon ASLR for shared
>>> >>> libraries. We're planning to submit this work for upstreaming and
>>> >>> would appreciate feedback before we get to the patch submission
>>> >>> stage.
>>> >>>
>>> >>> Pagerando randomizes the location of individual memory pages (ASLR
>>> >>> only randomizes the library base address). This increases security
>>> >>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
>>> >>> Pagerando splits libraries into page-aligned bins at compile time. At
>>> >>> load time, each bin is mapped to a random address. The code in each
>>> >>> bin is immutable and thus shared between processes.
>>> >>>
>>> >>> To implement pagerando, the compiler and linker need to build shared
>>> >>> libraries with text segments split into page-aligned (and ideally
>>> >>> page-sized) bins. All inter-bin references are indirected through a
>>> >>> table initialized by the dynamic loader that holds the absolute
>>> >>> address of each bin. At load time the loader randomly chooses an
>>> >>> address for each bin and maps the bin pages from disk into memory.
>>> >>>
>>> >>> We're focusing on ARM and AArch64 initially, although there is
>>> >>> nothing
>>> >>> particularly target specific that precludes support for other LLVM
>>> >>> backends.
>>> >>>
>>> >>> ## Design Goals
>>> >>>
>>> >>> 1. Improve security over ASLR. The randomization granularity
>>> >>> determines how much information a single code pointer leaks. A
>>> >>> pointer
>>> >>> to a page reveals less about the location of other code than a
>>> >>> pointer
>>> >>> into a contiguous library would.
>>> >>> 2. Avoid randomizing files on disk. Modern operating systems provide
>>> >>> verified boot techniques to detect tampering with files. Randomizing
>>> >>> the on-disk layout of system libraries would interfere with the
>>> >>> trusted boot process. Randomizing libraries at compile or link time
>>> >>> would also needlessly complicate deployment and provisioning.
>>> >>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
>>> >>> shared file pages to the same physical memory in each process and
>>> >>> locates these pages at different virtual addresses with ASLR. To
>>> >>> preserve sharing of code pages, we cannot modify the contents of
>>> >>> file-mapped pages at load time and are restricted to changing their
>>> >>> ordering and placement in the virtual address space.
>>> >>> 4. Backwards compatibility. Randomized code must interoperate
>>> >>> transparently with existing, unmodified executables and shared
>>> >>> libraries. Calls into randomized code must work as-is according to
>>> >>> the
>>> >>> normal ABI.
>>> >>> 5. Compatibility with other mitigations. Enabling randomization must
>>> >>> not preclude deploying other mitigations such as control-flow
>>> >>> integrity as well.
>>> >>>
>>> >>> ## Pagerando Design
>>> >>>
>>> >>> Pagerando requires a platform-specific extension to the dynamic
>>> >>> loading ABI for compatible libraries to opt-in to. In order to
>>> >>> decouple the address of each code bin (segment) from that of other
>>> >>> bins and global data, we must disallow relative addressing between
>>> >>> different bin segments as well as between legacy segments and bin
>>> >>> segments.
>>> >>>
>>> >>> To prepare a library for pagerando, the compiler must first allocate
>>> >>> functions into page-aligned bins corresponding to segments in the
>>> >>> final ELF file. Since these bins will be independently positioned,
>>> >>> the
>>> >>> compiler must redirect all inter-bin references through an
>>> >>> indirection
>>> >>> table – the Page Offset Table (POT) – which stores the virtual
>>> >>> address
>>> >>> of each bin in the library. Indices of POT entries and bin offsets
>>> >>> are
>>> >>> statically determined at link time so code will not require any
>>> >>> dynamic relocations to reference functions in another bin or globals
>>> >>> outside of bins. We reserve a register in pagerando-compatible code
>>> >>> to
>>> >>> hold the address of the POT. This register is initialized on entry to
>>> >>> the shared library. At load time the dynamic loader maps code bins at
>>> >>> independent, random addresses and updates the dynamic relocations in
>>> >>> the POT.
>>> >>>
>>> >>> Reserving a register to hold the POT address changes the internal ABI
>>> >>> calling convention and requires that the POT register be correctly
>>> >>> initialized when entering a library from external code. To initialize
>>> >>> the register, the compiler emits entry wrappers which save the old
>>> >>> contents of the POT register if necessary, initialize the POT
>>> >>> register, and call the target function. Each externally visible
>>> >>> function (conservatively including all address taken functions) needs
>>> >>> an entry wrapper which replaces the function for all external uses.
>>> >>>
>>> >>> To optimally pack functions into bins and avoid new static
>>> >>> relocations, we propose using (traditional) LTO. With new static
>>> >>> relocations (i.e. linker cooperation), LTO would not be necessary,
>>> >>> but
>>> >>> it is still desirable for more efficient bin packing.
>>> >>>
>>> >>> The design of pagerando is based on the mitigations proposed by
>>> >>> Backes
>>> >>> and Nürnberger [1], with improvements for compatibility and
>>> >>> deployability. The present design is a refinement of our first
>>> >>> pagerando prototype [2].
>>> >>>
>>> >>> ## LLVM Changes
>>> >>>
>>> >>> To implement pagerando, we propose the following LLVM changes:
>>> >>>
>>> >>> New module pass to create entry wrapper functions. This pass will
>>> >>> create entry wrappers as described above and replace exported
>>> >>> function
>>> >>> names and all address taken uses with the wrapper. This pass will
>>> >>> only
>>> >>> be run when pagerando is enabled.
>>> >>>
>>> >>> Instruction Lowering. Pagerando-compatible code must access all
>>> >>> global
>>> >>> values (including functions) through the POT since PC-relative memory
>>> >>> addressing is not allowed between a bin and another segment. We
>>> >>> propose that when pagerando is enabled, all global variable accesses
>>> >>> from functions marked as pagerando-compatible must be lowered into
>>> >>> GOT-relative accesses and added to the GOT address loaded from the
>>> >>> POT
>>> >>> (currently stored in the first POT entry). Lowering of direct
>>> >>> function
>>> >>> calls targeting pagerando-compatible code is slightly more
>>> >>> complicated
>>> >>> because we need to determine the POT index of the bin containing the
>>> >>> target function if the target is not in the same bin. However, we
>>> >>> can't properly allocate functions to bins before they are lowered and
>>> >>> an approximate size is available. Therefore, during lowering we
>>> >>> should
>>> >>> assume that all function calls must be made indirectly through the
>>> >>> POT
>>> >>> with the computation of the POT index and bin offset of the target
>>> >>> function postponed until assembly printing.
>>> >>>
>>> >>> New machine module LTO pass to allocate functions into bins. This
>>> >>> pass
>>> >>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
>>> >>> (MachineInstr) so that it knows (approximately) how large the final
>>> >>> function code will be. Functions can also be packed in such a way
>>> >>> that
>>> >>> the number of inter-bin calls are minimized by taking the function
>>> >>> call graph and/or execution profiles into account while packing. This
>>> >>> pass only needs to run when pagerando is enabled.
>>> >>>
>>> >>> Code Emission. After functions are assigned to bins, we create an
>>> >>> individual MCSection for each bin. These MCSections will map to
>>> >>> independent segments during linking. The AsmPrinter is responsible
>>> >>> for
>>> >>> emitting the POT entries during code emission. We cannot easily
>>> >>> represent the POT as a standard IR object because it needs to contain
>>> >>> bin (MCSection) addresses. The AsmPrinter instead can query the
>>> >>> MCContext for the list of bin symbols and emit these symbols directly
>>> >>> into a global POT array.
>>> >>>
>>> >>> Gold Plugin Interface. If using LTO to build the module, LLVM can
>>> >>> generate the complete POT for the module and instrument all
>>> >>> references
>>> >>> that need to use the POT. However, we must still ensure that bin
>>> >>> sections are each placed into an independent segment so that the
>>> >>> dynamic loader can map each bin separately. The gold plugin interface
>>> >>> currently provides support to assign sections to unique output
>>> >>> segments. However, it does not yet provide plugins an opportunity to
>>> >>> call this interface for new, plugin-created input files. Gold
>>> >>> requires
>>> >>> that the plugin provide the file handle of the input section to
>>> >>> assign
>>> >>> a section to a unique segment. We will need to upstream a small patch
>>> >>> for gold that provides a new callback to the LTO plugin when gold
>>> >>> receives a new, plugin-generated input file. This would allow the
>>> >>> plugin to obtain the new file’s handle and map its sections to unique
>>> >>> segments. The linker must mark pagerando bin segments in such a way
>>> >>> that the dynamic loader knows that it can randomize each bin segment
>>> >>> independently. We propose a new ELF segment flag PF_RAND_ADDR that
>>> >>> can
>>> >>> communicate this for each compatible segment. The compiler and/or
>>> >>> linker must add this flag to compatible segments for the loader to
>>> >>> recognize and randomize the relevant segments.
>>> >>>
>>> >>> ## Target-Specific Details
>>> >>>
>>> >>> We will initially support pagerando for ARM and AArch64, so several
>>> >>> details are worth considering on those targets. For ARM/AArch64, the
>>> >>> r9 register is a platform-specific register that can be used as the
>>> >>> static base register, which is similar in many ways to pagerando.
>>> >>> When
>>> >>> not specified by the platform, r9 is a callee-saved general-purpose
>>> >>> register. Thus, using r9 as the POT register will be backwards
>>> >>> compatible when calling out of pagerando code into either legacy code
>>> >>> or a different module; the callee will preserve r9 for use after
>>> >>> returning to pagerando code. In AArch64, r18 is designated as a
>>> >>> platform-specific register, however, it is not specified as
>>> >>> callee-saved when not reserved by the target platform. Thus, to
>>> >>> interoperate with unmodified legacy AArch64 software, we would need
>>> >>> to
>>> >>> save r18 in pagerando code before calling into any external code.
>>> >>> When
>>> >>> using LTO, the compiler will see the entire module and therefore be
>>> >>> able to identify calls into external vs internal code. Without LTO,
>>> >>> it
>>> >>> will likely be more efficient to use a callee-saved register to avoid
>>> >>> the need to save the POT register before each call. We will
>>> >>> experiment
>>> >>> with both caller- and callee-saved registers to determine which is
>>> >>> most efficient.
>>> >>>
>>> >>>
>>> >>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained
>>> >>> memory
>>> >>> randomization practical by allowing code sharing. In USENIX Security
>>> >>> Symposium, 2014. https://www.usenix.org/node/184466
>>> >>>
>>> >>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
>>> >>> we solved this problem yet? In IEEE Cybersecurity Development
>>> >>> Conference (SecDev), 2016.
>>> >>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
>>> >>> _______________________________________________
>>> >>> LLVM Developers mailing list
>>> >>> [hidden email]
>>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >>
>>> >> Out of curiosity, Did you measure what's the impact on performances
>>> >> of the generated executable? We tried something akin to your proposal
>>> >> in the past (i.e. randomizing ELF sections layout) and it turned out
>>> >> to be a
>>> >> sledgehammer for performances (in some cases, i.e. when
>>> >> -ffunction-sections/-fdata-sections was specified the performances of
>>> >> the runtime executable dropped by > 10% [cc:ing Michael as he did the
>>> >> measurements]).
>>> >>
>>> >
>>> > To clarify, I read your paper and I see some benchmarks see
>>> > substantial degradations (6.5%), but in your "future work" section you
>>> > describe techniques to mitigate the drop, and I wonder if you ever got
>>> > to implement them and got new measurements.
>>> >
>>> > Thanks,
>>> >
>>> > --
>>> > Davide
>>> >
>>> > "There are no solved problems; there are only problems that are more
>>> > or less solved" -- Henri Poincare
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
If the linker normally doesn't reorder sections with -ffunction-sections
but you introduce a random order, then I'd guess you have taken adjacent
functions that may have been originally from the same source and so
(perhaps) more tightly coupled, and scattered them randomly.  This could
take what was reasonably adjacent L2/L3 cache behavior and introduced a
bunch of non-locality as the calls start bouncing around memory more
widely.  Going from reasonable to unreasonable cache behavior can have a
pretty serious effect.
All speculation of course, and highly dependent on the application.
--paulr

> -----Original Message-----
> From: llvm-dev [mailto:[hidden email]] On Behalf Of
> Stephen Crane via llvm-dev
> Sent: Monday, June 12, 2017 3:42 PM
> To: Sean Silva
> Cc: llvm-dev
> Subject: Re: [llvm-dev] [RFC] Pagerando: Page-granularity code
> randomization
>
> I could understand a TLB hit if functions that originally happened to
> be on the same page were spread across many pages, raising the iTLB
> footprint for a given loop, etc. (reduced spatial locality). For
> pagerando, since we're splitting on 4k page boundaries and can keep
> spatial locality (or attempt to improve it), I'm not sure that TLB
> misses will be a large factor. I expect that the runtime overhead of
> inter-page indirection will dominate any TLB impact.
>
> On Mon, Jun 12, 2017 at 3:31 PM, Sean Silva <[hidden email]> wrote:
> >
> >
> > On Mon, Jun 12, 2017 at 2:13 PM, Michael Spencer via llvm-dev
> > <[hidden email]> wrote:
> >>
> >> On Mon, Jun 12, 2017 at 1:03 PM, Stephen Crane <[hidden email]>
> wrote:
> >>>
> >>> I don't have performance measurements for the new LTO version of
> >>> pagerando yet. I'll definitely be thoroughly measuring performance
> >>> once the current prototype is finished before moving forward, and will
> >>> post results when I have them.
> >>>
> >>> I'm definitely curious about your work and its performance impact.
> >>> Were you randomizing the layout of functions during linking by
> >>> reordering function sections? Or did just enabling -ffunction-sections
> >>> tank performance?
> >>>
> >>> Thanks,
> >>> Stephen
> >>
> >>
> >> -ffunction-sections plus randomization of text section order in the
> linker
> >> was a huge performance hit. It may well be different with only
> randomizing
> >> 4k groupings of sections instead.
> >
> >
> > What overhead would there be from randomizing layout other than iTLB
> cost?
> > And by definition if you are remapping at 4k boundaries then that has no
> > effect on iTLB cost (besides the initial need to split at 4k
> boundaries).
> >
> > -- Sean Silva
> >
> >>
> >>
> >> - Michael Spencer
> >>
> >>>
> >>>
> >>> On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <[hidden email]>
> >>> wrote:
> >>> > On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano
> <[hidden email]>
> >>> > wrote:
> >>> >> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
> >>> >> <[hidden email]> wrote:
> >>> >>> This RFC describes pagerando, an improvement upon ASLR for shared
> >>> >>> libraries. We're planning to submit this work for upstreaming and
> >>> >>> would appreciate feedback before we get to the patch submission
> >>> >>> stage.
> >>> >>>
> >>> >>> Pagerando randomizes the location of individual memory pages (ASLR
> >>> >>> only randomizes the library base address). This increases security
> >>> >>> against code-reuse attacks (such as ROP) by tolerating pointer
> leaks.
> >>> >>> Pagerando splits libraries into page-aligned bins at compile time.
> At
> >>> >>> load time, each bin is mapped to a random address. The code in
> each
> >>> >>> bin is immutable and thus shared between processes.
> >>> >>>
> >>> >>> To implement pagerando, the compiler and linker need to build
> shared
> >>> >>> libraries with text segments split into page-aligned (and ideally
> >>> >>> page-sized) bins. All inter-bin references are indirected through
> a
> >>> >>> table initialized by the dynamic loader that holds the absolute
> >>> >>> address of each bin. At load time the loader randomly chooses an
> >>> >>> address for each bin and maps the bin pages from disk into memory.
> >>> >>>
> >>> >>> We're focusing on ARM and AArch64 initially, although there is
> >>> >>> nothing
> >>> >>> particularly target specific that precludes support for other LLVM
> >>> >>> backends.
> >>> >>>
> >>> >>> ## Design Goals
> >>> >>>
> >>> >>> 1. Improve security over ASLR. The randomization granularity
> >>> >>> determines how much information a single code pointer leaks. A
> >>> >>> pointer
> >>> >>> to a page reveals less about the location of other code than a
> >>> >>> pointer
> >>> >>> into a contiguous library would.
> >>> >>> 2. Avoid randomizing files on disk. Modern operating systems
> provide
> >>> >>> verified boot techniques to detect tampering with files.
> Randomizing
> >>> >>> the on-disk layout of system libraries would interfere with the
> >>> >>> trusted boot process. Randomizing libraries at compile or link
> time
> >>> >>> would also needlessly complicate deployment and provisioning.
> >>> >>> 3. Preserve code page sharing. The OS reduces memory usage by
> mapping
> >>> >>> shared file pages to the same physical memory in each process and
> >>> >>> locates these pages at different virtual addresses with ASLR. To
> >>> >>> preserve sharing of code pages, we cannot modify the contents of
> >>> >>> file-mapped pages at load time and are restricted to changing
> their
> >>> >>> ordering and placement in the virtual address space.
> >>> >>> 4. Backwards compatibility. Randomized code must interoperate
> >>> >>> transparently with existing, unmodified executables and shared
> >>> >>> libraries. Calls into randomized code must work as-is according to
> >>> >>> the
> >>> >>> normal ABI.
> >>> >>> 5. Compatibility with other mitigations. Enabling randomization
> must
> >>> >>> not preclude deploying other mitigations such as control-flow
> >>> >>> integrity as well.
> >>> >>>
> >>> >>> ## Pagerando Design
> >>> >>>
> >>> >>> Pagerando requires a platform-specific extension to the dynamic
> >>> >>> loading ABI for compatible libraries to opt-in to. In order to
> >>> >>> decouple the address of each code bin (segment) from that of other
> >>> >>> bins and global data, we must disallow relative addressing between
> >>> >>> different bin segments as well as between legacy segments and bin
> >>> >>> segments.
> >>> >>>
> >>> >>> To prepare a library for pagerando, the compiler must first
> allocate
> >>> >>> functions into page-aligned bins corresponding to segments in the
> >>> >>> final ELF file. Since these bins will be independently positioned,
> >>> >>> the
> >>> >>> compiler must redirect all inter-bin references through an
> >>> >>> indirection
> >>> >>> table – the Page Offset Table (POT) – which stores the virtual
> >>> >>> address
> >>> >>> of each bin in the library. Indices of POT entries and bin offsets
> >>> >>> are
> >>> >>> statically determined at link time so code will not require any
> >>> >>> dynamic relocations to reference functions in another bin or
> globals
> >>> >>> outside of bins. We reserve a register in pagerando-compatible
> code
> >>> >>> to
> >>> >>> hold the address of the POT. This register is initialized on entry
> to
> >>> >>> the shared library. At load time the dynamic loader maps code bins
> at
> >>> >>> independent, random addresses and updates the dynamic relocations
> in
> >>> >>> the POT.
> >>> >>>
> >>> >>> Reserving a register to hold the POT address changes the internal
> ABI
> >>> >>> calling convention and requires that the POT register be correctly
> >>> >>> initialized when entering a library from external code. To
> initialize
> >>> >>> the register, the compiler emits entry wrappers which save the old
> >>> >>> contents of the POT register if necessary, initialize the POT
> >>> >>> register, and call the target function. Each externally visible
> >>> >>> function (conservatively including all address taken functions)
> needs
> >>> >>> an entry wrapper which replaces the function for all external
> uses.
> >>> >>>
> >>> >>> To optimally pack functions into bins and avoid new static
> >>> >>> relocations, we propose using (traditional) LTO. With new static
> >>> >>> relocations (i.e. linker cooperation), LTO would not be necessary,
> >>> >>> but
> >>> >>> it is still desirable for more efficient bin packing.
> >>> >>>
> >>> >>> The design of pagerando is based on the mitigations proposed by
> >>> >>> Backes
> >>> >>> and Nürnberger [1], with improvements for compatibility and
> >>> >>> deployability. The present design is a refinement of our first
> >>> >>> pagerando prototype [2].
> >>> >>>
> >>> >>> ## LLVM Changes
> >>> >>>
> >>> >>> To implement pagerando, we propose the following LLVM changes:
> >>> >>>
> >>> >>> New module pass to create entry wrapper functions. This pass will
> >>> >>> create entry wrappers as described above and replace exported
> >>> >>> function
> >>> >>> names and all address taken uses with the wrapper. This pass will
> >>> >>> only
> >>> >>> be run when pagerando is enabled.
> >>> >>>
> >>> >>> Instruction Lowering. Pagerando-compatible code must access all
> >>> >>> global
> >>> >>> values (including functions) through the POT since PC-relative
> memory
> >>> >>> addressing is not allowed between a bin and another segment. We
> >>> >>> propose that when pagerando is enabled, all global variable
> accesses
> >>> >>> from functions marked as pagerando-compatible must be lowered into
> >>> >>> GOT-relative accesses and added to the GOT address loaded from the
> >>> >>> POT
> >>> >>> (currently stored in the first POT entry). Lowering of direct
> >>> >>> function
> >>> >>> calls targeting pagerando-compatible code is slightly more
> >>> >>> complicated
> >>> >>> because we need to determine the POT index of the bin containing
> the
> >>> >>> target function if the target is not in the same bin. However, we
> >>> >>> can't properly allocate functions to bins before they are lowered
> and
> >>> >>> an approximate size is available. Therefore, during lowering we
> >>> >>> should
> >>> >>> assume that all function calls must be made indirectly through the
> >>> >>> POT
> >>> >>> with the computation of the POT index and bin offset of the target
> >>> >>> function postponed until assembly printing.
> >>> >>>
> >>> >>> New machine module LTO pass to allocate functions into bins. This
> >>> >>> pass
> >>> >>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
> >>> >>> (MachineInstr) so that it knows (approximately) how large the
> final
> >>> >>> function code will be. Functions can also be packed in such a way
> >>> >>> that
> >>> >>> the number of inter-bin calls are minimized by taking the function
> >>> >>> call graph and/or execution profiles into account while packing.
> This
> >>> >>> pass only needs to run when pagerando is enabled.
> >>> >>>
> >>> >>> Code Emission. After functions are assigned to bins, we create an
> >>> >>> individual MCSection for each bin. These MCSections will map to
> >>> >>> independent segments during linking. The AsmPrinter is responsible
> >>> >>> for
> >>> >>> emitting the POT entries during code emission. We cannot easily
> >>> >>> represent the POT as a standard IR object because it needs to
> contain
> >>> >>> bin (MCSection) addresses. The AsmPrinter instead can query the
> >>> >>> MCContext for the list of bin symbols and emit these symbols
> directly
> >>> >>> into a global POT array.
> >>> >>>
> >>> >>> Gold Plugin Interface. If using LTO to build the module, LLVM can
> >>> >>> generate the complete POT for the module and instrument all
> >>> >>> references
> >>> >>> that need to use the POT. However, we must still ensure that bin
> >>> >>> sections are each placed into an independent segment so that the
> >>> >>> dynamic loader can map each bin separately. The gold plugin
> interface
> >>> >>> currently provides support to assign sections to unique output
> >>> >>> segments. However, it does not yet provide plugins an opportunity
> to
> >>> >>> call this interface for new, plugin-created input files. Gold
> >>> >>> requires
> >>> >>> that the plugin provide the file handle of the input section to
> >>> >>> assign
> >>> >>> a section to a unique segment. We will need to upstream a small
> patch
> >>> >>> for gold that provides a new callback to the LTO plugin when gold
> >>> >>> receives a new, plugin-generated input file. This would allow the
> >>> >>> plugin to obtain the new file’s handle and map its sections to
> unique
> >>> >>> segments. The linker must mark pagerando bin segments in such a
> way
> >>> >>> that the dynamic loader knows that it can randomize each bin
> segment
> >>> >>> independently. We propose a new ELF segment flag PF_RAND_ADDR that
> >>> >>> can
> >>> >>> communicate this for each compatible segment. The compiler and/or
> >>> >>> linker must add this flag to compatible segments for the loader to
> >>> >>> recognize and randomize the relevant segments.
> >>> >>>
> >>> >>> ## Target-Specific Details
> >>> >>>
> >>> >>> We will initially support pagerando for ARM and AArch64, so
> several
> >>> >>> details are worth considering on those targets. For ARM/AArch64,
> the
> >>> >>> r9 register is a platform-specific register that can be used as
> the
> >>> >>> static base register, which is similar in many ways to pagerando.
> >>> >>> When
> >>> >>> not specified by the platform, r9 is a callee-saved general-
> purpose
> >>> >>> register. Thus, using r9 as the POT register will be backwards
> >>> >>> compatible when calling out of pagerando code into either legacy
> code
> >>> >>> or a different module; the callee will preserve r9 for use after
> >>> >>> returning to pagerando code. In AArch64, r18 is designated as a
> >>> >>> platform-specific register, however, it is not specified as
> >>> >>> callee-saved when not reserved by the target platform. Thus, to
> >>> >>> interoperate with unmodified legacy AArch64 software, we would
> need
> >>> >>> to
> >>> >>> save r18 in pagerando code before calling into any external code.
> >>> >>> When
> >>> >>> using LTO, the compiler will see the entire module and therefore
> be
> >>> >>> able to identify calls into external vs internal code. Without
> LTO,
> >>> >>> it
> >>> >>> will likely be more efficient to use a callee-saved register to
> avoid
> >>> >>> the need to save the POT register before each call. We will
> >>> >>> experiment
> >>> >>> with both caller- and callee-saved registers to determine which is
> >>> >>> most efficient.
> >>> >>>
> >>> >>>
> >>> >>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained
> >>> >>> memory
> >>> >>> randomization practical by allowing code sharing. In USENIX
> Security
> >>> >>> Symposium, 2014. https://www.usenix.org/node/184466
> >>> >>>
> >>> >>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization:
> Haven’t
> >>> >>> we solved this problem yet? In IEEE Cybersecurity Development
> >>> >>> Conference (SecDev), 2016.
> >>> >>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
> >>> >>> _______________________________________________
> >>> >>> LLVM Developers mailing list
> >>> >>> [hidden email]
> >>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>> >>
> >>> >> Out of curiosity, Did you measure what's the impact on performances
> >>> >> of the generated executable? We tried something akin to your
> proposal
> >>> >> in the past (i.e. randomizing ELF sections layout) and it turned
> out
> >>> >> to be a
> >>> >> sledgehammer for performances (in some cases, i.e. when
> >>> >> -ffunction-sections/-fdata-sections was specified the performances
> of
> >>> >> the runtime executable dropped by > 10% [cc:ing Michael as he did
> the
> >>> >> measurements]).
> >>> >>
> >>> >
> >>> > To clarify, I read your paper and I see some benchmarks see
> >>> > substantial degradations (6.5%), but in your "future work" section
> you
> >>> > describe techniques to mitigate the drop, and I wonder if you ever
> got
> >>> > to implement them and got new measurements.
> >>> >
> >>> > Thanks,
> >>> >
> >>> > --
> >>> > Davide
> >>> >
> >>> > "There are no solved problems; there are only problems that are more
> >>> > or less solved" -- Henri Poincare
> >>
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> [hidden email]
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
In reply to this post by Gerolf Hoflehner via llvm-dev


On Mon, Jun 12, 2017 at 3:41 PM, Stephen Crane <[hidden email]> wrote:
I could understand a TLB hit if functions that originally happened to
be on the same page were spread across many pages, raising the iTLB
footprint for a given loop, etc. (reduced spatial locality). For
pagerando, since we're splitting on 4k page boundaries and can keep
spatial locality (or attempt to improve it), I'm not sure that TLB
misses will be a large factor.

As long as your randomization doesn't cause misses at higher page tables (unlikely unless your are really spreading the pages out through memory, like multiple MB's apart) it shouldn't have any effect at all. And certainly no effect in inner loops where everything will be serviced from L1 iTLB.
 
I expect that the runtime overhead of
inter-page indirection will dominate any TLB impact.

That shouldn't be too high a cost either. Just a load from the GOT and an indirect branch (which will be monomorphic, so perfectly predicted by modern processors in a sufficiently hot loop). So we're talking about an extra load queue entry and for a hot loop that should end up in L1 and so only be a 3 or 4 cycle bubble max (and a parallel independent delay of a couple cycles for the indirect branch prediction compared to a fallthrough).

Overall, this page randomization idea is a really good design.

As long as the DSO is under some fixed size (say 1GB or 4GB or whatever) then with dynamic linker collaboration you can find the GOT by rounding down the current instruction pointer, eliminating the need for the POT. This should save the need for the internal ABI stuff. As long as you are shuffling sections and not spewing them all over memory you can implement the randomization as an in-place shuffling of the pages and thus not increase the maximal distance to the GOT.

So in the end the needed changes would be:
1. compiler change to have it break up sections into 4K (or whatever) chunks, inserting appropriate round-down-PC sequences for GOT access and a possibly a new relocation type for such GOT accesses. Add a new section flag to indicate that sections should be placed in output sections of at most 4K (or whatever is appropriate for the target). For -ffunction-sections -fdata-sections this should only require splitting a small number of sections (i.e. larger than 4K sections). There is no binning in the compiler.
2. linker change to respect the section flag and split output sections containing input sections with such flags into multiple 4K output sections. Also, set the PF_RAND_ADDR flag on such 4K output sections for communicating to the dynamic linker. (extra credit: linker optimization to relax GOT accesses within pages of output sections that will be split)
3. runtime loader change to collect the set of PT_LOAD's marked with PF_RAND_ADDR and perform an in-place shuffle of their load addresses (or some other randomization that doesn't massively expand the VA footprint so that round-down-PC GOT accesses will work) and also any glue needed for round-down-PC GOT accesses to work.

This asking the linker to split an output section into multiple smaller ones seems like reasonably general functionality, so it should be reasonable to build it right into gold (and hopefully LLD! in fact you may find LLD easier to hack on at first). This also should interoperate fairly transparently with any profile-guided or other section ordering heuristics the linker is using as it constructs the initial output sections, eliminating the need for custom LTO binning passes or custom LTO integration.

Overall, avoiding the need for the POT and custom ABI for indirecting through it might be worth it (mostly for implementation simplicity, but also probably performance).

-- Sean Silva
 

On Mon, Jun 12, 2017 at 3:31 PM, Sean Silva <[hidden email]> wrote:
>
>
> On Mon, Jun 12, 2017 at 2:13 PM, Michael Spencer via llvm-dev
> <[hidden email]> wrote:
>>
>> On Mon, Jun 12, 2017 at 1:03 PM, Stephen Crane <[hidden email]> wrote:
>>>
>>> I don't have performance measurements for the new LTO version of
>>> pagerando yet. I'll definitely be thoroughly measuring performance
>>> once the current prototype is finished before moving forward, and will
>>> post results when I have them.
>>>
>>> I'm definitely curious about your work and its performance impact.
>>> Were you randomizing the layout of functions during linking by
>>> reordering function sections? Or did just enabling -ffunction-sections
>>> tank performance?
>>>
>>> Thanks,
>>> Stephen
>>
>>
>> -ffunction-sections plus randomization of text section order in the linker
>> was a huge performance hit. It may well be different with only randomizing
>> 4k groupings of sections instead.
>
>
> What overhead would there be from randomizing layout other than iTLB cost?
> And by definition if you are remapping at 4k boundaries then that has no
> effect on iTLB cost (besides the initial need to split at 4k boundaries).
>
> -- Sean Silva
>
>>
>>
>> - Michael Spencer
>>
>>>
>>>
>>> On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <[hidden email]>
>>> wrote:
>>> > On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <[hidden email]>
>>> > wrote:
>>> >> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
>>> >> <[hidden email]> wrote:
>>> >>> This RFC describes pagerando, an improvement upon ASLR for shared
>>> >>> libraries. We're planning to submit this work for upstreaming and
>>> >>> would appreciate feedback before we get to the patch submission
>>> >>> stage.
>>> >>>
>>> >>> Pagerando randomizes the location of individual memory pages (ASLR
>>> >>> only randomizes the library base address). This increases security
>>> >>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
>>> >>> Pagerando splits libraries into page-aligned bins at compile time. At
>>> >>> load time, each bin is mapped to a random address. The code in each
>>> >>> bin is immutable and thus shared between processes.
>>> >>>
>>> >>> To implement pagerando, the compiler and linker need to build shared
>>> >>> libraries with text segments split into page-aligned (and ideally
>>> >>> page-sized) bins. All inter-bin references are indirected through a
>>> >>> table initialized by the dynamic loader that holds the absolute
>>> >>> address of each bin. At load time the loader randomly chooses an
>>> >>> address for each bin and maps the bin pages from disk into memory.
>>> >>>
>>> >>> We're focusing on ARM and AArch64 initially, although there is
>>> >>> nothing
>>> >>> particularly target specific that precludes support for other LLVM
>>> >>> backends.
>>> >>>
>>> >>> ## Design Goals
>>> >>>
>>> >>> 1. Improve security over ASLR. The randomization granularity
>>> >>> determines how much information a single code pointer leaks. A
>>> >>> pointer
>>> >>> to a page reveals less about the location of other code than a
>>> >>> pointer
>>> >>> into a contiguous library would.
>>> >>> 2. Avoid randomizing files on disk. Modern operating systems provide
>>> >>> verified boot techniques to detect tampering with files. Randomizing
>>> >>> the on-disk layout of system libraries would interfere with the
>>> >>> trusted boot process. Randomizing libraries at compile or link time
>>> >>> would also needlessly complicate deployment and provisioning.
>>> >>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
>>> >>> shared file pages to the same physical memory in each process and
>>> >>> locates these pages at different virtual addresses with ASLR. To
>>> >>> preserve sharing of code pages, we cannot modify the contents of
>>> >>> file-mapped pages at load time and are restricted to changing their
>>> >>> ordering and placement in the virtual address space.
>>> >>> 4. Backwards compatibility. Randomized code must interoperate
>>> >>> transparently with existing, unmodified executables and shared
>>> >>> libraries. Calls into randomized code must work as-is according to
>>> >>> the
>>> >>> normal ABI.
>>> >>> 5. Compatibility with other mitigations. Enabling randomization must
>>> >>> not preclude deploying other mitigations such as control-flow
>>> >>> integrity as well.
>>> >>>
>>> >>> ## Pagerando Design
>>> >>>
>>> >>> Pagerando requires a platform-specific extension to the dynamic
>>> >>> loading ABI for compatible libraries to opt-in to. In order to
>>> >>> decouple the address of each code bin (segment) from that of other
>>> >>> bins and global data, we must disallow relative addressing between
>>> >>> different bin segments as well as between legacy segments and bin
>>> >>> segments.
>>> >>>
>>> >>> To prepare a library for pagerando, the compiler must first allocate
>>> >>> functions into page-aligned bins corresponding to segments in the
>>> >>> final ELF file. Since these bins will be independently positioned,
>>> >>> the
>>> >>> compiler must redirect all inter-bin references through an
>>> >>> indirection
>>> >>> table – the Page Offset Table (POT) – which stores the virtual
>>> >>> address
>>> >>> of each bin in the library. Indices of POT entries and bin offsets
>>> >>> are
>>> >>> statically determined at link time so code will not require any
>>> >>> dynamic relocations to reference functions in another bin or globals
>>> >>> outside of bins. We reserve a register in pagerando-compatible code
>>> >>> to
>>> >>> hold the address of the POT. This register is initialized on entry to
>>> >>> the shared library. At load time the dynamic loader maps code bins at
>>> >>> independent, random addresses and updates the dynamic relocations in
>>> >>> the POT.
>>> >>>
>>> >>> Reserving a register to hold the POT address changes the internal ABI
>>> >>> calling convention and requires that the POT register be correctly
>>> >>> initialized when entering a library from external code. To initialize
>>> >>> the register, the compiler emits entry wrappers which save the old
>>> >>> contents of the POT register if necessary, initialize the POT
>>> >>> register, and call the target function. Each externally visible
>>> >>> function (conservatively including all address taken functions) needs
>>> >>> an entry wrapper which replaces the function for all external uses.
>>> >>>
>>> >>> To optimally pack functions into bins and avoid new static
>>> >>> relocations, we propose using (traditional) LTO. With new static
>>> >>> relocations (i.e. linker cooperation), LTO would not be necessary,
>>> >>> but
>>> >>> it is still desirable for more efficient bin packing.
>>> >>>
>>> >>> The design of pagerando is based on the mitigations proposed by
>>> >>> Backes
>>> >>> and Nürnberger [1], with improvements for compatibility and
>>> >>> deployability. The present design is a refinement of our first
>>> >>> pagerando prototype [2].
>>> >>>
>>> >>> ## LLVM Changes
>>> >>>
>>> >>> To implement pagerando, we propose the following LLVM changes:
>>> >>>
>>> >>> New module pass to create entry wrapper functions. This pass will
>>> >>> create entry wrappers as described above and replace exported
>>> >>> function
>>> >>> names and all address taken uses with the wrapper. This pass will
>>> >>> only
>>> >>> be run when pagerando is enabled.
>>> >>>
>>> >>> Instruction Lowering. Pagerando-compatible code must access all
>>> >>> global
>>> >>> values (including functions) through the POT since PC-relative memory
>>> >>> addressing is not allowed between a bin and another segment. We
>>> >>> propose that when pagerando is enabled, all global variable accesses
>>> >>> from functions marked as pagerando-compatible must be lowered into
>>> >>> GOT-relative accesses and added to the GOT address loaded from the
>>> >>> POT
>>> >>> (currently stored in the first POT entry). Lowering of direct
>>> >>> function
>>> >>> calls targeting pagerando-compatible code is slightly more
>>> >>> complicated
>>> >>> because we need to determine the POT index of the bin containing the
>>> >>> target function if the target is not in the same bin. However, we
>>> >>> can't properly allocate functions to bins before they are lowered and
>>> >>> an approximate size is available. Therefore, during lowering we
>>> >>> should
>>> >>> assume that all function calls must be made indirectly through the
>>> >>> POT
>>> >>> with the computation of the POT index and bin offset of the target
>>> >>> function postponed until assembly printing.
>>> >>>
>>> >>> New machine module LTO pass to allocate functions into bins. This
>>> >>> pass
>>> >>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
>>> >>> (MachineInstr) so that it knows (approximately) how large the final
>>> >>> function code will be. Functions can also be packed in such a way
>>> >>> that
>>> >>> the number of inter-bin calls are minimized by taking the function
>>> >>> call graph and/or execution profiles into account while packing. This
>>> >>> pass only needs to run when pagerando is enabled.
>>> >>>
>>> >>> Code Emission. After functions are assigned to bins, we create an
>>> >>> individual MCSection for each bin. These MCSections will map to
>>> >>> independent segments during linking. The AsmPrinter is responsible
>>> >>> for
>>> >>> emitting the POT entries during code emission. We cannot easily
>>> >>> represent the POT as a standard IR object because it needs to contain
>>> >>> bin (MCSection) addresses. The AsmPrinter instead can query the
>>> >>> MCContext for the list of bin symbols and emit these symbols directly
>>> >>> into a global POT array.
>>> >>>
>>> >>> Gold Plugin Interface. If using LTO to build the module, LLVM can
>>> >>> generate the complete POT for the module and instrument all
>>> >>> references
>>> >>> that need to use the POT. However, we must still ensure that bin
>>> >>> sections are each placed into an independent segment so that the
>>> >>> dynamic loader can map each bin separately. The gold plugin interface
>>> >>> currently provides support to assign sections to unique output
>>> >>> segments. However, it does not yet provide plugins an opportunity to
>>> >>> call this interface for new, plugin-created input files. Gold
>>> >>> requires
>>> >>> that the plugin provide the file handle of the input section to
>>> >>> assign
>>> >>> a section to a unique segment. We will need to upstream a small patch
>>> >>> for gold that provides a new callback to the LTO plugin when gold
>>> >>> receives a new, plugin-generated input file. This would allow the
>>> >>> plugin to obtain the new file’s handle and map its sections to unique
>>> >>> segments. The linker must mark pagerando bin segments in such a way
>>> >>> that the dynamic loader knows that it can randomize each bin segment
>>> >>> independently. We propose a new ELF segment flag PF_RAND_ADDR that
>>> >>> can
>>> >>> communicate this for each compatible segment. The compiler and/or
>>> >>> linker must add this flag to compatible segments for the loader to
>>> >>> recognize and randomize the relevant segments.
>>> >>>
>>> >>> ## Target-Specific Details
>>> >>>
>>> >>> We will initially support pagerando for ARM and AArch64, so several
>>> >>> details are worth considering on those targets. For ARM/AArch64, the
>>> >>> r9 register is a platform-specific register that can be used as the
>>> >>> static base register, which is similar in many ways to pagerando.
>>> >>> When
>>> >>> not specified by the platform, r9 is a callee-saved general-purpose
>>> >>> register. Thus, using r9 as the POT register will be backwards
>>> >>> compatible when calling out of pagerando code into either legacy code
>>> >>> or a different module; the callee will preserve r9 for use after
>>> >>> returning to pagerando code. In AArch64, r18 is designated as a
>>> >>> platform-specific register, however, it is not specified as
>>> >>> callee-saved when not reserved by the target platform. Thus, to
>>> >>> interoperate with unmodified legacy AArch64 software, we would need
>>> >>> to
>>> >>> save r18 in pagerando code before calling into any external code.
>>> >>> When
>>> >>> using LTO, the compiler will see the entire module and therefore be
>>> >>> able to identify calls into external vs internal code. Without LTO,
>>> >>> it
>>> >>> will likely be more efficient to use a callee-saved register to avoid
>>> >>> the need to save the POT register before each call. We will
>>> >>> experiment
>>> >>> with both caller- and callee-saved registers to determine which is
>>> >>> most efficient.
>>> >>>
>>> >>>
>>> >>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained
>>> >>> memory
>>> >>> randomization practical by allowing code sharing. In USENIX Security
>>> >>> Symposium, 2014. https://www.usenix.org/node/184466
>>> >>>
>>> >>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
>>> >>> we solved this problem yet? In IEEE Cybersecurity Development
>>> >>> Conference (SecDev), 2016.
>>> >>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
>>> >>> _______________________________________________
>>> >>> LLVM Developers mailing list
>>> >>> [hidden email]
>>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >>
>>> >> Out of curiosity, Did you measure what's the impact on performances
>>> >> of the generated executable? We tried something akin to your proposal
>>> >> in the past (i.e. randomizing ELF sections layout) and it turned out
>>> >> to be a
>>> >> sledgehammer for performances (in some cases, i.e. when
>>> >> -ffunction-sections/-fdata-sections was specified the performances of
>>> >> the runtime executable dropped by > 10% [cc:ing Michael as he did the
>>> >> measurements]).
>>> >>
>>> >
>>> > To clarify, I read your paper and I see some benchmarks see
>>> > substantial degradations (6.5%), but in your "future work" section you
>>> > describe techniques to mitigate the drop, and I wonder if you ever got
>>> > to implement them and got new measurements.
>>> >
>>> > Thanks,
>>> >
>>> > --
>>> > Davide
>>> >
>>> > "There are no solved problems; there are only problems that are more
>>> > or less solved" -- Henri Poincare
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev
Thanks for the ideas. I particularly like the GOT access via masking.
However I do have some security concerns over completely eliminating
the POT.

On Mon, Jun 12, 2017 at 5:48 PM, Sean Silva <[hidden email]> wrote:
> As long as the DSO is under some fixed size (say 1GB or 4GB or whatever)
> then with dynamic linker collaboration you can find the GOT by rounding down
> the current instruction pointer, eliminating the need for the POT. This
> should save the need for the internal ABI stuff. As long as you are
> shuffling sections and not spewing them all over memory you can implement
> the randomization as an in-place shuffling of the pages and thus not
> increase the maximal distance to the GOT.

I think this is a great idea for referencing the GOT and global data.
We should be careful that keeping the DSO in a fixed range and placing
.rodata at a fixed alignment still allows sufficient entropy to
mitigate guessing and disclosure attacks. Shuffling in place is
problematic without execute-only (non-readable) code page permissions,
since an attacker could simply do a linear scan of the DSO's code,
disassemble and reuse code in that DSO. On platforms that support
execute-only permissions, I think an in-place shuffle is fine.

I'm not sure we can keep code page pointers in the GOT/global segment
and still keep them hidden from an attacker with a read primitive. An
attacker who has any global data pointer can trivially find the GOT
and thus code page addresses if we keep them in the GOT. Even if we
were to decouple the GOT from other address-taken global data but
still place the GOT at a predictable location (masking off low bits),
then it should still be fairly easy for an attacker to locate it.

Even if we have to keep the POT, eliminating the extra load from the
POT for global access by masking the PC address should be a
significant performance optimization.

> So in the end the needed changes would be:
> 1. compiler change to have it break up sections into 4K (or whatever)
> chunks, inserting appropriate round-down-PC sequences for GOT access and a
> possibly a new relocation type for such GOT accesses. Add a new section flag
> to indicate that sections should be placed in output sections of at most 4K
> (or whatever is appropriate for the target). For -ffunction-sections
> -fdata-sections this should only require splitting a small number of
> sections (i.e. larger than 4K sections). There is no binning in the
> compiler.
> 2. linker change to respect the section flag and split output sections
> containing input sections with such flags into multiple 4K output sections.
> Also, set the PF_RAND_ADDR flag on such 4K output sections for communicating
> to the dynamic linker. (extra credit: linker optimization to relax GOT
> accesses within pages of output sections that will be split)
> 3. runtime loader change to collect the set of PT_LOAD's marked with
> PF_RAND_ADDR and perform an in-place shuffle of their load addresses (or
> some other randomization that doesn't massively expand the VA footprint so
> that round-down-PC GOT accesses will work) and also any glue needed for
> round-down-PC GOT accesses to work.
>
> This asking the linker to split an output section into multiple smaller ones
> seems like reasonably general functionality, so it should be reasonable to
> build it right into gold (and hopefully LLD! in fact you may find LLD easier
> to hack on at first). This also should interoperate fairly transparently
> with any profile-guided or other section ordering heuristics the linker is
> using as it constructs the initial output sections, eliminating the need for
> custom LTO binning passes or custom LTO integration.

I originally prototyped pagerando kind of similar to this. The linker
took individual function sections and binned them into pages,
inserting the POT indirection at call sites by appending small stubs
that looked up the function address and jumped to it. These stubs
added too much overhead (code size and runtime), so I wanted to insert
page inter-work at code generation time.

As you suggest, the compiler could certainly add the indirection for
every global access and call and leave final binning up to the linker
itself. However, if the compiler does not know which functions will be
binned together, it must indirect every function call, even for
callees that will be in the same bin as the caller. Binning in the
compiler allows us to optimize function calls inside the same bin to
direct, PC-relative calls, which I think is a critical optimization
for hot call sites.

If we could somehow teach the linker how to rewrite indirect
inter-page calls to direct intra-page calls, binning in the linker
would be perfectly viable. However, I'm concerned that we can't do
that safely in general because doing so would require correct
disassembly and rewriting of the call site. The computation of the
callee address may be spread across the function or stored in a
register (e.g. for repeated calls to the same function). To me,
rewriting these calls needs to be done at code-generation time,
although of course I'm open to alternatives.

Thanks,
Stephen
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Gerolf Hoflehner via llvm-dev


On Wed, Jun 14, 2017 at 4:03 PM, Stephen Crane <[hidden email]> wrote:
Thanks for the ideas. I particularly like the GOT access via masking.
However I do have some security concerns over completely eliminating
the POT.

IANA security person, so take all my advice with a grain of salt!
 

On Mon, Jun 12, 2017 at 5:48 PM, Sean Silva <[hidden email]> wrote:
> As long as the DSO is under some fixed size (say 1GB or 4GB or whatever)
> then with dynamic linker collaboration you can find the GOT by rounding down
> the current instruction pointer, eliminating the need for the POT. This
> should save the need for the internal ABI stuff. As long as you are
> shuffling sections and not spewing them all over memory you can implement
> the randomization as an in-place shuffling of the pages and thus not
> increase the maximal distance to the GOT.

I think this is a great idea for referencing the GOT and global data.
We should be careful that keeping the DSO in a fixed range and placing
.rodata at a fixed alignment still allows sufficient entropy to
mitigate guessing and disclosure attacks. Shuffling in place is
problematic without execute-only (non-readable) code page permissions,
since an attacker could simply do a linear scan of the DSO's code,
disassemble and reuse code in that DSO. On platforms that support
execute-only permissions, I think an in-place shuffle is fine.

The PF_RAND_PAGE flag could be defined to semantically permit the loader to change the addresses. It wouldn't have to be exactly in place (I meant that more as a very simple concrete thing the loader could do). In place is an extreme case where the VA is not increased at all. At a performance cost, you could insert unmapped pages or whatever you need for security. Just be careful that expanding the VA will cause a performance degradation (how much remains to be measured). The cost is basically that when the hardware page table walker will have to do an extra serially dependent memory access during its page table lookups. As long as the working set fits in iTLB it won't have any effect, but beyond that you will suffer some performance hit. If you already have a prototype working, can you try taking some measurement by just spacing out the (assumed 4k) pages 2M apart vs some much smaller separation? That should get a reasonable measurement of the overhead. Also, make sure to measure on a program with a substantial icache footprint, like clang or some other large complex program.
 

I'm not sure we can keep code page pointers in the GOT/global segment
and still keep them hidden from an attacker with a read primitive. An
attacker who has any global data pointer can trivially find the GOT
and thus code page addresses if we keep them in the GOT. Even if we
were to decouple the GOT from other address-taken global data but
still place the GOT at a predictable location (masking off low bits),
then it should still be fairly easy for an attacker to locate it.



This would affect POT's too, which would be easy to find from the text, no? At the end of the day something has to be at a predictable offset from the text (unless you are using a TLS register or something to hold it, but that has other larger issues as pointed out by one of your references)

E.g. the externally facing ABI in your proposal would be all stubs loading the internal ABI POT register. Then the attacker would just look at what the stub does to access it.
(or would the stubs use TLS or something expensive to get access to the POT address?)

I'm just saying this because adding a new "GOT-like" thing is actually pretty annoying (e.g. mips multi-GOT). If the changes can be limited to putting .got.plt in a separate PT_LOAD that would be pretty easy and non-invasive comparatively.


 
Even if we have to keep the POT, eliminating the extra load from the
POT for global access by masking the PC address should be a
significant performance optimization.

> So in the end the needed changes would be:
> 1. compiler change to have it break up sections into 4K (or whatever)
> chunks, inserting appropriate round-down-PC sequences for GOT access and a
> possibly a new relocation type for such GOT accesses. Add a new section flag
> to indicate that sections should be placed in output sections of at most 4K
> (or whatever is appropriate for the target). For -ffunction-sections
> -fdata-sections this should only require splitting a small number of
> sections (i.e. larger than 4K sections). There is no binning in the
> compiler.
> 2. linker change to respect the section flag and split output sections
> containing input sections with such flags into multiple 4K output sections.
> Also, set the PF_RAND_ADDR flag on such 4K output sections for communicating
> to the dynamic linker. (extra credit: linker optimization to relax GOT
> accesses within pages of output sections that will be split)
> 3. runtime loader change to collect the set of PT_LOAD's marked with
> PF_RAND_ADDR and perform an in-place shuffle of their load addresses (or
> some other randomization that doesn't massively expand the VA footprint so
> that round-down-PC GOT accesses will work) and also any glue needed for
> round-down-PC GOT accesses to work.
>
> This asking the linker to split an output section into multiple smaller ones
> seems like reasonably general functionality, so it should be reasonable to
> build it right into gold (and hopefully LLD! in fact you may find LLD easier
> to hack on at first). This also should interoperate fairly transparently
> with any profile-guided or other section ordering heuristics the linker is
> using as it constructs the initial output sections, eliminating the need for
> custom LTO binning passes or custom LTO integration.

I originally prototyped pagerando kind of similar to this. The linker
took individual function sections and binned them into pages,
inserting the POT indirection at call sites by appending small stubs
that looked up the function address and jumped to it. These stubs
added too much overhead (code size and runtime), so I wanted to insert
page inter-work at code generation time.

As you suggest, the compiler could certainly add the indirection for
every global access and call and leave final binning up to the linker
itself. However, if the compiler does not know which functions will be
binned together, it must indirect every function call, even for
callees that will be in the same bin as the caller. Binning in the
compiler allows us to optimize function calls inside the same bin to
direct, PC-relative calls, which I think is a critical optimization
for hot call sites.

If we could somehow teach the linker how to rewrite indirect
inter-page calls to direct intra-page calls, binning in the linker
would be perfectly viable. However, I'm concerned that we can't do
that safely in general because doing so would require correct
disassembly and rewriting of the call site. The computation of the
callee address may be spread across the function or stored in a
register (e.g. for repeated calls to the same function). To me,
rewriting these calls needs to be done at code-generation time,
although of course I'm open to alternatives.


Eliminating GOT access is a standard linker optimization these days. Look at e.g. R_X86_64_GOTPCRELX
(the psABI doc has examples; see "B.2 Optimize GOTPCRELX Relocations" in https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r252.pdf)
The linker does not need to disassemble anything because the compiler has emitted a special relocation.
The new relocations for pagerando would indicate to the linker the relaxation semantics for pagerando call sites.

-- Sean Silva
 
Thanks,
Stephen


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Loading...