[llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5

Tim Northover via llvm-dev
Hello,

I would like to share my successful bring-up of LLVM/clang-built Linux ARM(32-bit) hammerhead kernel for Android running on my Nexus 5 smartphone. After having successfully brought up LLVM/clang-built Linux kernel(since v4.15.7 to the most recent v4.17) on x86_64, I was interested in accomplishing the same on the ARM platform of my Nexus 5 - Android smartphone. So, here is the complete report of the same for the interested people. 

The main advantage of the clang-built Android ARM(32-bit) hammerhead kernel for my Nexus 5 has been the better battery usage when compared to that of gcc-built kernel, with the same kernel config and hardware(my Nexus 5 Android Smartphone). Details of the same can be found below.

NOTE : By the way, I came across some reports of ARM64 clang-kernel for some Android Smartphones - but, the information over there did *not* help for my ARM32 clang-kernel case of Nexus 5(hammerhead). So, I started off this project from *scratch* and it has been lot of *entirely my own original work* to first successfully build the ARM32 clang-kernel for Nexus 5(hammerhead) and second to make it *actually work* on the real hardware - Nexus 5.

For easy reading with formatting, etc : https://ubuntuforums.org/showthread.php?t=2394035

Cheers.


Android ARM(32-bit) clang-kernel bring-up for Nexus 5(hammerhead)

[Android Version Information] & [Battery Usage of a clang-built kernel ~ better than that of gcc-built kernel (shows one of the instances)]



[1] Android NDK r13b [LLVM/clang + binutils(as, ld, etc)] [2] Android NDK r17 [LLVM/clang + binutils(as, ld, etc)]



[3] Main LLVM/clang + Android NDK r13b binutils(as, ld, etc) [4] Main LLVM/clang + Android NDK r17 binutils(as, ld, etc)



[5] Snapdragon Qualcomm LLVM/clang + NDK r13b binutils(as, etc) [6] Snapdragon Qualcomm LLVM/clang + NDK r13b binutils(as, etc)

[Average Battery Usage]


BUILD SYSTEM INFORMATION


Code:
#### Build system information ####

exp@exp:~$ 
exp@exp:~$ sudo dmidecode -t system | grep "Manufacturer:\|Version:"
    Manufacturer: LENOVO
    Version: Lenovo Y50-70 Touch
exp@exp:~$ 
exp@exp:~$

exp@exp:~$ 
exp@exp:~$ sudo dmidecode -t processor | grep "Version\|Family:"
    Family: Core i7
    Version: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
exp@exp:~$ 
exp@exp:~$

exp@exp:~$ 
exp@exp:~$ cat /proc/meminfo | grep MemTotal
MemTotal:       16332968 kB
exp@exp:~$ 
exp@exp:~$ 

exp@exp:~$ 
exp@exp:~$  lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 17.10
Release:    17.10
Codename:    artful
exp@exp:~$ 
exp@exp:~$

BUILD SUMMARY

Code:
#### Total build time ####


38m6.816s

#### Build times ####


[GCC NDK r13b] : 4m15.596s
[GCC NDK r17] : 4m13.983s
[Android NDK r13b : LLVM/clang + binutils(ld and as)] : 4m3.665s
[Android NDK r17 : LLVM/clang + binutils(ld and as)] : 4m4.683s
[Main LLVM/clang + Android NDK r13b binutils(ld and as)] : 6m8.064s
[Main LLVM/clang + Android NDK r17 binutils(ld and as)] : 6m3.457s
[Qualcomm Snapdragon LLVM/clang + Android NDK r13b binutils(ld and as)] : 4m32.581s
[Qualcomm Snapdragon LLVM/clang + Android NDK r17 binutils(ld and as)] : 4m44.779s
LONGEST AND SHORTEST BUILD

Code:
##### Longest build ####
Name : Main LLVM/clang + Android NDK r13b binutils(ld and as)
Time : 6m8.064s
boot.img : boot-main-llvm-clang-ndk-r13b-binutils-ld-as.img
zImage-dtb : zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as


##### Shortest build ####
Name : Android NDK r13b : LLVM/clang + binutils(ld and as)
Time : 4m3.665s
boot.img : boot-ndk-r13b-clang-llvm-binutils-ld-as.img
zImage-dtb : zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as

LARGEST AND SMALLEST IMAGES

Code:
     ◈ Largest boot img : boot-ndk-r17-gcc.img ❏ Size : 13M (12984320 bytes)
     ◈ Smallest boot img 1 : boot-main-llvm-clang-ndk-r17-binutils-ld-as.img ❏ Size : 11M (11272192 bytes)
     ◈ Smallest boot img 2 : boot-main-llvm-clang-ndk-r13b-binutils-ld-as.img ❏ Size : 11M (11272192 bytes)


     ◈ Largest zImage-dtb : zImage-dtb-ndk-r17-gcc ❏ Size : 12M (11844904 bytes)
     ◈ Smallest zImage-dtb : zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as ❏ Size : 9.7M (10132600 bytes)
BOOT IMAGES SUMMARY

Code:
     ◈ boot-ndk-r17-gcc.img ❏ Size ~ 13M (12984320 bytes)
     ◈ boot-ndk-r13b-gcc.img ❏ Size ~ 13M (12978176 bytes)
     ◈ boot-ndk-r13b-clang-llvm-binutils-ld-as.img ❏ Size ~ 13M (12625920 bytes)
     ◈ boot-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as.img ❏ Size ~ 12M (11610112 bytes)
     ◈ boot-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as.img ❏ Size ~ 12M (11610112 bytes)
     ◈ boot-ndk-r17-clang-llvm-binutils-ld-as.img ❏ Size ~ 11M (11476992 bytes)
     ◈ boot-main-llvm-clang-ndk-r13b-binutils-ld-as.img ❏ Size ~ 11M (11272192 bytes)
     ◈ boot-main-llvm-clang-ndk-r17-binutils-ld-as.img ❏ Size ~ 11M (11272192 bytes)

ZIMAGE-DTB SUMMARY

Code:
     ◈ zImage-dtb-ndk-r17-gcc ❏ Size ~ 12M (11844904 bytes)
     ◈ zImage-dtb-ndk-r13b-gcc ❏ Size ~ 12M (11837640 bytes)
     ◈ zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as ❏ Size ~ 11M (11487176 bytes)
     ◈ zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as ❏ Size ~ 10M (10469728 bytes)
     ◈ zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as ❏ Size ~ 10M (10469680 bytes)
     ◈ zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as ❏ Size ~ 10M (10336624 bytes)
     ◈ zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as ❏ Size ~ 10M (10132608 bytes)
     ◈ zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as ❏ Size ~ 10M (10132600 bytes)

RAMDISK INFORMATION

Code:
Ramdisk(pre-built - from RR) : 


     ◈ boot.img-ramdisk.gz ❏ Size ~ 1M (1136400 bytes)

Clang-KERNEL INFORMATION(from each of the zImage-dtb images)

Code:
exp@exp:~$ 
exp@exp:~$ ../show_kernel_compiler_all.sh 
#### Kernel compiler information ####

NOTE : Analyzing the images by decompressing them based on lz4 magic (\x02\x21\x4c\x18) . . .

Image metadata(zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as) : 
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18' zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5381
++ dd if=zImage-dtb-main-llvm-clang-ndk-r13b-binutils-ld-as bs=5381 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm 1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) - android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:33:21 PDT 2018
++ set +x

Image metadata(zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as) : 
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18' zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as
+++ cut -d: -f 1
+++ tail -1
++ pos1=5373
++ dd if=zImage-dtb-main-llvm-clang-ndk-r17-binutils-ld-as bs=5373 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm 1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) - android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:39:25 PDT 2018
++ set +x

Image metadata(zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as) : 
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18' zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5509
++ dd if=zImage-dtb-ndk-r13b-clang-llvm-binutils-ld-as bs=5509 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ grep 'clang version'
+++ strings -a
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Android clang version 3.8.256229 (based on LLVM 3.8.256229) - android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:23:08 PDT 2018
++ set +x

Image metadata(zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as) : 
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18' zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5449
++ dd if=zImage-dtb-ndk-r17-clang-llvm-binutils-ld-as bs=5449 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Android (4691093 based on r316199) clang version 6.0.2 (https://android.googlesource.com/toolchain/clang 183abd29fc496f55536e7d904e0abae47888fc7f) (https://android.googlesource.com/toolchain/llvm 34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn) - android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:27:14 PDT 2018
++ set +x

Image metadata(zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as) : 
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18' zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5417
++ dd if=zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r13b-binutils-ld-as bs=5417 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK - android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:43:58 PDT 2018
++ set +x

Image metadata(zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as) : 
+++ grep -P -a -b --only-matching '\x02\x21\x4c\x18' zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as
+++ tail -1
+++ cut -d: -f 1
++ pos1=5409
++ dd if=zImage-dtb-qualcomm-snapdragon-llvm-clang-ndk-r17-binutils-ld-as bs=5409 skip=1
++ lz4 -d
++ eclang
++ head -1
+++ strings -a
+++ grep 'clang version'
Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK - android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:48:42 PDT 2018
++ set +x

exp@exp:~$ 
exp@exp:~$


Clang-KERNEL INFORMATION(from dmesg extracted from each of the boot instances)

Code:
exp@exp:~$ 
exp@exp:~$ cat android1/android1_dmesg.txt | grep "clang\|Machine"
[    0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Android clang version 3.8.256229 (based on LLVM 3.8.256229) - android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:23:08 PDT 2018
[    0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device Tree), model: LGE MSM 8974 HAMMERHEAD
exp@exp:~$ 
exp@exp:~$ 
exp@exp:~$ cat android2/android2_dmesg.txt | grep "clang\|Machine"
[    0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Android (4691093 based on r316199) clang version 6.0.2 (https://android.googlesource.com/toolchain/clang 183abd29fc496f55536e7d904e0abae47888fc7f) (https://android.googlesource.com/toolchain/llvm 34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn) - android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:27:14 PDT 2018
[    0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device Tree), model: LGE MSM 8974 HAMMERHEAD
exp@exp:~$ 
exp@exp:~$ 
exp@exp:~$ cat main1/main1_dmesg.txt | grep "clang\|Machine"
[    0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm 1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) - android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:33:21 PDT 2018
[    0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device Tree), model: LGE MSM 8974 HAMMERHEAD
exp@exp:~$ 
exp@exp:~$ 
exp@exp:~$ cat main2/main2_dmesg.txt | grep "clang\|Machine"
[    0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm 1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826) - android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:39:25 PDT 2018
[    0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device Tree), model: LGE MSM 8974 HAMMERHEAD
exp@exp:~$ 
exp@exp:~$ 
exp@exp:~$ 
exp@exp:~$ cat qualcomm1/qualcomm1_dmesg.txt | grep "clang\|Machine"
[    0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK - android-ndk-r13b) #1 SMP PREEMPT Mon Jun 4 00:43:58 PDT 2018
[    0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device Tree), model: LGE MSM 8974 HAMMERHEAD
exp@exp:~$ 
exp@exp:~$ 
exp@exp:~$ cat qualcomm2/qualcomm2_dmesg.txt | grep "clang\|Machine"
[    0.000000] Linux version 3.4.113-unicornblood-hammerhead-o+ (exp@exp) (Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK - android-ndk-r17) #1 SMP PREEMPT Mon Jun 4 00:48:42 PDT 2018
[    0.000000] Machine: Qualcomm MSM 8974 HAMMERHEAD (Flattened Device Tree), model: LGE MSM 8974 HAMMERHEAD
exp@exp:~$

[ANDROID ARM LLVM/CLANG-KERNEL ~ RESEARCH PROJECT OVERVIEW]


  1. Finding the right kernel source and kernel config that works with the Android version I had on my Nexus 5.

    • Android(version) on my Nexus 5 is Resurrection Remix(RR) oreo with 3.4.13 hammerhead kernel(unicornblood config)
    • Experimenting with different kernel source and hammerhead kernel config including that of AOSP

  2. Finding the most compact and quickest way to just build the kernel out of tree but using the tree's build tools

    • Syncing the Resurrection Remix's source from its repo onto my machine
    • Finding the make targets for building just the ramdisk instead of building the whole RR ROM which was not the goal
    • Finding the tool and the arguments for that to build the boot.img that can be fastboot-ed

  3. Building a working kernel from source for my RR on Nexus 5, first with Android NDK r13b gcc

    • Finding the DirtyUnicorn kernel source, building it with gcc in the first place, generating the boot.img
    • Finding the right kernel config for the hammerhead kernel - unicornblood that's used by DirtyUnicorn repo
    • Using the pre-built ramdisk from RR zip instead of self-built one
    • Disabling SELinux for allowing the kernel to be fastboot-ed
    • Debugging the ADB-over-USB not working with the DirtyUnicorn kernel image built
    • Discussing on the Resurrection Remix forum and with the kernel developer(uname:voidz) to know the actual kernel source used in RR.

  4. Successfully booting the gcc-built kernel for RR on my Nexus 5

    • Working kernel confimed in the first place for my ultimate Android ARM clang-kernel goal

  5. Setting up the clang-kernel build

    • Finding the right LLVM/clang toolchain to begin with - Android NDK r13b LLVM/clang
    • Using the binutils - assembler, linker, etc that's in Android NDK r13b

  6. Launching the clang-kernel build

    • Disabling the options that clang doesn't recognize
    • Examining the initial compilation erros - invalid instruction error(thrown actually by x86_64 GNU as)
    • Fixing the assembler path to the ARM assembler
    • Making sure it uses the right external assembler(Android EABI GNU assembler)

  7. Fixing the subsequent build errors

    • RCU header had some static code check which had to be disabled since in clang case alone it was an error not in gcc
    • VLAIS in various kernel components had to be changed to non-VLAIS to work with clang

  8. Updating VLAIS to non-VLAIS in various kernel components

    • Disk encryption
    • USB Gadget/Function Filesystem
    • CRC32
    • Netfilter

  9. Fixing linker errors for duplicate exception sections that clang generates

    • Instructing the linker to leave the exceptions out while generating the final kernel image

  10. Fixing more linker errors

    • Added missing ARM EABI memory manipulation implementations that some of the kernel code needed that the toolchain didn't provide
    • Building successfully the kernel image with clang for the first time

  11. Booting the clang-built kernel image for the first time

    • Booting gets stuck at "Google" logo
    • Checking whether there was any kernel panic by booting to recovery mode and checking /proc/last_kmsg
    • Not finding anything relevant to clang-kernel boot in /proc/last_kmsg

  12. Running over a plethora of possibilities for the stuck-at-google-logo case

    • Kernel might have not been loaded at all by bootloader for some reason
    • Generated boot.img might have incorrect offsets for kernel, ramdisk, etc
    • Generated kernel image might be corrupt
    • Bootloader might be expecting gcc-specific compiler metadata instead of LLVM/clang, in kernel image header
    • Fastboot might have some way of logging the overall boot sequence which might have give some hint

  13. Attempting to boot the clang-kernel and ramdisk with QEMU/ARM to debug like I did for x86_64 clang-kernel

    • Booting on QEMU/ARM with both kernel and ramdisk, next with just the kernel, with GUI and without GUI
    • Booting the kernel and ramdisk built with Android emulator, specifically QEMU/ARMel that's part of the SDK
    • Cross-compiling Android LittleKernel bootloader for ARM and using that to boot the clang-kernel on QEMU/ARM
    • Not seeing anything happening at all in any of the above scenarios

  14. Carrying out more debugging

    • Turning off the Nexus 5 and doing a fastboot to get fresh /proc/last_kmsg if possible
    • Adding various debug parameters on kernel command-line
    • Trying out different options that fastboot has for specifying offsets, etc
    • Researching on external hardware-based debugging like adding UART interface to get early boot logs

  15. Researching on the very first code that runs when the kernel is loaded - kernel entry point

    • Finding start_kernel() inside init/main.c
    • Adding some debug statements over there and not seeing them for obvious console-not-yet-initialized reason
    • Then finding the actual kernel entry point in head.S assembly source
    • Researching on ways to print anything at all in the ARM assembly code within head.S
    • Finding printascii that's used for the above case but realizing it's for a serial console(UART)

  16. Understanding the ARM assembly code within head.S and its siblings and the inline documentation therein

    • Getting to know the prerequisites of prior entering the kernel entry point in head.S

  17. Exploring the methods to confirm whether the control is indeed reaching head.S or not

    • Checking if the LED on the bottom of Nexus 5 can be turned on/off with different colors as an indication
    • Researching on doing something with ARM CPU like raising an exception, or a reset event as an indication
    • Researching on different ways of restarting Nexus 5 - checking how a "reboot" command works at kernel level
    • Looking into machine restart logic - translating that to a reset logic to be used within head.S

  18. Using reset logic for more fine-grained debugging instruction by instrucion within head.S and its siblings

    • Noticing PC write was problematic
    • Finding ways to branch off to destination instead of modifying PC which is not recommended as per the docs
    • Understanding the end-to-end control flow since the kernel entry point till start_kernel() of init/main.c
    • Following the inter-working of head.S and processor-specific assembly code during the setup
    • Locating some control register access being problematic like that with PC

  19. Researching on other available clang/LLVM toolchains to see if they work

    • Using Android NDK r17's LLVM/clang - not helpful - same outcome - stuck-at-google-logo case
    • Finding main LLVM/clang source and building it to use with kernel source - not helpful - same outcome
    • Finding Snapdragon Qualcomm LLVM/clang toolchain and using it - not helpful - same outcome

  20. Using different diff-tools to compare two binaries : clang-built kernel and gcc-built kernel

    • hexdiff - saw some differences
    • vimdiff - some other differenes

  21. Using different binary analysis tools to examine the differences between the gcc-built and clang-built kernels => Android EABI readelf - saw some ELF header information differences

    • Android EABI objdump - compared disassembly, symbols, sections and their flags, etc

  22. Adding mechanism to use the same assembly code settings as that of gcc

    • Using same assembler options that gcc uses while invoking the assembler for intermediate assembly code
    • Using same assembly code setup as that of gcc for data, target architecture, etc for intermediates
    • Automating the above so that it works for every intermediate assembly file that clang generates

  23. Reducing the optimization level of kernel build

    • Keeping the oversmart optimization aside - O1 and Os - didn't change the stuck-at-google-logo case
    • Disabling optimization completely - O0 - didn't work - kernel doesn't support based on what I read online

  24. Disabling the caches in kernel config as needed by head.S

    • Updating kernel config to disable I-cache and D-cache - didn't help

  25. Trying out different assembler options

    • Experimenting with different SP sizes, EABI versions - didn't help

  26. Trying out different clang options

    • Using different possible stack alignments to address any incorrect assumptions around that - didn't help

  27. Correcting the SP access

    • Updating access to SP in one of the thread access kernel code as per one of the online notes

  28. Researching more on the lines on what does clang do that's not gcc doesn't

    • Disabling all the optimizations and clang-only features if any
    • Disabling all the intrinsic features that clang uses internally - device rebooted after a failed boot!
    • No more stuck-at-google-logo case with the above change
    • Witnessing the device auto-reboot with the above change - must be a kernel panic!
    • Checking /proc/last_kmsg within recovery mode - yes, it was the clang-kernel that panic'ed - good sign!
    • Finally, the clang-kernel has started executing after an exhaustive set of attempts - breakthrough!

  29. Locating the source of kernel panic

    • Looking at the stacktrace revealed one of the audio codec had a buffer overrun - fixed it
    • Rebuilding the kernel with the fix and retrying
    • Seeing some more kernel panics - another audio codec source which had similar issue - fixed it

  30. Booting to the Android GUI with clang-kernel for the first time

    • Fixing the kernel panics mentioned above allowed boot to move on
    • Seeing Android animation(RR logo in my case) for the first time!
    • Getting to the Android home screen after few seconds of wait - mission accomplished!

  31. Verifying all the system information

    • Checking kernel version to be showing LLVM/clang toolchain version, etc
    • Examining kernel dmesg for clang specific information
    • Checking /proc/version for the same
    • Checking Settings/About for the same

  32. Verifying all the features work

    • Checking Camera, Bluetooth, ADB over USB, etc
    • Cheking WiFi - didn't work - "connected, no internet"

  33. Noticing WiFi symbol had a cross(x) symbol on it

    • Browsing failed as expected due to no internet availability
    • Disabling WiFi to check if cellular(LTE) network works for internet - didn't

  34. Noticing mobile network also had a cross(x) symbol on it after disabling WiFi as above

    • Browsing with mobile network as well failed as expected due to no internet availability
    • Verifying phone calls work - yes, they worked!

  35. Checking logcat, dmesg for any network error

    • Noticing SELinux denials for some of the network related actions
    • Locating the error stating bandwidth module not loaded
    • Narrowing down to the kernel code where the possible issue is present

  36. Fixing the netfilter code for the above issue

    • Updating one of the netfilter code with the latest code from that of the mainline kernel
    • Rebuilt the kernel - WiFi and mobile network - both worked!

  37. Realizing all the features are now working with a clang-built ARM kernel for Android!

    • Planning to repeat the same with the all ther remaining LLVM/clang toolchains

  38. Using Android NDK r13b's LLVM/clang in place of main LLVM/clang used so far for building the kernel

    • Noticing Kernel panic
    • Tracking down the kernel panic to one of the Camera MSM driver code

  39. Fixing the Camera MSM driver code in terms of device id specification

    • Comparing with other Camera MSM driver source code and finding the difference if any
    • Completing the device/driver id specification with the missing item - fixed the panic
    • Booting to Android home screen this time with even the Android NDK r13's LLVM/clang-built kernel!

  40. Picking the next remaining LLVM/clang toolchains

    • Android NDK r17's LLVM/clang - no issues in booting thus far updated kernel
    • Qualcomm Snapdragon LLVM/clang - no issues in booting thus far updated kernel

  41. Performing round up of all the toolchains and combinations of NDK binutils

    • Verifying all the combinations(total 8) :

      • Android NDK r13b [gcc + binutils(as, ld, etc)]
      • Android NDK r17 [gcc + binutils(as, ld, etc)]
      • Android NDK r13b [LLVM/clang + binutils(as, ld, etc)]
      • Android NDK r17 [LLVM/clang + binutils(as, ld, etc)]
      • Main LLVM/clang + Android NDK r13b binutils(as, ld, etc)
      • Main LLVM/clang + Android NDK r17 binutils(as, ld, etc)
      • Snapdragon Qualcomm LLVM/clang + Android NDK r13b binutils(as, ld, etc)
      • Snapdragon Qualcomm LLVM/clang + Android NDK r17 binutils(as, ld, etc)

    • Confirming all of the above work - yes, worked!

  42. Automating all of the above builds and the testing of the images

    • Facilitating automation of all the 8 combination builds
    • Collecting the statistics - build time, image sizes, etc
    • Summarizing the longest/shortest builds, largest/smallest zImage-dtb, largest/smallest boot.img
    • Testing all of the images one by one for the final time

  43. Consolidating the data from the above automation

    • Collecting the complete kernel boot(dmesg) log from each build
    • Taking snapshots of the kernel version, Android version, build number from each build - About/System info

  44. Wrapping my Android clang-kernel research project!

    • Done and dusted. Period.



[ANDROID ARM LLVM/CLANG-KERNEL ~ RESEARCH WALK-THROUGH]

Stage 1: First of all, finding the right kernel source for the hammerhead kernel installed as part of Resurrection Remix(RR) on my Nexus 5 was a challenge. Initially, based on the kernel version information, I took dirtyunicorn kernel source from its repo and built it with unicornblood hammerhead kernel config using Android r13b gcc toolchain. Then I also built the ramdisk needed from the Resurrection Remix Android source base. With that created boot.img and tried that with fastboot - this didn't boot - the usual google logo appeared and looped back to bootloader. After examining the logs in recovery mode, I noticed SELinux specific denials. So, tried to fastboot again with SELinux mode set to permissive. This also didn't boot. After some research, found that the boot.img that I had was smaller than the one that came with Resurrection Remix zip. In that ramdisk was much smaller than the one in the zip. So, after realizing that ramdisk needs other rootfs utils to be part of it which required some more building from RR source, and since I was only keen on the kernel part, I took the ramdisk from the zip and generated the boot.img with that along with my kernel. This booted. Prior arriving at this, I had also experimented with AOSP kernel source, and other kernel sources for hammerhead - some of them didn't even boot so had to go back to dirtyunicorn. 


Stage 2: The booted kernel had almost everything working except for the ADB over USB feature. Though ADB over network worked, when I enabled ADB over USB, I noticed on my host PC, dmesg showed "USB disconnect" event and would reconnect if I disabled ADB over USB. Tried every other stuff to get this working including USB driver debugging, etc - nothing consistenly gave the ADB over USB access. I noticed there were some missing sysfs entries for Android USB function filesystem. So, I debugged around that code and added extra logic to make sure the USB won't get disconnected, etc when ADB over USB was enabled - none helped. After a lot more researching and experimenting, noticed that the kernel that came with RR zip was built with username(voidz) and based on that got an update from RR fourm that there's an updated dirtyunicorn kernel branch which that person had and I took that and tried to build as I built above. The kernel got built with the same unicornblood hammerhead kernel config. Generated the boot.img with this kernel and the ramdisk from the RR zip file, fastboot-ed it, image booted, and ADB over USB worked as needed.


Stage 3: With the above prerequisite of a working kernel(source) for my Nexus 5 running RR(oreo), I was motivated to build the Android kernel with LLVM/clang as I had already been successful in building and bringing up x86_64 Linux kernel(4.15.7 onwards - latest 4.17) for my host system running Ubuntu 17.10 x86_64 whose reports are available at the end of this post for the interested people. To begin with, I chose the LLVM/clang toolchain that was present in Android NDK r13b and started to build the same kernel I used with Android NDK r13b gcc above. Similar to that of x86_64 kernel clang-build, there weres some build/compilation errors involving unknown options, etc.


Stage 4: To get the Android kernel clang-build further, I disabled the integrated assembler, removed the unsuppored compiler option/flags across kernel components. This allowed clang-build to proceed further. But, there were some still issues in recognizing the inline ARM assembly code correctly since the assembler being picked up was still wrong. Setting CLANG_TRIPLE to arm-linux-gnu and the like in separate attempts, didn't help in any way. After much research, I found out that I can use "-target" option to specify the androideabi target and by that clang was made to search for the assembler with that target as prefix which actually was not present under /usr/bin by default. Hence, it took the host x86_64 GNU assembler and was complaining about unknown assembly instrucitons. So, I created the necessary symlinks over there pointing to that of Android NDK r13b binutils binaries - this fixed the issue of picking the right assembler. By the way, I saw some reports on successful ARM64/AARCH64 clang-kernel for Google Pixel* but the steps therein didn't work for my ARM(32) clang-kernel case, so had to continue with my independent efforts.


Stage 5: Progressing further, there was an RCU specific static code check which failed - after looking around disabled it as it was harmless. The build progressed further. Next the build failed due to VLAIS used in several of the kernel components including disk encryption, USB Gadget/Function Filesystem, CRC32, netfilter, etc. These addressed compilation errors.


Stage 6: Next was to address the linker errors - clang generated duplicate exception sections across kernel components which resulted in failure while the linker tried to link all of them together. To address this, I researched a lot in the internet to understand what are these and why are they getting added by clang but not gcc, how to get rid of them if not needed and possible, and so on. So, experimented with options to disable exceptions, unwind-tables, etc - none helped. There were also warnings of unwinding not expected to work for some of the kernel components. Tried to enable the stack unwinding in kernel config - this addressed the linker error but wanted to make sure that this additional change to be not present for only clang-build as gcc didn't need it. 


Stage 7: To address the duplicate the exception sections linker error mentioned above, after realizing that gcc doesn't generate this error due to not generating them in the first place, I researched more to finds ways to exclude them. Found the way to inform the linker to leave these exception sections in generating the final kernel image - that solved the issue.


Stage 8: After fixing the above linker error of duplicate exception sections, there were some some more linker complaints about missing implementations of ARM eabi versions of memory manipulation functions. Researched on these and found that these are usually available as part of the toolchain but if absent, they need to be implemented by self. So, I wrote my own implementations for them and restarted the build - did't see those linker errors, as expected.


Stage 9: With the so far changes, the kernel image got successfully built with clang for the first time. The zImage-dtb was ready to be combined with the pre-built ramdisk used above to generate the quintessential boot.img that I can "fastboot" on my Nexus 5 for the first time.


Stage 10: Tried to fastboot with boot.img thus built - the image appeared to boot but got stuck with the "Google" logo with the unlocked symbol underneath. Having seen this a lot many times even with gcc, expected the boot to proceed with Android logo animation(RR logo in this case) after some time since I thought booting time might differ between gcc/clang-built kernel. But, it didn't proceed. Having seen a similar thing with x86_64 clang-built kernel for my host system running Ubuntu 17.10 x86_64, my immediate interpretation was that the kernel "panic-ed" somewhere along the boot assuming that kernel at least got loaded and started to boot to some extent. So, I rebooted to recovery mode to retrieve the kernel logs of this unsuccessful boot. But, the /proc/last_ksmg didn't have any logs pertaining to the clang-built Android ARM kernel I fastboot-ed. Instead, it was having some logs from previous gcc-built kernel boot.


Stage 11: The above absence of clang-kernel logs itself in /proc/last_kmsg gave rise to numerous questions :

  • Is the kernel getting loaded in the very first place?
  • Is the kernel offset, ramdisk offset wrong or off by some bytes for the clang-built kernel and not the gcc-built kernel?
  • Is the mkbootimg command having the base address right, does it have to be relative or absolute?
  • Is there some incorrect zImage-dtb header(wrong version magic, etc) for some reason, that Android bootloader was considering erroneous, and hence bailing out and sitting idle after displaying the "Google" logo?
  • Is there a way to get the fastboot logs, bootloader logs, where's the bootloader code of Android?
  • How does the Android bootloader(found out that LittleKernel was the bootloader) source locate and load the kernel with offsets, etc?
  • Is there some boot signature that's wrong and hence bootloader isn't able to load kernel?



Stage 12: With no clue of whether kernel was getting loaded at all as explained above, I wanted to see if it's possible to boot the clang-built Android ARM kernel and ramdisk used, using QEMU/ARM - just like the debugging method I had employed for that of x86_64 when I saw the Linux kernel booting had stopped as I saw "Loading ..." on my host system with Ubuntu 17.10 x86_64. However, the none of the attempts to boot Android ARM kernel and ramdisk succeeded - nothing was seen at all either on QEMU GUI or on the console in no-GUI mode. Tried different ARM machine types in QEMU options, tried to boot even just the kernel to see if panics by not finding the rootfs/ramdisk as expected - nothing helped.


Stage 13: As an additional effort in bringing up Android ARM clang-kernel on QEMU/ARM, I thought if I could cross-compile the LittleKernel bootloader source for ARM and use it as the bootloader with QEMU/ARM, I could see some action - didn't work. Also, I tried to use an UEFI firmware cross-compiled for ARM to boot the same - didn't help. Then I tried to use standard Android emulator(via Android Studio, AVD, etc) to see if I can mention my Android ARM clang-kernel as the kernel to boot the emulator with the necessary arguments - didn't help though it took considerable effort to locate how to load custom kernel in the command-line of Android emulator as the script/bianary "emulator" loads eventuallly the QEMU/ARMel whose command-line invocation I finally figured out and used it for my debugging in this stage.


Stage 14: In order to confirm whether ther kernel is at least starting to execute, I retried fastboot-ing some more times in different modes :

  • Powering off the Nexus 5 and powering it on to bootloader mode, so that old /proc/last_kmsg won't be present and hence no confusion of whether the kernel logs therein is of my clang-kernel boot or of some other previous gcc-kernel boot.
  • Booting into gcc-kernel after failed clang-kernel boot to see if I can get the /proc/last_kmsg from this valid Android boot mode instead of in recovery mode.
  • Enabled earlyprintk and UART debugging in kernel config for the clang-kernel being built and added those parameters in the kernel cmdline.
  • Increased kernel log buffer size to 4M.
  • Increased the kernel log-level to maximum.
  • Researched on who actually renders the "Google" logo if kernel is not even starting to execute - is it then the bootloader which renders that?
  • Why is that with some kernel versions, booting fails and it actually loops back to bootloader and not in this clang-kernel case?
  • Is the DTB appended correctly and TAGS offset correct for the kernel to find it - if DTB is not present, fastboot itself fails saying the DTB can't be found in the given boot.img.
  • Tried to give the kernel(zImage-dtb) and ramdisk separately in fastboot command with offsets, etc - fastboot creates the boot.img on the fly and tries to boot it.
  • Checked if fastboot has some useful options to debug more - like some logging over USB back to host(my PC).
  • Researched around to see if I can add some UART module over USB to get the bootloader/kernel logs before the console(tty) even gets initialized and I can see the kernel logs - I actually found some article of someone having added an external UART module some Android smartphone - but after sometime, I thought it's an overkill for my project - so, dropped that idea of buying something like that and got back to debugging with whatever I had - just my Nexus 5!
  • Reattempted to boot with QEMU/ARM with different TTY*s, etc


None of the above worked though they helped me in understanding better of how the Android booting works and its internals.

Stage 15: Then came the point of finding what actually runs first when kernel is made to run - like some low-level kernel code that runs to get the actual high-level code up and running to start dumping some kernel logs? At first I found the C kernel entry-point to be start_kernel() in init/main.c. Tried to put some extra printk() statements at the beginning of it only to find the console gets somewhere down later after a whole bunch of other initializations. So, all the printk would get buffered until then - no use. So, is this the very function that gets called from the Android bootloader or is there some other low-level code that gets entered into?


Stage 16: After some research, found the architecture specific head.S(ARM specific kernel entry point from the bootloader!). So, time to dive into the assembly world - one step closer to the hardware - excited! I spent some time understanding the instructions over there and the inline comments over there to see what actually happens where - by that I also looked around in head-common.S, etc. Finally, realized that this *is* the code to debug in the first place before the start_kernel() even gets called ultimately. But, what to debug, where to print some logs, how to debug - no UART/serial console, no way of finding whether this entry point is even getting hit - all I was seeing was the "Google" log(which I found out in some way that it's not rendered by kernel but the bootloader/some-other-early-code) and nothing else.


Stage 17: Upon reading the documentation over there in head.S, I doubted whether the hardware is setup properly before entering the head.S : it says certain things about MMU, I-cache, D-cache, etc - are these rightly setup? At the same time, I also thought, if there was something wrong in that, how did the gcc-built kernel boot up without any issue - same bootloader, so same hardware pre-initializations before head.S is entered, same head*.S code and the same kernel except the compiler being gcc instead of clang - so, is clang screwing something up in generating the final binary so that this head.S is not even entered from bootloader?


Stage 18: Next was to determine a way to check if the very instruction of head.S was executed or not. With no external debugging aid, this took a lot of brainstorming and researching on the internet - can I place some code in head.S to turn on the small LED on the Android Smartphone in some way to indicate the head.S was indeed getting hit or is there some other way to know the same? All of the available debugging methods involved some external hardware-aid which I didn't have.


Stage 19: There came a thought of doing with the ARM CPU itself - can it be made to raise an exception, interrupt, etc through some ARM instruction? Can it be reset? Can it made to jump to reset vector? Yes, that was the "bang on" moment! I researched on the method, rather the instruction to reset ARM CPU - some articles said some of the registers need to be written into to enable/disable something so that ARM CPU gets reset but none of them worked for the ARM environment(Cortex-a15 of Nexus 5) had.


Stage 20: After a fair amount of research into how to reset ARM CPU in bare-metal/assembly mode, I realized that there must piece of low-level code which resets/reboots the ARM CPU when the user issues a "reboot" command on console/terminal - where is it? After tracing down the reboot=>..=>sys_reboot()=>...some level-code to restart machine, I located the logic which could restart the system. Using that as a reference for my implementation of ARM CPU reset, wrote a reset logic in ARM assembly and placed it in head.S. After experimenting substantially, came up with a reset logic that seemed to work and called it from the kernel entry point within head.S. With this, rebuilt the clang-kernel, regenerated the boot.img, tried to fastboot it as usual on my Nexus 5 - the "Google" logo appeared as anticipated, then after few seconds as I hoped, the boot process looped back to bootloader! Delighted! My reset logoic worked which meant the kernel entry point in head.S was indeed getting hit! Great find after lot of struggle and what not!


Stage 21: With the above confirmation of the control coming to kernel entry point in head.S from bootloader, I moved the reset logic to few instructions down and repeated the above process of rebuilding-regnerating-fastbooting and again, I could see the Nexus 5 getting reset to bootloader - proved again that the control was fine so far good within head.S. Eventually, there was a place where the program counter was getting updated directly to branch off to a different assembly procedure - I placed my reset logic just after that one - control didn't reach this time - there was no reset - booting process got stuck at "Google" logo like many of the earlier times. This made me think what's wrong with updating the program counter - is it not accessible/modifiable directly? how was it not an issue for gcc-kernel?


Stage 22: After finding the issue with direct update of program counter above, I tried a branch off to the destination using the branch instruction - it worked to my surprise as I moved the reset logic to destination and got the Nexus 5 reset to bootloader this time! So, there was some issue with the underlying binary generated by clang with direct update of PC instead of branching to it like I did. This made me acutally think, since there were some unsupported ABI specfic options for clang that were removed, is there something with the way program counter is interpreted? Hence, I tried experimenting with different program counter sizes, stack-alignment, and all that to the assembler in addition to the given arguments from the clang for generating the object code for the intermediate assembly code generated by clang - none helped, same behavior. Did more of watching the arguments that are being given by clang to the assembler and compared with that of the gcc - what were the differences? Whatever were missing like EABI version, DWARF2 debugging information, little endian flag, etc, I added them manually along with the arguments given by clang so as to run the assembler with the same set of options that gcc was doing - no change.


Stage 23: Since I had found a way to move the control forward using branch instruction, I studied the control flow and replaced all the program counter updates to corresponding branch instructions - entry point => processor initialization => getting MMU ready => turning MMU on => enabling MMU => starting the kernel where the C code in init/main.c starts executing. This took a lot of time since the above control transfers were across assembly files and carefully going through them and understanding them demanded certain level of effort as I was dealing with entirely ARM assembly code for the first time on my own with no one to ask any questions I had - ARM reference manuals, online documentation, open source discussions were handy in understanding which instructions(in different *.S) do what. Also, I had to combine several of the assembly code spread across files together to be able to branch off by name instead of addresses of the procedures stored in registers.


Stage 24: With the replaced assembly code as above, there were some still issues with respect to the assembly instructions modifying the control registers - control wouldn't go beyond them for some reason. Just to if that's really required, had commented it out, and the control would move on but the same issue would occur in another place with some access to the control registers. So, I was not sure whether it's okay to leave out all those control registers read/write - so reverted them to their originals.


Stage 25: With additional corrected changes to branching, etc, control eventually reached the start_kernel() of init/main.c - for confirming this I implemented the reset logic I wrote in ARM assembly for use in head*.S and other assembly code, in C and called it within main.c - though the same kind of reset to bootloader didn't happen right away, with some more corrections to my C implementation of reset logic, I saw the same reset behavior - start_kernel() is getting hit for sure as I checked it multiple times with and without my C-reset logic(all of the earlier reset calls were disabled, of course).


Stage 26: After confirming that start_kernel() was being entered into, I tried moving the reset logic down one function call at a time - reset didn't happen after few of function calls - went inside the function after which the reset was not happening - it turned out to be accessing CPUID(again some sort of control register access which posed as an issue as earlier) and the control seemed not returning from it to the start_kernel().


Stage 27: Tried to change the linkage of the first few functions being called to be ASM linkage to see if it makes any difference as start_kernel() was being called(rather branched into) without any issue but not the subsequent functions - didn't make any difference. The goal was to find the underlying issue in any possible way - so methods employed though seemed not so relevant many a times, I have tried them anyway so that I rule them out from further considerations confidently if they didn't work out or didn't help.


Stage 28: Tried to move the console initializations over other initializations to be able to see printk messages as early as possible - didn't work anyway.


Stage 29: After all the above, somewhere I thought what if there are some fixes in the newer Android NDK LLVM/clang that were absent in the one I was using(NDK r13b) - downloaded the latest NDK r17 and rebuilt the kernel using the LLVM/clang that was in NDK r17, regenerated boot.img as usual, fastboot-ed it - didn't boot - same result as earlier - got stuck in "Google" logo.


Stage 30: Looked around on the internet for any reports on how to use clang for Android kernel - found ARM64/AARCH64 report which mentioned using main LLVM for building the LLVM/clang on our own. So, followed the steps there, got the LLVM source, built it as needed, got the toolchain binaries ready : clang, clang++, etc. Used it for rebuilding the kernel, repeated the rest of process - didn't boot - same result as earlier.


Stage 31: Researched some more to see if there's any other LLVM toolchain for Android that people have used. Found Qualcomm Snapdragon LLVM toolchain. Downloaded it from Qualcomm site. Used the clang within it to rebuild the kernel and repeated the rest of the process - did *not* boot - same result as earlier. So, decided to continue rest of the effort with main LLVM/clang download and built in the above stage instead of Android NDK r13b's LLVM/clang.


Stage 32: As the next resort, tried to see differences between the kernel images built by clang and gcc using hexdiff, vimdiff, etc - saw some differences but couldn't find out anything useful.


Stage 33: Similarly did a hexdiff between boot.img(s) with clang and gcc kernel - is the kernel offset same in both the cases and so on?


Stage 34: Used Android NDK's readelf to examine the kernel images from both clang and gcc builds(vmlinux - both compressed and uncompressed) in terms of the ELF header. Saw some metadata differences - analyzed them to understand if they affected the kernel executions in any way.


Stage 35: Used Android NDK's objdump to determine differences in the sections within the kernel images built using clang and gcc - compared the disassembled code from both cases - saw the differences - went back to assembler invocation - checked the arguments being passed - added manually whatever were missing from the clang invocation of assembler when compared to the gcc invocation of the same assembler. Even then theere were differences in terms of number of sections(text, etc) and their flags, symbols, and so on - spent considerable amount of time to understand if they matter with respect to the reason behind the clang-kernel not booting.


Stage 36: Going one step further, added a mechanism to save the intermediate assembly file that clang generateed for each source(*.S and *.c), and compared that with that of gcc - there were some differences in terms of optimizations, instruction uses, etc - this made me think there could be some optimizations(over) that might be resulting in unexpected behavior that I am seeing when booting clang-kernel?


Stage 37: Based on the above, added a mechanism to replace the assembler directives that were placed by clang with that of gcc, for all of the intermediate assembler code generated - this was to make sure that the ELF generated by clang for the kernel has exactly same ELF header specifics as that of the kernel generated by gcc - automating the whole thing took a lot of work and rework. There was also some unamed intermediate assembly file with architecture set to ARMv5 instead of ARMv7 - so did that replacement as well to make sure I get the exact assembly code specifics as that with gcc. 


Stage 38: To keep the differences between the code generated by clang and gcc at minimum, reduced the optimization level of kernel from O2 to O1 - rebuilt the kernel and redid the rest of the process - didn't help in booting the kernel - same outcome as earlier


Stage 39: Totally disabled optimization with O0 - this didn't even compile - failed with some errors - researched on the internet and came to know that the Linux kernel code is meant to be built with optimization at least of O1.


Stage 40: Changed the optimization criteria to be for size(Os) - repeated the whole process of building, etc - didn't help


Stage 41: Based on what I read in the documentation at the entry point in head.S, disabled the D-cache, I-cache, etc in kernel config to see if they bring any change - didn't affect.


Stage 42: Revisited the changes I had made including the memory manipulation functions I had implemented - Are they right? Is there any alignment considerations I need to do? What does a standard implementation looks like? Researched more along these lines, found the actual signatures of these functions, corrected my implementations accordingly - rebuilt the kernel, repeated the rest of the process - didn't change anything.


Stage 43: Looked around on the internet again why these memory manipulations aren't present by default - can this be because of the optimization level chosen? Is there a way to enable these instead of implementing it myself? - didn't find anything of that sort.


Stage 44: Meanwhile, came across some report on the stack pointer access for thread local storage needing some explicit assembly-level access. Included that as well, redid the whole thing - no change in the kernel booting behavior though.


Stage 45: Eventually, I came to the thought there is something extra that's being done by clang that's resulting in some unneeded optimization that's leading to some misinterpretation of given assembly and/or C code - so disabled every other optimization including the intrinsic ones that's built into the compiler code generation after exploring the relevant discussions and clang documentation. Rebuilt the kernel and repeated the rest of the process - to my great satisfaction, this time it didn't get stuck in "Google" logo but it rebooted to boot with the existing kernel(installed on my Nexus 5) - I knew this happens only if there was some kernel panic for any legitimate reason and as I had already disabled all the "reset logic" everywhere, I was pretty sure about I could get the dump of this kernel panic or oops in the recovery mode via /proc/last_kmsg.

Stage 46: With the above first milestone of finally achieving the loading and running of the clang-built kernel on Nexus 5 though it looped back to reboot with the installed kernel because it encountered some kernel panic, I was immensely happy and overall project gained incredible momentum since I had to go through tons of research, attempts, experiments, documentation and articles day and night just to get the kernel started and see some logs from it. And yes, I went to recovery mode, and had a look at the /proc/last_kmsg which showed kernel panic - having started from zeroth level in this project seeing my Nexus 5 getting stuck at "Google" logo for hundreds of time since the start of project across numerous attemps spread across day and night amidst so many things, it was a great indication of eventually having a working clang-built Android kernel on my Nexus 5 since I at least saw my clang-built Android kernel alive for some time for the first time ever!

Stage 47: Just to confirm, I reverted the most recent changes mentioned above to see if they were "actually" the reason to get the clang-kernel up and running - they were indeed the reason - confirmed it couple of times - great!


Stage 48: Focusing on the kernel panic, it was from one of the audio codecs with resepct to buffer overrun - fixed it, rebuilt the kernel, re-fastboot-ed it - there was another audio codec with the same failure - fixed that as well, repeated the process and there were some more of these and fixed all of them - kernel moved forward, saw the RR logo animaton(Andriod logo in case of AOSP)! Android is finally booting!


Stage 49: After a plethora of attempts above, finally I got the Android successfully booted up on the clang-built ARM kernel on my Nexus 5 - confirmed the system information from "adb shell - dmesg - proc-version, etc" and also from "About/System" section of Android settings which showed the clang-built kernel version information in terms of the clang compiler version used to build the kernel, etc. Immensely happy!


Stage 50: Noticed the WiFi got connected but had a cross(x) symbol on it - "connected, no internet" was the status. Checked if data over LTE works if not WiFi - didn't - same cross mark if I disabled WiFi and tried to use Cellular data(LTE) - phone calls would work but no internet - checked logcat/dmesg - showed some issue with SELinux denials - spent some time on whether it's the kernel or ramdisk issue, etc.


Stage 51: Tracked down the above WiFi issue to be some bandwidth module being not loaded - checked for different entries for WiFi/wlan under sysfs and their status and so on - found out a particular piece of netfilter code had some VLAIS in use which I had fixed as per clang requirements - later on figured out that the latest mainline kernel had an updated code for that piece - incorporated that, rebuilt and retried - WiFi worked - no more cross symbol - same case with cellular data. Also, verified that ADB over USB, etc work - so far so good.


Stage 52: With everything working so far, I had used main LLVM/clang so far as the clang compiler, so wanted to try the same with all the other clang toolchains I had earlier experimented with, for the sake of universality. So, at first, picked Android NDK r13b's clang/LLVM - rebuilt the kernel and repeated the rest of the process - didn't boot - looped back to boot with the existing kernel - sign of some kernel panic.


Stage 53: Examined the /proc/last_ksmg as usual to debug the above issue - saw some kernel panic in Camera MSM driver code - tracked down to the actual function as per the stack-trace - after substantial debugging without finding the actual reason for the panic, I decided to move on to the other remaining LLVM/clang toolchains as this was only seeming to happen with Android NDK r13b's clang(which is not the latest as well) - so why waste any more time?


Stage 54: Took Android NDK r17's LLVM/clang for rebuilding the kernel and tried bringing it up - worked without any issue(no panic, etc) - Android came up - WiFi, etc all were working - verified the kernel version information, compiler version, etc - all good.


Stage 55: Next was to use Qualcomm Snapdragon's LLVM/clang to rebuild the kernel and verify everything works - did that - everything worked.


Stage 56: With all verified except with the Android NDK r13b's LLVM/clang with which there was a kernel panic from Camera MSM driver code, I decided to go back to it and fix it once for all as I didn't have anything else pending before wrapping this project up!


Stage 57: Examined the Camera MSM driver code across different files and compared them with each other, added printk-logs. Surprisingly, I noticed that there was a incorrect driver's code being called back for the device id table registered - there were some string manipulation usage within these functions - suspected them to be overrunning the stack or so and corrupting something up causing such a behavior - dumped various flags, data, numbers, names, etc for the driver, device and its parameters to find out who is calling the wrong driver - found the init code was calling this as a call back after the driver got registered.


Stage 58: For one moment, had to check the device tree source for this device for which wrong driver was called back - didn't find anything incorrect as the same works for every other clang-toolchain and even gcc.


Stage 59: Tried to debug the init code which was calling this based on the device id matching the driver, went through different device driver bare-bone code to understand how a device-and-driver gets mapped onto and the driver's file operations are called - found a difference between the suspected driver and the actual driver which should have been called for the wrongly mapped device, in terms of device id specification.


Stage 60: Added the missing item in the suspected driver to have the device id specification as expected by the device driver bare-bone framework - rebuilt the kernel, repeated the rest of the process - worked! Android booted even with Android NDK r13b's LLVM/clang!


Stage 61: With that, everything was done and dusted. For the sake of completeness, I redid the whole thing with different combinations of LLVM/clang and NDK's binutils.


Stage 62: Repeated the process with Android NDK r13b's gcc - all good - no regressions because of changes done for clang.


Stage 63: Repeated the process with Android NDK r17s gcc - all good - no regressions because of changes done for clang.


Stage 63: Repeated the process with Android NDK r13b's (LLVM/clang + binutils(as, ld, etc)) - no issues


Stage 64: Repeated the process with Android NDK r17's (LLVM/clang + binutils(as, ld, etc)) - no issues


Stage 65: Repeated the process with (Main LLVM/clang) + (Android NDK r13b's binutils(as, ld, etc)) - no issues


Stage 66: Repeated the process with (Main LLVM/clang) + (Android NDK r17's binutils(as, ld, etc)) - no issues


Stage 67: Repeated the process with (Snapdragon Qualcomm LLVM/clang) + (Android NDK r13b's binutils(as, ld, etc)) - no issues


Stage 68: For the sake of completeness and compactness, silenced the various noisy warnings for clang case. Despite that Snapdargon Qualcomm LLVM/clang's instance was showing unavailability of vectorization because of missing needed options to enable them - tried to add the missing ones - didn't silence the warning - researched on that - came to know that there were little benefits of it - ignored those warnings and didn't silence them.


Stage 69: Repeated the process with (Snapdragon Qualcomm LLVM/clang) + (Android NDK r17's binutils(as, ld, etc)) - no issues


Stage 70: Added clang-specific build-level changes under clang-only category to not mess with gcc-builds


Stage 71: Reverified that everything is fine with all the toolchains with thus far set of changes


Stage 72: Automated the whole of the above combinations, generated the boot.imgs, compared the build-times, zImage-sizes, boot.img's size - listed out the shortest/longest builds, smallest/largest zImage, smallest/largest boot.img.


Stage 73: Ran the overall automation several times with any activity on my system and with the activity to see the difference in build times, etc. Note down the results.


Stage 74: Wrote test automation to fastboot all of the images one by one and checked the sanity of all the images - all good - all done. Period.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5

Tim Northover via llvm-dev
Looks like you didn't read the whole article. Well, for more complete comparison between GCC and LLVM/clang, I have used four different LLVM/clang
versions(old to to the new), from within the Google's Android NDK and from outside of the NDK i.e., from other sources, in my project as under :
  1. NDK r13b LLVM/clang : Android clang version 3.8.256229  (based on LLVM 3.8.256229)

  2. Qualcomm Snapdragon LLVM/clang for Android : Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK

  3. NDK r17 LLVM/clang : Android (4691093 based on r316199) clang version 6.0.2 (https://android.googlesource.com/toolchain/clang 183abd29fc496f55536e7d904e0abae47888fc7f) (https://android.googlesource.com/toolchain/llvm 34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn)

  4. Main LLVM/clang : Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm 1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826)
And I saw improved battery usage in all these cases as published over there in my article. So, read the entire article before making assumptions... :p

On Thu, Jun 14, 2018 at 3:16 PM, Jean-Michaël Celerier <[hidden email]> wrote:
> The main advantage of the clang-built Android ARM(32-bit) hammerhead kernel for my Nexus 5 has been the better battery usage when compared to that of gcc-built kernel, with the same kernel config and hardware(my Nexus 5 Android Smartphone). Details of the same can be found below.

To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p

-------
Jean-Michaël Celerier



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p 

Looks like you didn't read the whole article. Well, for more complete comparison between GCC and LLVM/clang, I have used four different LLVM/clang
versions(old to to the new), from within the Google's Android NDK and from outside of the NDK i.e., from other sources, in my project as under :
  1. NDK r13b LLVM/clang : Android clang version 3.8.256229  (based on LLVM 3.8.256229)

  2. Qualcomm Snapdragon LLVM/clang for Android : Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK

  3. NDK r17 LLVM/clang : Android (4691093 based on r316199) clang version 6.0.2 (https://android.googlesource.com/toolchain/clang183abd29fc496f55536e7d904e0abae47888fc7f) (https://android.googlesource.com/toolchain/llvm 34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn)

  4. Main LLVM/clang : Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm 1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826)
And I saw improved battery usage in all these cases as published over there in my article. So, read the entire article before making assumptions... :p


On Thu, Jun 14, 2018 at 3:16 PM, Jean-Michaël Celerier <[hidden email]> wrote:
> The main advantage of the clang-built Android ARM(32-bit) hammerhead kernel for my Nexus 5 has been the better battery usage when compared to that of gcc-built kernel, with the same kernel config and hardware(my Nexus 5 Android Smartphone). Details of the same can be found below.

To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p

-------
Jean-Michaël Celerier



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
> To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p 

Looks like you didn't read the whole article.

Well, for more complete comparison between GCC and LLVM/clang, I have used four different LLVM/clang versions(old to to the new), from within the Google's Android NDK and
from outside of the NDK i.e., from other sources, in my project as under. And I saw improved battery usage in all these cases as published over there in my article. 

 So, read the entire article before making assumptions... :p
  1. NDK r13b LLVM/clang : Android clang version 3.8.256229  (based on LLVM 3.8.256229)

  2. Qualcomm Snapdragon LLVM/clang for Android : Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK

  3. NDK r17 LLVM/clang : Android (4691093 based on r316199) clang version 6.0.2 (https://android.googlesource.com/toolchain/clang 183abd29fc496f55536e7d904e0abae47888fc7f) (https://android.googlesource.com/toolchain/llvm 34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn)

  4. Main LLVM/clang : Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826)


On Thu, Jun 14, 2018 at 3:16 PM, Jean-Michaël Celerier <[hidden email]> wrote:
> The main advantage of the clang-built Android ARM(32-bit) hammerhead kernel for my Nexus 5 has been the better battery usage when compared to that of gcc-built kernel, with the same kernel config and hardware(my Nexus 5 Android Smartphone). Details of the same can be found below.

To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p

-------
Jean-Michaël Celerier



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5

Tim Northover via llvm-dev
To account for the easy reading with formatting instead of plain-text format of mailing list, I also mentioned the *URL* to the thread where I have posted the same content.

On Thu, Jun 14, 2018 at 10:59 PM, Jean-Michaël Celerier <[hidden email]> wrote:
Sorry, I'll be very honest : all the colors and backgrounds in your mail made it *very* hard to read; here's how it looks for me (in attachment).

good job in any case !





-------
Jean-Michaël Celerier

On Fri, Jun 15, 2018 at 2:47 AM, Raghavan Santhanam <[hidden email]> wrote:
> To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p 

Looks like you didn't read the whole article.

Well, for more complete comparison between GCC and LLVM/clang, I have used four different LLVM/clang versions(old to to the new), from within the Google's Android NDK and
from outside of the NDK i.e., from other sources, in my project as under. And I saw improved battery usage in all these cases as published over there in my article. 

 So, read the entire article before making assumptions... :p
  1. NDK r13b LLVM/clang : Android clang version 3.8.256229  (based on LLVM 3.8.256229)

  2. Qualcomm Snapdragon LLVM/clang for Android : Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK

  3. NDK r17 LLVM/clang : Android (4691093 based on r316199) clang version 6.0.2 (https://android.googlesource.com/toolchain/clang 183abd29fc496f55536e7d904e0abae47888fc7f) (https://android.googlesource.com/toolchain/llvm 34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn)

  4. Main LLVM/clang : Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826)


On Thu, Jun 14, 2018 at 3:16 PM, Jean-Michaël Celerier <[hidden email]> wrote:
> The main advantage of the clang-built Android ARM(32-bit) hammerhead kernel for my Nexus 5 has been the better battery usage when compared to that of gcc-built kernel, with the same kernel config and hardware(my Nexus 5 Android Smartphone). Details of the same can be found below.

To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p

-------
Jean-Michaël Celerier





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Success: Bring-up of LLVM/clang-built Linux ARM(32-bit) kernel for Android - Nexus 5

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
All the Android versions I have used prior to my Android clang-kernel have been gcc-built from different ROMs(custom and AOSP) - none of them have given good experience
in terms of battery usage. AFAIK, for obvious reasons, Google has announced they have switched over from GCC to LLVM/clang for userspace and for kernel, they have done it to
some extent - not yet for ARM(32-bit) as I got to know and that's why I was interested in it to accomplish it.

Having said that, I tried to use GCC-8-cross toolchain for gnueabi to build the Android kernel, there were some assembler error messages, so researched upon building the androideabi
version from gcc-8.1 source - the Android NDK toolchain source has some build scripts which need to be worked upon to pick out-of-tree gcc-source - so, not a straightforward process.
I saw some Linaro GCC toolchain but of the older 7.* version and not the latest 8.* one - I have seen some people using Linaro GCC as well to build the kernel but I didn't see any
positive notes with respect to battery usage with that.

Anyway, I am satisfied with my Android clang-built kernel as it has been battery efficient as opposed to earlier gcc-built kernels and that's with rest of the
Android(ramdisk, system image, etc) being the *same* including the hardware(Nexus 5).

On Thu, Jun 14, 2018 at 11:01 PM, Jean-Michaël Celerier <[hidden email]> wrote:
But... I still don't find any different GCC version? AFAIK they stopped updating gcc around ndk10. Did you test at some point with a custom built gcc / g++ 8 for instance ?



-------
Jean-Michaël Celerier

On Fri, Jun 15, 2018 at 2:10 AM, Raghavan Santhanam <[hidden email]> wrote:
Looks like you didn't read the whole article. Well, for more complete comparison between GCC and LLVM/clang, I have used four different LLVM/clang
versions(old to to the new), from within the Google's Android NDK and from outside of the NDK i.e., from other sources, in my project as under :
  1. NDK r13b LLVM/clang : Android clang version 3.8.256229  (based on LLVM 3.8.256229)

  2. Qualcomm Snapdragon LLVM/clang for Android : Snapdragon LLVM ARM Compiler 4.0.2 for Android NDK (based on llvm.org 4.0+) - clang version 4.0.2 for Android NDK

  3. NDK r17 LLVM/clang : Android (4691093 based on r316199) clang version 6.0.2 (https://android.googlesource.com/toolchain/clang 183abd29fc496f55536e7d904e0abae47888fc7f) (https://android.googlesource.com/toolchain/llvm 34361f192e41ed6e4e8f9aca80a4ea7e9856f327) (based on LLVM 6.0.2svn)

  4. Main LLVM/clang : Flash clang version 7.0.332826 (https://git.llvm.org/git/clang 4029c7ddda99ecbfa144f0afec44a192c442b6e5) (https://git.llvm.org/git/llvm 1181c40e0e24e0cca32e2609686db1f14151fc1a) (based on LLVM 7.0.332826)
And I saw improved battery usage in all these cases as published over there in my article. So, read the entire article before making assumptions... :p

On Thu, Jun 14, 2018 at 3:16 PM, Jean-Michaël Celerier <[hidden email]> wrote:
> The main advantage of the clang-built Android ARM(32-bit) hammerhead kernel for my Nexus 5 has been the better battery usage when compared to that of gcc-built kernel, with the same kernel config and hardware(my Nexus 5 Android Smartphone). Details of the same can be found below.

To be fair, the GCC version which comes with the android ndk has not been updated for four years, while the clang version is kept up-to-date. It would be interesting to compare clang and GCC latest releases instead... that's where the future lies :p

-------
Jean-Michaël Celerier





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev