[llvm-dev] __parityti2(), __paritydi2() and __paritysi2() vs. __builtin_parity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] __parityti2(), __paritydi2() and __paritysi2() vs. __builtin_parity

Alberto Barbaro via llvm-dev
Hi @ll,

compiler-rt/lib/builtins/parityti2.c
compiler-rt/lib/builtins/paritydi2.c
compiler-rt/lib/builtins/paritysi2.c

implement the parity function as matroschka:

si_int
__paritysi2(si_int a)
{
    su_int x = (su_int)a;
    x ^= x >> 16;
    x ^= x >> 8;
    x ^= x >> 4;
    return (0x6996 >> (x & 0xF)) & 1; // see optimisation below!
}

si_int
__paritydi2(di_int a)
{
    dwords x;
    x.all = a;
    return __paritysi2(x.s.high ^ x.s.low);
}

si_int
__parityti2(ti_int a)
{
    twords x;
    x.all = a;
    return __paritydi2(x.s.high ^ x.s.low);
}

Questions:
~~~~~~~~~~

1. are these functions still needed, given that __builtin_parity is available?

2. will the optimiser "inline" the internal function calls (as part of LTO)?

   If NOT, they should be inlined manually!

   JFTR: if the 3 functions are part of a single source or compilation unit,
         they are inlined by the compiler!

   Yes, parity is seldomly used, so this optimisation may not seem necessary.

si_int
__paritydi2(di_int a)
{
    su_int x = (su_int)a;
    x ^= (du_int)a >> 32;
    x ^= x >> 16;
    x ^= x >> 8;
    x ^= x >> 4;
    return (0x69966996 >> x) & 1;
}

si_int
__parityti2(ti_int a)
{
    du_int x = (du_int)a;
    x ^= (tu_int)a >> 64;
    x ^= x >> 32;
    x ^= x >> 16;
    x ^= x >> 8;
    x ^= x >> 4;
    return (0x69966996 >> x) & 1;
}

CAVEAT: the last right-shift MAY BE undefined behaviour, the optimisation
        shown here only works on CPUs which perform shifts modulo word-size!

stay tuned
Stefan Kanthak
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] __parityti2(), __paritydi2() and __paritysi2() vs. __builtin_parity

Alberto Barbaro via llvm-dev
I don't believe clang/llvm will ever emit a call to the parity library routines today. But it might be needed so we can say that we match the libgcc interface. gcc doesn't use it for x86 either as far as I know. And I don't know if they have it in their x86 libgcc. But it might be easier for compiler-rt to be a superset of libgcc rather than trying to track exactly what they have on each target.

Not sure about the inlining question.

~Craig


On Tue, Dec 4, 2018 at 11:51 AM Stefan Kanthak via llvm-dev <[hidden email]> wrote:
Hi @ll,

compiler-rt/lib/builtins/parityti2.c
compiler-rt/lib/builtins/paritydi2.c
compiler-rt/lib/builtins/paritysi2.c

implement the parity function as matroschka:

si_int
__paritysi2(si_int a)
{
    su_int x = (su_int)a;
    x ^= x >> 16;
    x ^= x >> 8;
    x ^= x >> 4;
    return (0x6996 >> (x & 0xF)) & 1; // see optimisation below!
}

si_int
__paritydi2(di_int a)
{
    dwords x;
    x.all = a;
    return __paritysi2(x.s.high ^ x.s.low);
}

si_int
__parityti2(ti_int a)
{
    twords x;
    x.all = a;
    return __paritydi2(x.s.high ^ x.s.low);
}

Questions:
~~~~~~~~~~

1. are these functions still needed, given that __builtin_parity is available?

2. will the optimiser "inline" the internal function calls (as part of LTO)?

   If NOT, they should be inlined manually!

   JFTR: if the 3 functions are part of a single source or compilation unit,
         they are inlined by the compiler!

   Yes, parity is seldomly used, so this optimisation may not seem necessary.

si_int
__paritydi2(di_int a)
{
    su_int x = (su_int)a;
    x ^= (du_int)a >> 32;
    x ^= x >> 16;
    x ^= x >> 8;
    x ^= x >> 4;
    return (0x69966996 >> x) & 1;
}

si_int
__parityti2(ti_int a)
{
    du_int x = (du_int)a;
    x ^= (tu_int)a >> 64;
    x ^= x >> 32;
    x ^= x >> 16;
    x ^= x >> 8;
    x ^= x >> 4;
    return (0x69966996 >> x) & 1;
}

CAVEAT: the last right-shift MAY BE undefined behaviour, the optimisation
        shown here only works on CPUs which perform shifts modulo word-size!

stay tuned
Stefan Kanthak
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev