Changelog:
============
* speedup unaligned copies by always using word shifts (in combination
with builtin byte swap 64 when available) when bit-endianness and
machine byte-order are opposite
* add 'HAVE_BUILTIN_BSWAP64' to header
* avoid misaligned pointers when casting to '(uint64_t *)'
* add tests
* simplify 'copy_n()' (remove special cases), see #d2d6fd53
* add [word shift example C program](../examples/shift_r8.c),
and simplify 'shift_r8()'
* improve documentation and testing
* add [roadmap](https://github.com/ilanschnell/bitarray#roadmap)
Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>