linux-xiaomi-chiron/lib
Jiong Wang 06ae48269d lib: reciprocal_div: implement the improved algorithm on the paper mentioned
The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when
"divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
requires u128 divide on host. The exception case could be easily handled
before calling "reciprocal_value_adv".

The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.

It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.

However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.

To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.

  struct reciprocal_value_adv rvalue;
  u8 pre_shift, exp;

  // handle exception case.
  if (d >= (1U << 31)) {
    result = n >= d;
    return;
  }
  rvalue = reciprocal_value_adv(d, 32)
  exp = rvalue.exp;
  if (rvalue.is_wide_m && !(d & 1)) {
    // floor(log2(d & (2^32 -d)))
    pre_shift = fls(d & -d) - 1;
    rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
  } else {
    pre_shift = 0;
  }

  // code generation starts.
  if (imm == 1U << exp) {
    result = n >> exp;
  } else if (rvalue.is_wide_m) {
    // pre_shift must be zero when reached here.
    t = (n * rvalue.m) >> 32;
    result = n - t;
    result >>= 1;
    result += t;
    result >>= rvalue.sh - 1;
  } else {
    if (pre_shift)
      result = n >> pre_shift;
    result = ((u64)result * rvalue.m) >> 32;
    result >>= rvalue.sh;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-07 01:45:31 +02:00
..
842
fonts
lz4
lzo
mpi treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
raid6
reed_solomon treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
xz
zlib_deflate
zlib_inflate
zstd
.gitignore
argv_split.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
ashldi3.c
ashrdi3.c
asn1_decoder.c
assoc_array.c
atomic64.c
atomic64_test.c
audit.c
bcd.c
bch.c
bitmap.c lib/bitmap.c: micro-optimization for __bitmap_complement() 2018-06-07 17:34:39 -07:00
bitrev.c
bsearch.c
btree.c
bucket_locks.c mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags 2018-06-07 17:34:38 -07:00
bug.c
build_OID_registry
bust_spinlocks.c
chacha20.c
check_signature.c
checksum.c
clz_ctz.c
clz_tab.c
cmdline.c
cmpdi2.c
compat_audit.c
cordic.c
cpu_rmap.c
cpumask.c
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
crc4.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
crc32test.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c
dec_and_lock.c atomic: Add irqsave variant of atomic_dec_and_lock() 2018-06-12 23:33:24 +02:00
decompress.c
decompress_bunzip2.c
decompress_inflate.c
decompress_unlz4.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
devres.c
digsig.c
div64.c
dump_stack.c
dynamic_debug.c
dynamic_queue_limits.c
earlycpio.c
error-inject.c
errseq.c
extable.c
fault-inject.c
fdt.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
find_bit.c
find_bit_benchmark.c lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit() 2018-05-11 17:28:45 -07:00
flex_array.c
flex_proportions.c
gcd.c
gen_crc32table.c
genalloc.c
glob.c
globtest.c
hexdump.c
hweight.c
idr.c lib/idr.c: remove simple_ida_lock 2018-06-07 17:34:39 -07:00
inflate.c
int_sqrt.c
interval_tree.c
interval_tree_test.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
iomap.c
iomap_copy.c
iommu-helper.c iommu-helper: mark iommu_is_span_boundary as inline 2018-05-09 06:55:44 +02:00
ioremap.c
iov_iter.c Merge branch 'for-4.18/mcsafe' into libnvdimm-for-next 2018-06-08 15:16:44 -07:00
irq_poll.c
irq_regs.c
is_single_threaded.c
jedec_ddr_data.c
kasprintf.c
Kconfig Move all the dma-mapping code to kernel/dma 2018-06-20 16:30:01 +09:00
Kconfig.debug fault-injection: reorder config entries 2018-06-15 07:55:24 +09:00
Kconfig.kasan kasan: depend on CONFIG_SLUB_DEBUG 2018-06-28 11:16:44 -07:00
Kconfig.kgdb
Kconfig.ubsan
kfifo.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
klist.c
kobject.c
kobject_uevent.c netns: restrict uevents 2018-05-01 10:22:41 -04:00
kstrtox.c
kstrtox.h
lcm.c
libcrc32c.c
list_debug.c
list_sort.c
llist.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lockref.c
logic_pio.c
lru_cache.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
lshrdi3.c
Makefile Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-24 19:36:16 +08:00
memory-notifier-error-inject.c
memweight.c
muldi3.c
net_utils.c
netdev-notifier-error-inject.c
nlattr.c netlink: Return extack message if attribute validation fails 2018-06-28 16:18:04 +09:00
nmi_backtrace.c
nodemask.c
notifier-error-inject.c
notifier-error-inject.h
of-reconfig-notifier-error-inject.c
oid_registry.c
once.c
parman.c
parser.c
pci_iomap.c
percpu-refcount.c
percpu_counter.c
percpu_ida.c lib/percpu_ida.c: don't do alloc from per-CPU list if there is none 2018-06-28 11:16:44 -07:00
percpu_test.c
plist.c
pm-notifier-error-inject.c
prime_numbers.c
radix-tree.c idr: fix invalid ptr dereference on item delete 2018-05-25 18:12:10 -07:00
random32.c
ratelimit.c
rational.c
rbtree.c
rbtree_test.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
reciprocal_div.c lib: reciprocal_div: implement the improved algorithm on the paper mentioned 2018-07-07 01:45:31 +02:00
refcount.c locking/refcounts: Implement refcount_dec_and_lock_irqsave() 2018-06-12 23:33:25 +02:00
rhashtable.c rhashtable: clean up dereference of ->future_tbl. 2018-06-22 13:43:28 +09:00
sbitmap.c treewide: kzalloc_node() -> kcalloc_node() 2018-06-12 16:19:22 -07:00
scatterlist.c for-linus-20180629 2018-06-30 10:47:46 -07:00
seq_buf.c
sg_pool.c
sg_split.c
sha1.c
sha256.c
show_mem.c
siphash.c
smp_processor_id.c
sort.c
stackdepot.c
stmp_device.c
string.c
string_helpers.c
strncpy_from_user.c
strnlen_user.c
syscall.c
test-kstrtox.c
test-string_helpers.c
test_bitmap.c lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly 2018-05-18 17:17:12 -07:00
test_bpf.c test_bpf: flag tests that cannot be jited on s390 2018-06-28 23:58:39 +02:00
test_debug_virtual.c
test_firmware.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
test_hash.c
test_hexdump.c
test_kasan.c
test_kmod.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
test_list_sort.c
test_module.c
test_overflow.c test_overflow: fix an IS_ERR() vs NULL bug 2018-06-12 16:19:22 -07:00
test_parman.c
test_printf.c Revert "lib/test_printf.c: call wait_for_random_bytes() before plain %p tests" 2018-06-25 13:44:20 +02:00
test_rhashtable.c rhashtable: remove nulls_base and related code. 2018-06-22 13:43:27 +09:00
test_siphash.c
test_sort.c
test_static_key_base.c
test_static_keys.c
test_string.c
test_sysctl.c
test_ubsan.c
test_user_copy.c
test_uuid.c
textsearch.c
timerqueue.c
ts_bm.c
ts_fsm.c
ts_kmp.c
ubsan.c
ubsan.h
ucmpdi2.c
ucs2_string.c lib/ucs2_string.c: add MODULE_LICENSE() 2018-06-07 17:34:39 -07:00
usercopy.c
uuid.c
vsprintf.c Printk changes for 4.18 2018-06-06 16:04:55 -07:00
win_minmax.c
xxhash.c