python - Numba function slower than C++ and loop re-order further slows down x10 -


the following code simulates extracting binary words different locations within set of images.

the numba wrapped function, wordcalc in code below, has 2 problems:

  1. it 3 times slower compared similar implementation in c++.
  2. most strangely, if switch order of "ibase" , "ibit" for-loops, speed drops factor of 10 (!). not happen in c++ implementation remains unaffected.

i'm using numba 0.18.2 winpython 2.7

what causing this?

imdim = 80 numinsts = 10**4 numinstssub = 10**4/4 bitsnum = 13;  xs = np.random.rand(numinsts, imdim**2)        iinstinds = np.array(range(numinsts)[::4]) baseinds = np.arange(imdim**2 - imdim*20 + 1) ofst1 = np.random.randint(0, imdim*20, bitsnum) ofst2 = np.random.randint(0, imdim*20, bitsnum)  @nb.jit(nopython=true) def wordcalc(xs, iinstinds, baseinds, ofst, bitsnum, newxz):     count = 0     in iinstinds:         xi = xs[i]                 ibit in range(bitsnum):             ibase in range(baseinds.shape[0]):                                     u = xi[baseinds[ibase] + ofst[0, ibit]] > xi[baseinds[ibase] + ofst[1, ibit]]                 newxz[count, ibase] = newxz[count, ibase] | np.uint16(u * (2**ibit))         count += 1     return newxz  ret = wordcalc(xs, iinstinds, baseinds, np.array([ofst1, ofst2]), bitsnum, np.zeros((iinstinds.size, baseinds.size), dtype=np.uint16)) 

i 4x speed-up changing np.uint16(u * (2**ibit)) np.uint16(u << ibit); i.e. replace power of 2 bitshift, should equivalent (for integers).

it seems reasonably c++ compiler might making substitution itself.

swapping order of 2 loops makes small difference me both original version (5%) , optimized version (15%), can't think can make useful comment on that.

if wanted compare numba , c++ can @ compiled numba function doing os.environ['numba_dump_assembly']='1' before import numba. (that's quite involved though).

for reference, i'm using numba 0.19.1.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -