Random number generation in Cython

Reading time ~1 minute

Problem

In one of my programs, I had to perform (a lot of) random sampling from Python lists. So much that it ended up being my bottleneck.

Without going in too much details, the function was mostly generating random numbers and accessing elements in lists. I gave it a simple Cython try with something along these lines:

import random
import cython

@cython.boundscheck(False)
def sample(int n_sampling, l):
    a = []
    for _ in range(n_sampling):
        a.append(l[0])
    return a

@cython.boundscheck(False)
def rando(int n_sampling, l):
    a = []
    for _ in range(n_sampling):
        a.append(random.randrange(3))
    return a

Cython usually needs the following setup code:

from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("fast_sampler.pyx")
)

And the following command will build the code:

python setup.py build_ext --inplace

The timing results:

python -mtimeit -s"import fast_sampler" "fast_sampler.sample(10000,[1,2,3])"
5000 loops, best of 5: 61.8 usec per loop

Accessing elements in a loop seems quick enough.

python -mtimeit -s"import fast_sampler" "fast_sampler.rando(10000,[1,2,3])"
50 loops, best of 5: 5.77 msec per loop

However, the calls to random.randrange() seem to be the bottleneck.

If we add this cimport statement, we can directly call rand()

from libc.stdlib cimport rand

@cython.boundscheck(False)
def rando_c(int n_sampling, l):

    a = []
    for _ in range(n_sampling):
        a.append(rand() % 3)
    return a

And finally:

python -mtimeit -s"import fast_sampler" "fast_sampler.rando_c(10000,[1,2,3])"
2000 loops, best of 5: 104 usec per loop

Which brings a 50x speedup!

What about the seed ?

Usually, it is a good practice to add these lines in a Python code when dealing with random number generation (to ensure reproducibility):

random.seed(0)
np.random.seed(0)

By default, rand() always returns the same numbers, as long as you do not call srand() before, so you do not have to worry about them any more ! (At least, not in this part of your code).

Vim for datascience

There are plenty of tutorials here and there to have Python and vim interact beautifully. The point of this one is to provide some simple...… Continue reading

Random Greedy Forest tutorial

Published on January 09, 2022

Does gradient boosting overfit

Published on December 12, 2021