pybloomfiltermmap3: a fast implementation of Bloom filter for Python

pybloomfiltermmap3 is a Python 3 fork of pybloomfiltermmap by Michael Axiak (@axiak).

Bloom filter is a probabilistic data structure used to test whether an element is a member of a set. The wikipedia page has further information on their nature.

This module implements a Bloom filter in Python that’s fast and uses mmap files for better scalability.

Here’s a quick example:

>>> from pybloomfilter import BloomFilter

>>> bf = BloomFilter(10000000, 0.01, 'filter.bloom')
>>> with open("/usr/share/dict/words") as f:
>>>     for word in f:
>>>         bf.add(word.rstrip())

>>> print 'apple' in bf

That wasn’t so hard, was it? Now, there are a lot of other things we can do. For instance, let’s say we want to create a similar filter with just a few pieces of fruit:

>>> fruitbf = bf.copy_template("fruit.bloom")
>>> fruitbf.update(("apple", "banana", "orange", "pear"))

>>> print(fruitbf.to_base64())

Why pybloomfilter?

As already mentioned, there are a couple reasons to use this module:

  • It natively uses mmaped files.

  • It natively does the set things you want a Bloom filter to do.

  • It is fast (see benchmarks).


Please note that this version is for Python 3.5 and over. In case you are using Python 2, please see pybloomfiltermmap.

To build and install:

$ pip install pybloomfiltermmap3


To develop you will need Cython. The script should automatically build from Cython source if the Cython module is available.