pybloomfiltermmap3: a fast implementation of Bloom filter for Python¶
pybloomfiltermmap3 is a Python 3 fork of pybloomfiltermmap by Michael Axiak (@axiak).
Bloom filter is a probabilistic data structure used to test whether an element is a member of a set. The wikipedia page has further information on their nature.
This module implements a Bloom filter in Python that’s fast and uses mmap files for better scalability.
Here’s a quick example:
>>> from pybloomfilter import BloomFilter
>>> bf = BloomFilter(10000000, 0.01, 'filter.bloom')
>>> with open("/usr/share/dict/words") as f:
>>> for word in f:
>>> bf.add(word.rstrip())
>>> print 'apple' in bf
True
That wasn’t so hard, was it? Now, there are a lot of other things we can do. For instance, let’s say we want to create a similar filter with just a few pieces of fruit:
>>> fruitbf = bf.copy_template("fruit.bloom")
>>> fruitbf.update(("apple", "banana", "orange", "pear"))
>>> print(fruitbf.to_base64())
"eJzt2k13ojAUBuA9f8WFyofF5TWChlTHaPzqrlqFCtj6gQi/frqZM2N7aq3Gis59d2ye85KTRbhk"
"0lyu1NRmsQrgRda0I+wZCfXIaxuWv+jqDxA8vdaf21HIOSn1u6LRE0VL9Z/qghfbBmxZoHsqM3k8"
"N5XyPAxH2p22TJJoqwU9Q0y0dNDYrOHBIa3BwuznapG+KZZq69JUG0zu1tqI5weJKdpGq7PNJ6tB"
"GKmzcGWWy8o0FeNNYNZAQpSdJwajt7eRhJ2YM2NOkTnSsBOCGGKIIYbY2TA663GgWWyWfUwn3oIc"
"fyLYxeQwiF07RqBg9NgHrG5ba3jba5yl4zS2LtEMMcQQQwwxmRiBhPGOJOywIPafYhUwqnTvZOfY"
"Zu40HH/YxDexZojJwsx6ObDcT7D8vVOtJBxiAhD/AjMmjeF2Wnqd+5RrHdo4azPEzoANabiUhh0b"
"xBBDDDHEENsf8twlrizswEjDhnTbzWazbGKpQ5k07E9Ox2iFvXBZ2D9B7DawyqLFu5lshhhiiGUK"
"a4nUloa9yxkwR7XhgPPXYdhRIa77uDtnyvqaIXalGK02ufv3J36GmsnG4lquPnN9gJo1VNxqgYbt"
"ji/EC8s1PWG5fuVizW4Jox6/3o9XxBBDDLFbwcg9v/AwjrPHtTRsX34O01mxLw37bhCTjJk0+PLK"
"08HYd4MYYojdKmYnBfjsktEpySY2tGGZzWaIIfYDGB271Yaieaat/AaOkNKb"
Why pybloomfilter?¶
As already mentioned, there are a couple reasons to use this module:
It natively uses mmaped files.
It natively does the set things you want a Bloom filter to do.
It is fast (see benchmarks).
Install¶
Please note that this version is for Python 3.5 and over. In case you are using Python 2, please see pybloomfiltermmap.
To build and install:
$ pip install pybloomfiltermmap3
Develop¶
To develop you will need Cython. The setup.py script should automatically build from Cython source if the Cython module is available.