pybloomfiltermmap3: a fast implementation of Bloom filter for Python¶
pybloomfiltermmap3 is a Python 3 fork of pybloomfiltermmap by Michael Axiak (@axiak).
Bloom filter is a probabilistic data structure used to test whether an element is a member of a set. The wikipedia page has further information on their nature.
This module implements a Bloom filter in Python that’s fast and uses mmap files for better scalability.
Here’s a quick example:
>>> from pybloomfilter import BloomFilter >>> bf = BloomFilter(10000000, 0.01, 'filter.bloom') >>> with open("/usr/share/dict/words") as f: >>> for word in f: >>> bf.add(word.rstrip()) >>> print 'apple' in bf True
That wasn’t so hard, was it? Now, there are a lot of other things we can do. For instance, let’s say we want to create a similar filter with just a few pieces of fruit:
>>> fruitbf = bf.copy_template("fruit.bloom") >>> fruitbf.update(("apple", "banana", "orange", "pear")) >>> print(fruitbf.to_base64()) "eJzt2k13ojAUBuA9f8WFyofF5TWChlTHaPzqrlqFCtj6gQi/frqZM2N7aq3Gis59d2ye85KTRbhk" "0lyu1NRmsQrgRda0I+wZCfXIaxuWv+jqDxA8vdaf21HIOSn1u6LRE0VL9Z/qghfbBmxZoHsqM3k8" "N5XyPAxH2p22TJJoqwU9Q0y0dNDYrOHBIa3BwuznapG+KZZq69JUG0zu1tqI5weJKdpGq7PNJ6tB" "GKmzcGWWy8o0FeNNYNZAQpSdJwajt7eRhJ2YM2NOkTnSsBOCGGKIIYbY2TA663GgWWyWfUwn3oIc" "fyLYxeQwiF07RqBg9NgHrG5ba3jba5yl4zS2LtEMMcQQQwwxmRiBhPGOJOywIPafYhUwqnTvZOfY" "Zu40HH/YxDexZojJwsx6ObDcT7D8vVOtJBxiAhD/AjMmjeF2Wnqd+5RrHdo4azPEzoANabiUhh0b" "xBBDDDHEENsf8twlrizswEjDhnTbzWazbGKpQ5k07E9Ox2iFvXBZ2D9B7DawyqLFu5lshhhiiGUK" "a4nUloa9yxkwR7XhgPPXYdhRIa77uDtnyvqaIXalGK02ufv3J36GmsnG4lquPnN9gJo1VNxqgYbt" "ji/EC8s1PWG5fuVizW4Jox6/3o9XxBBDDLFbwcg9v/AwjrPHtTRsX34O01mxLw37bhCTjJk0+PLK" "08HYd4MYYojdKmYnBfjsktEpySY2tGGZzWaIIfYDGB271Yaieaat/AaOkNKb"
As already mentioned, there are a couple reasons to use this module:
Please note that this version is for Python 3.5 and over. In case you are using Python 2, please see pybloomfiltermmap.
To build and install:
$ pip install pybloomfiltermmap3