Affordable Access

Publisher Website

fimpera: drastic improvement of Approximate Membership Query data-structures with counts

  • Robidou, Lucas
  • Peterlongo, Pierre
Publication Date
Dec 26, 2022
DOI: 10.1101/2022.06.27.497694
OAI: oai:HAL:hal-03912993v1
External links


Motivations: Approximate membership query data structures (AMQ) such as Cuckoo filters or Bloom filters are widely used for representing and indexing large sets of elements. AMQ can be generalized for additionally counting indexed elements, they are then called "counting AMQ". This is for instance the case of the "counting Bloom filters". However, counting AMQs suffer from false positive and overestimated calls. Results: In this work we propose a novel computation method, called fimpera, consisting of a simple strategy for reducing the false-positive rate of any AMQ indexing all k-mers (words of length k) from a set of sequences, along with their abundance information. This method decreases the false-positive rate of a counting Bloom filter by an order of magnitude while reducing the number of overestimated calls, as well as lowering the average difference between the overestimated calls and the ground truth. In addition, it slightly decreases the query run time. fimpera does not require any modification of the original counting Bloom filter, it does not generate false-negative calls, and it causes no memory overhead. The unique drawback is that fimpera yields a new kind of false positives and overestimated calls. However their amount is negligible. fimpera requires a unique parameter, and its results are only little impacted when using this parameter within recommended values. As a side note, for the algorithmic needs of the method, we also propose a novel generic algorithm for finding minimal values of a sliding window over a vector of x integers in O(x) time with zero memory allocation.

Report this publication


Seen <100 times