Máté Nagy, János Tapolcai and Gábor Rétvári
R3D3: A Doubly Opportunistic Data Structure for Compressing and Indexing Massive Data
Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in iformationtheoretically minimum space. Yet, efficient data processing requires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.
Reference:
DOI: 10.36244/ICJ.2019.2.7
Please cite this paper the following way:
Máté Nagy, János Tapolcai and Gábor Rétvári, "R3D3: A Doubly Opportunistic Data Structure for Compressing and Indexing Massive Data", Infocommunications Journal, Vol. XI, No 2, June 2019, pp. 58-66. DOI: 10.36244/ICJ.2019.2.7