Affordable Access

Publisher Website

MUD: Mapping-based query processing for high-dimensional uncertain data

Information Sciences
Publication Date
DOI: 10.1016/j.ins.2012.02.023
  • High-Dimensional Uncertain Data
  • Probabilistic Threshold Range Query
  • Computer Science
  • Mathematics


Abstract Many real-world applications require management of uncertain data that are modeled as objects in high-dimensional space with imprecise values. In such applications, data objects are typically associated with probability density functions. A fundamental operation on such uncertain data is the probabilistic-threshold range query (PTRQ), which retrieves the objects appearing in the query region with probabilities no less than a specified value. In this paper, we propose a novel framework called MUD for efficient processing of PTRQs on high-dimensional uncertain data. We first propose a cost-effective pruning technique based on a very simple form of probabilistic pruning information (PPI), namely the probabilistic quantiles. Then we map high-dimensional uncertain objects to a single-dimensional space, where the quantiles of uncertain objects can be indexed using the existing single-dimensional indices such as the B+-tree. Each PTRQ in the high-dimensional space is transformed into multiple range queries on the single-dimensional space and evaluated there. We also discuss a method to optimize the indexing scheme for MUD. Specifically, we formulate a mathematical model for measuring the “pruning power” of quantiles, and propose a dynamic programming algorithm which selects the “best” quantiles for mapping and indexing. We perform extensive experiments on both synthetic and real data sets. Our experimental results reveal that the MUD framework is both effective and efficient for processing PTRQs on high-dimensional uncertain data, and it can significantly outperform state-of-the-art schemes.

There are no comments yet on this publication. Be the first to share your thoughts.