Affordable Access

Publisher Website

Top-k best probability queries and semantics ranking properties on probabilistic databases

Data & Knowledge Engineering
DOI: 10.1016/j.datak.2013.04.005
  • Top-Kquery
  • Ranking Query
  • Probabilistic Data
  • Query Processing
  • Uncertain Data
  • Computer Science
  • Linguistics


Abstract There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalized services, and decision making. In probabilistic relational databases, the most common problem in answering top-k queries (ranking queries) is selecting the top-k result based on scores and top-k probabilities. In this paper, we firstly propose novel answers to top-k best probability queries by selecting the probabilistic tuples which have not only the best top-k scores but also the best top-k probabilities. An efficient algorithm for top-k best probability queries is introduced without requiring users to define a threshold. The top-k best probability approach is more efficient and effective than the probability threshold approach (PT-k) [1,2]. Second, we add the “k-best ranking score” into the set of semantic properties for ranking queries on uncertain data proposed by [3,4]. Then, our proposed method is analyzed, which meets the semantic ranking properties on uncertain data. In addition, it proves that the answers to the top-k best probability queries overcome drawbacks of previous definitions of the top-k queries on probabilistic data in terms of semantic ranking properties. Lastly, we conduct an extensive experimental study verifying the effectiveness of answers to the top-k best probability queries compared to PT-k queries on uncertain data and the efficiency of our algorithm against the state-of-the-art execution of the PT-k algorithm using both real and synthetic data sets.

There are no comments yet on this publication. Be the first to share your thoughts.


Seen <100 times