Affordable Access

Access to the full text

Interval Estimation for Aggregate Queries on Incomplete Data

Authors
  • Zhang, An-Zhen1
  • Li, Jian-Zhong1
  • Gao, Hong1
  • 1 Harbin Institute of Technology, Harbin, 150001, China , Harbin (China)
Type
Published Article
Journal
Journal of Computer Science and Technology
Publisher
Springer-Verlag
Publication Date
Nov 22, 2019
Volume
34
Issue
6
Pages
1203–1216
Identifiers
DOI: 10.1007/s11390-019-1970-4
Source
Springer Nature
Keywords
License
Yellow

Abstract

Incomplete data has been a longstanding issue in the database community, and the subject is yet poorly handled by both theories and practices. One common way to cope with missing values is to complete their imputation (filling in) as a preprocessing step before analyses. Unfortunately, not a single imputation method could impute all missing values correctly in all cases. Users could hardly trust the query result on such complete data without any confidence guarantee. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than to impute the missing values. An interval estimation, composed of the upper and the lower bound of aggregate query results among all possible interpretations of missing values, is presented to the end users. The ground-truth aggregate result is guaranteed to be among the interval. We believe that decision support applications could benefit significantly from the estimation, since they can tolerate inexact answers, as long as there are clearly defined semantics and guarantees associated with the results. Our main techniques are parameter-free and do not assume prior knowledge about the distribution and missingness mechanisms. Experimental results are consistent with the theoretical results and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data.

Report this publication

Statistics

Seen <100 times