Affordable Access

Access to the full text

Kernel density estimation from complex surveys in the presence of complete auxiliary information

Authors
  • Mostafa, Sayed A.1, 2
  • Ahmad, Ibrahim A.3
  • 1 Indiana University, Department of Statistics, Bloomington, IN, USA , Bloomington (United States)
  • 2 North Carolina A&T State University, Department of Mathematics, Greensboro, NC, USA , Greensboro (United States)
  • 3 Oklahoma State University, Department of Statistics, Stillwater, OK, USA , Stillwater (United States)
Type
Published Article
Journal
Metrika
Publisher
Springer Berlin Heidelberg
Publication Date
Jan 01, 2019
Volume
82
Issue
3
Pages
295–338
Identifiers
DOI: 10.1007/s00184-018-0703-y
Source
Springer Nature
Keywords
License
Yellow

Abstract

Auxiliary information is widely used in survey sampling to enhance the precision of estimators of finite population parameters, such as the finite population mean, percentiles, and distribution function. In the context of complex surveys, we show how auxiliary information can be used effectively in kernel estimation of the superpopulation density function of a given study variable. We propose two classes of “model-assisted” kernel density estimators that make efficient use of auxiliary information. For one class we assume that the functional relationship between the study variable Y and the auxiliary variable X is known, while for the other class the relationship is assumed unknown and is estimated using kernel smoothing techniques. Under the first class, we show that if the functional relationship can be written as a simple linear regression model with constant error variance, the mean of the proposed density estimator will be identical to the well-known regression estimator of the finite population mean. If we drop the intercept from the linear model and allow the error variance to be proportional to the auxiliary variable, the mean of the proposed density estimator matches the ratio estimator of the finite population mean. The properties of the new density estimators are studied under a combined design-model-based inference framework, which accounts for the underlying superpopulation model as well as the randomization distribution induced by the sampling design. Moreover, the asymptotic normality of each estimator is derived under both design-based and combined inference frameworks when the sampling design is simple random sampling without replacement. For the practical implementation of these estimators, we discuss how data-driven bandwidth estimators can be obtained. The finite sample properties of the proposed estimators are addressed via simulations and an example that mimics a real survey. These simulations show that the new estimators perform very well compared to standard kernel estimators which do not utilize the auxiliary information.

Report this publication

Statistics

Seen <100 times