Affordable Access

Biometrische Analyse von Imputationsmethoden für Fehlwerte in Gauß-Mischverteilungen

Authors
Publication Date
Keywords
  • Missing Values
  • Multivariate Finite Gaussian Mixture Models
  • Multiple Imputation
  • Data Augmentation
  • Em-Algorithm
Disciplines
  • Computer Science

Abstract

In clinical research often only incomplete data are available for evaluation. Due to this many statistical methods cannot be used. Usually the missing values are replaced by plausible imputation values. Furthermore, the normal distribution is most commonly used to approximate the distribution of the data, which rarely reflect the reality. It rather happens that the drawn sample consists of several unknown subpopulations which can be modeled with finite Gaussian mixture distributions and whose parameters most often are estimated by the EM-algorithm. This work combines the two topics missing values and Gaussian mixture models and presents known multiple imputation methods for Gaussian mixture distributions as well as new methods with different imputation strategies. One of the new developed imputation strategies combines the idea of data augmentation with imputation of observed values. Another new strategy is a variation of the expectation maximization algorithm for Gaussian mixture models, which is an alternative solution to current Bayesian approaches. A comparison to imputation methods under normal distribution assumption proves the better imputation quality when Gaussian mixture distributions are taken into account. The imputation methods were analyzed in a comprehensive simulation study that includes various combinations of missing value mechanisms and missing value rates considering the estimated parameters of the mixture model. For the analysis three so far for Gaussian mixture models not used criteria are defined. One is the Mahalanobis-distance which includes all distribution parameters. Another one is a combination of relative RMSE/BIAS values to analyze the imputation quality for separate parameter groups. Moreover, a graphical quality check for imputation values in Gaussian mixture models is shown. The results of the simulation study discover the clear superiority of multiple imputation strategies that involve the mixing distribution assumption. The use of multiple methods with mixing distribution assumption is also recommended for mixture models with small components. Compared with some known imputation methods, the new combined imputation strategy calculated more appropriate imputation values whereas the EM imputation presented a more efficient evaluation of imputation values. To differentiate between the imputation quality in Gaussian mixture distributions, the need of the presented appropriate analysis criteria could be confirmed.

There are no comments yet on this publication. Be the first to share your thoughts.