For a small number of clusters , it is difficult to estimate the between-cluster variance of the random effects. In applications with systematically missing data , there are no observed values in the cluster, so the cluster location cannot be estimated. The variation of the random slopes can be large, so the method used to deal with the missing data should account for this. The model contains aggregates of the level-1 variables, such as cluster means, which need to be taken in account during imputation.
The multilevel model may be very complex, it may not be possible to fit the model, or there are convergence problems. There is not one super-method that will address all such issues. In practice, we may need to emphasize certain issues at the expense of others. In order to gauge the complexity of the imputation task for particular dataset and model, ask yourself the questions listed in Table 7. Missing values in the level-1 predictors or the level-2 predictors have long been treated by listwise deletion. This is easy to do, but may have severe adverse effects, especially for missing values in level-2 predictors.
For example, we may not know whether a school is public or private. Ignoring all records pertaining to that school is not only wasteful, but may also lead to selection effects at cluster level.
misaem: Logistic Regression with Missing Covariates
Another ad-hoc solution is to ignore the clustering and impute the data by a single-level method. It is known that this will underestimate the intra-class correlation Taljaard, Donner, and Klar ; Van Buuren ; Enders, Mistler, and Keller The amount of underestimation grows with the ICC and with the missing data rate. Increasing the cluster size hardly aids in reducing this bias. In addition, the regression weights for the fixed effects will be biased.
Solved: Missing values in logistic regression - SAS Support Communities
Conducting multiple imputation with the wrong model e. Another ad-hoc technique is to add a dummy variable for each cluster, so that the model estimates a separate coefficient for each cluster. The coefficients are estimated by ordinary least squares, and the parameters are drawn from their posteriors.
If the missing values are restricted to the outcome, this method will estimate the fixed effects quite well, but also artificially inflates the true variation between groups, and thus biases the ICC upwards Andridge ; Van Buuren ; Graham If there are also missing values in the predictors, the level-1 regression weights will be unbiased, but the level-2 weights are biased, in particular for small clusters and low ICC.
Since the bias in random slopes and variance components can be substantial, one should turn to multilevel imputation to obtain proper estimates of those parts of the multilevel model Speidel, Drechsler, and Sakshaug Vink, Lazendic, and Van Buuren described an application of Australian school data with over 2. Given the size and complexity of the imputation problem, this application would have been computationally infeasible with full multilevel imputation.
Thus, for large databases, adding a dummy variable per cluster is a practical and useful technique for estimating the fixed effects. There is an extensive literature, especially for longitudinal data Verbeke and Molenberghs ; Molenberghs and Verbeke ; Daniels and Hogan For more details, see the encyclopaedic overview in Fitzmaurice et al. Multilevel models have the ability to handle models with varying time points, which is an advance over traditional repeated-measures ANOVA, where the usual treatment is to remove the entire case if one of the outcomes is missing.
Multilevel models do not assume an equal number of occasions or fixed time points, so all cases can be used for analysis. Missing outcome data are easily handled in modern likelihood-based methods. Mixed-effects models can be fit with maximum-likelihood methods, which take care of missing data in the dependent variable. Flexible Imputation of Missing Data Want the hardcopy?
Missing values in the measured variables of the multilevel model can occur in the outcome variable; the level-1 predictors; the level-2 predictors; the class variable. Some of these are as follows: For small clusters the within-cluster mean and variance are unreliable estimates, so the choice of the prior distribution becomes critical. An integer that is used as argument by the set. Default is to leave the random number generator alone.
A data frame of the same size and type as data , without missing data, used to initialize imputations before the start of the iterative process. The default NULL implies that starting imputation are created by a simple random draw from the data. Note that specification of data. Generates multiple imputations for incomplete multivariate data by Gibbs sampling. Missing data can occur anywhere in the data. The algorithm imputes an incomplete column the target column by generating 'plausible' synthetic values given other columns in the data. Each incomplete column must act as a target column, and has its own specific set of predictors.
The default set of predictors for a given target consists of all other columns in the data. For predictors that are incomplete themselves, the most recently generated imputations are used to complete the predictors prior to imputation of the target column. A separate univariate imputation model can be specified for each column.
The default imputation method depends on the measurement level of the target column. In addition to these, several other methods are provided. You can also write their own imputation functions, and call these from within the algorithm. The data may contain categorical variables that are used in a regressions on other variables. The algorithm creates dummy variables for the categories of these variables, and imputes these from the corresponding categorical variable.
Subscribe to thestatsgeek.com by email
These corresponding functions are coded in the mice library under names mice. The method argument specifies the methods to be used. For the j 'th column, mice calls the first occurrence of paste 'mice.
- Environmental tobacco smoke : measuring exposures and assessing health effects.
- Airway Management in Emergencies (Red and White Emergency Medicine Series).
- A Companion to Vergils Aeneid and its Tradition (Blackwell Companions to the Ancient World)?
The mechanism allows uses to write customized imputation function, mice. Passive imputation: mice supports a special built-in method, called passive imputation. This method can be used to ensure that a data transform always depends on the most recently generated imputations. In some cases, an imputation model may need transformed data in addition to the original data e.
Passive imputation maintains consistency among different transformations of the same data. This provides a simple mechanism for specifying deterministic dependencies among the columns. You should make sure that the combined observed and imputed parts of the target column make sense. An easy way to create consistency is by coding all entries in the target as NA , but for large data sets, this could be inefficient.
Note that you may also need to adapt the default predictorMatrix to evade linear dependencies among the predictors that could cause errors like Error in solve. In that way, deterministic relation between columns will always be synchronized. Argument ls. Auxiliary predictors in formulas specification: For a given block, the formulas specification takes precedence over the corresponding row in the predictMatrix argument.
This precedence is, however, restricted to the subset of variables specified in the terms of the block formula. Any variables not specified by formulas are imputed according to the predictMatrix specification. Variables with non-zero type values in the predictMatrix will be added as main effects to the formulas , which will act as supplementary covariates in the imputation model. Returns an S3 object of class mids multiply imputed data set. There is a detailed series of six online vignettes that walk you through solving realistic inference problems with mice.
Ad hoc methods and the MICE algorithm.
Convergence and pooling. Inspecting how the observed data and missingness are related. Passive imputation and post-processing. Imputing multilevel data. Sensitivity analysis with mice. Boca Raton, FL. The book Flexible Imputation of Missing Data. Second Edition. The first application of the method concerned missing blood pressure data Van Buuren et. The term Fully Conditional Specification was introduced in to describe a general class of methods that specify imputations model for multivariate data as a set of conditional distributions Van Buuren et.
Further details on mixes of variables and applications can be found in the book Flexible Imputation of Missing Data. Statistics in Medicine , 18 , Journal of Statistical Computation and Simulation , 76 , 12,