notes for mike book
Notes for Missing Data in Longitudinal Studies by Mike Daniels
MTM
- $logit(\mu_{ij}) = x\beta$
- $\log(\phi) = \delta + y\gamma_{j}$
Misc
- missing assumption can not be verified
- when posterior is not closed form, use sample from posterior
- use diffuse prior
- prior can be improper, but posterior should be proper or just proper
- in complete data, variance parameter is ancillary but important for efficiency. In missing data scenario, variance parameter is important for consistency.
- For missing data, Bayesian allow assumptions about nonidentified parameters and uncertainty about the assumption. While frequentist uses fixed value or constraints.
Some Recommendation
- $\sigma$: truncated uniform or folded normal
- $\tau$ : uniform shrinkage
- $\Sigma$ : flat prior , reference prior, just proper wishart. Since improper prior can not be used in Winbugs, so use just proper priors.
Posterior Consistency
$p(\theta | y) \to \theta_{0}$ which is a point mass (ture value) as $n \to \infty$ the sample size goes to infinity |
Sampling
Gibbs
- each can be conjugate
- If not random walk, optimal acceptance rate could be 100%
- Choose normal approximation or Laplace approximation for full conditional distribution
Recommendation:
- random walk for 1 parameter and which is not expensive
- when not expensive, can be applied on full distribution
- works on heavy tailed well
Data Augmentation
What we want is $p(\theta | y)$, use auxiliary variable z: |
-
$p(z \theta, y)$ -
$p(\theta z, y)$
- If there is latent variable, it suit better (probit, random effect)
- often used for multivariate t distribution
Inference
After getting samples, the problem is how to use samples to get inference
- discard K (burn-in)
- multiple chain
- autocorrelated (lag k) (lag 10 is fine)
- block gibbs or non-random walk used to speed mixing by reducing autocorrelation
- thinning, but inefficient
- batching is more efficient
Model Selection
DIC
Drawbacks:
- likelihood based
- not invariant
- no closed likelihood
PPL
Nonparametric
-
$p(y \theta)$ - $\theta \sim G(\theta)$
- $G \sim DP(G_{0}, \alpha)$
$\alpha$ controls how similar nonparametric G is close to $G_{0}$. $\alpha \to \infty \rightarrow G \to G_{0}$.
- can check random effect distribution
Missing
-
Full data : $p(y, r \theta, \omega)$ -
Full data response model : $p(y x, \theta)$
MAR
-
For monotone dropout, MAR is equivalent to $P(U=j Y) = P(U=j Y_{j})$, which means the hazard is only relevant to previous observed one. -
MAR + Ignorability goes to simplification
-
By $p(y_{2} y_{1}, r= 0, \theta) = P(y_{2} y_{1}, r = 1, \theta)$, it is easy to impute data and then model regression from completed data - MTM not suitable
- Example 5.9: MAR can not assure ignorability
MNAR (Non ignorable)##
- use unvarified parametric assumption
- use informative prior
Selection Model
$y, r = p(y)p(r | y)$ |
Pros:
-
$y x$ -
$r y$ -
when monotone dropout, $r y$ is hazard function
Cons:
- sensitive to model specification
- sometimes identification problem
Example:
$g(\pi_{i}) = \phi_{0} + \phi_{1}y_{1} + \phi_{2}y_{2}$
if $\phi_{2} = 0$ is equivalent to MAR
cons:
- assumptions can not be verified
- potential sensitivity problem
Mixture Model
$p(y, r) = p(y | r,x,\omega)p(r | x, \omega)$ |
cons:
-
$y r$ is sensible
$p(y_{2},y_{1} | r, \alpha) = p(y_{2} | y_{1}, r, \alpha_{E})p(y_{1} | r, \alpha_{o})$ |
In general, $\alpha_{E}$ can not fully identified from the data alone.
$\alpha_{E}, \alpha_{O}$ may overlap. So assume
$\alpha_{E} = (\alpha_{EI},\alpha_{NI}$
where $\alpha_{NI}$ can be used to do sensitivity analysis, or use informative prior
Example: bivariate normal
cons:
- when dimension of observations goes up, number of $\alpha_{NI}$ goes up too
Shared Parameter Model
Ignorability
-
$p(ymis yobs, \theta)$ using Data augmentation -
$p(\theta y)$ using gibbs sampler
Strongly rely on the assumption of $p(y | \theta)$ |
Misspecification leads to inconsistency.
GARP/IV
-
for ignorable data, only need to specify $p(y \omega)$ -
for non-ignorable: $p(y, r \omega)$ is mandatory
NONIGNORABLE
- sensitivity parameter and sensitivity analysis
- informative prior
- not identifiable from observed data but when they are fixed, remainder is identified.
Selection Model
parametric model are not suitable to sensitivity analysis
Semiparametric Selection Model
-
parametric form for $r y$ -
non or semi-parametric form for $y \omega$ or $y,r \omega$ - informative prior on $\phi_{2}$
pros:
-
specify $r y, y \omega$, and $\beta$ explicit
cons:
- sensitivity analysis
Mixture Model
$p(y | \omega,x) = \sum_{r}(y,r | \omega, x)$ |
Identification strategy:
Usually mixture model is used to deal with MNAR.
For monotone dropout, MAR is equivalent to ACMV constraints.
Table here.
Interior Family Constraints (IFC)
- complete case missing value (CCMV)
- nearest-neighbour (NN)
- available case (ACMV) , for monotone dropout, MAR == ACMV
- non future dependence missing value restrictions
- identification via extrapolation: $\Sigma$ is common across pattern , then identified
Mixture Model with discrete time dropout
Pattern Mixture Model with Covariates
- linear link: $\beta = \sum \phi_s\beta^{s}$
- nonlinear link: $\beta = \sum_{s} \partial{1}{x} g^{-1}(x\beta^{s}) \phi_{s}(x)$, when $\phi_{s}(x)$ does not depend on x.
So when dealing with mixture model with covariates, be sure to check if
- mean is linear in covariates (identical link or something)
-
missingness depends on covariates: $r x$ - covariates effects are time varying
Shared Parameter Model
$y | b$ is independent with $r | b$ |
-
$y x,z \sim N(x\beta+z,\sigma^2)$ -
$h_u(t z) = h_{0}(t)\exp(z\gamma)$
if $\gamma = 0 $ , then MAR, otherwise, MNAR
pros:
- good at hazard
- complex data structure
- latent variable
- good at jointly analyzing repeated measure and event time
cons:
-
$y u$ is not explicit ($p(u y)$ or p(y u)) - $h_{u}$ depend on future observations
-
hard to separate p(ymis yobs, u), ie, hard to embed MAR - rely on $b$ distribution
Model Selection in Nonignorable Models
based on p(y,r | $\omega$) , instead of p(y | $\theta$). While for |
ignorable data, based on p(y | $\theta$) |
- DIC : can be obtained in winbugs
- PPL: posterior predictive checks
Ch9: Informative Priors and Sensitivity Analysis
- sensitivity for UN-verifiable missing assumption
- incorporate subjective belief
Pattern Mixture Model
$p(y, r | \alpha) = p(y | r,)p(r) = p(ymis | yobs, r)p(yobs | r) p(r)$ |
Sensitivity Analysis
- fix value constant
- exam inference across range
- assign appropriate prior
- prefer mixture model than selection model
- freq: fix $\delta$ and see how inference change with $\delta$
- Bayesian: specify priors on $\delta$ based on the belief of missing mechanism.
Examples
Spcify Priors
- MAR with no uncertainty
- MAR with uncertainty
- MNAR with no uncertainty
- MNAR with uncertainty
Examples