mix2normal {VGAM} | R Documentation |
Estimates the five parameters of a mixture of two univariate normal distributions by maximum likelihood estimation.
mix2normal(lphi = "logitlink", lmu = "identitylink", lsd = "loglink", iphi = 0.5, imu1 = NULL, imu2 = NULL, isd1 = NULL, isd2 = NULL, qmu = c(0.2, 0.8), eq.sd = TRUE, nsimEIM = 100, zero = "phi")
lphi,lmu,lsd |
Link functions for the parameters phi,
mu, and
sd.
See |
iphi |
Initial value for phi, whose value must lie between 0 and 1. |
imu1, imu2 |
Optional initial value for mu1 and
mu2. The default is to compute initial values
internally using the argument |
isd1, isd2 |
Optional initial value for sd1 and
sd2. The default is to compute initial values
internally based on the argument |
qmu |
Vector with two values giving the probabilities relating
to the sample quantiles for obtaining initial values for
mu1 and mu2. The two values are fed in
as the |
eq.sd |
Logical indicating whether the two standard deviations should
be constrained to be equal. If |
nsimEIM |
|
zero |
May be an integer vector
specifying which linear/additive predictors are modelled as
intercept-only. If given, the value or values can be from the
set 1,2,...,5.
The default is the first one only, meaning phi
is a single parameter even when there are explanatory variables.
Set |
The probability density function can be loosely written as
f(y) = phi * N(mu1, sd1) + (1-phi) * N(mu2, sd2)
where phi is the probability an observation belongs
to the first group.
The parameters mu1 and mu2 are the
means, and sd1 and sd2 are the
standard deviations. The parameter phi satisfies
0 < phi < 1.
The mean of Y is
phi*mu1 + (1-phi)*mu2
and this is returned as the fitted values.
By default, the five linear/additive predictors are
(logit(phi), mu1, log(sd1), mu2, log(sd2))^T.
If eq.sd = TRUE
then sd1=sd2
is enforced.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions
such as vglm
,
and vgam
.
Numerical problems can occur and
half-stepping is not uncommon.
If failure to converge occurs, try inputting better initial
values,
e.g., by using iphi
,
qmu
,
imu1
,
imu2
,
isd1
,
isd2
,
etc.
This VGAM family function is experimental and should be used with care.
Fitting this model successfully to data can be difficult due to numerical problems and ill-conditioned data. It pays to fit the model several times with different initial values and check that the best fit looks reasonable. Plotting the results is recommended. This function works better as mu1 and mu2 become more different.
Convergence can be slow, especially when the two component
distributions are not well separated.
The default control argument trace = TRUE
is to encourage
monitoring convergence.
Having eq.sd = TRUE
often makes the overall optimization
problem easier.
T. W. Yee
McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. New York: Wiley.
Everitt, B. S. and Hand, D. J. (1981). Finite Mixture Distributions. London: Chapman & Hall.
uninormal
,
Normal
,
mix2poisson
.
## Not run: mu1 <- 99; mu2 <- 150; nn <- 1000 sd1 <- sd2 <- exp(3) (phi <- logitlink(-1, inverse = TRUE)) rrn <- runif(nn) mdata <- data.frame(y = ifelse(rrn < phi, rnorm(nn, mu1, sd1), rnorm(nn, mu2, sd2))) fit <- vglm(y ~ 1, mix2normal(eq.sd = TRUE), data = mdata) # Compare the results cfit <- coef(fit) round(rbind('Estimated' = c(logitlink(cfit[1], inverse = TRUE), cfit[2], exp(cfit[3]), cfit[4]), 'Truth' = c(phi, mu1, sd1, mu2)), digits = 2) # Plot the results xx <- with(mdata, seq(min(y), max(y), len = 200)) plot(xx, (1-phi) * dnorm(xx, mu2, sd2), type = "l", xlab = "y", main = "red = estimate, blue = truth", col = "blue", ylab = "Density") phi.est <- logitlink(coef(fit)[1], inverse = TRUE) sd.est <- exp(coef(fit)[3]) lines(xx, phi*dnorm(xx, mu1, sd1), col = "blue") lines(xx, phi.est * dnorm(xx, Coef(fit)[2], sd.est), col = "red") lines(xx, (1-phi.est)*dnorm(xx, Coef(fit)[4], sd.est), col="red") abline(v = Coef(fit)[c(2,4)], lty = 2, col = "red") abline(v = c(mu1, mu2), lty = 2, col = "blue") ## End(Not run)