Expectation Maximization
17.7. Expectation Maximization#
Expectation-Maximization (EM) [26] method is a maximum likelihood based estimation paradigm. It requires an explicit probabilistic model of the mixed data-set. The algorithm estimates model parameters and the segmentation of data in Maximum-Likelihood (ML) sense.
We assume that
We assume that
Each component distribution can then be modeled as a
conditional (continuous) distribution
With
The log-likelihood function for the dataset
is given by
An ML estimate of the parameters, namely
We start with noting that
Thus,
The first term is
expected complete log-likelihood function
and the second term is the
conditional entropy of
Let us introduce auxiliary variables
Then, we have
where, we have written
An iterative maximization approach can be introduced as follows:
Maximize
w.r.t. keeping as constant.Maximize
w.r.t. keeping as constant.Repeat the previous two steps till convergence.
This is essentially the EM algorithm. Step 1 is known
as E-step and step 2 is known as the M-step.
In the E-step, we are estimating the expected membership
of each sample being drawn from each component distribution.
In the M-step, we are maximizing the
expected complete log-likelihood
function as the conditional entropy term
doesn’t depend on
Using Lagrange multiplier, we can show that the optimal
A closed form solution for the
In
EM-method is a good method for a hybrid dataset consisting of mixture of component distributions. Yet, its applicability is limited. We need to have a good idea of the number of components beforehand. Further, for a Gaussian Mixture Model (GMM), it fails to work if the variance in some of the directions is arbitrarily small [82]. For example, a subspace like distribution is one where the data has large variance within a subspace but almost zero variance orthogonal to the subspace. The EM method tends to fail with subspace like distributions.