Machine Learning Math Science

Partially fixing numerical underflow for mixture models

In a mixture model the probability of an event $$x$$ is written

$$P(x=X) =\sum_{i}\pi_{i}P_{i}(x=X) $$,

where $$\pi_{i}$$ is the probability of the point belonging to mixture $$i$$ and $$P_{i}(x=X) $$ is the probability of the event $$x=X$$ in the $$i$$-th mixture. The problem is usually that the $$P_i$$ are small which makes underflow happen when you multiply them with big numbers such as $$\pi_i$$ or summing up the values. Underflow is simply your computer not having enough precision to handle all those tiny numbers and so rounding errors happen. For example $$1+\epsilon = 1$$ if $$\epsilon$$ is really small.

To fix underflow one usually operates in the log domain i.e.

$$\log P(x=X) = \log [ \sum_{i}\pi_{i}P_{i}(x=X) ]$$.

The problem with this is that the log cannot decompose sums and we still get underflow. To fix this(somewhat) we can write:

$$\log P(x) = \log [ \sum_{i}\pi_{i}P_{i}(x=X) ] = \log [ \sum_{i} \exp\big\{ \log\pi_{i} + \log P_{i}(x=X) \big\} ]$$

Now by finding the max value of the different sums $$\log(\pi_{i}) + \log ( P_{i}(x=X) )$$ and deducting it we can move out most of the probability mass in the equation. It is simple if one looks at the following calculations for a mixture model with $$2$$ mixture components.

$$\log p = \log [p_1 + p_2] = \log [ \exp(\log p_1) + \exp(\log p_2) ]$$
$$pMax = \max [\log p_1 ,\log p_2 ]$$
$$\log p = \log[ \exp(pMax) * ( \exp\big\{\log p_1 -pMax\big\} +\exp\big\{log p_2-pMax\big\}) ]$$
$$\log p = pMax + \log [ \exp(\log p_1 – pMax) + \exp( \log p_2 – pMax) ]$$

Now if we for example assume $$\log p_1 > \log p_2$$ then
$$\log p = pMax + \log [ \exp(0) + \exp\big\{\log p_2 -pMax\big\} ] = \\ pMax + \log [ 1 + \exp\big\{\log p_2 – pMax\big\} ]$$

This means that we gotten out most of the probability mass from the sum and we have made the exponent in the exponential closer to zero and thus less small. This of course doesn’t mean numerical issues will be a concern anymore but I believe we are more in the clear than before.

Below is an implementation in Matlab. Instructions in the file.


Leave a Reply

Your email address will not be published. Required fields are marked *