Maybe you have seen something like this when observing the log likelihood derivations for multivariate Gaussians

$$ \ln p(X|\mu, \Sigma) = \frac{1}{2}\ln|\Sigma|- \frac{1}{2}X^{T}\Sigma^{-1}X + const = \frac{1}{2}\ln|\Sigma| – \frac{1}{2}Tr(\Sigma^{-1}XX^{T}) + const $$

and you wondered where that $$Tr$$ came from. Here you can find a great explanation but I thought I would write it down for myself as well.

The exponential term in the multivariate Gaussian can be rewritten with a trace term i.e.

$$x^{T}\Sigma^{-1}x = tr(\Sigma^{-1}) x x^{T})$$

This comes from two neat properties of the trace

- $$tr(AB) = tr(BA)$$ if all dimensions work out.
- $$tr(c) = c$$ when $$c$$ is a constant i.e. a $$1\times1$$ matrix.

Looking at the exponential term in the Gaussian we realize that it is just a $$1 \times 1$$ matrix meaning that we can write

$$ x^{T}\Sigma^{-1}x = tr(x^{T}\Sigma^{-1}x) = tr(\Sigma^{-1} x x^{T})$$