Maybe you have seen something like this when observing the log likelihood derivations for multivariate Gaussians

\( \ln p(X|\mu, \Sigma) = \frac{1}{2}\ln|\Sigma|- \frac{1}{2}X^{T}\Sigma^{-1}X + const = \frac{1}{2}\ln|\Sigma| - \frac{1}{2}Tr(\Sigma^{-1}XX^{T}) + const \)

and you wondered where that \(Tr\) came from. Here you can find a great explanation but I thought I would write it down for myself as well.

The exponential term in the multivariate Gaussian can be rewritten with a trace term i.e.

\(x^{T}\Sigma^{-1}x = tr(\Sigma^{-1}) x x^{T})\)

This comes from two neat properties of the trace

- \(tr(AB) = tr(BA)\) if all dimensions work out.
- \(tr(c) = c\) when \(c\) is a constant i.e. a \(1\times1\) matrix.

Looking at the exponential term in the Gaussian we realize that it is just a \(1 \times 1\) matrix meaning that we can write

\( x^{T}\Sigma^{-1}x = tr(x^{T}\Sigma^{-1}x) = tr(\Sigma^{-1} x x^{T})\)