The Rayleigh Quotient and the Norm Constraint

This post will try to explain why in the optimization of the Rayleigh Quotient one constrains the norm of \(x\) to be \(\|x\| = 1\)

Let's start with the definition, given a symmetric matrix, \(A\), the Rayleigh quotient is defined as a function \(R(x)\) from \(\mathbb{R}^{d}_{0}\) to \(\mathbb{R}\),

\begin{equation}
R(x) = \frac{x^T A x}{x^Tx}
\end{equation}

The maximization gives the following equivalent expression

\begin{equation}
\arg \max_{x} \frac{x^T A x}{x^Tx} = \arg \max_{x} \; x^T A x \; \textrm{sb.to.} \; \|x\|=1
\end{equation}

To untangle the norm constraint we can begin by remembering that a symmetric matrix is diagonalizable so we can write,

\begin{equation}
A=U \Lambda U^* = \sum_{i=1}^{d} \lambda_i U_i U_i^T,
\end{equation}

where \(\lambda_i\) are the eigenvalues of \(A\) and \(U_i\) the orthonormal vectors from the diagonalization. We can express \(x\) as a sum of the eigenvectors of \(A\), that is, \(x = \sum_{i=1}^{d} \alpha_i v_i\). This allows us to rewrite the quotient,

\begin{equation}
\begin{split}
x^Tx
& = \Big( \sum_{i=1}^{d} \alpha_i v_i \Big)^{T} \Big( \sum_{i=1}^{d} \alpha_i v_i \Big) \\
& = \{ \textrm{orthogonality + unit length} \} = \sum_{i=1}^{d} \alpha_i^2
\end{split}
\end{equation}

\begin{equation}
\begin{split}
x^T A x
& = \Big( \sum_{i=1}^{d} \alpha_i v_i \Big)^{T} \Big( \sum_{i=1}^{d} \alpha_i A v_i \Big)
= \Big( \sum_{i=1}^{d} \alpha_i v_i \Big)^{T} \Big( \sum_{i=1}^{d} \alpha_i \lambda_i v_i \Big) \\
& = \{ \textrm{orthogonality + unit length} \} = \sum_{i=1}^{d} \alpha_i^2 \lambda_i
\end{split}
\end{equation}

If we want to maximize the quotient we can now write the Rayleigh quotient as

\begin{equation}
\arg \max_{x} R(x) = \frac{x^T A x}{x^Tx} = \frac{ \sum_{i=1}^{d} \alpha_i^2 \lambda_i}{\sum_{i=1}^{d} \alpha_i^2}.
\end{equation}

We can see from the expression that the length of \(x\) does not matter. The maximization is in the direction of \(x\) and not in the length of \(x\), that is, if some \(x\) maximizes the quotient then so does any multiple of it, \(k \cdot x, \; k \neq 0\). This means that we can constrain \(x\) such that, \(\sum_{i=1}^{d} \alpha_i^2=1\), which means that we can reduce the problem to maximizing \(x^T A x\).

The same thing can be shown for the generalized Rayleigh quotient,

\begin{equation}
\arg \max_{x} R(x) = \frac{x^T A x}{x^T B x}
\end{equation}

by proceeding in the same manner.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>