Why irreducible representations are orthogonal
Group representation theory is broadly used across the physical sciences, e.g., the classification of elementary particles in quantum field theory, the prediction of spectroscopic lines in chemistry, and the analysis of semiconductor band gap structures in solid state physics.
A central theorem in group representation theory is the orthogonality of matrix elements of irreducible representations (irreps)1:
\[\sum_{g\in G\;\;\;} D^{\mu}(g)^{\dag}_{ki} D^{\nu}(g)_{jl} = C \delta_{\mu \nu} \delta_{ij} \delta_{kl},\]where \(G\) is a group, \(\mu\) and \(\nu\) are labels for irreps, the \(\dag\) symbol denotes taking the adjoint, \(D^{\mu}(g)\) and \(D^{\nu}(g)\) are matrices for irreps, and \(C\) is a positive, nonzero constant that depends on the order of \(G\) and the dimension of the irrep.
The following is an intuitive explanation of this theorem, focusing on just the essence to make it easier to keep in your head all at once. In particular, we only focus on the case where \(\mu\) and \(\nu\) are two different, inequivalent irreps, which means that the sum above is zero.
Why?▶
If \(\mu\neq\nu\), then \(\delta_{\mu \nu}\) evaluates to zero.
As always, the goal is to keep things sufficiently simple so that we are able to recall the essence of the proof while taking a walk.
Isolating the matrix elements
The left side of the expression for the theorem above is very suggestive, as it’s a sum of terms over all group elements. And each term is a product of something related to the inverse of a \(D^\mu\) irrep with something related to a \(D^\nu\) irrep. As the theorem is currently stated, each of the \(D\)’s is a matrix element from the irreps. However, if each of the \(D\)’s were a group element, then multiplication on the left with an arbitrarily chosen group element together with multiplication on the right with the inverse of that element would be equivalent to a permutation of the terms in the sum, which leaves the sum invariant. And that would enable us to use Schur’s Lemmas.
So the first task is to convert the \(D\)’s from matrix elements to full irrep matrices.
Consider \(M = A X B\), where \(M\) is a \(m\)x\(n\) matrix, \(A\) is a \(m\)x\(m\) matrix, \(X\) is a \(m\)x\(n\) matrix, and \(B\) is a \(n\)x\(n\) matrix. This can also be written as \(m_{kl} = a_{ki}x_{ij}b_{jl}\).
If we choose the matrix \(X\) such that \(x_{i_0 j_0}=1\) and zero otherwise, then we have \(m_{kl} = a_{ki_0}b_{j_0 l}\). So we see that \(m_{kl}\) is the product of an element of the \(A\) matrix with an element of the \(B\) matrix. And so the matrix \(M = A X B\) with this type of \(X\) gives us the product of arbitrary elements of \(A\) and \(B\). For example, if we want the product \(a_{52}b_{79}\), we simply choose \(X\) to have a \(1\) in the \((2,7)\)-th element and zero otherwise, and then \(m_{59}\) will be the desired product \(a_{52}b_{79}\).
And if there is a sum \(\sum_{g\in G} A(g)XB(g)\), then the \((k,l)\)-th element of this sum would be the sum of the \((k,l)\)-th elements of \(A(g)XB(g)\).
Now we see that the left side of the theorem is actually a version of “the sum of the \((k,l)\)-th elements of \(A(g)XB(g)\)”. And therefore it is equal to the \((k,l)\)-th element of \(\sum_{g\in G} A(g)XB(g)\)).
Why?▶
Set \(D^{\mu\dag} = A\), \(D^\nu=B\), and \(x_{ij} = \delta_{i, i_0}\delta_{j, j_0}\); then \(D^{\mu\dag}_{ki_0}D^\nu_{j_0l}=(AXB)_{kl}.\)
So if we let \(S(i_0,j_0) := \sum_{g\in G} D^{\mu\dag}(g) X D^{\nu}(g)\), with \(X\) set to 1 in the \((i_0,j_0)\)-th element and zero otherwise, then we just need to show that \(S(i_0,j_0)\) is zero for all choices of \((i_0,j_0)\).
Back to Schur’s
So now we see that we have exactly what we were looking for, which is a sum over group elements where each term is a matrix that has a group element on the left and its inverse group element on the right. If the \(\mu\) irrep has dimension \(m\) and the \(\nu\) irrep has dimension \(n\), then \(S\), a linear transformation from a \(n\)-dimensional space to a \(m\)-dimensional space, is also a FG-homomorphism from the \(\nu\) module to the \(\mu\) module, because:
\[D^\mu(h) S = S D^\nu(h), \forall h \in G\]Why?▶
This is the definition of a FG-homomorphism from one module to another module. It means that we are insensitive to the order of applying a group transformation \(h\) and applying the mapping \(S\).
So by Schur’s Lemmas, either \(S=0\), or \(S\) is an invertible square matrix. But if \(S\) is invertible, then
\[D^\mu(h) = S D^\nu(h) S^{-1}, \forall h \in G,\]which would mean that \(\mu\) and \(\nu\) are equivalent irreps, which is a contradiction. So we must have \(S=0\), which proves our theorem for the case of inequivalent \(\mu\) and \(\nu\).
-
This writeup follows along the lines of the proof from Section 3.5 of Group Theory for Physics by Wu-Ki Tung. Simplified to only emphasize the orthogonality between inequivalent irreps. ↩