A simple formula for powers of two-by-two matrices

19 Apr 2024matrices special functions optics

Summary of this article. A power of a two-by-two matrix M is a linear combination of M and the identity, whose coefficients are Chebyshev functions.

Introduction #

In this post we will find a simple expression for integer powers of any two-by-two matrix $\mathbf{M}$ . It turns out that $\mathbf{M}^n$ is always just a linear combination of $\mathbf{M}$ and $\mathbf{I}$ , where the coefficients are a type of special function called Chebyshev polynomial.

This journey started for me with a formula given in Chapter 5 of Orazio Svelto's Principles of lasers [S10] (though you don't have to follow this motivation to understand the result). Optical fields moving through dielectric interfaces or optical components can be modelled by two-by-two matrices with unit determinant, for example scattering matrices, transfer matrices, and ABCD matrices. The $n$ th power of this matrix then describes the interaction of light with a repeating structure, such as a distributed Bragg reflector. Let $\mathbf{M}$ be a two-by-two matrix with determinant 1, whose elements are
$\mathbf{M}=\left(\begin{array}{cc} A & B \\ C & D\end{array}\right).$
We define an angle $\theta$ as
$\cos\theta=\frac{\mathrm{tr}\{M\}}{2}=\frac{A+D}{2},$
where $\theta$ will be real only if $-1\le (A+D)/2\le 1$ . Then for integer $n$ we have
$\mathbf{M}^n=\frac{1}{\sin\theta}\left(\begin{array}{cc}A\sin n\theta-\sin(n-1)\theta & B\sin n\theta \\ C\sin n\theta & D\sin n\theta-\sin (n-1)\theta\end{array}\right).$
This formula is referred to as "Sylvester's theorem" in [S10], but I couldn't find any reference to it on the Wikipedia page of things named after Sylvester, so I don't know how common the name is.

A Google search lead me to this stackexchange answer by user Jean Marie giving a proof of the theorem, which was the inspiration for this post. He points out that the formula can be re-written as (remember this is only for unit determinant)
[eqSylvestersTheorem]: $\mathbf{M}^n=\frac{\sin n\theta}{\sin\theta}\mathbf{M}-\frac{\sin (n-1)\theta}{\sin \theta}\mathbf{I},$
where $\mathbf{I}$ is the identity matrix. In other words, powers $\mathbf{M}^n$ are always a linear combination of $\mathbf{M}$ and the identity.

In this article we will derive a generalised version of Sylvester's theorem, which applies to all two-by-two matrices (rather than just matrices with determinant 1). The key tool we'll use is Cayley's Theorem, as well as special functions called Chebyshev polynomials.

Background #

Cayley-Hamilton theorem #

The Cayley-Hamilton theorem states that every matrix satisfies its own characteristic equation. This is a powerful result with many applications, and we will see that it imposes strong constraints on the form of $\mathbf{M}^n$ . I will just state the theorem, as there are many proofs avaiable online.

The characteristic equation of a matrix $\mathbf{M}$ is the polynomial that determines its eigenvalues. A matrix has eigenvalue $\lambda$ if
$\mathrm{det}(\mathbf{M}-\lambda\mathbf{I})=0.$
We define the characteristic equation to be the function
$p(\lambda)=\mathrm{det}(\mathbf{M}-\lambda\mathbf{I}).$
For two-by-two matrices this is
$\begin{aligned} p(\lambda) &= \mathrm{det}\left[\left(\begin{array}{cc}A-\lambda & B \\ C & D-\lambda\end{array}\right)\right], \\ &=(A-\lambda)(D-\lambda)-BC, \\ &= \lambda^2-\lambda(A+D)+(AD-BC), \\ &=\lambda^2-\lambda\,\mathrm{tr}(\mathbf{M})+\mathrm{det}(\mathbf{M}). \end{aligned}$
The Cayley-Hamilton theorem says that
$p(\mathbf{M})=0,$
which gives us:
[eqCayleyHamilton]: $\mathbf{M}^2-\mathbf{M}\,\mathrm{tr}(\mathbf{M})+\mathrm{det}(\mathbf{M})\mathbf{I}=0.$
Note that each term has the same 'units', i.e. is proportional to the square of matrix elements. Don't let anyone tell you that physics has a monopoloy on unit checks!

There are many ways of proving this. A rough sketch of the argument I like is

Convince yourself that [eqCayleyHamilton] is true for any diagonal matrix, and then any diagonalisable matrix.
Diagonal matrices are dense in the set of matrices, that is to say any matrix can be approximated arbitrarily well with a diagonal matrix.
The characteristic polynomial, being a polynomial, is continuous. Thus if $\mathbf{d}_n$ are a sequence of diagonal matrices which converge to $\mathbf{M}$ , we have
$p(\mathbf{M})=p\left(\lim_{n\rightarrow\infty}\mathbf{d}_n\right)=\lim_{n\rightarrow\infty}p(\mathbf{d}_n)=0.$

In two dimensions, Cayley-Hamilton lets us replace squares of a matrix $\mathbf{M}^2$ with a sum of linear terms $\mathbf{M},\mathbf{I}$ . In $n$ dimensions it lets us replace $\mathbf{M}^n$ with a sum of terms of order $\mathbf{I},\mathbf{M},\ldots,\mathbf{M}^{n-1}$ .

Chebyshev polynomials #

For the general version of Sylvester's theorem, the coefficients turn out to be a type of special function called Chebyshev polynomials, which we will introduce here.

Special functions often come across as intimidating, which I think is partly because people don't quite know how to approach them. I recommend that you look at a special function the same way you look at conventional 'special functions' like sine and cosine. You probably don't know how to find the numeric value of $\sin(1.4)$ , and I doubt you lose any sleep over it. And while sine does have a polynomial representation as an infinite sum, that's something you only pull out in emergencies. To use trigonometric functions all you need is a rough feel for how they behave, and to know simple relations like their derivatives, and formulae such as $\sin^2\theta+\cos^2\theta=1$ . Aim for the same sort of understanding with special functions.

Like sine and cosine, special functions often come in pairs. For Chebyshev polynomials, these are straightforwardly named Chebyshev polynomials of the first kind and Chebyshev polynomials of the second kind. We are only interested in the second kind, which are typically denoted $U_n(x)$ . Here $n$ is an integer, which is there because there are an infinite family of Chebyshev polynomials. $U_n(x)$ is an $n$ -th order polynomial.

The first few Chebyshev polynomials of the second kind are
$U_0(x) = 1,$
$U_1(x) = 2x,$
$U_2(x) = 4x^2-1,$
but don't spend too much on the particular polynomial form. What will be useful to us is their recurrence relation
[eqChebyshevRecurrence]: $U_{n+1}(x)=2xU_n(x)-U_{n-1}(x).$

An explicit representation for Chebyshev polynomials can be given using trigonometric functions. If we write
$x=\cos\theta,$
then the Chebyshev polynomials are given by
[eqChebyshevTrigForm]: $U_n(\cos\theta)=\frac{\sin(n+1)\theta}{\sin\theta}.$
For $n=1$ the numerator is $\sin(2\theta)=2\sin\theta\,\cos\theta$ , and the right-hand side is indeed a function of $\cos\theta$ . This is true for all $n$ . If you peek back at [eqSylvestersTheorem], you may have a hint at where this article is going.

Armed with Chebyshev polynomials, we can now state the generalised Sylvester's theorem. I found this in [BR20], though no proof was given. Suppose $\mathbf{M}$ is any two-by-two non-singular matrix, with determinant $d$ and trace $t$ . Then for $n$ an integer:
[eqGeneralSylvester]: $\mathbf{M}^n=d^{(n-1)/2}U_{n-1}\left(\frac{t}{2\sqrt{d}}\right)\mathbf{M}-d^{n/2}U_{n-2}\left(\frac{t}{2\sqrt{d}}\right)I,$
where $U_n$ is the $n$ th Chebyshev polynomial. We'll see later that for singular matrices there is an even simpler formula.

In the case where $\mathbf{M}$ has unit determiant $d=1$ , this reduces to
$\mathbf{M}^n=U_{n-1}\left(\frac{t}{2}\right)\mathbf{M}-U_{n-2}\left(\frac{t}{2}\right)\mathbf{I}.$
You can verify yourself that if we use the explicit form [eqChebyshevTrigForm], this will reproduce [eqSylvestersTheorem].

Proving the generalised Sylvester's theorem #

Suppose $\mathbf{M}$ is a two-by-two matrix, with trace and derivative:
$t = \mathrm{tr}\{\mathbf{M}\},\; d = \mathrm{det}(\mathbf{M}).$
Let's make our lives simpler, and get rid of the determinant by defining
$\mathbf{\tilde{M}}=\frac{1}{\sqrt{d}}\mathbf{M},$
assuming the determinant is nonzero. Then we will have
$\mathrm{det}(\tilde{\mathbf{M}}) = 1,\; \mathrm{tr}\{\tilde{\mathbf{M}}\} = \tilde{t}=\frac{t}{\sqrt{d}}.$
With this, Cayley-Hamilton becomes
$\mathbf{\tilde{M}}^2=\tilde{t}\mathbf{\tilde{M}}-\mathbf{I}.$

We can now derive a formula for $\mathbf{\tilde{M}}^n$ , and then work out
$\mathbf{M}^n=\left(\sqrt{d}\tilde{\mathbf{M}}\right)^n=d^{n/2}\tilde{\mathbf{M}}^n.$

Zero determinant #

Before we continue, let's quickly see what happens when $\mathbf{M}$ has zero determinant. Cayley-Hamilton gives us
$\mathbf{M}^2=t\,\mathbf{M},$
from which we find
$\mathbf{M}^3=\mathbf{M}\left(\,t\mathbf{M}\right)=t\mathbf{M}^2=t^2\mathbf{M}.$
Continuing in this way we find
$\mathbf{M}^{n}=t^{n-1}\mathbf{M}.$
Thus powers of two-by-two singular matrices take a very simple form!

Unit determinant #

Let's now try and find a formula for $\mathbf{\tilde{M}}$ . The first power is trivial
$\mathbf{\tilde{M}}^1=\mathbf{\tilde{M}},$
while Cayley-Hamilton gives us the second power.
$\mathbf{\tilde{M}}^2=\tilde{t}\mathbf{\tilde{M}}-\mathbf{I}.$

Now let's look at the third power. We begin by substituting the equation for the second power:
$\begin{aligned}\mathbf{\tilde{M}}^3 &=\mathbf{\tilde{M}}\left(\mathbf{\tilde{M}}^2\right), \\ &= \mathbf{\tilde{M}}\left(t\mathbf{\tilde{M}}-\mathbf{I}\right), \\ &= \tilde{t}\mathbf{\tilde{M}}^2-\mathbf{\tilde{M}} \end{aligned}$
Substituting again the equation for the second power gives
$\mathbf{\tilde{M}}^3=(\tilde{t}^2-1)\mathbf{\tilde{M}}-\tilde{t}\mathbf{I}.$
Now we start to get an inkling of what's going on underneath Sylvester's theorem. Continuing in this way, we can see that Cayley-Hamilton guarantees that we will always have
$\mathbf{\tilde{M}}^n=a_n(\tilde{t})\mathbf{M}+b_n(\tilde{t})\mathbf{I},$
for as-yet undetermined coefficients $a_n(\tilde{t}),b_n(\tilde{t})$ .

We want a formula that gives us $a_n(\tilde{t}),b_n(\tilde{t})$ . As a first step, let's find how these depend on $a_{n-1}(\tilde{t}),b_{n-1}(\tilde{t})$ . We have
$\begin{aligned}\mathbf{\tilde{M}}^n &= \mathbf{\tilde{M}}\left(\mathbf{\tilde{M}}^{n-1}\right), \\ &= \mathbf{\tilde{M}}\left(a_{n-1}(\tilde{t})\mathbf{\tilde{M}}+b_{n-1}(\tilde{t})\mathbf{I}\right), \\ &= a_{n-1}(\tilde{t})\mathbf{\tilde{M}}^2+b_{n-1}(\tilde{t})\mathbf{\tilde{M}}, \end{aligned}$
and using Cayley-Hamilton gives
$\mathbf{\tilde{M}}^n=\left(b_{n-1}(\tilde{t})+\tilde{t}\,a_{n-1}(\tilde{t})\right)\mathbf{\tilde{M}}-a_{n-1}(\tilde{t})\mathbf{I}.$
At the same time we know
$\mathbf{\tilde{M}}^n=a_{n}(\tilde{t})\mathbf{\tilde{M}}+b_{n}(\tilde{t})\mathbf{I}.$
Equating these gives
$a_{n}(\tilde{t})=\tilde{t}a_{n-1}(\tilde{t})+b_{n-1}(\tilde{t}).$
$b_{n}(\tilde{t})=-a_{n-1}(\tilde{t}).$
We can substitute the second equation into the first to get rid of $b_n$ .

From the above we have
$\mathbf{\tilde{M}}^n = a_n(\tilde{t})\mathbf{\tilde{M}}-a_{n-1}(\tilde{t})\mathbf{I},$
where $a_n(\tilde{t})$ satisfies the recurrence relation
$a_n(\tilde{t})=\tilde{t}a_{n-1}(\tilde{t})-a_{n-2}(\tilde{t}).$
To prove Sylvester's theorem, we need to solve the relation for the coefficients $a_n$ .

Solving recurrence relations is as broad a topic as solving differential equations. I'm not very familiar with this subject, if someone knows any nice solutions methods please let me know. One way is to see if the recurrence relation matches a well-known form. Let's compare with [eqChebyshevRecurrence], which we can write as
$U_n(x)=2xU_{n-1}(x)-U_{n-2}(x).$
This is very close, apart from the factor of two. We want $2x=\tilde{t}$ , which can be solved by taking $x=\tilde{t}/2$ . Thus the recurrence relation is solved by
$a_n(\tilde{t})=U_{n+n_0}\left(\frac{\tilde{t}}{2}\right),$
where $n_0$ is any integer. To find the value of $n_0$ , we have to use initial conditions. We know
$\mathbf{\tilde{M}}^2=\tilde{t}\mathbf{\tilde{M}}-\mathbf{I}=a_2(\tilde{t})\mathbf{M}-a_{1}(\tilde{t})\mathbf{I},$
which gives
$a_2(\tilde{t})=\tilde{t},\,a_1(\tilde{t})=1.$
This is satisfied by taking $n_0=-1$ .

This is our solution! You can verify that this re-produces [eqGeneralSylvester]

Alternatively we could use a computational package. Mathematica's RSolve[] is able to find the solution
$a_n(\tilde{t}) =\frac{\left(\tilde{t}+\sqrt{\tilde{t}^2-4}\right)^n-\left(\tilde{t}-\sqrt{\tilde{t}^2-4}\right)^n}{2^n\sqrt{\tilde{t}^2-4}}.$
This is in fact another explicit representation of the Chebyshev polynomials.

For an alternate proof by induction in the $d=1$ case, see Jean Marie's Stackexchange answer.

Discussion #

We've found simple formulae for the powers of two-by-two matrices. Let $\mathrm{tr}(\mathbf{M})=t$ and $\mathrm{det}(\mathbf{M})=d$ . If $d\ne 0$ we have
$\mathbf{M}^n=d^{n/2}\left[U_{n-1}\left(\frac{t}{2\sqrt{d}}\right)\frac{1}{\sqrt{d}}\mathbf{M}-U_{n-2}\left(\frac{t}{2\sqrt{d}}\right)\mathbf{I}\right],$
where $U_n$ is a generalised Chebyshev polynomial of the second kind. If the determinant of $\mathbf{M}$ is zero then the formula is even easier:
$\mathbf{M}^n=t^{n-1}\mathbf{M}.$
This seems incredible to me! Since the matrix elements can be complex, a two-by-two matrix has eight degrees of freedom. You would think that very little could be said about the powers of a matrix in general, without knowing more about the elements. However, we have these simple formulae that $\mathbf{M}^n$ is always a linear combination of $\mathbf{M}$ and $\mathbf{I}$ , with the coefficients depending only on the trace and determinant. Fundamentally this follows from the Cayley-Hamilton theorem, which places strong restrictions on how complicated powers of matrices can be.

I mentioned that there are many proofs of Cayley-Hamilton, from both algebraic and analytic viewpoints. I recommend you pick your favourite proof, find the 'essential idea' of it, and then try and intuitively see how that idea constrains powers of matrices. This sort of thing can help you appreciate the amazing connections that lie between the many different areas of mathematics.

I think this gives a neat way to analytically solve recurrence relations, by comparing them with the recurrence relations of special functions. Hopefully this will come in useful in the future.

Here are some miscellaneous things I'm curious about, if you know anything about them please let me know:

It would be interesting to look at generalisations beyond two-dimensions, and to non-integer powers.
Can we better understand why Chebyshev polynomials, and not some other function, show up in this problem? The direct reason is the recurrence relation. But can we get a more intuitive picture?

If you have any comments or questions feel free to drop me a line at me@ruvi.blog, and you can follow me @ruvi_l.

Also, the correct pronunciation is apparently "Chebyshov".

References #

[S10] Svelto, Orazio (2010) Principles of lasers (5th edition) Springer.
[BR20] Brandi, P., & Ricci, P. E. (2020) Composition identities of Chebyshev polynomials, via 2× 2 matrix powers Symmetry, 12(5), 746.

Next: Euler-Lagrange equations for a rigid body with quaternions
Previous: Race restarter math problem