Summary of this article. A power of a two-by-two matrix M is a linear combination of M and the identity, whose coefficients are Chebyshev functions.

Introduction #

In this post we will find a simple expression for integer powers of any two-by-two matrix M\mathbf{M}. It turns out that Mn\mathbf{M}^n is always just a linear combination of M\mathbf{M} and I\mathbf{I}, where the coefficients are a type of special function called Chebyshev polynomial.

This journey started for me with a formula given in Chapter 5 of Orazio Svelto's Principles of lasers [S10] (though you don't have to follow this motivation to understand the result). Optical fields moving through dielectric interfaces or optical components can be modelled by two-by-two matrices with unit determinant, for example scattering matrices, transfer matrices, and ABCD matrices. The nnth power of this matrix then describes the interaction of light with a repeating structure, such as a distributed Bragg reflector. Let M\mathbf{M} be a two-by-two matrix with determinant 1, whose elements are
M=(ABCD).\mathbf{M}=\left(\begin{array}{cc} A & B \\ C & D\end{array}\right).
We define an angle θ\theta as
cosθ=tr{M}2=A+D2,\cos\theta=\frac{\mathrm{tr}\{M\}}{2}=\frac{A+D}{2},
where θ\theta will be real only if 1(A+D)/21-1\le (A+D)/2\le 1. Then for integer nn we have
Mn=1sinθ(Asinnθsin(n1)θBsinnθCsinnθDsinnθsin(n1)θ).\mathbf{M}^n=\frac{1}{\sin\theta}\left(\begin{array}{cc}A\sin n\theta-\sin(n-1)\theta & B\sin n\theta \\ C\sin n\theta & D\sin n\theta-\sin (n-1)\theta\end{array}\right).
This formula is referred to as "Sylvester's theorem" in [S10], but I couldn't find any reference to it on the Wikipedia page of things named after Sylvester, so I don't know how common the name is.

A Google search lead me to this stackexchange answer by user Jean Marie giving a proof of the theorem, which was the inspiration for this post. He points out that the formula can be re-written as (remember this is only for unit determinant)
[eqSylvestersTheorem]: Mn=sinnθsinθMsin(n1)θsinθI,\mathbf{M}^n=\frac{\sin n\theta}{\sin\theta}\mathbf{M}-\frac{\sin (n-1)\theta}{\sin \theta}\mathbf{I},
where I\mathbf{I} is the identity matrix. In other words, powers Mn\mathbf{M}^n are always a linear combination of M\mathbf{M} and the identity.

In this article we will derive a generalised version of Sylvester's theorem, which applies to all two-by-two matrices (rather than just matrices with determinant 1). The key tool we'll use is Cayley's Theorem, as well as special functions called Chebyshev polynomials.

Background #

Cayley-Hamilton theorem #

The Cayley-Hamilton theorem states that every matrix satisfies its own characteristic equation. This is a powerful result with many applications, and we will see that it imposes strong constraints on the form of Mn\mathbf{M}^n. I will just state the theorem, as there are many proofs avaiable online.

The characteristic equation of a matrix M\mathbf{M} is the polynomial that determines its eigenvalues. A matrix has eigenvalue λ\lambda if
det(MλI)=0.\mathrm{det}(\mathbf{M}-\lambda\mathbf{I})=0.
We define the characteristic equation to be the function
p(λ)=det(MλI).p(\lambda)=\mathrm{det}(\mathbf{M}-\lambda\mathbf{I}).
For two-by-two matrices this is
p(λ)=det[(AλBCDλ)],=(Aλ)(Dλ)BC,=λ2λ(A+D)+(ADBC),=λ2λtr(M)+det(M).\begin{aligned} p(\lambda) &= \mathrm{det}\left[\left(\begin{array}{cc}A-\lambda & B \\ C & D-\lambda\end{array}\right)\right], \\ &=(A-\lambda)(D-\lambda)-BC, \\ &= \lambda^2-\lambda(A+D)+(AD-BC), \\ &=\lambda^2-\lambda\,\mathrm{tr}(\mathbf{M})+\mathrm{det}(\mathbf{M}). \end{aligned}
The Cayley-Hamilton theorem says that
p(M)=0,p(\mathbf{M})=0,
which gives us:
[eqCayleyHamilton]: M2Mtr(M)+det(M)I=0.\mathbf{M}^2-\mathbf{M}\,\mathrm{tr}(\mathbf{M})+\mathrm{det}(\mathbf{M})\mathbf{I}=0.
Note that each term has the same 'units', i.e. is proportional to the square of matrix elements. Don't let anyone tell you that physics has a monopoloy on unit checks!

There are many ways of proving this. A rough sketch of the argument I like is

  1. Convince yourself that [eqCayleyHamilton] is true for any diagonal matrix, and then any diagonalisable matrix.
  2. Diagonal matrices are dense in the set of matrices, that is to say any matrix can be approximated arbitrarily well with a diagonal matrix.
  3. The characteristic polynomial, being a polynomial, is continuous. Thus if dn\mathbf{d}_n are a sequence of diagonal matrices which converge to M\mathbf{M}, we have
    p(M)=p(limndn)=limnp(dn)=0.p(\mathbf{M})=p\left(\lim_{n\rightarrow\infty}\mathbf{d}_n\right)=\lim_{n\rightarrow\infty}p(\mathbf{d}_n)=0.

In two dimensions, Cayley-Hamilton lets us replace squares of a matrix M2\mathbf{M}^2 with a sum of linear terms M,I\mathbf{M},\mathbf{I}. In nn dimensions it lets us replace Mn\mathbf{M}^n with a sum of terms of order I,M,,Mn1\mathbf{I},\mathbf{M},\ldots,\mathbf{M}^{n-1}.

Chebyshev polynomials #

For the general version of Sylvester's theorem, the coefficients turn out to be a type of special function called Chebyshev polynomials, which we will introduce here.

Special functions often come across as intimidating, which I think is partly because people don't quite know how to approach them. I recommend that you look at a special function the same way you look at conventional 'special functions' like sine and cosine. You probably don't know how to find the numeric value of sin(1.4)\sin(1.4), and I doubt you lose any sleep over it. And while sine does have a polynomial representation as an infinite sum, that's something you only pull out in emergencies. To use trigonometric functions all you need is a rough feel for how they behave, and to know simple relations like their derivatives, and formulae such as sin2θ+cos2θ=1\sin^2\theta+\cos^2\theta=1. Aim for the same sort of understanding with special functions.

Like sine and cosine, special functions often come in pairs. For Chebyshev polynomials, these are straightforwardly named Chebyshev polynomials of the first kind and Chebyshev polynomials of the second kind. We are only interested in the second kind, which are typically denoted Un(x)U_n(x). Here nn is an integer, which is there because there are an infinite family of Chebyshev polynomials. Un(x)U_n(x) is an nn-th order polynomial.

The first few Chebyshev polynomials of the second kind are
U0(x)=1,U_0(x) = 1,
U1(x)=2x,U_1(x) = 2x,
U2(x)=4x21,U_2(x) = 4x^2-1,
but don't spend too much on the particular polynomial form. What will be useful to us is their recurrence relation
[eqChebyshevRecurrence]: Un+1(x)=2xUn(x)Un1(x).U_{n+1}(x)=2xU_n(x)-U_{n-1}(x).

An explicit representation for Chebyshev polynomials can be given using trigonometric functions. If we write
x=cosθ,x=\cos\theta,
then the Chebyshev polynomials are given by
[eqChebyshevTrigForm]: Un(cosθ)=sin(n+1)θsinθ.U_n(\cos\theta)=\frac{\sin(n+1)\theta}{\sin\theta}.
For n=1n=1 the numerator is sin(2θ)=2sinθcosθ\sin(2\theta)=2\sin\theta\,\cos\theta, and the right-hand side is indeed a function of cosθ\cos\theta. This is true for all nn. If you peek back at [eqSylvestersTheorem], you may have a hint at where this article is going.

Armed with Chebyshev polynomials, we can now state the generalised Sylvester's theorem. I found this in [BR20], though no proof was given. Suppose M\mathbf{M} is any two-by-two non-singular matrix, with determinant dd and trace tt. Then for nn an integer:
[eqGeneralSylvester]: Mn=d(n1)/2Un1(t2d)Mdn/2Un2(t2d)I,\mathbf{M}^n=d^{(n-1)/2}U_{n-1}\left(\frac{t}{2\sqrt{d}}\right)\mathbf{M}-d^{n/2}U_{n-2}\left(\frac{t}{2\sqrt{d}}\right)I,
where UnU_n is the nnth Chebyshev polynomial. We'll see later that for singular matrices there is an even simpler formula.

In the case where M\mathbf{M} has unit determiant d=1d=1, this reduces to
Mn=Un1(t2)MUn2(t2)I.\mathbf{M}^n=U_{n-1}\left(\frac{t}{2}\right)\mathbf{M}-U_{n-2}\left(\frac{t}{2}\right)\mathbf{I}.
You can verify yourself that if we use the explicit form [eqChebyshevTrigForm], this will reproduce [eqSylvestersTheorem].

Proving the generalised Sylvester's theorem #

Suppose M\mathbf{M} is a two-by-two matrix, with trace and derivative:
t=tr{M},  d=det(M).t = \mathrm{tr}\{\mathbf{M}\},\; d = \mathrm{det}(\mathbf{M}).
Let's make our lives simpler, and get rid of the determinant by defining
M~=1dM,\mathbf{\tilde{M}}=\frac{1}{\sqrt{d}}\mathbf{M},
assuming the determinant is nonzero. Then we will have
det(M~)=1,  tr{M~}=t~=td.\mathrm{det}(\tilde{\mathbf{M}}) = 1,\; \mathrm{tr}\{\tilde{\mathbf{M}}\} = \tilde{t}=\frac{t}{\sqrt{d}}.
With this, Cayley-Hamilton becomes
M~2=t~M~I.\mathbf{\tilde{M}}^2=\tilde{t}\mathbf{\tilde{M}}-\mathbf{I}.

We can now derive a formula for M~n\mathbf{\tilde{M}}^n, and then work out
Mn=(dM~)n=dn/2M~n.\mathbf{M}^n=\left(\sqrt{d}\tilde{\mathbf{M}}\right)^n=d^{n/2}\tilde{\mathbf{M}}^n.

Zero determinant #

Before we continue, let's quickly see what happens when M\mathbf{M} has zero determinant. Cayley-Hamilton gives us
M2=tM,\mathbf{M}^2=t\,\mathbf{M},
from which we find
M3=M(tM)=tM2=t2M.\mathbf{M}^3=\mathbf{M}\left(\,t\mathbf{M}\right)=t\mathbf{M}^2=t^2\mathbf{M}.
Continuing in this way we find
Mn=tn1M.\mathbf{M}^{n}=t^{n-1}\mathbf{M}.
Thus powers of two-by-two singular matrices take a very simple form!

Unit determinant #

Let's now try and find a formula for M~\mathbf{\tilde{M}}. The first power is trivial
M~1=M~,\mathbf{\tilde{M}}^1=\mathbf{\tilde{M}},
while Cayley-Hamilton gives us the second power.
M~2=t~M~I.\mathbf{\tilde{M}}^2=\tilde{t}\mathbf{\tilde{M}}-\mathbf{I}.

Now let's look at the third power. We begin by substituting the equation for the second power:
M~3=M~(M~2),=M~(tM~I),=t~M~2M~\begin{aligned}\mathbf{\tilde{M}}^3 &=\mathbf{\tilde{M}}\left(\mathbf{\tilde{M}}^2\right), \\ &= \mathbf{\tilde{M}}\left(t\mathbf{\tilde{M}}-\mathbf{I}\right), \\ &= \tilde{t}\mathbf{\tilde{M}}^2-\mathbf{\tilde{M}} \end{aligned}
Substituting again the equation for the second power gives
M~3=(t~21)M~t~I.\mathbf{\tilde{M}}^3=(\tilde{t}^2-1)\mathbf{\tilde{M}}-\tilde{t}\mathbf{I}.
Now we start to get an inkling of what's going on underneath Sylvester's theorem. Continuing in this way, we can see that Cayley-Hamilton guarantees that we will always have
M~n=an(t~)M+bn(t~)I,\mathbf{\tilde{M}}^n=a_n(\tilde{t})\mathbf{M}+b_n(\tilde{t})\mathbf{I},
for as-yet undetermined coefficients an(t~),bn(t~)a_n(\tilde{t}),b_n(\tilde{t}).

We want a formula that gives us an(t~),bn(t~)a_n(\tilde{t}),b_n(\tilde{t}). As a first step, let's find how these depend on an1(t~),bn1(t~)a_{n-1}(\tilde{t}),b_{n-1}(\tilde{t}). We have
M~n=M~(M~n1),=M~(an1(t~)M~+bn1(t~)I),=an1(t~)M~2+bn1(t~)M~,\begin{aligned}\mathbf{\tilde{M}}^n &= \mathbf{\tilde{M}}\left(\mathbf{\tilde{M}}^{n-1}\right), \\ &= \mathbf{\tilde{M}}\left(a_{n-1}(\tilde{t})\mathbf{\tilde{M}}+b_{n-1}(\tilde{t})\mathbf{I}\right), \\ &= a_{n-1}(\tilde{t})\mathbf{\tilde{M}}^2+b_{n-1}(\tilde{t})\mathbf{\tilde{M}}, \end{aligned}
and using Cayley-Hamilton gives
M~n=(bn1(t~)+t~an1(t~))M~an1(t~)I.\mathbf{\tilde{M}}^n=\left(b_{n-1}(\tilde{t})+\tilde{t}\,a_{n-1}(\tilde{t})\right)\mathbf{\tilde{M}}-a_{n-1}(\tilde{t})\mathbf{I}.
At the same time we know
M~n=an(t~)M~+bn(t~)I.\mathbf{\tilde{M}}^n=a_{n}(\tilde{t})\mathbf{\tilde{M}}+b_{n}(\tilde{t})\mathbf{I}.
Equating these gives
an(t~)=t~an1(t~)+bn1(t~).a_{n}(\tilde{t})=\tilde{t}a_{n-1}(\tilde{t})+b_{n-1}(\tilde{t}).
bn(t~)=an1(t~).b_{n}(\tilde{t})=-a_{n-1}(\tilde{t}).
We can substitute the second equation into the first to get rid of bnb_n.

From the above we have
M~n=an(t~)M~an1(t~)I,\mathbf{\tilde{M}}^n = a_n(\tilde{t})\mathbf{\tilde{M}}-a_{n-1}(\tilde{t})\mathbf{I},
where an(t~)a_n(\tilde{t}) satisfies the recurrence relation
an(t~)=t~an1(t~)an2(t~).a_n(\tilde{t})=\tilde{t}a_{n-1}(\tilde{t})-a_{n-2}(\tilde{t}).
To prove Sylvester's theorem, we need to solve the relation for the coefficients ana_n.

Solving recurrence relations is as broad a topic as solving differential equations. I'm not very familiar with this subject, if someone knows any nice solutions methods please let me know. One way is to see if the recurrence relation matches a well-known form. Let's compare with [eqChebyshevRecurrence], which we can write as
Un(x)=2xUn1(x)Un2(x).U_n(x)=2xU_{n-1}(x)-U_{n-2}(x).
This is very close, apart from the factor of two. We want 2x=t~2x=\tilde{t}, which can be solved by taking x=t~/2x=\tilde{t}/2. Thus the recurrence relation is solved by
an(t~)=Un+n0(t~2),a_n(\tilde{t})=U_{n+n_0}\left(\frac{\tilde{t}}{2}\right),
where n0n_0 is any integer. To find the value of n0n_0, we have to use initial conditions. We know
M~2=t~M~I=a2(t~)Ma1(t~)I,\mathbf{\tilde{M}}^2=\tilde{t}\mathbf{\tilde{M}}-\mathbf{I}=a_2(\tilde{t})\mathbf{M}-a_{1}(\tilde{t})\mathbf{I},
which gives
a2(t~)=t~,a1(t~)=1.a_2(\tilde{t})=\tilde{t},\,a_1(\tilde{t})=1.
This is satisfied by taking n0=1n_0=-1.

This is our solution! You can verify that this re-produces [eqGeneralSylvester]

Alternatively we could use a computational package. Mathematica's RSolve[] is able to find the solution
an(t~)=(t~+t~24)n(t~t~24)n2nt~24.a_n(\tilde{t}) =\frac{\left(\tilde{t}+\sqrt{\tilde{t}^2-4}\right)^n-\left(\tilde{t}-\sqrt{\tilde{t}^2-4}\right)^n}{2^n\sqrt{\tilde{t}^2-4}}.
This is in fact another explicit representation of the Chebyshev polynomials.

For an alternate proof by induction in the d=1d=1 case, see Jean Marie's Stackexchange answer.

Discussion #

We've found simple formulae for the powers of two-by-two matrices. Let tr(M)=t\mathrm{tr}(\mathbf{M})=t and det(M)=d\mathrm{det}(\mathbf{M})=d. If d0d\ne 0 we have
Mn=dn/2[Un1(t2d)1dMUn2(t2d)I],\mathbf{M}^n=d^{n/2}\left[U_{n-1}\left(\frac{t}{2\sqrt{d}}\right)\frac{1}{\sqrt{d}}\mathbf{M}-U_{n-2}\left(\frac{t}{2\sqrt{d}}\right)\mathbf{I}\right],
where UnU_n is a generalised Chebyshev polynomial of the second kind. If the determinant of M\mathbf{M} is zero then the formula is even easier:
Mn=tn1M.\mathbf{M}^n=t^{n-1}\mathbf{M}.
This seems incredible to me! Since the matrix elements can be complex, a two-by-two matrix has eight degrees of freedom. You would think that very little could be said about the powers of a matrix in general, without knowing more about the elements. However, we have these simple formulae that Mn\mathbf{M}^n is always a linear combination of M\mathbf{M} and I\mathbf{I}, with the coefficients depending only on the trace and determinant. Fundamentally this follows from the Cayley-Hamilton theorem, which places strong restrictions on how complicated powers of matrices can be.

I mentioned that there are many proofs of Cayley-Hamilton, from both algebraic and analytic viewpoints. I recommend you pick your favourite proof, find the 'essential idea' of it, and then try and intuitively see how that idea constrains powers of matrices. This sort of thing can help you appreciate the amazing connections that lie between the many different areas of mathematics.

I think this gives a neat way to analytically solve recurrence relations, by comparing them with the recurrence relations of special functions. Hopefully this will come in useful in the future.

Here are some miscellaneous things I'm curious about, if you know anything about them please let me know:

  • It would be interesting to look at generalisations beyond two-dimensions, and to non-integer powers.
  • Can we better understand why Chebyshev polynomials, and not some other function, show up in this problem? The direct reason is the recurrence relation. But can we get a more intuitive picture?

If you have any comments or questions feel free to drop me a line at me@ruvi.blog, and you can follow me @ruvi_l.

Also, the correct pronunciation is apparently "Chebyshov".

References #

[S10] Svelto, Orazio (2010) Principles of lasers (5th edition) Springer.
[BR20] Brandi, P., & Ricci, P. E. (2020) Composition identities of Chebyshev polynomials, via 2× 2 matrix powers Symmetry, 12(5), 746.