Mathematics, Tricks

Feynman’s Vector Calculus Trick

1. Introduction

Many people are familiar with the so-called `Feynman’s trick’ of differentiating under the integral. Buried in chapter 27-3 of the Feynman Lectures on Electromagnetism [1] though there lies another trick, one which can simplify problems in vector calculus by letting you treat the derivative operator {\nabla} as any other vector, without having to worry about commutativity . I don’t know if Feynman invented this himself, but I have never stumbled across it anywhere else.

Note: u/bolbteppa on Reddit has pointed out that this idea can be found in the very first book on vector calculus, written based on lectures given by Josiah Willard Gibbs.

What this trick will allow you to do is to treat the {\nabla} operator as if it were any other vector. This means that if you know a vector identity, you can immediately derive the corresponding vector calculus identity. Furthermore even if you do not have (or don’t want to look up) the identity, you can apply the usual rules of vectors assuming that everything is commutative, which is a nice simplification.

The trick appears during the derivation of the Poynting vector. We wish to simplify

\displaystyle \nabla\cdot(B\times E), \ \ \ \ \ (1)

where {B} and {E} are the magnetic and electric field respectively, though for our purposes they can just be any vector fields.

2. The trick

The problem we want to solve is that we cannot apply the usual rules of vectors to the derivative operator. For example, we have

\displaystyle A\times B=-B\times A,\;\;A\cdot B=B\cdot A \ \ \ \ \ (2)

but it is certainly not true that

\displaystyle \nabla\times A=-A\times\nabla,\;\;\nabla\cdot A=A\cdot\nabla. \ \ \ \ \ (3)

This means that when you want to break up an expression like {\nabla\cdot(B\times E)}, you can’t immediately reach for a vector identity {A\cdot(B\times C)=B\cdot(C\times A)} and expect the result to hold. Even if you aren’t using a table of identities, it would certainly make your life easier if you could find a way to treat {\nabla} like any other vector and bash out algebra like (3).

Let’s first restrict ourselves to two scalar functions {f} and {g}, we introduce the notation

\displaystyle \frac{\partial}{\partial x_f} \ \ \ \ \ (4)

to mean a derivative operator which only acts on {f}, not {g}. Moreover, it doesn’t matter where in the expression the derivative is, it is always interpreted as acting on {f}. In our notation the following are all equivalent:

\displaystyle \frac{\partial f}{\partial x}g=\frac{\partial}{\partial x_f}fg=f\frac{\partial}{\partial x_f}g=fg\frac{\partial}{\partial x_f}. \ \ \ \ \ (5)

Why did we do this? Well now the derivative {\frac{\partial}{\partial x_f}} behaves just like any other number! We can write our terms in any order we want, and still know what we mean.

Now let’s suppose we want to differentiate a product of terms:

\displaystyle \frac{\partial}{\partial x}(fg)=\frac{\partial f}{\partial x}g+f\frac{\partial g}{\partial x}. \ \ \ \ \ (6)

We can see that whenever we have such a product, we can write:

\displaystyle \begin{aligned} \frac{\partial}{\partial x}(fg) &= \left(\frac{\partial}{\partial x_f}+\frac{\partial}{\partial x_g}\right)fg, \\ &= \frac{\partial}{\partial x_f}fg+\frac{\partial}{\partial x_g}fg. \end{aligned} \ \ \ \ \ (7)

We want to generalise this to thinks like {\nabla\cdot(A\times B)}. Remembering that the derivative operator is interpreted as {\nabla=\left(\frac{\partial}{\partial x},\frac{\partial}{\partial y},\frac{\partial}{\partial z}\right)}, we define

\displaystyle \nabla_A=\left(\frac{\partial}{\partial x_A},\frac{\partial}{\partial y_A},\frac{\partial}{\partial z_A}\right). \ \ \ \ \ (8)

Here {\frac{\partial}{\partial x_A}} is interpreted as acting on any of the components {A_x}, {A_y}, {A_z} of {A}.

With this notation, keeping in mind the commutativity (5) of the derivative operator, we can see that

\displaystyle \nabla_A\cdot A=A\cdot\nabla_A, \ \ \ \ \ (9)

\displaystyle \nabla_A\times A=-A\times\nabla_A. \ \ \ \ \ (10)

Work out the components and see for yourself!

In the next section we will apply this trick to derive some common vector calculus identities. The idea is to take an expression such as {\nabla\cdot(E\times B)}, write it as {(\nabla_E+\nabla_B)\cdot(E\times B)}, and then expand this using our normal vector rules until we end up with {\nabla_E} acting only on {E} and {\nabla_B} on {B}, in which case we can replace them with the original {\nabla}.

3. Some examples

Here we will see how various vector identities can be generalised to include {\nabla} using the ideas from the previous section. All the identities I am using come from the Wikipedia page [2].

You may want to try and do each of these yourself before reading the solution. Have a look at the title of the section, check the Wikipedia page [2] for the corresponding vector identity, and have a play. If you get stuck read just enough of the solution until you find out what concept you were missing, and then go back to it. As they say, mathematics is not a spectator sport!.

3.1. {\nabla\cdot(A\times B)}

The corresponding vector identity is

\displaystyle A\cdot (B\times C)=B\cdot(C\times A)=C\cdot(A\times B). \ \ \ \ \ (11)

We can look at this as saying that the product {A\cdot(B\times C)} is invariant under cyclic permutations, i.e. if you shift {A\rightarrow B\rightarrow C\rightarrow A}. If we look at {A\cdot(B\times C)} as something with three slots: {\_\cdot(\_\times\_)}, this is saying that you can move everything one slot to the right (and the rightmost one `cycles’ to the left), or you can move everything one slot to the left (and the leftmost one `cycles’ to the right). This pattern comes up all the time in mathematics and physics, so it’s good to keep it in mind.

Let’s experiment and see where we go. Since every term will be a product of terms from {A} and terms from {B}, we may expand

\displaystyle \nabla\cdot(A\times B) = \nabla_A\cdot(A\times B)+\nabla_B\cdot(A\times B). \ \ \ \ \ (12)

We want to change this so that {\nabla_A} is acting on {A} and {\nabla_B} on {B}, then we can replace them with the original {\nabla}. So let’s cyclically permute the first term to the right, and the second to the left:

\displaystyle =B\cdot(\nabla_A\times A)+A\cdot(B\times\nabla_B). \ \ \ \ \ (13)

Finally, we use {A\times B=-B\times A} to re-write the last term:

\displaystyle \begin{aligned} &= B\cdot(\nabla_A\times A)-A\cdot(\nabla_B\times B), \\ &= B\cdot(\nabla\times A)-A\cdot(\nabla\times B). \end{aligned} \ \ \ \ \ (14)

We have thus derived

\displaystyle \nabla\cdot(A\times B)=B\cdot(\nabla\times A)-A\cdot(\nabla\times B). \ \ \ \ \ (15)

Better yet, now we have an idea of where that strange minus sign came from. The first two terms have the same cyclic order in their slots {\nabla\rightarrow A\rightarrow B\rightarrow\nabla}, and breaking this in the third term comes at the expense of a minus sign.

3.2. {\nabla\times(A\times B)}

The corresponding vector identity is

\displaystyle A\times(B\times C)=(A\cdot C)B-(A\cdot B)C. \ \ \ \ \ (16)

We thus have

\displaystyle (\nabla_A+\nabla_B)\times(A\times B)=\nabla_A\times (A\times B)+\nabla_B\times(A\times B). \ \ \ \ \ (17)

Let’s look at the first term, the second will be analogous.

\displaystyle \nabla_A\times(A\times B) = (\nabla_A\cdot B)A-(\nabla_A\cdot A)B. \ \ \ \ \ (18)

Note that the product {\nabla_A\cdot B} is not zero, as {\nabla_A} is a derivative operator which still acts on {A} anywhere in the equation (see (5)). We rearrange the above using the commutativity of the dot product to write

\displaystyle \begin{aligned} \nabla_A\times(A\times B) &= (B\cdot\nabla_A)A-(\nabla_A\cdot A)B, \\ &= (B\cdot\nabla)A-(\nabla\cdot A)B. \end{aligned} \ \ \ \ \ (19)

Swapping {A\leftrightarrow B} we obtain

\displaystyle \nabla_B\times(B\times A) = (A\cdot\nabla)B-(\nabla\cdot B)A, \ \ \ \ \ (20)


\displaystyle \nabla_B\times(A\times B) = -(A\cdot\nabla)B+(\nabla\cdot B)A. \ \ \ \ \ (21)

Putting the two together finally gives

\displaystyle \nabla\times(A\times B)=(B\cdot\nabla)A-(A\cdot\nabla)B+(\nabla\cdot B)A-(\nabla\cdot A)B. \ \ \ \ \ (22)

3.3. {\nabla\cdot(\psi A)}

Here {\psi} is just an ordinary scalar function, and {A} a vector. The difference makes this one a little bit tricky, but on the plus side we won’t have to look up any identities. Let’s begin by expanding as usual (since everything will be a product of {\psi} and terms from {A}):

\displaystyle \begin{aligned} \nabla\cdot(\psi A) &= \nabla_{\psi}\cdot(\psi A)+\nabla_A\cdot(\psi A). \end{aligned} \ \ \ \ \ (23)

For the second term we can pull the scalar {\psi} through {\nabla_A} to get {\psi(\nabla_A\cdot A)}. Let’s have a think about what we mean by the first term. The derivative operator is a vector

\displaystyle \nabla_{\psi}=\left(\frac{\partial}{\partial x_{\psi}},\frac{\partial}{\partial y_{\psi}},\frac{\partial}{\partial z_{\psi}}\right), \ \ \ \ \ (24)

and the quantity inside the brackets is a vector

\displaystyle (\psi A)=\left(\psi A_x,\psi A_y,\psi A_z\right), \ \ \ \ \ (25)

where {A_x} is the {x}-component of {A}, and so on. Taking the dot product of (24) and (25), we can see that this will give us

\displaystyle \begin{aligned} \nabla_{\psi}\cdot(\psi A) &= \frac{\partial}{\partial x_{\psi}}(\psi A_x)+\frac{\partial}{\partial y_{\psi}}(\psi A_y)\frac{\partial}{\partial z_{\psi}}(\psi A_z), \\ &= A_x\frac{\partial \psi}{\partial x_{\psi}}+A_y\frac{\partial \psi}{\partial y_{\psi}}+A_z\frac{\partial \psi}{\partial z_{\psi}}, \\ &=A\cdot\nabla_{\psi}\psi. \end{aligned} \ \ \ \ \ (26)

Putting all this together we arrive at

\displaystyle \nabla\cdot(\psi A)=A\cdot\nabla\psi+\psi\nabla\cdot A. \ \ \ \ \ (27)

4. Conclusion

We’ve learned a neat trick to treat the derivative operator just like any other vector. This is a cool and useful idea, which I hadn’t seen anywhere before I came across it in chapter 27-3 of [1]. Leave a comment or a tweet if you find other cool applications, or have ideas for further investigation. I notably did not touch on any of the second derivatives, such as {\nabla\cdot(\nabla\times A)} or {\nabla\times(\nabla\times A)}, and I’m sure that this trick would also simplify a lot of these. I also had a look at {\nabla(A\cdot B)}, and while you could use the trick there it turned out to be a bit complicated and involved some thinking to `guess’ terms which would fit what you wanted. Let me know if you find a nice simple way of doing this.

As a final application, u/Muphrid15 mentioned that this idea can be used to generalise the derivative operator to geometric algebra (also known as Clifford algebras). This is a sort of algebra for vector spaces, allowing you to do things like add one vector space to another or ajoin and subtract dimensions, and many calculations in vector algebra can be simplified immensely when put in this language.

Follow @RLecamwasam on twitter for more posts like this, or join the discussion on Reddit:

Feynman’s Vector Calculus Trick from Physics

Feynman’s vector calculus trick from math

5. References

[1] Leighton, R., & Sands, M. (1963). The Feynman Lectures on Physics, Volume II: Mainly Electromagnetism and Matter.

[2] Wikipedia contributors. (2019, February 20). Vector calculus identities. In Wikipedia, The Free Encyclopedia: Retrieved 23:01, February 22, 2019\_calculus\_identities

[3] The LaTeX was written using the excellent tool LaTeX to WordPress:
LaTeX to WordPress

quantum algorithms, quantum information

Superdense coding

1. Introduction

In this article we will introduce superdense coding, a scheme which lets Alice send two bits of (classical) information to Bob by transmitting a single entangled qubit. This article will be mathematically rigorous, while hopefully also providing an intuitive explanation of what is really going on. We will assume an undergraduate understanding of quantum mechanics, including familiarity with Dirac notation and entanglement.

Suppose Alice has a qubit, whose state may be written as

\displaystyle a|0\rangle+b|1\rangle, \ \ \ \ \ (1)

where {a} and {b} are complex numbers such that {|a|^2+|b|^2=1}. It would seem from (1) that if Alice wished to encode some information in her state and then send it to Bob, she has a lot of freedom in her choice of {a} and {b}. In comparison to a classical bit, which can only take discrete values of {0} or {1}, it seems like a qubit is infinitely more powerful! However, there’s a big catch.

To access this information Bob needs to measure the qubit, and (assuming he measures in the {\{|0\rangle,|1\rangle\}} basis) his result will be either {0} or {1}, with probability {|a|^2} and {|b|^2} respectively. Once he does this the state is lost, and he can gain no more information. Thus the only way that Alice can deterministically transfer information is to send either the {|0\rangle} state or the {|1\rangle} state, in which case Bob can measure it to receive one bit of information. If Alice sends anything else, Bob won’t be able to draw a conclusion from a single measurement, after which the original state will be lost. Despite all the extra freedom we have in a qubit, the probabilistic nature of quantum measurement seems to imply we can’t do any better than with a classical bit.

It turns out however that if Alice and Bob start off by sharing an entangled state, Alice can deterministically transfer two bits of information with a single qubit, by using a scheme called ‘superdense coding’. We can think of this as them sharing one bit of entanglement, which together with the transfer of one qubit leads to two bits of information. This idea was introduced in 1992 by Charles Bennet and Stephen Wiesner (see References below for the paper link).

2. Some quantum gates

We will begin by defining four operators which Alice and Bob will use. Firstly there is the Pauli {\sigma_x}, which flips a qubit:

\displaystyle \sigma_x|0\rangle=|1\rangle, \ \ \ \ \ (2)

\displaystyle \sigma_x|1\rangle=|0\rangle. \ \ \ \ \ (3)

Next there is the Pauli {\sigma_z} operator, which flips the phase of the {|1\rangle} bit:

\displaystyle \sigma_z|0\rangle = |0\rangle, \ \ \ \ \ (4)

\displaystyle \sigma_z|1\rangle = -|1\rangle. \ \ \ \ \ (5)

The Hadamard operator sends the qubits to two orthogonal superpositions:

\displaystyle H|0\rangle=\frac{1}{\sqrt{2}}\left(|0\rangle+|1\rangle\right), \ \ \ \ \ (6)

\displaystyle H|1\rangle=\frac{1}{\sqrt{2}}\left(|0\rangle-|1\rangle\right). \ \ \ \ \ (7)

We can see that this also reverses itself:

\displaystyle \begin{aligned} H\frac{1}{\sqrt{2}}\left(|0\rangle+|1\rangle\right)&=\frac{1}{\sqrt{2}}\left(H|0\rangle+H|1\rangle\right), \\ &= \frac{1}{\sqrt{2}}\left(\frac{1}{\sqrt{2}}\left(|0\rangle+|1\rangle\right)+\frac{1}{\sqrt{2}}\left(|0\rangle-|1\rangle\right)\right), \\ &=\frac{1}{2}\left(2|0\rangle\right), \\ &= |0\rangle. \end{aligned} \ \ \ \ \ (8)


\displaystyle H\frac{1}{\sqrt{2}}\left(|0\rangle-|1\rangle\right)=|1\rangle. \ \ \ \ \ (9)

Finally there is the only two-qubit gate we will need, the controlled not (CNOT) gate. This takes two qubits; if the first (the control) is {|0\rangle}, it leaves the whole state unchanged:

\displaystyle CNOT\left(|0\rangle |0\rangle\right)=|0\rangle |0\rangle, \ \ \ \ \ (10)

\displaystyle CNOT\left(|0\rangle |1\rangle\right)=|0\rangle |1\rangle. \ \ \ \ \ (11)

If the control qubit is {|1\rangle} however then CNOT flips the target:

\displaystyle CNOT\left(|1\rangle |0\rangle\right)=|1\rangle |1\rangle, \ \ \ \ \ (12)

\displaystyle CNOT\left(|1\rangle |1\rangle\right)=|1\rangle |0\rangle. \ \ \ \ \ (13)

3. The superdense coding protocol

Let’s see how we can encode two bits of information in a single qubit. This time, Alice and Bob start off with a pair of entangled qubits:

\displaystyle |\Psi\rangle_{AB}=\frac{1}{\sqrt{2}}\left(|0\rangle_A|0\rangle_B+|1\rangle_A|1\rangle_B\right). \ \ \ \ \ (14)

In the equation above, {|0\rangle_A} represents Alice’s qubit being {|0\rangle}. Because this system is entangled, Alice’s and Bob’s states are intrinsically linked. This is best thought of as a single bipartite system rather than two individual qubits, and so local operations on Alice’s state will affect the state {|\Psi\rangle_{AB}} of the system as a whole.

Suppose Alice has two classical bits to encode, {\alpha} and {\beta}, each of which takes value either {0} or {1}. She encodes the first bit in the parity of her’s and Bob’s states, i.e. whether they are the same or different. If {\alpha} is {0} she does nothing, and so from (14) Alice’s and Bob’s qubits will be the same. If {\alpha} is {1} she applies a {\sigma_x} gate to her state, flipping it and resulting in the state

\displaystyle \sigma_{x,A}|\Psi\rangle_{AB}=\frac{1}{\sqrt{2}}\left(|1\rangle_A|0\rangle_B+|0\rangle_A|1\rangle_B\right). \ \ \ \ \ (15)

Thus her’s and Bob’s qubits will always be measured to be opposite.

Alice encodes her second bit {\beta} in the phase between the two states in the superposition. If {\beta} is {0} she again does nothing, however if {\beta} is {1} she applies the {\sigma_z} gate to her state, which will result in a minus sign between the two states.

As we mentioned belfore, even though Alice is applying these operators locally to her state, the system is an entangled bipartite state, and so we can think of her as applying global operators {\left(\sigma_{i,A}\otimes I_B\right)}, Pauli operators tensored with the identity, to the whole system. After Alice’s operations, if {\alpha=0} the global state will be

\displaystyle |\Psi\rangle_{AB}=\frac{1}{\sqrt{2}}\left(|0\rangle_A|0\rangle_B\pm|1\rangle_A|1\rangle_B\right), \ \ \ \ \ (16)

and if {\alpha=1} the global state will be

\displaystyle |\Psi\rangle_{AB}=\frac{1}{\sqrt{2}}\left(|0\rangle_A|1\rangle_B\pm |1\rangle_A|0\rangle_B\right), \ \ \ \ \ (17)

where in both cases the sign is positive if {\beta=0}, and negative if {\beta=1}. Again we note that {\alpha} is encoded in the parity, whether Alice or Bob’s quibts are the same or different, and {\beta} in the phase between the two superpositions. This phase is the new degree of freedom which we get from entanglement.

Alice then sends her single qubit to Bob, who now possess both states of the bipartite system. Even though Alice has only transmitted a single qubit, because their states were entangled Bob may recover both of the operations that Alice performed. To do this Bob performs the following steps:

  1. To measure the parity Bob applies the CNOT gate on the system, using Alice’s bit as the control. If {\alpha=0}, this will send (16) to

    \displaystyle \begin{aligned} CNOT_A|\Psi\rangle_{AB} &=\frac{1}{\sqrt{2}}\left(|0\rangle_A|0\rangle_B\pm|1\rangle_A|0\rangle_B\right), \\ &=\frac{1}{\sqrt{2}}\left(|0\rangle_A\pm|1\rangle_A\right)|0\rangle_B, \end{aligned} \ \ \ \ \ (18)

    and if {\alpha=1} this will send (17) to

    \displaystyle CNOT_A|\Psi\rangle_{AB}=\frac{1}{\sqrt{2}}\left(|0\rangle_A\pm|1\rangle_A\right)|1\rangle_B, \ \ \ \ \ (19)

    Bob could now deterministically read out the value of {\alpha} simply by performing a measurement on his qubit!

  2. To measure the phase, Bob applies the Hadamard gate to Alice’s qubit. Looking at the two equations above, we see that regardless of Bob’s qubit, Alice’s is in the superposition

    \displaystyle \frac{1}{\sqrt{2}}\left(|0\rangle_A\pm|1\rangle_A\right), \ \ \ \ \ (20)

    where the sign is positive if {\beta=0} and negative if {\beta=1}. In the former case the Hadamard gate will send this to {|0\rangle_A}, and in the latter to {|1\rangle_A}.

We can see then that after this protocol, Bob has the state:

\displaystyle |\alpha\beta\rangle. \ \ \ \ \ (21)

He may therefore perform a single measurement on the two qubits he possess, and in doing so learn the value of both bits {\alpha} and {\beta}! Alice thus used one qubit, and one bit of entanglement, to transmit two bits of information to Bob.

4. Discussion

Follow @RLecamwasam on twitter for more posts like this. Questions/comments/criticisms? Feel free to leave comment, either here or on the Reddit thread:

Superdense coding explained from Physics

u/RRumpleTeazzer pointed out that this protocol still involves the transmission of two qubits. We could imagine this as Alice first prepares the entangled state superposition {|\Psi\rangle_{AB}}, sends one of the qubits to Bob, and then performs the superdense coding protocol on her remaining qubit before sending this to him as well. So really, this is Alice sending two classical bits via two qubits.

What I think still makes this process surprising from a classical point of view is that all of Alice’s encoding happens after Bob already has the first qubit. They begin by sharing the resource of an entangled state, Alice encodes two classical bits on her qubit, and then sends this to Bob who can decode them both. Of course from the quantum point of view this is perfectly natural; since this is a bipartite entangled state, it is better to think of Alice performing operations on the global state {|\Psi\rangle_{AB}}, rather than on ‘her qubit’. As u/RRumpleTeazzer’s says, ‘delayed choice coding’ is perhaps an equally good name.

u/NidStyles and u/gabeff asked about experimental implementations of superdense coding. The first implementation was in 1996 (see References) and used photons as qubits, where {|0\rangle} and {|1\rangle} were the Horizontal and Vertical polarisation states {|H\rangle} and {|V\rangle}. The initial superposition was created using a process called ‘spontaneous parameteric downconversion’, where a nonlinear crystal creates pairs of photons whose polarisations are entangled with each other:

\displaystyle |\Psi\rangle=\frac{1}{2}\left(|H\rangle|H\rangle+|V\rangle|V\rangle\right). \ \ \ \ \ (22)

The problem with this experiment however was that Bob could only measure three of Alice’s four possible messages. These four messages were:

\displaystyle |\Psi^+\rangle=\frac{1}{2}\left(|H\rangle|V\rangle+|V\rangle|H\rangle\right), \ \ \ \ \ (23)

\displaystyle |\Psi^-\rangle=\frac{1}{2}\left(|H\rangle|V\rangle-|V\rangle|H\rangle\right), \ \ \ \ \ (24)

\displaystyle |\Phi^+\rangle=\frac{1}{2}\left(|H\rangle|H\rangle+|V\rangle|V\rangle\right), \ \ \ \ \ (25)

\displaystyle |\Phi^-\rangle=\frac{1}{2}\left(|H\rangle|H\rangle-|V\rangle|V\rangle\right). \ \ \ \ \ (26)

The experimenters interfered these in such a way that you could distinguish states which were symmetric in interchanging the photons from states which were anti-symemtric. We can see above that {|\Psi^-\rangle} is the only anti-symmetric state (if you swap the two photons this is the only one which picks up a minus sign), and so this one could be immediately read out. For the other three, they passed them through a scheme which could determine if the photons had the same or different polarisations. If they were different, this corresponded to {|\Psi^+\rangle}. If they were the same however it could be either of {|\Phi^+\rangle} or {|\Phi^-\rangle}, with no way of distinguishing them further.

These difficulties were resolved in a later experiment in 2008 (again see References). In this, each qubit was composed two photons rather than one, with the first of each pair entangled in polarisation, and the second in angular momentum. This extra degree of freedom allowed the experimenters to distinguish the four possible messages.

Because of the intricacies of the setups, both of these should be seen as more ‘proof of principle’ than scalable methods for quantum communication.

4. References

John Watrous’s Lecture Notes ‘Introduction to Quantum Computing (Winter 2006)’.
See Lecture 3: ‘Superdense coding; quantum circuits; and partial measurements’ –

The Wikipedia page on ‘Superdense coding’:

Also check out the original paper:
Bennett, C. H., & Wiesner, S. J. (1992). Communication via one- and two-particle operators on Einstein-Podolsky- Rosen states. Physical Review Letters, 69(20), 2881–2884.

The first experimental implementation was in 1996 using photons as qubits, however in this one Bob could only recover three out of the four possible messages:
Mattle, K., Weinfurter, H., Kwiat, P. G., & Zeilinger, A. (1996). Dense coding in experimental quantum communication. Physical Review Letters, 76(25), 4656–4659.

A newer implementation in 2008 allowed Bob to decode all four messages. This was done by composing each qubit of two photons, rather than one:
Barreiro, J. T., Wei, T. C., & Kwiat, P. G. (2008). Beating the channel capacity limit for linear photonic superdense coding. Nature Physics, 4(4), 282–286.

LaTeX and document formatting was done via the amazing tool LaTeX to WordPress: