*by Michael Barr*

The theorem of the title has been known for centuries, perhaps longer, but I believe that Lagrange gave the first proof. When I was a student, I saw a very different (and, in my opinion, harder) proof from the one given here. I don't know who discovered this proof, but I saw it in Alan Baker's little book on number theory.

There is yet another proof that proves and uses unique factorization in the non-commutative ring of quaternionic integers. Warning: the quaternionic integers include (1+i+j+k)/2 and in fact is the ring of all polynomials in that number.

All proofs that I have seen use the identity

(`a`² + `b`² + `c`² + `d`²) (`A`² + `B`² + `C`² + `D`²) =
(`a``A`+`b``B`+`c``C`+`d``D`)² + (`a``B`−`b``A`+`c``D`−`d``C`)² + (`a``C`−`b``D`−`c``A`+`d``B`)² + (`a``D`−`d``A`+`b``C`−`c``B`)²

which implies that the product of two numbers, each of which is a sum
of four squares, is also a sum of four squares. Although this identity
can be verified directly it is worth pointing out that by taking
`u`=`a`−`b`i−`c`j−`d`k and `U`=`A`+`B`i+`C`j+`D`k, this is simply the square of the
identity that says that |`u``U`|=|`u`||`U`|. That is why I often refer to it
as the quaternionic product identity. Anyway the conclusion is that it
suffices to prove for primes. Since every power of 2 is obviously
either a square or a sum of two squares, it suffices to prove that every
odd prime is a sum of four squares.

For later use, we also mention what might be called the complex product identity:

(`a`² + `b`²) (`A`² + `B`²) =
(`a``A`+`b``B`)² + (`a``B`−`b``A`)²

We give here a proof that has the added advantage not only of being
constructive, but actually quite easily implemented as an algorithm. It
is not deterministic in the sense that there are trials, but it is
arranged so that each trial has at least a 50% chance of success. I
assume that the reader is familiar with modular arithmetic, although I
may add an introduction to that in the future. If `a` and `n` are
numbers, we say that the remainder `r` when `a` is divided by `n` is the
*residue* of `a` mod `n` (residue was the old word for what we now
call remainder, but the old usage has hung on in number theory).
Clearly, `a` = `r` (mod `p`) and `r` is the smallest non-negative number
with that property. Closely related is the *absolutely least
residue* defined as whichever of `r` and `r`−`n` is smaller in
absolute value. If `n` happens to be even and `r`=`n`/2, then |`r`|=|`r`−`n`| and you
can take either one, but in order for the phrase to have an unambiguous
meaning, we will take `r`.

So we wish to show that every odd prime is a sum of four squares. Actually, we give two separate arguments, the first that every prime congruent to 1 mod 4 is a sum of two squares and then that every prime congruent to 3 mod 4 is a sum of four squares.

We begin by showing that if `p` is a prime and `p` = 1 (mod 4), then
there is a number `a` such that `a`² = −1 (mod `p`). We say that −1
is a quadratic residue mod `p`. One way to see this is the following.
If `a`, `b`, and `c` are three integers and `p` does not divide `a`,
then `a``b` = `a``c` (mod `p`) implies that `b` = `c` (mod `p`). The reason is that the
conditions imply that `p` divides `a``b`−`a``c` and `p` does not divide `a` so that `p` divides `b`−`c`,
which is the conclusion. Now for any `a` such that 0 < `a` < `p` the numbers
`a`, 2`a`, 3`a`, ..., (`p`−1)`a` are all non-congruent mod `p` and
therefore must be congruent to 1, 2, 3, ..., `p`−1 in some
order. By taking the product of all the integers in the two sets, we
see that `a`^{p−1}(`p`−1)! = (`p`−1)! (mod `p`). Since `p` divides no factor
of (`p`−1)!, it can be canceled to conclude that `a`^{p−1} = 1 (mod `p`)
(Fermat's little theorem).

Next we note that for any `a` not divisible by `p`, if we
let `b`=`a`^{(p-1)/2}, then `b`² = 1 (mod `p`). This means that
`p` divides (`b`²−1) = (`b`−1)(`b`+1) which means that `a`^{(p−1)/2} = `b` = ±1 (mod `p`).

Can `a`^{(p−1)/2}=1 for all `a`? No, it cannot. I will not give all
the details here, but the underlying reason is that arithmetic mod a
prime is really like ordinary arithmetic in nearly all respects. In
particular, the usual argument that a polynomial of degree `d` has at
most `d` distinct roots is valid. If `f`(`x`) is a polynomial, then
`f`(`a`) = 0 (mod `p`) if and only if `x`−`a` divides `f`(`x`), with division taking
place mod `p`. This is not true for non-primes, by the way. For
example, `x`²−1 = (`x`−1)(`x`+1) has four distinct roots mod 8 because 2 can divide one of the factors and 4 can divide the other. Getting back to the main argument, the
polynomial `x`^{(p-1)/2}−1 can have at most (`p`−1)/2 roots as can the
other factor of `x`^{p−1}−1 = (`x`^{(p−1)/2}−1) (`x`^{(p−1)/2}+1). Since the
polynomial does have `p`−1 roots (namely, 1, 2, ..., `p`−1), it
follows that exactly (`p`−1)/2 of the factors must satisfy the equation
`x`^{(p−1)/2} = −1 (mod p). Up to now, this argument is valid for all
odd primes. The next step is limited to primes `p`, for which 4 divides `p`−1.

If `a`^{(p−1)/2} = −1 (mod `p`) and we let `b`=`a`^{(p−1)/4}, then
`b`² = −1 (mod `p`), as desired.

So the first step of the process is to find a square root of −1 mod
`p`. We have shown, in fact, that exactly half of the numbers
1, ..., `p`−1 have the desired property. I do not know a deterministic
process for finding such a number, but they appear to be distributed
randomly so you can just try numbers at random until you find one and
each trial has a 50% chance of success. It is not hard to show that
the smallest solution must be a prime so that you can just try primes
2, 3, 5, 7, 11,... until you succeed. Even if you are trying 1000 digit
numbers it is unlikely that you will not find a solution before the
2000th prime, so for computational purposes, it is probably worth
keeping a list of the first 2000 or so primes.

Actually, you can do better than that if a higher power of 2 divides
`p`−1. If 2^{k} divides `p`−1 and we let `m`=(`p`−1)/2^{k}, then as soon as
`a`^{m} ≠ 1 (mod `p`), it follows that one among the numbers
`a`, `a`², `a`⁴ will be a square root of −1 mod `p`. The probability of a
number chosen at random satisfying `a`^{m} = 1 (mod `p`) is only 1 in
2^{k−1}.

Now `a`²+1² is a multiple of `p`, say `a`²+1²=kp. Since `a` <= `p`−1,
we see that `a`²+1 < `p`² so that `k` < `p`. What we will now describe
is a procedure that continues to produce solutions of `a`²+`b`²=`k``p`, but
with smaller values of `k` until `k`=1 and we are through. Assuming we
have such a pair `a` and `b`, let `a'` and `b'` be the absolutely least
residues `a` mod `k` and `b` mod `k`, respectively. This means that
`a'` and `b'` are integers in the range (−`k`/2,`k`/2]. It follows that
`a'`²+`b'`² <= (`k`/2)²+(`k`/2)² = `k`²/2. Also, `a'`²+`b'`² =
`a`²+`b`² = 0 (mod `k`) since `a` = `a'` (mod `k`) and `b` = `b'` (mod `k`).

Thus `a'`²+`b'`² = `m``k` with `m` < `k`. Multiplying, we find that
(`a`²+`b`²)(`a'`²+`b'`²) = `m``k`²`p`. Now we have the identity,

(`a`²+`b`²)(`a'`²+`b'`²) =
(`a``a'`+`b``b'`)²+(`a``b'`−`b``a'`)² (*)

Since `a``a'`+`b``b'` = `a`²+`b`² = 0 (mod `k`) and `a``b'`−`b``a'` = `a``b`−`b``a` = 0 (mod `k`)
it follows that both terms on the right hand side
of (*) are divisible by `k` and so

((`a``a'`+`b``b'`)/`k`)² + ((`a``b'`−`b``a'`)/`k`)² = `m``p`

and `m` < `k`. It can be shown that the number of reduction steps is
bounded above by the base-2 logarithm of `p`.

In this case, we begin by finding numbers `a` and `b` such that
`a`²+`b`²+1 = 0 (mod `p`). This is the same as `b`² = −1−`a`² (mod `p`).
So we can look at −1−4, −1−9, ..., until one of them has a square
root mod `p`. I don't know if there is a theorem on the subject, but
empirically it seems that for each value of `a` there is about a 50%
chance that −1−`a`² is a quadratic residue. It is not hard to prove
that there is at least one solution to `y`² = −1−`x`² (mod `p`). To see
this, first observe that the squares 0², 1², ..., ((`p`−1)/2)² have
all distinct residues mod `p` since `a`² = `b`² (mod `p`) implies that
`p` divides (`a`²−`b`²) = (`a`+`b`)(`a`−`b`) and hence that `a` = `b` (mod `p`) or `a` =
−`b` (mod `p`), which is impossible for two distinct numbers in the range
[0,(`p`−1)/2]. Thus those squares have (`p`+1)/2 distinct residues.
For similar reasons, the numbers −1−0², −1−1², ..., −1−((`p`−1)/2)²
have (`p`+1)/2 distinct residues. Since there are at most `p`, the two
sets of residues overlap and there is at least one solution to
`x`²+`y`²+1 = 0 (mod `p`). This proves that there is at least one
solution, but if there were only one, finding it would be like looking
for a very small needle in a very large haystack (assuming `p` is very
large). In fact, trials convince me that −1−`x`² is congruent to a
square about half the time and there are very many solutions. In fact,
it can be shown that −2=−1−1² is a quadratic residue whenever
`p` = 3 (mod 8), which already covers half the cases of primes that
are congruent to 3 mod 4. Similar criteria exist for −5=−1−2²,
−10=−1−3², etc. So it looks as if you can simply try successively
−2, −5, −10, ... until you find one that works. (Incidentally, −1
can never be a quadratic residue for these primes, a very easy exercise.)

For our purposes, however, it is not enough to find `x`, you also need
`y`. Fortunately, there is for primes that are congruent to 3 mod 4,
there is a very simple way of constructing a square root of any number
that is a quadratic residue. There is, to my knowledge, no
correspondingly easy way of doing this for primes congruent to 1 mod 4.
Consider a number `a`, not divisible by `p`. Suppose that `a` = `b`² (mod `p`). Then

`a`^{(p+1)/2} = `b`^{p+1} = `b`²`b`^{p−1} = `b`² = `a` (mod `p`)

from which we see that (`a`^{(p+1)/4})² = `a` (mod `p`) and hence
`a`^{(p+1)/4} is a square root of `a` mod `p` if it has one. (Otherwise
it is a square root of −`a`, as a matter of interest.)

One we have found `x` and `y` so that `x`²+`y`²+1 = 0 (mod `p`), we can
replace them by absolutely least residues and suppose that |`x`| < (`p`−1)/2 and |`y`| < (`p`−1)/2 from which one sees that `x`²+`y`²+1 < `p`²/2. Thus we get initial values of `a`, `b`, `c`, `d` for which
`a`²+`b`²+`c`²+`d`² = `k``p` with `k` < `p`.

We will show how, if `k` > 1, to find a new solution with a smaller
value of `k`. The argument is similar to, but harder than, the
previous case of two squares. We consider the case of `k` odd and even
separately, for reasons we make clear. Suppose `k` is odd. In that
case, let `a'`, `b'`, `c'`, and `d'` be the
absolutely least residues of `a`, `b`, `c`, and `d` mod `k`.
Then each of |`a'`|, |`b'`|, |`c'`| and
|`d'`| is <= (`k`−1)/2 so that

`a'`²+`b'`²+`c'`²+`d'`² <= (`k`−1)²

while

`a'`²+`b'`²+`c'`²+`d'`² = `a`²+`b`²+`c`²+`d`² = 0 (mod `k`)

Hence `a'`²+`b'`²+`c'`²+`d'`² = `m``k`
with `m` < `k`. Now we have that

`m``k`²`p` = (`a`²+`b`²+`c`²+`d`²) (`a'`²+`b'`²+`c'`²+`d'`²) = (`a``a'`+`b``b'`+`c``c'`+`d``d'`)²
+ (`a``b'`−`b``a'`+`d``c'`−`c``d'`)² + (`a``c'`−`c``a'`+`b``d'`−`d``b'`)² + (`a``d'`−`a'``d`+`c``b'`−`b``c'`)²
and, in a way similar to the first case, each term is divisible by `k`,
from which we conclude that

`m``p` = ((`a``a'`+`b``b'`+`c``c'`+`d``d'`)/`k`)² + ((`a``b'`−`b``a'`+`d``c'`−`c``d'`)/`k`))²
+ ((`a``c'`−`c``a'`+`b``d'`−`d``b'`)/`k`)² + ((`a``d'`−`a'``d`+`c``b'`−`b``c'`)/`k`)²

which completes the construction.

When `k` is even, this argument fails specifically in the case that
`a'` = `b'` = `c'` = `d'` = `k`/2 since then `a'`²+`b'`²+`c'`²+`d'`² = `k`². Although we
could give a proof that works just in that case, it is just as easy and
computationally better to give an argument that halves `k`
even. We observe that `a`²+`b`²+`c`²+`d`² = `k``p` is even so that
none or two or all four of `a`, `b`, `c` and `d` are even. In any case
we can arrange them so that `a`±`b` and `c`±`d` are even and then

((`a`+`b`)/2)² + ((`a`−`b`)/2)² + ((`c`+`d`)/2)² + ((`c`−`d`)/2)² = (`k`/2)`p`

which completes the proof.

Reproduced here with permission of the author.

Click here to open an applet that finds the decomposition of positive integer numbers as a sum of up to four squares using this method.

*by Dario Alejandro Alpern*

The methods shown above work only when the complete prime factorization of the number is known. If this is not the case a modification is required.

Since it is much easier to test whether a number is prime than factoring it, the following method suggests itself:

If the number does not have the form 4^{r} (8`k`+7) it can be expressed as a sum of three squares. In this case subtract a square to the original number such that the difference is a prime of the form 4`k`+1.
Then, using the method explained above we find the decomposition of the prime in a sum of two perfect squares.
If the number cannot be expressed as a sum of three squares subtract two squares.

Unluckily this simple method does not work: when a number is multiple of 4, we cannot find a difference of the form 4`k`+1 by subtracting one or two squares.
Also, when the number has the form 8`k`+3 it can be expressed as a sum of three squares, but subtracting a square we cannot obtain a number of the form 4`k`+1.
It is not difficult to change the idea so it works.

The method can be stated as follows:

- Express the original number in the form
`n`= 4^{r}`s`is not multiple of 4. - Compute the integer square root of
`s`. If this number squared equals`s`, the original number is a perfect square, so skip the following two steps. - If
`s`< 7 (mod 8) the number can be expressed as a sum of three squares. Subtract a square and check whether the result has the form 2^{m}(4`k`+1) where`m`is a non-negative integer. If not, select another square and repeat. Finally check if the number 4`k`+1 is prime. If not, select another square and repeat. The resulting difference, 2^{m}(4`k`+1), can be expressed as a sum of two squares as explained above. In the sequence of squares, the first attempt should be zero in order to minimize the number of summands in the decomposition. - Otherwise, the number must be expressed as a sum of four squares. Subtract two perfect squares from
`s`such that the difference has the form 2^{m}(4`k`+1) and proceed as in the previous step. - At this moment
`s`=`a`² +`b`² +`c`² +`d`², where some variables can be equal to zero. Multiply by 4^{r}to get`n`= (2^{r}`a`)² + (2^{r}`b`)² + (2^{r}`c`)² + (2^{r}`d`)².

It is not necessary to use a certified-prime test in the loop used in the method. This is because we are satisfied if we find a number `q` such that `q`² = −1 (mod (4`k`+1)).
This number `q` is used in the algorithm that computes the sum of two squares.

A disadvantage of this method is that a number that is a product of big primes of the form 4k+1 will be expressed as a sum of three squares instead of two.

Click here to open an applet that finds the decomposition of positive integer numbers as a sum of up to four squares using this method.

Last updated February 5th, 2020.