# Six Impossible Things: Issue 1

**January 17, 2024.** *In the first issue of 6IT: the mathematics of supervising children; unreliably clairvoyant demons; why the sky is blue
(and why it isnâ€™t); a Poisson process for primes; sychronizing
fireflies; and quantum error correction from Penrose tilings.*

*Through the Looking Glass*

Inspired by one of my heroes, John Baez, Iâ€™m planning to blog about â€śfindsâ€ť in the hard sciences on a semi-regular basis. These may be papers, favourite theorems, new discoveries, or shower thoughts; Iâ€™m hoping to be brief and nerdily enthusiastic rather than thorough and technically deep. Either way, I hope itâ€™s fun to read. Letâ€™s see how it goes!

## Contents

*The obstructed supervision theorem**Zermeloâ€™s demon and the axiom of choice**Why the sky is blue**A game of primes**The firefly effect**Quantum error correction from Penrose tilings*

## 1. The obstructed supervision theorem

Suppose you have to guard an art gallery with $n$ straight walls. How many guards do you need to ensure the artworks are visible at all times? There is a beautiful proof, due to Steve Fisk and presented Proofs from THE BOOK, This book collects demonstrations of which God themself might approve. Whether an omniscient being needs proofs is another question! that $\lfloor n/3\rfloor$ guards are sufficent, and examples show this number is in general necessary. Weâ€™ll start with the examples. Take $m$ triangles and connect them with a corridor, forming a gallery of $n=3m$ walls. For $m = 3$, this gives:

Each triangle requires a guard to see the recessed tip, but one of these guards can survey the whole corridor from the bottom left. Thus, $m = n/3$ guards are needed. For $n$ not a multiple of $3$, we can simply remove one or two walls from the above construction. Now letâ€™s turn to Fiskâ€™s argument for sufficiency.

##
*Proof.*$\,$ (In short: triangulate, 3-color, and place
guards at one color. Click to expand.)

Consider the graph formed by the $n$ walls of the gallery, with walls as edges and the points they meet as vertices. Triangulate this graph, i.e. introduce $n - 2$ new rectilinear edges that split the interior into triangles. We will show that we can color the vertices of this graph with three colors so that no adjacent vertex has the same color.

Note that, since there are no internal vertices, we can assign $0$ and $1$ in an alternating fashion to triangular faces. If we could not assign bits this way, it would mean an odd number of triangular faces meet at a point. This is only possible with internal vertices. We color the faces assigned $0$ with three colors (G, R, B) in clockwise order, and faces assigned $1$ with the colors in counterclockwise order, with the first face determining the coloring of the second face, and so on. The coloring clearly exists and is unique up to the choice of triangulation and ordering on the first face. Running around the polygon, we have $n$ vertices of three colors, the rarest of which is assigned to at most $\lfloor n/3 \rfloor$ vertices. If it was assigned to more, there would be more than $n$ vertices! We choose this rare color to be the location of our guards. Since each triangle has a vertex of this color, each triangle has a guard watching it, and we are done. The required triangulation doesn't obviously exist for non-convex polygons. To show it does, first we need the existence of a vertex with acute interior angle, like the leftmost corner above. Without such a vertex, the polygon could never close! So, let's draw this vertex: We can either tringulate by drawing an edge between the two adjacent vertices, or if a vertex pokes inside that triangle, to the offending poker. Then we split off the triangle and use induction on the rest! $\blacksquare$I became interested in the art gallery theorem because of a real world
problem: supervision in out-of-school-hours (OSH) care. Instead of
guards watching over paintings, there are educators watching over
children, which in contrast to paintings are mobile and
unpredictable. The requirement of full visibility makes more sense!
My sister runs an OSH facility with an almost comically elaborate
layout.
The standard art gallery theorem does not apply due to the presence of
*internal polygons*.
Let us these call internal walls or polygons *holes*.
With a little thought, one can extend to the art gallery theorem to
the following:

##
*Proof.*$\,$ (In short: connect a hole to the outside with two walls.)

For each hole, draw an edge from one of its vertices to a vertex on the outer polygon.

If each such edge is viewed as*two*walls, we now have a (non-convex) polygon of $n + 2g$ walls, and we can apply the original art gallery theorem. $\blacksquare$

This was first proved by Oâ€™Rourke (see Chapter 5 of his book). Examples of necessity are much harder to come by. We can modify our earlier examples by adding internal walls, which we view as degenerate polygons of $2$ walls. For instance, the following layout, with $n = 7$ and $g = 1$, requires $(n + 2g)/3 = 3$ educators:

Adding more slits and recesses, we can extend this to higher values of $n$ and $g$.
But these degenerate polygons seem like cheating,
After all, they can only hide a child of measure zero from a
well-positioned supervisor.
and if we require genuine polygons, it becomes much harder to saturate the bound; for
general $n$ and $g$, the problem is unsolved. There is a full
characterization of such polygons for a *single* hole (due to Shermer)
that is very fiddly, and gives rise to constructions like the following
($n=40$ walls and $14$ guards):

This would not look out of place in a basilica or medieval fortress. It is remarkable that a few walls make the task so much harder; then again, perhaps we shouldnâ€™t be surprised that trying to manage groups of children gives rise to unsolvable problems.

### Further reading

*Proofs from THE BOOK*(1998), Martin Aigner and GĂĽnter Ziegler.*Art Gallery Theorems and Algorithms*(1987), Joseph Oâ€™Rourke.

## 2. Zermeloâ€™s demon and the axiom of choice

A recent paper by Baumeler, DakiÄ‡, and Del Santo introduced me to a curious consequence of the axiom of choice (AC). Recall that the AC states that, for any collection of nonempty sets $\{S_i: i\in I\}$, we can choose a set of representatives $\{s_i: i\in I\}$ where each $s_i \in S_i$, i.e. one element from each set. Though the AC is very intuitiveâ€”we can choose elements from nonempty setsâ€”it has very counterintuitive consequences, such as the Banach-Tarski paradox.

This splits a perfect sphere into five
wildly complicated pieces, and reassembles them to form *two* spheres of
the same size!
In his
autobiography,
Richard Feynman refutes Banach-Tarski
Or he mockingly calls it, â€śSo-and-soâ€™s theorem of
immeasurable measureâ€ť.
on the grounds that no *physical*
sphere, composed of atoms, can be split up this way.
The atoms prevent the arbitrarily fine divisions needed to make the
pieces non-measurable!
One could view this as an argument against the AC, but I think the
point is subtler: since we cannot carry out the Banach-Tarski
â€śexperimentâ€ť in reality, we cannot use physics to reason about the axioms of set
theory. The AC could be true, it could be false; physics probably has
nothing to do with it.

Baumeler, DakiÄ‡, and Del Santo disagree. They argue that the AC may have physical implications, using a bizarre corollary discovered by Gabay and Oâ€™Connor in 2004:

*only a finite number are wrong*.

##
*Proof.*$\,$ (Prisoners see eventually identical
sequences, and guess backwards with AC.)

Consider the set $S = \{\text{R}, \text{B}\}^{\mathbb{N}}$ of infinite sequences of red and blue hats. Two sequences $s, s' \in S$ are eventually equal if they have the form $$ s = s_0 t, \quad s' = s'_0 t $$ for some finite sequences $s_0, s_0'$ of equal length, and $t \in S$. We let $s(n)$ denote the $n$th element of sequence $s$. It's easy to see that being eventually equal is an equivalence relation, with equivalence classes $S_i$ for some indexing set $i \in I$. By the AC, the prisoners can select a representative $s_i$ from each class.

Once placed on the number line and assigned hats, any two prisoners will see sequences that are eventually equal, with $t$ the sequence seen by the prisoner further along (see above). All prisoners can thus determine the equivalence class $S_i$ of the actual sequence of hats, choose the corresponding representative $s_i$, and output the guess $s_i(n)$ for their position $n$. Since the actual sequence and the guess $s_i$ are eventually equal, at most finitely many guesses are wrong. $\blacksquare$There are already a number of Feynman-like objections we could make
here; for instance, the prisoners need to have an infinitely large memory to store
the representatives $s_i$.
But even if we grant them these powers, what can they *do* with them?
Infinite memory is reminiscent of Maxwellâ€™s demon and relatives, so let us
replace the prisoners with *Zermeloâ€™s demon*, named after Ernst
Zermelo, originator of the AC.
Prior to his work on set theory, Zermelo invented a demon to sit around
and wait for
PoincarĂ© recurrences.
These recurrences reset a system to its initial state, thereby spontaneously reducing entropy and violating
the second law of thermodynamics. We suppose Zermeloâ€™s demon got
boredâ€”a recurrence takes much longer
than the age of the universeâ€”and decided, like Zermelo, to turn its
efforts to mathematics.

The demon uses AC and has a set of lucky coins; any time it
flips one, it learns the infinitely long history of prior coin
outcomes (analogous to prisoners looking backwards) and the number of
flips until the coin disintegrates (analogous to a prisoner learning their
number). Like the prisoners, the demon can determine the equivalence
class of $S_i$ of the history of flips, and choose the representative
$s_i$ with the AC. But since *any* sequence of remaining flips gives
rise to the same $s_i$, itâ€™s clear that the demon learns nothing about them!

The fact that Zermeloâ€™s demon cannot predict future coin flips makes
the AC less physically disturbing. But there is another variant due to
Hardin and Taylor which shows
the demon *can* look into the future. I follow
this nice exposition
by Eric Moorhouse.

##
*Proof.*$\,$ (The demon uses the prisoner's trick
with functions rather than sequences.)

To prove this, the demon first constructs a well-ordering A well-ordering is a linear order ($\prec, A)$ such that every nonempty set of $A$ has a least element. We use the AC to pick the least element, then its successor, and so on. $(\prec, \mathbb{R}^{\mathbb{R}})$ on the set of real functions using the AC. Consider the equivalence relation on functions $f: \mathbb{R}\to \mathbb{R}$ given by $$ f \sim_t g \quad \Longleftrightarrow \quad f(x) = g(x) \text{ for all } x < t. $$ The well-ordering allows the demon to define the representative $ \hat{f}_t = \min_{\prec} \{g : f \sim_t g\}. $ The demon then guesses that $\hat{f}_t$ is the extension of $f$.

This extrapolation will fail on some set of â€śunpredictableâ€ť times, $$ U = \{t \in \mathbb{R} : \hat{f}_t \neq \hat{f}_{t'} \text{ for all } t' > t\}. $$ These are times for which $\hat{f}_t$ fails to agree with $f$ at all $t' > t$, or equivalently, $\hat{f}_{t'}$. Our goal now is to show the set $U$ is countable. Observe that $U$ cannot have an infinite sequence of times, or equivalently a sequence of functions, $$t_1> t_2 > \cdots \quad \Longleftrightarrow \quad \hat{f}_{t_1} \succ \hat{f}_{t_2} \succ \cdots, $$ since $\hat{f}_{t_{i}} \sim_{t_{i+1}} f$, but $\hat{f}_{t_{i+1}}$ is defined as the minimal function satisfying this, so we must have $ \hat{f}_{t_{i}} \succ \hat{f}_{t_{i+1}} $. This infinite descending sequence is impossible, since $U$ has a least element. Since $U$ has no infinite descending sequence of times, for any $t \in U$, the set $U_t = U \cap (t, \infty)$ either has a least element $\min U_t$, or is empty, in which case $t = \max U$. In the first case, we can find a rational number $\iota(t)$ which is between $t$ and $\min U_t$; in the second, we just choose some rational $\iota(t) > t$. The map $\iota: U \to \mathbb{Q}$ is an injection, and since $\mathbb{Q}$ is countable, $U$ is countable. $\blacksquare$The demon is clairvoyant on almost all intervals $(-\infty, t)$, but with no
guarantees about how long that clairvoyance can last. It would be
unwise to follow its investment advice! As with the lucky coins, the
demonâ€™s powers are useless or unreliable. In fact, all the variants discussed in Hardin and
Taylorâ€™s
monograph
have some similar fineprint; this illustrates the fundamentally
nonconstructive nature of the AC, where we can build something, but we
canâ€™t say when, where, or for how long.
You might ask *what* itâ€™s building. I think the simplest answer
is a sort of error
correcting code, robust against any finite (or countable,
or â€śscatteredâ€ť as per Definition 7.1.1 of
Hardin and Taylor)
number of errors. It is not useful since what it is good at
transmitting or predicting is the equivalence class
representative chosen by the AC, not the other points in the
equivalence class we care about for our application.
I think physics remains safe from Zermeloâ€™s demon.

### Further reading

- â€śThe Axiom of Choice and the No-Signaling Principleâ€ť (2023), Ă„min Baumeler, Borivoje DakiÄ‡, and Flavio Del Santo.
- â€śA Peculiar Connection between the Axiom of Choice and Predicting the Futureâ€ť (2008), Christopher Hardin and Alan Taylor.
*The Mathematics of Coordinated Inference: A Study of Generalized Hat Problems*(2013), Christopher Hardin and Alan Taylor.

## 3. Why the sky is blue

Why is the sky blue? The conventional answer is that light bounces off air molecules, and high frequencies bounce more. Blue has a high frequency, so voilĂ ! But actually, violet has a higher frequency, so why isnâ€™t the sky violet? It turns out the sun emits more blue light, so surely that does it? Do the calculation and you find it doesnâ€™t: the sky should be green. So what gives? We are missing physiology. It is the eye, and not the sky, that is blue.

To carefully determine the perceived colour, we need to go and look up luminous efficiency functions and do some integrals. This is hard work, and I am lazy. Luckily, there is a quick hack to get the right answer! Thanks to Jude Arnold for collaborating on an earlier version of this calculation. The idea is to assume the eye adapted to receive sunlight, so we can take its sensitivity curve $V$ to match the spectral intensity $I_{\odot}$ of the sun. Direct sunlight makes up about $85\%$ of the total power at midday, with the rest being the scattered or diffuse light itself, of intensity $I_{\text{diff}}$. So a better approximation is \(V = \alpha I_{\odot} + (1-\alpha) I_{\text{diff}}\) for $\alpha=0.85$. Letâ€™s see what color this predicts!

##
*Calculation.*$\,$ (Take a product of spectra, differentiate, and set to zero.)

For frequency $\nu$ and temperature $T$, the blackbody spectrum is proportional to \[ I_{\odot}(\nu) \propto \frac{\nu^{3}}{e^{h\nu/k_\text{B}T_{\odot}} - 1} \propto \frac{x^3}{e^x-1} \] for a dimensionless variable $x = h\nu/k_\text{B}T_{\odot}$. Rayleigh scattering increases with frequency as $\nu^{4}$, so $I_{\text{diff}}(\nu) \propto \nu^4 I_{\odot}(\nu)$. Thus, for weight $\alpha$, the sensitivity curve goes as \[ V(\nu) = \alpha I_{\odot}(\nu) + (1- \alpha)I_{\text{diff}}(\nu) \propto \left[\alpha + (1-\alpha) x^4\right] \frac{x^3}{e^x - 1}, \] since using the dimensionless variable ensures the constants of proportionality are equal. The perceived spectrum of the sky is $V\cdot I_{\text{diff}}$. To find the peak of this spectrum, we differentiate with respect to $x$ and set to zero, yielding the equation \[ (x - \beta)e^{x} +\beta = 0 \] for $\beta = 7 - 2\alpha$. This is transcendental, but has approximate solution $x \approx \beta(1- e^{-\beta})$. The corresponding frequency and wavelength peaks are then \[ h\nu^* = \beta (1- e^{-\beta}) k_\text{B}T_{\odot} \quad \Longrightarrow \quad \lambda^* = \frac{c}{\nu^*} = \frac{hc}{\beta (1- e^{-\beta}) k_\text{B}T_{\odot}}. \] If we now plug in $hc/k_\text{B} = 14.39 \text{ mm K}$, $\alpha = 0.85$, and $T_{\odot} = 5772 \text{ K}$ for the surface temperature of the sun, we get \[ \lambda^* = \frac{14.39 \text{ mm}}{\beta (1- e^{-\beta}) \cdot 5772} \approx 473 \text{ nm}, \] as claimed. $\blacksquare$

For reference, the real answer is $\lambda = 475 \text{ nm}$. Although the adaptive hypothesis is wrong, The (photopic) luminous efficiency function falls off much faster than a blackbody distribution. The evolutionary hypothesis isnâ€™t bad, but ignores the physiological constraints on the eye which favor a narrowband design. it produces an estimate astoundingly close to the real color of the skyâ€”the mark of a good hack! To get a sense of the discrepancy, we can map these frequencies to color space using the CIE 1931 model, which implements all that ocular physiology we ignore. Itâ€™s close, but a few nanometers can still make a perceptible difference.

This concludes our discussion of daylight. But the sky isnâ€™t only blue during the day; it is sometimes blue as twilight falls, a fact I noticed a few weeks ago on an evening drive. The blue lies below the pure black of the night sky, and above the orange and yellow of sunset. The black makes sense because, far enough from the horizon, all the light has been scattered. Of course, go further and the sky is dark simply because it is in the earthâ€™s shadow. Sunset is yellow and orange because most of the blue light has been scattered out by the time it arrives at the eye. But the band in between is a puzzle!

The explanation turns out to involve new physics: the absorption of red and yellow light by atmospheric ozone ($\text{O}_3$), a process discovered by James Chappuis in 1880 and applied to twilight by Edward Hulbert in the 1950s. Ozone molecules absorb photons, get excited and start wobbling, then split into oxygen molecules ($\text{O}_2$) and/or atoms ($\text{O}_1$). Wobbling is a continuous phenomenon, leading to a band of frequencies that can be absorbed. However, there are two peaks, at $\lambda = 575 \text{ nm}$ and $\lambda = 603 \text{ nm}$:

No wonder the sunset colors disappear so quickly! Theyâ€™re eaten up by the ozone, and whatâ€™s left over is the blue light not yet scattered. You might wonder if the hole in the ozone layer leads to less blue in the antarctic twilight. As Lee, Meyer and Hoeppe show from painstaking observation at the south pole, it does not, which raises yet more questionsâ€¦ This paper has a wonderful potted history of the optics of twilight. They conclude that, although ozone is the main factor, aerosol levels, haze, and high-altitude meteorology are also relevant, particularly for the color of the zenith. With atmospheric optics, something is always up in the air.

### References

*Atmospheric Optics*(2003), Craig Bohren.- â€śAtmospheric ozone and colors of the Antarctic twilight skyâ€ť (2011), Raymond Lee Jr., Wolfgang Meyer, and GĂ¶tz Hoeppe.

## 4. A game of primes

Iâ€™m working on a research problem where, the more prime factors a number has, the harder the problem becomes. So I began to wonder: how many prime factors does a large number have, on average? Suppose $n$ can be factorized as

\[n = p_1^{m_1} p_2^{m_2} \cdots p_{T}^{m_{T}},\]where each $p_j$ is prime, $m_j > 0$ and $T$ is the total number of primes. Define

\[\omega(n) = T, \quad \Omega(n) = \sum_{j=1}^T m_j.\]Weâ€™ll focus on the second function, and try to understand its behaviour. A good place to start is numerics. Letâ€™s plot a histogram of the distribution of $\Omega(n)$ in a window (of width $\min\{n, 10^3\}$) around $n_a = 10^{2a}$, for integers $1 \leq a \leq 5$:

Weâ€™ve marked the mode on the $x$ axis, and although the distribution spreads out to the right, it does so very, very slowly. As Carl Pomerance said of a related function, it â€śhas been proved to go to infinity but has never been observed to do soâ€ť. The distribution itself looks well-behaved; after inspecting a few skewed possibilities, the best visual match is the Poisson distribution:

\[X \sim \text{Poiss}(\lambda), \quad \mathbb{P}(X=k) = e^{-k}\frac{\lambda^k}{k!}.\]This distribution counts the number of times an event occurs, where each occurrence is independent and the average number of occurrences is $\lambda$. Itâ€™s tempting to view a prime $p$ as being Poisson distributed $\sim \text{Poiss}(\lambda_p)$ over each $n$, with $k$ counting how many times $p$ divides $n$. Since a prime $p$ divides every $p$th numer, we expect that $\lambda_p = 1/p$. If these different prime factors are independent of each other, then the total number of factors, on average, is

\[\mathbb{E}_{m\leq n}[\Omega(m)] = \sum_{p\leq n} \lambda_p = \sum_{p\leq n} \frac{1}{p}.\]We can do this sum assuming a few results from analytic number theory.

##
*Proof.*$\,$ (Integrate by parts and use the prime
number theorem.)

For a continuously differentiable function $f(x)$ on $[\alpha, \beta]$, and a sequence $c_n$, we can integrate by parts to obtain Abel's Partial Summation Formula \[ \sum_{\alpha \leq n \leq \beta} c_n f(n) = C(\beta)f(\beta) - \int_\alpha^\beta \text{d}x \, C(x)f'(x), \tag{1}\label{abel} \] where $C(x) = \sum_{\alpha \leq n\leq x} c_n$. The Prime Number Theorem (PNT) states that the number of primes less than $n$, $\pi(n)$, has asymptotic approximation \[ \pi(x) = \sum_{2 \leq n \leq x} \delta_{\mathbb{P}}(n) \sim \frac{x}{\log x} \tag{2}\label{PNT} \] where $\delta_{\mathbb{P}}(n)$ is the indicator function on primes. Setting $c_n = \delta_{\mathbb{P}}(n)$, $f(n) = 1/n$, and noting that $C(x) = \pi(x)$, Abel summation applied to $(\ref{PNT})$ gives \begin{align*} \sum_{p \leq n} \frac{1}{p} & = \frac{\pi(n)}{n} + \int_2^n \text{d}x \, \frac{\pi(x)}{x^2} \sim \frac{1}{\log(n)} + \int_2^n \,\frac{\text{d}x}{x\log x} \sim \log \log n, \end{align*} discarding $1/\log n - \log\log 2 = o(\log\log n)$. $\blacksquare$

So, we expect that if we average $\Omega(m)$ over a large enough interval $[2, n]$, we should obtain $\log \log n$. In fact, a sum of independent Poisson-distributed variables is Poisson, since Poisson just counts the total number of events in terms of the expected number, and both the total and the expected number sum. This means that

\[\Omega(n) \sim \sum_{p \leq n}\text{Poiss}\left(\frac{1}{p}\right) = \text{Poiss}(\log \log n).\]Plotting the probability density over the top of the empirical curves, we see excellent agreement, even at relatively small numbers! It looks as if the primes are Poisson random.

If true, this has a remarkable consequence. As $n$ becomes large, $\text{Poiss}(nx)$ can be viewed as a sum of $n$ i.i.d. variables $\text{Poiss}(x)$. Since, as we said, a sum of independent Poisson variables is Poisson. By the Central Limit Theorem (CLT), this will tend to a normal distribution $\mathcal{N}(\mu, \sigma^2)$ of mean and variance $\mu=\sigma^2=nx$. For our prime divisor-counting function $\Omega(n)$, this suggests that, as $n \to \infty$,

\[\Omega(m) \sim \mathcal{N}(\log\log n, \log\log n),\]when $m \leq n$. The number of factors is Gaussian!

Letâ€™s turn now to what can be proven using cold, hard math. The PNT $(\ref{PNT})$ was rigorously derived in 1896 (by Hadamard and de la VallĂ©e Poussin independently) but in 1903, Edmund Landau gave a simpler proof and made the following generalization:

##
*Proof.*$\,$ (Remove a prime from the set, use
induction, and do a horrible integral.)

For a prime $p \leq n$, the number of sets of $k$ primes, which produce to less than $n$, is given by the number of sets of $k-1$ which produce to less than $n/p$: $$ \pi_{k}(n) = \sum_{p \leq} \pi_{k-1}\left(\frac{n}{p}\right). $$ I'll give an original but not entirely rigorous argument from induction; for a more careful proof, see Hardy and Wright (Theorem 437). The base case $k=1$ is the PNT $(\ref{PNT})$. Assume it is true for $k$; we will use this to show $k+1$. Our goal is to count all sets of $k+1$ primes whose product is less than $n$. If $p$ is in some of these sets, the remaining primes must multiply to less than $n/p$. Hence, \begin{align*} \pi_{k+1}(n) = \sum_{p \leq n} \pi_{k}\left(\frac{n}{p}\right) = \sum_{p \leq n} \frac{1}{(k-1)!}\frac{n/p}{\log(n/p)}[\log\log (n/p)]^{k-1}. \end{align*} Using Abel summation $(\ref{abel})$ in a similar fashion to earlier, we can replace summing over primes with integration weighted by $1/\log x$: \begin{align*} \pi_{k+1}(n) & \sim \frac{1}{(k-1)!}\int_2^{n/2} \text{d}x\, \frac{n/x}{\log x\log(n/x)}[\log\log (n/x)]^{k-1}\\ & = -\frac{1}{(k-1)!}\frac{n}{\log n}\int_{\log\log 2}^{\log\log (n/2)} \text{d}t\, \frac{t^{k-1}}{e^t/\log n - 1}, \end{align*} where we sum up to $n/2$ in the first integral Since if $x > n/2$, then $n/x < 2$ and hence $\pi_k(n/x) = 0$. and make the substitution $t = \log \log (n/x)$ in the second. This can be evaluated, using computer algebra or by hand, in terms of the polylogarithm $\text{Li}_j(x)$. Slogging through, we get \begin{align*} \pi_{k+1}(n) & \sim \frac{n}{\log n} \sum_{j =1}^{k}\frac{1}{(k-j)!} \left[\text{Li}_{j}\left(\frac{\log n}{\log (n/2)}\right) - \text{Li}_{j}\left(\frac{\log n}{\log 2}\right)\right]. \end{align*} The second term, at highest $j$, dominates, with \[ \pi_{k+1}(n) \sim -\frac{n}{\log n} \text{Li}_{k}\left(\frac{\log n}{\log 2}\right) \sim \frac{n}{\log n} \frac{(\log\log n)^k}{k!} \] from the large argument asymptotics of $\text{Li}_k$. This is what we wanted to show! $\blacksquare$

Now that we have an asymptotic Poisson distribution, we can show that at large $n$, the values of $\Omega(m)$ for $m \leq n$ are normally distributed. The rigorous proof of this result is called the ErdĹ‘sâ€“Kac Theorem, and its genesis is legendary. In Mark Kacâ€™s words:

*I first stated the conjecture during a lecture in Princeton in March 1939. Fortunately for me and possibly for mathematics, ErdĹ‘s was in the audience and he immediately perked up. Before the lecture was over he had completed the proof.*

ErdĹ‘s and Kacâ€™s derivation uses complicated sieve methods, and although simpler proofs now exist, we wonâ€™t bother. Instead, Iâ€™ll finish with the heuristic Itâ€™s heuristic because we havenâ€™t kept track of the errors in Landauâ€™s formula for $\pi_k(n)$. argument Kac may have presented at Princeton in 1939, since it uses Landauâ€™s Poisson primes results. This is one of the many gems in Kacâ€™s book on probabilistic methods.

##
*Proof.*$\,$ (Apply the CLT to Landau's Poisson distribution.)

Let's start by stating the CLT more precisely for a Poisson variable $X \sim \text{Poiss}(x)$. As $x \to \infty$, the CLT tells us that $Y = (X - x)/\sqrt{x} \sim \mathcal{N}(0, 1)$, and hence \[ \lim_{x\to \infty} \mathbb{P}_{X \sim \text{Poiss}}(x + \alpha\sqrt{x} < X <x + \beta\sqrt{x} ) =\mathbb{P}_{Y \sim \mathcal{N}}(\alpha < Y < \beta), \] or more explicitly, \[ \lim_{x\to\infty} \sum_{x + \alpha\sqrt{x} < k < x+ \beta\sqrt{x}} e^{-x}\frac{x^k}{k!} =\frac{1}{\sqrt{2\pi}}\int_\alpha^\beta \text{d}t\, e^{-t^2/2}. \label{CLT} \tag{5} \] We can split the count of $\Omega(m)$ into values it assumes, i.e. the number of $m \leq n$ with $\Omega(m) = k$, which is given by $\pi_k(n)$. Applying $(\ref{CLT})$ to Landau's formula $(\ref{PNT+})$, with $x=\log\log n$, we deduce \begin{align*} \lim_{n\to \infty}\frac{1}{n}K_n(\alpha, \beta) & = \lim_{n\to \infty}\sum_{x + \alpha \sqrt{x} < k < x + \beta \sqrt{x}} \frac{\pi_k(n)}{n} \\ & = \lim_{x\to\infty} \sum_{x + \alpha \sqrt{x} < k < x + \beta \sqrt{x}} e^{-x}\frac{x^{k-1}}{(k-1)!} \\ & = \frac{1}{\sqrt{2\pi}}\int_\alpha^\beta \text{d}t\, e^{-t^2/2}. \end{align*} Thus, $K_n \sim \mathcal{N}(x, x)$, and the values of $\Omega(m)$ for $m \leq n$ are normally distributed. $\blacksquare$

Kac said, rather poetically, that the primes play a game of chance. Landauâ€™s formula implies that they play the same game as gum on the sidewalk, which is less poetic, but no less true.

### References

*An Introduction to the Theory of Numbers*(1938), G. H. Hardy and E. M. Wright.*Statistical Independence in Probability, Analysis and Number Theory*(1959), Mark Kac.

## 5. The firefly effect

For a long time, Iâ€™ve been itching to make a firefly simulator, and
finally got round it! More specifically, I wanted to simulate the
famous
phase synchronization
of lightning bugs (*Photinus carolinus*) in the Great Smoky Mountains of North Carolina
and Tennessee. It happens during mating season, once a year, and is so
popular there is a lottery for park entrance permits.

The mathematics is also very beautiful.
If we think of the fireflies as oscillators, with brightness varying
sinusoidally, then some sort of interaction or *coupling* between
oscillators must cause them to synchronize.
The fireflies do more than oscillate at a single frequency, with
a â€śbrightâ€ť period of $5$ to $8$ flashes followed roughly $8$
seconds of darkness. This isnâ€™t so hard to add, but I donâ€™t do
it for simplicity.
Kuramoto (1975) introduced a cute model for coupled oscillators which
does the trick.
Say we have $N$ fireflies, $i=1, 2, \ldots, N$, with brightness
controlled by a phase $\theta_i$.
If firefly $i$ oscillates with natural frequency $\omega_i$, its phase
changes as

To measure how synchronized our fireflies are, we define the average phasor

\[re^{i\psi} = \frac{1}{N}\sum_{j=1}^N e^{i\theta_j}\]with *coherence* $r$ and *mean phase* $\psi$.
Phases add constructively if $\theta_j = \theta$ is the
same for all $j$, with $r = 1$, but otherwise there is some
destructive intereference, and in the extreme case $r = 0$. We draw some cartoon phasors below:

Letâ€™s now think about how to couple the oscillating bugs. If firefly $i$ aligns its phase with firefly $j$, then $\theta_i$ will slow down if it is ahead of $\theta_j$ ($\theta_j - \theta_i < 0$), and speed up if it lags behind ($\theta_j - \theta_i > 0$). We can implement such a coupling by adding a term to $\dot{\theta}_i$ proportional to the difference of phases:

\[\dot{\theta}_i = \omega_i + k (\theta_j - \theta_i),\]for a coupling constant $k > 0$. This has the same mathematical form as friction. Unfortunately, it doesnâ€™t respect the $2\pi$ periodicity of the phase. But when $\theta_j \approx \theta_i$, $\sin(\theta_j - \theta_i)$ approximates a linear coupling, and this does respect periodicity. Making this replacement, and summing over all fireflies, we obtain the Kuramoto model:

\[\dot{\theta}_i = \omega_i + \frac{K}{N}\sum_{j=1}^N\sin(\theta_j - \theta_i)\]for a normalized coupling constant $K = Nk$. Below, we provide a firefly simulator, with natural frequencies $\omega_i$ uniformly randomized over $[-0.01, 0.01]$ and initial phases $\theta_i$ over $[-\pi, \pi)$. Fireflies also wander randomly in space, but this is uncorrelated with flashing. You can control the normalized coupling $K$ by clicking and dragging the mouse up and down, and the number of fireflies $N$ with the left and right arrow keys.

At zero coupling, fireflies flash in an uncorrelated way and the
coherence vanishes. Above some critical threshold, we begin to see *partial
synchronization*, and eventually $r = 1$, with all fireflies blinking in unison.
You can see this in the simulation. If youâ€™re interested, Iâ€™ve included details about the analytic solution in the large-$N$ limit below, following AcebrĂłn et al.

##
*Calculation.*$\,$ (Couple to the mean field, turn
sums to integrals, solve Fokker-Planck.)

So far, we've only used the amplitude of the phasor $r e^{i\psi}$, but the mean phase $\psi$ also has an
important role to play. In fact, it provides a mean field
description of the oscillators. To see this, mutiply the phasor by $e^{-i\theta_i}$ and take the imaginary
part:
\begin{align}
\Im[re^{i(\psi - \theta_i)}] & = r\sin(\psi - \theta_i) = \frac{1}{N}\sum_{j=1}^N \sin(\theta_j - \theta_i) =
K^{-1}(\dot{\theta}_i - \omega_i) \\ \Longrightarrow \quad
\dot{\theta}_i & = \omega_i + Kr\sin(\psi - \theta_i). \tag{5.1} \label{kuramoto-mf}
\end{align}
Thus, each oscillator tries to align itself to the mean phase
$\psi$, with a coupling determined not only by $K$ but the coherence
$r$.
Since the phasor is an average, we can explicitly view it as an
expectation value with respect to a discrete pdf $\rho$:
\[
r e^{i\psi} = \mathbb{E}_{\rho}\left[e^{i\theta}\right], \quad \rho(\theta, t) =
\int_{-\infty}^\infty \mathrm{d}\omega\,\rho(\theta, t | \omega)
\sum_{j=1}^N \frac{1}{N} \delta(\omega - \omega_j),
\]
where $\rho(\theta, t | \omega)$ governs the probability
distribution of oscillators with natural frequency $\omega$; the
initial phase remains random.
It's clear that if $\theta_\omega(t)$ is the solution to
$(\ref{kuramoto-mf})$ with initial phase $\theta = 0$, then for
some function $f$,
\[
\rho(\theta, t | \omega) = f(\theta - \theta_\omega(t)) \quad \Longrightarrow
\quad \frac{\partial \rho}{\partial t} = -\frac{\partial}{\partial\theta} \big\{\big[\omega + Kr \sin(\psi-\theta)\big]\rho\big\}. \tag{5.2}
\label{kuramoto-fp}
\]
Taking the $N\to\infty$ limit, we can replace the sum over delta
functions by a continuous distribution $g(\omega)$, but everything
else remains the same.
Then $(\ref{kuramoto-fp})$ is a diffusion-free Fokker-Planck
equation. It has two solutions. The first is coherent, with
\[
\rho(\theta, t | \omega) = \delta\left(\theta - \psi - \sin^{-1}[\omega/K r]\right)
\]
for $|\omega| \leq Kr$ and hence $|\theta -\psi| \leq \pi/2$.
These individual oscillators are phase locked, with
$\dot{\theta}=0$.
The second solution to $(\ref{kuramoto-fp})$ is obtained from
assuming a stationary state,
\[
0 = \frac{\partial}{\partial\theta} \big\{\big[\omega + Kr \sin(\psi-\theta)\big]\rho\big\}
\quad \Longrightarrow \quad \rho(\theta, t | \omega) = \frac{C}{|\omega + Kr \sin(\psi-\theta)|}
\]
for some constant $C$ determined by normalization at fixed
$\omega$.
This proves our first claim.

For the second, assume that $g(\omega)=g(-\omega)$ is even.
We can calculate $r$ in the large-$N$ limit, performing a separate
sum over wanderers and followers:
\begin{align}
r = \mathbb{E}_\rho\left[e^{i(\theta-\psi)}\right] & = I_1 + I_2 \\
I_1 & = \int_{-\pi/2}^{\pi/2} \mathrm{d}\theta \int_{-\infty}^\infty
\mathrm{d}\omega \,\, e^{i(\theta-\psi)}\delta\left(\theta - \psi -
\sin^{-1}[\omega/K r]\right) g(\omega) \\
I_2 & = \int_{-\pi}^{\pi} \mathrm{d}\theta \int_{|\omega| > Kr}
\mathrm{d}\omega \,\, e^{i(\theta-\psi)}\frac{Cg(\omega)}{|\omega + Kr \sin(\psi-\theta)|}.
\end{align}
From evenness, the integral $I_2$ vanishes, while the first reduces to
\begin{align}
r & = \int_{|\omega| \leq Kr} \mathrm{d}\omega \,\,
e^{i\sin^{-1}[\omega/Kr]} g(\omega) \\
& = \int_{|\omega| \leq Kr} \mathrm{d}\omega \,\,
\cos\left(\sin^{-1}[\omega/Kr]\right) g(\omega) \\
& = Kr \int_{-\pi/2}^{\pi/2} \mathrm{d} \phi \,\, \cos^2\phi \, g(Kr
\sin\phi),
\end{align}
where on the last line we made the change of variable $\omega = Kr
\sin\phi$. This has the trivial, incoherent solution $r = 0$, and
for $r > 0$, the nontrivial solution
\[
1 = K \int_{-\pi/2}^{\pi/2} \mathrm{d}
\phi \,\, \cos^2\phi \, g(Kr \sin\phi).
\]
Taking $r \to 0$ in the above, the integral gives $\pi/2$, so the
minimal value of coupling at which the partial synchronization
becomes possible is $2/[\pi g(0)]$, as claimed. $\blacksquare$

In our simulation, frequencies are uniformly distributed according to $g \sim \mathcal{U}(-0.01, 0.01)$. Thus, $g(0) = 1/0.02$, and for large $N$, the onset of partial synchronization should occur at $K_c = 0.04/\pi \sim 13 \times 10^{-3}$. This is reflected in the simulation! For a general uniform interval $\mathcal{U}(-L, L)$, $K_c = 4L/\pi$. We can also calculate $r$. Setting $J = Kr$ and $M = \min\{L, J\}$, we have

\[r = \int_{|\omega| \leq M} \frac{1}{2L}\mathrm{d}\omega \,\, \sqrt{1 - (\omega/J)^2} = \frac{M}{2L}\sqrt{1 - \left(\frac{M}{J}\right)^2} + \frac{J}{2L} \, \sin^{-1}\left(\frac{M}{J}\right). \tag{5.3} \label{kuramoto-r}\]When $M = J$ this gives us no information about $r$, but at large coupling, $J \gg L = M$, keeping only first-order terms in $L/J$ gives

\[r \approx \frac{1}{2} + \frac{J}{2L} \sin^{-1} \left(\frac{L}{J}\right) \approx \frac{1}{2} + \frac{1}{2} = 1.\]This is what we expect! In general, when $M = L$ we can numerically solve $(\ref{kuramoto-r})$ for $r$. For instance, when $K = 20 \times 10^{-3}$, $r \approx 0.95$. This doesnâ€™t quite match the simulation because of finite $N$ corrections. You can read more about these in Strogatz and Mirollo (1991). They also add noise to the model and do a stability analysis, but I wonâ€™t go into that here.

The usual image of chaos is the *butterfly effect*, where systems become
unpredictable due to sensitivity on initial conditions. Put
differently, we start in a small patch of phase space, and time
evolution spreads this out to a large patch (when
suitably coarse grained). The â€śfirefly effectâ€ť is a very different aspect of chaos: predictable
behaviour emerges spontaneously, and the emergent behaviour is largely *insensitive* to
initial conditions. This takes us from a large patch to a small patch,
so itâ€™s like the butterfly effect in reverse!

### References

- â€śThe Kuramoto modelâ€ť (2005), Juan AcebrĂłn, L. Bonilla, Conrad PĂ©rez Vicente, FĂ©lix Ritort and Renato Spigler.
- â€śStability of Incoherence in a Population of Coupled Oscillatorsâ€ť (1991), Steven Strogatz and Renato Mirollo.

## 6. Quantum error correction from Penrose tilings

Penrose tilings are deliciously weird tessellations that cover the
plane, but *aperiodically*: they never repeat themselves.
Itâ€™s built out of two rhombi called a â€śkiteâ€ť and a â€śdartâ€ť (purple and
blue below), with thin angles $2\pi/5$ and $\pi/5$ respectively.
To build a tiling, we join kites and darts according to a matching
rule for edges, indicated by a single or double arrowhead below:

There are uncountably many ways to perform the matching, since most of the time we can join a kite or a dart to a given edge. Despite this, Penrose tilings are indistinguishable unless we look at an infinitely large area! We can also recover any finite â€śholesâ€ť. Formally:

- (
*Indistinguishability*) Any finite tiling patch $T$ is contained in*every*Penrose tiling. This result can be strengthened to the statement that the relative spatial frequency of a patch $T$ is also independent of the tiling. - (
*Recoverability.*) The pattern in a finite region $K$ is uniquely determined by the pattern in $K^c$.

Both results are straightforward to prove using the toolkit of aperiodic tilings.

##
*Proof.*$\,$ (Dualize matching edges to bars, and iterately
group/ungroup rhombi.)

We will first show *recoverability*, i.e. that the pattern in
a finite region $K$ can be recovered from the pattern in $K^c$.
The first step is to replace the edge matching with a "dual"
matching of bars, as depicted below right. It's easy to see contiguity of bars is equivalent to matching
edges. However, since the bars "reflect" off edges, when joined they
form straight lines through the tiling called *Ammann lines*.

*local indistinguishability*, namely that a finite patch of tiling $T$ is contained in every full Penrose tiling. The basic idea is to use

*inflation*and

*deflation*. Inflation is a way of splitting the kite and dart into smaller half-kites and half-darts. We then then complete these half-rhombi into full rhombi, as below, and rescale by $(1+\sqrt{5})/2$ to obtain a new, and unique, Penrose tiling. You can check from the edge matching rules that adjacent rhombi are inflated in a consistent way. Although we won't prove it, this process is reversible, so any tiling may be uniquely

*deflated*by running the dissection backwards. This gives an easy proof of indistinguishability. Given any finite patch of tiling $T$, we can deflate it $d$ times until it is contained in a single vertex configuration $v$, meaning a vertex and the rhombi that come together at that vertex. There are only a finite number of such vertex arrangements, and each appears in any valid Penrose tiling $T'$. In particular, if we start with a valid tiling and deflate $d$ times, it contains $v$, and hence on inflation $d$ times, we conclude that $T'$ contains the finite patch $T$. This argument can be extended to show that the relative spatial frequency of any pattern $T$ is indendent of the full Penrose tiling, a fact we will use below. $\blacksquare$

For further details I've omitted, including proof that deflation works, and the analysis of vertex arrangements, check out this site.

These properties tell us that information about the tiling is encoded
globally, but in a redundant way.
Li and Boyle (2023)
observe that this looks a lot like a quantum error-correcting
code (QECC), and take the further step of turning it into one!
If it QECCs like a duckâ€¦
A QECC is a way of redundantly encoding quantum information in a
larger Hilbert space so that some ways of corrupting the state can be corrected.
More precisely, if $\mathcal{H}_0$ is a Hilbert space of *logical
states*, we embed it as a *code subspace* $\mathcal{C}$ inside a
larger â€śphysicalâ€ť Hilbert space $\mathcal{H}$.

To turn logical states into physical states we need an *encoding unitary* $\mathcal{U}: \mathcal{H}_0 \to
\mathcal{C}$, some *error
channels* $\mathcal{E}_i: \mathcal{H} \to \mathcal{H}$ which corrupt the
states, and a *recovery channel* $\mathcal{R}: \mathcal{H} \to
\mathcal{H}$ which undoes errors.
A simpler way to say this is that logical states remain
distinguishable after errors. In quantum mechanics,
distinguishability is measured by the inner product, so for any code
states $|u\rangle, |v\rangle \in \mathcal{C}$ and any correctable
error channels $\mathcal{E}_i, \mathcal{E}_j$, we must have
We use the trace since errors may make pure states mixed.

where the constant $C_{ij}$ depends only on the errors and not the states. With all this setup, we can finally show that Penrose tilings naturally give rise to an unusual QECC.

##
*Proof.*$\,$ (Take an overlap of states in the code space and
use recoverability/indistinguishability.)

Consider a state in the subspace, decomposed over $K^c \otimes K$: \[ |\overline{T}\rangle = \int \mathrm{d}g |gT\rangle_{K^c}|gT\rangle_K, \quad \mathcal{E}|\overline{T}\rangle = \int \mathrm{d}g |gT\rangle_{K^c}\mathcal{E}|gT\rangle_K. \] The overlap of two states corrupted on $K$ is \[ \langle\overline{T}|\mathcal{E}^\dagger \mathcal{E}'|\overline{T}'\rangle = \int \mathrm{d}g' \, \mathrm{d}g\, \langle g T| g'T'\rangle_{K^c} \langle gT|_K \mathcal{E}^\dagger \mathcal{E}' |g'T'\rangle_K. \] By recoverability, $gT$ and $g'T'$ are determined by patterns on $K^c$, so the first overlap becomes an indicator function $\langle g T| g'T'\rangle_{K^c} = \delta_{gT, g'T'} = \delta_{T, g^{-1}g'T'}$. By indistinguishability, the states $|gT\rangle_K$ in the second overlap simply index patterns on $K$, with relative frequency independent of $T$ and $T'$. Thus, setting $g'' = g^{-1}g'$, we get \[ \langle\overline{T}|\mathcal{E}^\dagger \mathcal{E}'|\overline{T}'\rangle = \int \mathrm{d}g'' \, \delta_{T, g''T'} \int \mathrm{d}g \, \langle gT|_K \mathcal{E}^\dagger \mathcal{E}' |gT'\rangle_K = \langle \overline{T}|\overline{T}'\rangle C(\mathcal{E}, \mathcal{E}'), \] where $C(\mathcal{E}, \mathcal{E}')$ is a constant depending on the errors but not the states. Thus, the Penrose tiling is a QECC. $\blacksquare$

This is very cool! But it raises the question: can we go backwards, and obtain a tiling from an erasure code? To make this more concrete, consider the smallest quantum erasure code, introduced by Grassl, Beth and Pillizzari (1996). This encodes two logical qubits using four qubits. We can write

\[\begin{align*} \mathcal{U}|00\rangle &= \frac{1}{\sqrt{2}}(|0000\rangle + |1111\rangle) \\ \mathcal{U}|01\rangle &= \frac{1}{\sqrt{2}}(|0110\rangle + |1001\rangle) \\ \mathcal{U}|10\rangle &= \frac{1}{\sqrt{2}}(|1010\rangle + |0101\rangle )\\ \mathcal{U}|11\rangle & = \frac{1}{\sqrt{2}}(|1100\rangle + |0011\rangle). \end{align*}\]This can correct an arbitrary error on a single qubit at a known location. In this case, there is a natural geometric interpretation. Notice that each code state consists of a $4$-qubit basis state, like $|0000\rangle$ and its â€śantipodalâ€ť opposite, $|1111\rangle$. These are opposite corners of a tesseract. These corners are always at least two edges away from each other, since the coordinates have even parity. Erasing a qubit corresponds to erasing one dimension of the tesseract, but the missing information can always be recovered by an antipodal flip, providing we know which part was erased. I suspect this is related to tilings by Minkowskiâ€™s conjecture.

This construction also connects to an approach to quantum
gravity called â€śholographyâ€ť or the
AdS/CFT correspondence.
It has been known for some time that, in many ways,
AdS/CFT behaves like a QECC, with physical information
encoded redundantly on the *asymptotic boundary* of spacetime.
In addition, the physical objects in gravity are *diffeomorphism
invariant*, so $|T\rangle$ should treated the same way as
$|gT\rangle$.
This is implemented directly by our code space.

For Penrose tilings, we donâ€™t even need the full complement $K^c$ to
determine $K$; any *boundary neighbourhood* $\partial K$ of $K$ (red above) will do!
We can extend the Ammann lines and infer the tiling, as before.
Itâ€™s cute that this theory is â€śholographicâ€ť in the sense that
what happens inside $K$ is determined by the boundary.
Moreoever, inflation and deflation look a lot like *renormalization*, also
expected to play a role in quantum gravity. From recreational
mathematics to quasicrystals to quantum gravity, Penrose tilings continue
to surprise!

### References

- â€śCodes for the quantum erasure channelâ€ť (1996), M. Grassl, Th. Beth and T. Pellizzari.
- â€śThe Penrose tiling is a quantum error-correcting codeâ€ť (2023), Zhi Li and Latham Boyle.