Tiny Dog: A Game of Extremes

June 13, 2026. The game of spotting maximally tiny dogs turns into a probability tutorial.

Introduction
Minimal values
Poisson processes
Strategy
Bayesian decision theory
Shape parameters
Putting it all together

1. Introduction

My wife and I play a game called Tiny Dog. When you spot a small dog, if you think it is the smallest you will see that day, you declare “tiny dog!”, with precisely one guess per day. The winner is of course the person who spots the smallest dog. As determined by subjective, variable-precision and often highly contested measurements. The strategy is subtle. If the other person (“first spotter”) has made their guess and you haven’t (“second spotter”), then you simply observe as many dogs as possible and hope for a smaller one. But if they haven’t, going first is a bet involving the distribution of dog sizes and the amount of time left in the day. Together, these suggest an instructive exercise in extreme values, point processes and Bayesian decision theory!

2. Minimal values

Consider a set of real-valued observations $X_1, X_2, \ldots, X_n \in \mathbb{R}$ of iid random variables $X_i \sim \mathcal{D}$. The minimum value $m_n = \min_i X_i$ is the smallest observed value in the batch. Recall that the cumulative distribution function (cdf) of a random variable $X \sim \mathcal{D}$ is the probability $X \leq x$:

\[F_{\mathcal{D}}(x) = \mathbb{P}_{X \sim \mathcal{D}}[X \leq x] = \int_{-\infty}^x f_{\mathcal{D}}(t) \, \mathrm{d}t,\]

where $f_{\mathcal{D}}(t)$ is the probability density function (pdf) associated with $\mathcal{D}$. Note that

\[\mathbb{P}_{X \sim \mathcal{D}}[X > x] = 1 - \mathbb{P}_{X \sim \mathcal{D}}[X \leq x] = 1 - F_{\mathcal{D}}(x).\]

The probability $m_n \geq m$ is the probability that each $X_i > m$, and hence the cdf (denoted $M_{n,\mathcal{D}}$) of $m_n$ obeys

\[M_{n,\mathcal{D}}(m) = \mathbb{P}_{X \sim \mathcal{D}}[m_n \leq m] = 1 - (1 - F_{\mathcal{D}}(m))^n.\]

It’s tempting to go to the asymptotic limit in $n$ here, which leads to extreme value theory. But the number of dogs observed is small and variable in important ways, so we will keep $n$ finite.
Concretely, dog size is not normal but closer to log normal $\log\mathcal{N}$, i.e., the logarithm of size is normally distributed. This has cdf

\[\Phi(x) = \frac{1}{2}\left[1 + \text{erf}\left(\frac{\ln x - \mu}{\sqrt{2}\sigma}\right)\right], \quad \text{erf}(z) = \frac{2}{\sqrt{\pi}}\int_0^z e^{-t^2} \, \mathrm{d}t,\]

for Gaussian mean $\mu$ and variance $\sigma^2$. Hence,

\[M_{n}(m) = 1 - (1 - \Phi(m))^n.\]

Since the game focuses on tiny dogs, we care about dogs far from the mean. The gorgeous asymptotic formula

\[1 - \text{erf}(x) = \frac{e^{-x^2}}{x\sqrt{\pi}}\sum_{k\geq 0} (-1)^k \frac{(2k-1)!!}{(2x^2)^k}\]

yields (from its first term) the approximation

\[1 - M_n(m) \approx \left[\frac{e^{-x^2}}{2x\sqrt{\pi}}\right]^n, \quad x = \frac{\ln m - \mu}{\sqrt{2}\sigma}.\]

The log average of dog size is around $\mu = 3 \text{ log kg}$ (corresponding to $e^3 \approx 20 \text{ kg}$) and standard deviation $\sigma = 1 \text{ log kg}$.

3. Poisson processes

For the process of observing dogs, we assume that every interval of the same length is just as likely to yield an observation, with a mean of $\lambda$ dogs per unit time. If $T$ is the total length of the day and $t$ the elapsed time, the average number of dogs to be observed is $n = \lambda (T - t)$. Obviously, we are ignoring temporal factors (e.g. owners like to walk dogs at certain times of day) and geographic factors (e.g. dog parks are more likely to contain dogs)! Taking just the temporal factors into account is easily accomplished; we just make $\lambda(t)$ time-dependent. The resulting stochastic process is called a Poisson point process, and if the form of $\lambda(t)$ is known (“seasonality” over course of day) the remaining number can be computed as

\[n = \int_{T-t}^T \lambda(t') \, \mathrm{d}t'.\]

The constant $\lambda$ process is called homogeneous. Returning to the problem at hand, suppose the first spotter sees a dog of size $m$. They must compute two things: first, the average number of dogs remaining to be observed for the day, $n = \lceil \lambda (T-t)\rceil$ or the full Poisson point process expression; second, the probability $M_{n,\mathcal{D}}(m)$ that a dog smaller than that will be observed. They set some threshold probability $\alpha$ such that, if

\[M_{n,\mathcal{D}}(m) \leq \alpha,\]

i.e., the probability of a smaller dog is below the threshold, then they call it. The first calculation is relatively easy for the homogeneous process if you know $\lambda$ and have a watch; approximate values in our neighbourhood are one dog every five minutes outside, so $\lambda \sim 3.3 \text{ mHz}$.

4. Strategy

It’s clear that smaller $\alpha$ is better for the first spotter: it is literally the probability of losing if you are the first spotter. The “if” is important; the problem is that the smaller $\alpha$, the longer on average you must wait to see a dog of that size, giving your opponent time to spot a larger dog. We will ignore the role of attentiveness for simplicity. Choosing a large $\alpha$ gives you a greater likelihood of being first spotter, but obviously increases your chance of losing as first spotter. Without loss of generality, suppose we are playing and pick a threshold $\alpha$, with our opponent selecting a threshold $\beta \sim \mathcal{T}$ at random. Since $\alpha > \beta$ just in case we are first spotter, the probability we win is then

\[\mathbb{P}[\text{win}] = (1-\alpha)\int_{-\infty}^{\alpha} f_\mathcal{T}(\beta)\, \mathrm{d}\beta + \int_{\alpha}^\infty f_\mathcal{T}(\beta) \beta\, \mathrm{d}\beta,\]

since $1 - \alpha$ is the probability we win as first spotter, and $\beta$ is the probability the other player loses as first spotter. Differentiating with respect to $\alpha$ and setting to zero gives the condition

\[0 = g(\alpha^\ast) = F_\mathcal{T}(\alpha^\ast) - (1- 2\alpha^\ast)f_\mathcal{T}(\alpha^\ast).\]

If $f_\mathcal{T}(0) = 0$, then $\alpha = 0$ is a solution, and we recover the strategy to decrease $\alpha$ we observed earlier. Otherwise, a nonzero solution exists by virtue of the intermediate value theorem, since

\[g(0) = -f_\mathcal{T}(0) < 0, \quad g(1) = 1 + f_\mathcal{T}(1) > 0.\]

For a concrete distribution $\mathcal{T}$, we can attempt to solve this analytically or numerically. A natural choice is the beta distribution $\text{Beta}(a, b)$ with pdf and cdf

\[f_{\text{Beta}}(x) = \frac{x^{a - 1}(1 - x)^{b - 1}}{B(a, b)}, \quad B(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)}, \quad F_{\text{Beta}}(x) = I_x(a, b)\]

for $x \in [0, 1]$, $\Gamma(z)$ the Gamma function, and $I_x$ the regularized incomplete beta function. Then the optimal first spotting probability obeys

\[I_{\alpha^\ast}(a, b)B(a, b) = (1- 2\alpha^\ast)\alpha^{\ast(a - 1)}(1 - \alpha^\ast)^{b - 1},\]

which can be solved numerically but not analytically in general.

5. Bayesian decision theory

We’ve neglected an important strategic element: every dog your opponent does not call is a piece of information, namely, $\beta < \beta’$ where $\beta’$ is the probability of seeing a dog smaller than that just observed. This suggests we treat $\mathcal{T}$ as a Bayesian prior. The posterior density when your opponent passes some number of dogs, minimum size $\beta_\min$, is given by Bayes’ law:

\[f_{\mathcal{T}}(\beta | \beta_\min) = \frac{f_{\mathcal{T}}(\beta)}{F_{\mathcal{T}}(\beta_\min)},\]

for $\beta < \beta_\min$, and vanishes otherwise, while the cdf is modified to

\[F_{\mathcal{T}}(\beta | \beta_\min) = \int_{-\infty}^\beta \frac{f_{\mathcal{T}}(t)}{F_{\mathcal{T}}(\beta_\min)}, \, \mathrm{d}t = \frac{F_{\mathcal{T}}(\beta)}{F_{\mathcal{T}}(\beta_\min)}.\]

For the concrete case of the beta distribution, the nonvanishing part is:

\[f_{\text{Beta}}(\beta | \beta_\min) = \frac{\beta^{a - 1}(1-\beta)^{b - 1}}{B(a, b) I_{\beta_\min}(a, b)}, \quad F_{\text{Beta}}(\beta | \beta_\min)=\frac{I_{\beta}(a, b)}{I_{\beta_\min}(a, b)}\]

Interestingly, the optimality condition is unchanged for $\alpha < \beta_\min$! This is because the factor of $I_{\beta_\min}(a, b)$ drops out on both sides. This leads to two phases of optimal play:

Stationary phase: Calculating the global optimal $\alpha^\ast$ one based on the initial prior. Use this threshold as long as $\beta_\min > \alpha^\ast$.
Boundary phase: Once $\beta_\min \leq \alpha^*$, “snap” to just below $\beta_\min$.

By the way, you might wonder if we should be playing Bayesian game theory, but here, there is no real sequential decision making; either you’re the first spotter (which is a decision process, essentially) or you are the second spotter and your strategy is completely dictated.

6. Shape parameters

You will have noticed that we fixed the “shape parameters” $a, b$ in the beta distribution; that makes sense for a single game, but we can update those parameters over the course of multiple games. A reasonable initial choice of distribution is skewed towards small values of $\beta$ with a long tail, e.g. $a = 1$ and $b > 1$. This has mode $\beta = 0$; this is a feature or a bug depending on the play style of your opponent. The cdf has a particularly simple form:

\[I_x(1, b) = \frac{\Gamma(b+1)}{\Gamma(b)\Gamma(1)}\int_{0}^x (1-t)^{b-1} \mathrm{d}t = b\int_{1-x}^{1} s^{b-1} \mathrm{d}s = 1- (1 - x)^b.\]

The equation for optimal $\alpha^\ast$ becomes

\[1 - (1- \alpha^\ast)^b = b(1- \alpha^\ast)^{b+1}.\]

Taking logs gives

\[b \log(1 - \alpha^\ast) + \log(1 + b - b\alpha^\ast) = 0\]

and assuming $\alpha^\ast$ is small, the Taylor expansion of $\log$ gives

\[-b\alpha^\ast + \log(1 + b - b\alpha^\ast) = 0.\]

If $b \gg 1$, we can set $\log(1 + b - b\alpha^\ast) \approx \log b$ and hence

\[-b\alpha^\ast + \log b = 0 \quad \Longrightarrow \quad \alpha^\ast \approx \frac{\log b}{b}.\]

This is eminently computable, though at small $b$ it overshoots, e.g. by roughly $25\%$ at $b = 6$. Higher order corrections are possible but left to the reader. Note that, for $a = 1$ and $b > 1$, the mean of the beta distribution is

\[\kappa = \frac{a}{a + b} = \frac{1}{1+b}.\]

This gives a simple update rule when opponent was first spotter: modify $b$ to match the observed mean $\beta$, or in equations,

\[b = \frac{N}{\sum_{i=1}^N\beta_i} - 1 = \langle \beta\rangle^{-1} - 1.\]

When you are first spotter, you still get information about the opponent’s distribution, but the update is more delicate and we defer it to an appendix (to be added later).

7. Putting it all together

So, let’s summarize how to play in the simple case of a homogeneous point process and the various approximations we’ve used above:

Before game: Set the optimal threshold $\alpha^\ast = \log b/b$ based on opponent prior.
During game: When dog is encountered,
- Compute expected number of dogs left to observe, $n = \lambda(T - t)$.
- If $(e^{-x^2}/2x\sqrt{\pi})^n \geq 1 - \alpha^\ast$, call it (stationary phase.)
- If your opponent doesn’t call it, update $\alpha^\ast \leftarrow \beta_\min - \epsilon$ (boundary phase).
- If opponent calls it, look for and call smaller dogs.
Following game: Update $b\leftarrow \langle \beta\rangle^{-1} - 1$ (they spot first) or via nonexistent appendix.

Thus, we learn that behind the seemingly trivial game of spotting tiny dogs, there is a wealth of subtlety: from finite-$n$ extreme values to Poisson point processes to Bayesian decision-making on the theory side, and on the practical side, a panoply of approximations needed for optimal play. The lesson is that as you look harder at things, they tend to get deeper; so don’t look too hard unless you’re prepared to blog about it!

Appendix A. Log-likelihood update

To write!

The update rule in §6 applies only when the opponent spots first. When you spot first, information lies in silence, i.e., every dog they passed exceeded their threshold, leaving only the one-sided bound

\[\beta < \beta_{\min}.\]

This is a censored observation, with likelihood equal to mass below the bound:

\[\mathbb{P}[\beta < \beta_{\min}] = F_\mathcal{T}(\beta_{\min}) = 1 - (1 - \beta_{\min})^b.\]

Writing $\beta_i$ for the thresholds seen in opponent-first games and $\beta_{\min, j}$ for the bounds from our own, the log-likelihood for $b$ is

\[\log \mathcal{L}(b) = \sum_i \big[\log b + (b-1)\log(1 - \beta_i)\big] \;+\; \sum_j \log\!\big[1 - (1 - \beta_{\min, j})^b\big].\]

The first sum is agreeable, since $\text{Beta}(1, b)$ is an exponential family with sufficient statistic $-\log(1 - \beta)$. A $\text{Gamma}(\alpha, \lambda)$ prior on $b$ is conjugate, and each opponent-first game updates as

\[\text{Gamma}(\alpha, \lambda) \leftarrow \text{Gamma}\big(\alpha + 1,\; \lambda - \log(1 - \beta_i)\big),\]

with point estimate $\hat b = \alpha / \lambda$. This is distinct from the mean update rule in §6. The second sum is the delicate part, since $1 - (1 - \beta_{\min, j})^b$ is not log-linear in $b$ and so spoils conjugacy. One either folds it into a posterior carried on a grid of $b$, or solves the score equation

\[\frac{f}{b} - \sum_j \frac{(1 - \beta_{\min, j})^b\,\log(1 - \beta_{\min, j})} {1 - (1 - \beta_{\min, j})^b} \;=\; -\sum_i \log(1 - \beta_i)\]

numerically, with $f$ the number of opponent-first games. Each first-spotter game adds a positive term on the left and so raises $\hat b$. This is sensible, since spotting first is itself evidence the opponent’s threshold sits low. –>

Written on June 13, 2026

𝔥𝔢𝔭𝔱𝔞𝔯۰𝔠𝔥