Probability with Martingales by David Williams\[ \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\Z}{\mathbb Z} \newcommand{\N}{\mathbb N} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Norm}[1]{\left \lVert #1 \right \rVert} \newcommand{\Abs}[1]{\left \lvert #1 \right \rvert} \newcommand{\pind}{\bot \!\!\! \bot} \newcommand{\probto}{\buildrel P\over \to} \newcommand{\vect}[1]{\boldsymbol #1} \DeclareMathOperator{\EE}{\mathbb E} \DeclareMathOperator{\PP}{\mathbb P} \DeclareMathOperator{\E}{E} \DeclareMathOperator{\dnorm}{\mathcal N} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Leb}{Leb} \DeclareMathOperator{\Bin}{Bin} \newcommand{\wto}{\buildrel w\over \to} \] Problem 18.1
\[ \begin{equation} \lim_{n\to \infty} \Pr\left(\frac{ X_1+X_2\dots +X_n} n \leq x\right) = \frac 1 2 + \pi^{-1} \arctan x \end{equation} \] where \(\arctan \in (-\frac \pi 2, \frac \pi 2)\) Consider the Bernoulli random variable \(Z\) with parameter \(p\) \[ \begin{equation} \phi_Z(\theta) = \E e^{i\theta Z} = p e^{i \theta\cdot 1} + q e^{i \theta\cdot 0} = p (e^{i\theta} -1)+ 1 \end{equation} \]
\[ \begin{equation} \phi_{F_n}(\theta) = \left( p e^{i\theta} + q \right)^n = \left( 1+\frac \lambda n (e^{i\theta} -1) \right)^n \to \exp( \lambda( e^{i\theta}-1)) \end{equation} \] On the other if \(F\) has a Poisson distribution with parameter \(\lambda\), its CF is given by \[ \begin{equation} \phi_F(\theta) = \E e^{i\theta F} = \sum_{k=0}^\infty e^{i\theta k} e^{-\lambda} \frac{\lambda^k}{k!}= \exp(\lambda e^{i\theta} - \lambda) \end{equation} \] which is exactly the same as \(\lim_{n\to \infty}\phi_{F_n}(\theta)\). Thus we conclude that \(F_n \to F\) in weakly in distribution.
\[ \begin{equation} \phi_X(\theta) = \E e^{i\theta X} = \int_\R e^{i\theta x} \frac{1-\cos x}{\pi x^2}\, dx \end{equation} \] To this end first let's compute for \(k\in \N\) and \(\alpha>0\) \[ \begin{equation} I_k(\alpha) = \int_\R x^{-k} e^{i \alpha x}\, dx \end{equation} \] (Aside: there's something subtle going on here, because, like the integral in 16.1, this is not integrable in the \(L^1\) sense). As in problem 16.1, take a contour along the real axis from \(-R\) to \(-\epsilon\), a semicircle in the upper half plane from \(-\epsilon\) to \(\epsilon\), a line from \(\epsilon\) to \(R\), and then a semicircle in the upper half plane back to \(-R\). We use a variation of the residue theorem \(\int_{\gamma’} \frac{f(z)}{(z-z_0)^k}\, dz = \frac{i \Delta \theta }{(k-1)!} f^{(k-1)}(z_0)\), and \(\Delta \theta\) is the radians in the infinitesimal path. \[ \begin{equation} \oint_\gamma \frac{e^{i \alpha z}}{z^k} \, dz = \pi i \frac{ (i\alpha)^{k-1} }{(k-1)!} \end{equation} \] Using the technique of 16.1, we get an estimate of the integral along the big semicircle \[ \begin{equation} \int_{\gamma’’} \frac{e^{i \alpha z}}{z^k} \, dz = \int_0^\pi \frac{ e^{- R \alpha \sin \theta}}{R^{k-1}} \, d\theta \to 0 \qquad \text{ as } R\to \infty \end{equation} \] Since \(\abs{x^{-k}}\leq \abs{x^{-1}}\) on the big semicircle, the estimate for the contribution of this segment holds. Therefore as \(R\to \infty\) and \(\epsilon\to 0\) we get \[ \begin{equation} \oint_\gamma \frac{e^{i \alpha z}}{z^k} \, dz \to I_k(\alpha) \end{equation} \] When \(\alpha <0\) we can modify the argument by taking semicircles in the lower half plane. This allows us to conclude the integral vanishes along the big semicircle, but now the sense of integration along the small semicircle is reversed to clockwise from counterclockwise. We can summarize both cases as \[ \begin{equation} I_k(\alpha) = \frac{ \pi (i \alpha)^k }{(k-1)! \abs{\alpha}} \end{equation} \] This gives \(I_1(\alpha) = \pi i \sgn(x)\) and \(I_2(\alpha) = -\pi \abs{\alpha}\). Expressing \(\cos x = \frac 1 2 (e^{ix} + e^{-ix})\), our CF can be written \[ \begin{equation} \begin{split} \phi_X(\theta) &= \frac 1 \pi \left( I_2(\theta) - \frac 1 2 ( I_2(\theta+1) + I_2(\theta-1) ) \right)\\ &= \frac 1 2 \left( \abs{\theta+1} + \abs{\theta-1} \right) - \abs{\theta} \\ &= \begin{cases} 1-\abs{\theta} & \abs{\theta} \leq 1 \\ 0& \text{otherwise} \end{cases} \end{split} \end{equation} \] Thus \(\phi_X\) is a kind of a witch-hat centered at 0. Like in 16.3, for \(S_n = X_1+\dots + X_n\), \[ \begin{equation} \phi_{S_n/n}(\theta) =\phi_{X}(\theta/n)^n = \begin{cases} \left(1-\frac{\abs{\theta}} n \right)^n & \text{if } \frac{\abs{\theta}} n \leq 1\\ 0 & \text{otherwise} \end{cases} \end{equation} \] For any fixed \(\theta\), for large enough \(n\), \(\abs{\theta}/n \leq 1\) so the first case applies eventually. Taking the limit as \(n\to \infty\) this implies \[ \begin{equation} \phi_{S_n/n}(\theta) \to e^{-\abs{\theta}} \end{equation} \] This is the CF of the Cauchy distribution, \(S_n/n\) converges in distribution to the Cauchy distribution. We conclude \[ \begin{equation} P(S_n/n \leq x) \to \int_{-\infty}^x \frac{dx}{\pi(1+x^2)} = \frac 1 2 +\pi^{-1} \arctan(x) \end{equation} \] Problem 18.2
Prove the weak law of large numbers in the following form. Suppose \(X_1,X_2,\dots\) are i.i.d. RV each with the same distribution as \(X\). Suppose \(X\in \mathcal L^1\) and that \(\E X = \mu\). Prove by use of CF's that the distribution of \[ \begin{equation} S_n = n^{-1}(X_1+\dots+X_n) \end{equation} \] converges weakly to the unit mass at \(\mu\). Deduce that \[ \begin{equation} S_n \to \mu \text{ in probability} \end{equation} \] Of course, SLLN implies this weak law. Let \(Z \sim \delta_\mu\) be equal to \(\mu\) almost surely. Then \[ \begin{equation} \phi_Z(\theta) = \E e^{i\theta Z} = e^{i \theta \mu} \end{equation} \] Now consider the Taylor expansion of \(\phi_X(\theta)\). Note that \[ \begin{equation} \phi_X(0) = 1 \qquad \phi_X’(0) = \left. \E i X e^{i\theta X} \right|_{\theta=0}= i \mu \end{equation} \] Therefore \[ \begin{equation} \log \phi_{X}(\theta) = 1 + i \mu \theta + o(\theta) \end{equation} \] So the characteristic function satisfies \[ \begin{equation} \phi_{S_n}(\theta) = \phi_X(\theta/n )^n = \left(1+ \frac{i \mu \theta } n + o\left( \frac \theta n\right) \right)^n \to \exp(i\theta \mu ) = \phi_Z(\theta) \end{equation} \] This implies that \(S_n \to \delta_\mu\) in distribution. However, for a unit mass, convergence in distribution is the same as convergence in probability since for any \(\epsilon>0\) \[ \begin{equation} \Pr( S_n < \mu-\epsilon) \to \Pr( Z < \mu-\epsilon ) = 0 \qquad \Pr( S_n > \mu+\epsilon) \to \Pr( Z > \mu+\epsilon ) = 0 \end{equation} \] and hence \(\Pr( \abs{S_n-\mu}>\epsilon) \to 0\) for any \(\epsilon>0\). This proves the weak law of large numbers. Weak Convergence for \(\Pr[0,1]\)Problem 18.3
Let \(X\) and \(Y\) be RV's taking values in \([0,1]\). Suppose that \[ \begin{equation} \E X^k = \E Y^k \qquad k=0,1,2,\dots \end{equation} \] Prove that
Item (i) follows from linearity of expectations. If \(p(x) = \sum_{k=0}^n c_k x^k\) then \[ \begin{equation} \E p(X) = \sum_{k=0}^n c_k \E X^k = \sum_{k=0}^n c_k \E Y^k = \E p(Y) \end{equation} \] Item (ii) follows from the Weierstrass theorem. On the compact set \([0,1]\), every continuous function \(f\) can be approximated uniformly by a sequence of polynomials \(p_n\). Choose \(N\) large enough so that for all \(n>N\), \(\abs{f(x)-p_n(x)}<\epsilon\) for all \(x\). In this case \[ \begin{equation} \Abs{\E f(X) - \E p_n(X)} \leq \E \abs{f(X)-p_n(X)} \leq \epsilon \end{equation} \] Therefore \[ \begin{equation} \begin{split} \Abs{ \E f(X) - \E f(Y)} &\leq \Abs{\E f(X) - \E p_n(X)} + \Abs{\E p_n(X) - \E p_n(Y)} + \Abs{\E p_n(X) - \E f(X)}\\ &\leq \epsilon + 0 + \epsilon = 2\epsilon \end{split} \end{equation} \] Since \(\epsilon\) is arbitrary, this shows \(\E f(X) = \E f(Y)\). Approximate \(I_{[0,\alpha]}\) by the continuous peicewise linear function \[ \begin{equation} f_n(x) = \begin{cases} 1 & x \in [0, \alpha )\\ n\left(\alpha - x\right) & x \in [ \alpha, \alpha + \frac 1 n ) \\ 0 & x \in [ \alpha + \frac 1 n, 1] \end{cases} \end{equation} \] Using the dominated convergence theorem \[ \begin{equation} \lim_{n\to \infty} \E f_n(X) = \E I_{[0,\alpha]}(X) = \Pr\left( X\leq \alpha \right) \end{equation} \] The same holds for \(Y\). However, \(\E f_n(X) = \E f_n(Y)\) for all \(n\) by part (ii). Therefore the limits are equal and \(\Pr( X\leq \alpha) = \Pr(Y\leq \alpha)\). Thus the moments unique determine the distribution on \([0,1]\). Problem 18.4
Suppose that \((F_n)\) is a sequence of DFs with \[ \begin{equation} F_n(x) = 0 \text{ for } x<0, \qquad F_n(1)=1,\qquad \text{ for every } n \end{equation} \] Suppose that \[ \begin{equation} m_k = \lim_n \int_{[0,1]} x^k dF_n \text{ exists for } k=0,1,2,\dots \end{equation} \] Use the Helly-Bray Lemma and 18.3 to show that \(F_n\wto F\), where \(F\) is characterized by \(\int_{[0,1]}x^k \, dF = m_k, \forall k\) Let \(F_n\) be associated with the measure \(\mu_n\). Suppose a subsequence \(F_{n_i}\wto F\) converges in distribution. By Helly-Bray there is a subsequence \(n_i\) such that \(F_{n_i} \wto F\), and call the associated measure \(\mu\). By the definition of weak convergence since \(x^k\) is a continuous function on \([0,1]\), and since \(n_i\) is a subsequence of \(n\) \[ \begin{equation} m_k = \lim_n \mu_n(x^k) = \lim_{n_i} \mu_{n_i}(x^k) = \mu(x^k) = \int_{[0,1]} x^k\, dF \end{equation} \] If another subsequence \(n’_i\) satisfies \(F_{n’_i}\wto F'\) with associated measure \(\mu'\), then 10.3 implies that \(F’=F\) since the above argument shows \(\mu’(x^k) = m_k \) as well, so \(\mu’(x^k)= \mu(x^k)\) for all \(k\). This implies that \(F_n \wto F\). For suppose there is some continuous \(g:[0,1]\to\R\) such that \(\limsup \mu_n(g) \neq \mu(g)\) or \(\liminf \mu_n(g) \neq \mu(g)\). Then choose a subsequence \(n_i\) so that \(\mu_{n_i}(g)\) converges to a value other than \(\mu(g)\). By Helly-Bray we can take a further subsequence converging to a distribution \(F'\). This distribution must satisfy \(\mu’(g) \neq \mu(g)\), but this contradicts what we've previously shown, that every convergent subsequence has the same limit. Therefore \(\lim_n \mu_n(g)\) exists for all \(g\in C[0,1]\) and \(\lim \mu_n(g) = \mu(g)\). Problem 18.5 Improving on 18.3, a Moment Inversion Formula
Let \(F\) be a distribution with \(F(0-)= 0\) and \(F(1)=1\). Let \(\mu\) be the associated law, and define \[ \begin{equation} m_k = \int_{[0,1]} x^k\, dF(x) \end{equation} \] Define \[ \begin{equation} \begin{split} &\Omega= [0,1]\times[0,1]^{\N},\,\,\, \mathcal = \mathcal B \times \mathcal B^{\N}, \,\,\, \Pr = \mu\times \Leb^\N \\ &\Theta(\omega) = \omega_0, \,\,\, H_k(\omega) = I_{[0,\omega_0]}(\omega_k) \end{split} \end{equation} \] This models the situation in which \(\Theta\) is chosen with law \(\mu\), a coin with probability \(\Theta\) is minted and tossed at times \(1,2,\dots\). See E10.8. The RV \(H_k\) is 1 if the \(k\)th toss produces heads, 0 otherwise. Define \[ \begin{equation} S_n = H_1+H_2+\dots+H_n \end{equation} \] By the strong law of large numbers and fubini's theorem \[ \begin{equation} S_n /n \to \Theta,\qquad \text{a.s.} \end{equation} \] Define a map \(D\) on the space of real sequences \((a_n : n\in \Z_+)\) by setting \[ \begin{equation} Da = (a_n-a_{n+1} : n \in \Z_+) \end{equation} \] Prove that \[ \begin{equation} F_n(x) = \sum \binom n i (D^{n-i}m)_i \to F(x) \label{eq:dist from moments} \end{equation} \] at every point of continuity of \(F\) By the SLLN, we know that \[ \begin{equation} F(x) = \Pr( \Theta \leq x) = \lim_{n\to \infty} \Pr( S_n/n \leq x ) \end{equation} \] Conditional on \(\Theta\), \(S_n \sim \Bin( n, \Theta )\), so marginalizing over \(\Theta\) and using Fubini's theorem we find \[ \begin{equation} \Pr( S_n \leq nx)= \int_{[0,1]} \sum_{k\leq nx} \binom{n}{k} \theta^{n-k} (1-\theta)^k \, \mu(d\theta) = \sum_{k\leq nx} \binom{n}{k} \int_{[0,1]} \theta^{n-k}(1-\theta)^{k} \, \mu(d\theta) \end{equation} \] But we can write the integral in terms of the \(m_k\). By induction I claim \[ \begin{equation} (D^n m)_k = \int_{[0,1]} x^k (1-x)^n \, \mu(dx) \end{equation} \] Clearly this is true for \(n=0\), for \(n\geq 1\) \[ \begin{equation} \begin{split} (D^n m)_k &= (D \cdot (D^{n-1} m))_k = (D^{n-1}m)_{k} - (D^{n-1}m)_{k+1} \\ &= \int_{[0,1]} x^{k} (1-x)^{n-1} \, \mu(dx) - \int_{[0,1]} x^{k+1} (1-x)^{n-1} \, \mu(dx) \\ &= \int_{[0,1]} x^{k}(1-x)^{n-1} (1-x)\, \mu(dx) \end{split} \end{equation} \] Pulling together the parts we get \(\eqref{eq:dist from moments}\) Problem 18.6 Moment Problem
Prove that if \((m_k : k\in \Z_+)\) is a sequence of numbers in \([0,1]\), then there exiss a RV \(X\) with values in \([0,1]\) such that \(\E(X^k) = m_k\) if and only if \(m_0=1\) and \[ \begin{equation} (D^r m)_s \ge 0 \qquad (r,s \in \Z^+) \end{equation} \] Weak Convergence for \(\Pr[0,\infty)\)Problem 18.7
Using Laplace transforms instead of CF's. Suppose that \(F\) and \(G\) are DF's on \(\R_+\) such that \(F(0-) = G(0-)\) and \[ \begin{equation} \int_{[0,\infty)} e^{-\lambda x} \, dF(x) = \int_{[0,\infty)} e^{-\lambda x}\, dG(x),\qquad \forall x\geq 0 \label{eq:laplace condition} \end{equation} \] Note that the integral on the LHS has a contribution \(F(0)\) from \(\{0\}\). Prove that \(F=G\). Suppose that \((F_n)\) is a sequence of distribution functions on \(\R\) each with \(F_n(0-) = 0\) and such that \[ \begin{equation} L(\lambda) = \lim_n \int e^{-\lambda x}\, dF_n(x) \end{equation} \] exists for \(\lambda\geq 0\) and that \(L\) is continuous at 0. Prove that \(F_n\) is tight and that \[ \begin{equation} F_n \wto F \text{ where } \int e^{-\lambda x}\, dF(x) = L(\lambda),\qquad \forall \lambda\geq 0 \end{equation} \] Let \(U=e^{-X}\) and \(V=e^{-Y}\). Since the range of \(X,Y\) is \([0,\infty)\) the range of \(U,V\) is \([0,1]\). From \(\eqref{eq:laplace condition}\) \[ \begin{equation} \E U^k = \E e^{-k X} = \E e^{-k Y} = \E V^k \end{equation} \] Hence by 10.3 the distributions of \(U\) and \(V\) are equal. However, since \(e^{x}\) is monotonic and invertible, this means that the distributions of \(X\) and \(Y\) are equal. More concretely, the relationship \(\Pr( U \geq c ) = \Pr( X \leq -\log c)\) provides a dictionary to translate between the distributions of \(X\) and \(U\), and similarly for \(Y\) and \(V\). Given \(\epsilon>0\) choose \(\lambda\) small enough so that \(L(\lambda) > 1-\epsilon/4\). This is possible because \(L\) is continuous at 0. Thus, because \(L\) is the limit of the Laplace transforms of the \(F_n\), for all but finitely many \(n\) we must have \[ \begin{equation} \int_{[0,\infty)} ( 1-e^{-\lambda x} )\, dF_n(x) < \epsilon/2 \end{equation} \] Thus for any \(K\), for all but finitely many \(n\) \[ \begin{equation} (1-F_n(K)) (1-e^{-\lambda K}) \leq \int_{[0,\infty)} ( 1-e^{-\lambda x} )\, dF_n(x) < \epsilon/2 \end{equation} \] Choose \(K\) large enough so that \(1-e^{-\lambda K}>\frac 1 2\). Thus we get, for all but finitely many \(n\) \[ \begin{equation} F_n(K) > 1-\epsilon \end{equation} \] which shows that \(F_n\) is tight. The rest of the proof is basically the same as 18.4. Having proved tightness, Hally-Bray implies that some subsequence converges in distribution to some distribution, say \(F\). Any distribution which is the limit of a subsequence of \(F_n\) has the Laplace transform \(L\) (since the limit of a subsequences are the same as the limit of the sequence). By the argument above, there is a unique distribution with a given Laplace transform, Thus, the limit of a subsequence of \(F_n\), when it exists, converges to a unique distribution regardless of the subsequence. This implies that \(F_n \wto F\) since, for any continuous function \(g\), we must have \(\limsup \int g\, dF_n = \liminf \int g\, dF_n = \int g\,dF\), since otherwise we could use Helly-Bray to find a subsequence which converged to a distribution other than \(F\). ContactFor comments or corrections please contact Ryan McCorvie at ryan@martingale.group |