Probability with Martingales - Ryan McCorvie's solutions

\[ \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\Z}{\mathbb Z} \newcommand{\N}{\mathbb N} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Norm}[1]{\left \lVert #1 \right \rVert} \newcommand{\Abs}[1]{\left \lvert #1 \right \rvert} \newcommand{\pind}{\bot \!\!\! \bot} \newcommand{\probto}{\buildrel P\over \to} \newcommand{\vect}[1]{\boldsymbol #1} \DeclareMathOperator{\EE}{\mathbb E} \DeclareMathOperator{\PP}{\mathbb P} \DeclareMathOperator{\E}{E} \DeclareMathOperator{\dnorm}{\mathcal N} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Leb}{Leb} \DeclareMathOperator{\Bin}{Bin} \newcommand{\wto}{\buildrel w\over \to} \]

Problem 18.1

Supose that \(\lambda>0\) and that (for \(n>\lambda\)), \(F_n\) is the DF associated with the Binomial distribution \(\Bin(n,\lambda/n)\). Prove using CF's that \(F_n\) converges weakly to \(F\) where \(F\) is the DF of the Poisson distribution with parameter \(\lambda\).

Suppose that \(X_1,X_2,\dots\) are i.i.d. RV's each with the density function \((1-\cos x)/\pi x^2\) on \(\R\). Prove that for \(x\in \R\)

\[ \begin{equation} \lim_{n\to \infty} \Pr\left(\frac{ X_1+X_2\dots +X_n} n \leq x\right) = \frac 1 2 + \pi^{-1} \arctan x \end{equation} \]

where \(\arctan \in (-\frac \pi 2, \frac \pi 2)\)

\[ \begin{equation} \phi_Z(\theta) = \E e^{i\theta Z} = p e^{i \theta\cdot 1} + q e^{i \theta\cdot 0} = p (e^{i\theta} -1)+ 1 \end{equation} \]

\[ \begin{equation} \phi_{F_n}(\theta) = \left( p e^{i\theta} + q \right)^n = \left( 1+\frac \lambda n (e^{i\theta} -1) \right)^n \to \exp( \lambda( e^{i\theta}-1)) \end{equation} \]

On the other if \(F\) has a Poisson distribution with parameter \(\lambda\), its CF is given by

\[ \begin{equation} \phi_F(\theta) = \E e^{i\theta F} = \sum_{k=0}^\infty e^{i\theta k} e^{-\lambda} \frac{\lambda^k}{k!}= \exp(\lambda e^{i\theta} - \lambda) \end{equation} \]

which is exactly the same as \(\lim_{n\to \infty}\phi_{F_n}(\theta)\). Thus we conclude that \(F_n \to F\) in weakly in distribution.

\[ \begin{equation} \phi_X(\theta) = \E e^{i\theta X} = \int_\R e^{i\theta x} \frac{1-\cos x}{\pi x^2}\, dx \end{equation} \]

\[ \begin{equation} I_k(\alpha) = \int_\R x^{-k} e^{i \alpha x}\, dx \end{equation} \]

(Aside: there's something subtle going on here, because, like the integral in 16.1, this is not integrable in the \(L^1\) sense). As in problem 16.1, take a contour along the real axis from \(-R\) to \(-\epsilon\), a semicircle in the upper half plane from \(-\epsilon\) to \(\epsilon\), a line from \(\epsilon\) to \(R\), and then a semicircle in the upper half plane back to \(-R\). We use a variation of the residue theorem \(\int_{\gamma’} \frac{f(z)}{(z-z_0)^k}\, dz = \frac{i \Delta \theta }{(k-1)!} f^{(k-1)}(z_0)\), and \(\Delta \theta\) is the radians in the infinitesimal path.

\[ \begin{equation} \oint_\gamma \frac{e^{i \alpha z}}{z^k} \, dz = \pi i \frac{ (i\alpha)^{k-1} }{(k-1)!} \end{equation} \]

Using the technique of 16.1, we get an estimate of the integral along the big semicircle

\[ \begin{equation} \int_{\gamma’’} \frac{e^{i \alpha z}}{z^k} \, dz = \int_0^\pi \frac{ e^{- R \alpha \sin \theta}}{R^{k-1}} \, d\theta \to 0 \qquad \text{ as } R\to \infty \end{equation} \]

Since \(\abs{x^{-k}}\leq \abs{x^{-1}}\) on the big semicircle, the estimate for the contribution of this segment holds. Therefore as \(R\to \infty\) and \(\epsilon\to 0\) we get

\[ \begin{equation} \oint_\gamma \frac{e^{i \alpha z}}{z^k} \, dz \to I_k(\alpha) \end{equation} \]

When \(\alpha <0\) we can modify the argument by taking semicircles in the lower half plane. This allows us to conclude the integral vanishes along the big semicircle, but now the sense of integration along the small semicircle is reversed to clockwise from counterclockwise. We can summarize both cases as

\[ \begin{equation} I_k(\alpha) = \frac{ \pi (i \alpha)^k }{(k-1)! \abs{\alpha}} \end{equation} \]

This gives \(I_1(\alpha) = \pi i \sgn(x)\) and \(I_2(\alpha) = -\pi \abs{\alpha}\).

\[ \begin{equation} \begin{split} \phi_X(\theta) &= \frac 1 \pi \left( I_2(\theta) - \frac 1 2 ( I_2(\theta+1) + I_2(\theta-1) ) \right)\\ &= \frac 1 2 \left( \abs{\theta+1} + \abs{\theta-1} \right) - \abs{\theta} \\ &= \begin{cases} 1-\abs{\theta} & \abs{\theta} \leq 1 \\ 0& \text{otherwise} \end{cases} \end{split} \end{equation} \]

\[ \begin{equation} \phi_{S_n/n}(\theta) =\phi_{X}(\theta/n)^n = \begin{cases} \left(1-\frac{\abs{\theta}} n \right)^n & \text{if } \frac{\abs{\theta}} n \leq 1\\ 0 & \text{otherwise} \end{cases} \end{equation} \]

For any fixed \(\theta\), for large enough \(n\), \(\abs{\theta}/n \leq 1\) so the first case applies eventually. Taking the limit as \(n\to \infty\) this implies

\[ \begin{equation} \phi_{S_n/n}(\theta) \to e^{-\abs{\theta}} \end{equation} \]

This is the CF of the Cauchy distribution, \(S_n/n\) converges in distribution to the Cauchy distribution. We conclude

\[ \begin{equation} P(S_n/n \leq x) \to \int_{-\infty}^x \frac{dx}{\pi(1+x^2)} = \frac 1 2 +\pi^{-1} \arctan(x) \end{equation} \]

Problem 18.2

Prove the weak law of large numbers in the following form. Suppose \(X_1,X_2,\dots\) are i.i.d. RV each with the same distribution as \(X\). Suppose \(X\in \mathcal L^1\) and that \(\E X = \mu\). Prove by use of CF's that the distribution of

\[ \begin{equation} S_n = n^{-1}(X_1+\dots+X_n) \end{equation} \]

converges weakly to the unit mass at \(\mu\). Deduce that

\[ \begin{equation} S_n \to \mu \text{ in probability} \end{equation} \]

Of course, SLLN implies this weak law.

\[ \begin{equation} \phi_Z(\theta) = \E e^{i\theta Z} = e^{i \theta \mu} \end{equation} \]

\[ \begin{equation} \phi_X(0) = 1 \qquad \phi_X’(0) = \left. \E i X e^{i\theta X} \right|_{\theta=0}= i \mu \end{equation} \]

\[ \begin{equation} \log \phi_{X}(\theta) = 1 + i \mu \theta + o(\theta) \end{equation} \]

\[ \begin{equation} \phi_{S_n}(\theta) = \phi_X(\theta/n )^n = \left(1+ \frac{i \mu \theta } n + o\left( \frac \theta n\right) \right)^n \to \exp(i\theta \mu ) = \phi_Z(\theta) \end{equation} \]

This implies that \(S_n \to \delta_\mu\) in distribution. However, for a unit mass, convergence in distribution is the same as convergence in probability since for any \(\epsilon>0\)

\[ \begin{equation} \Pr( S_n < \mu-\epsilon) \to \Pr( Z < \mu-\epsilon ) = 0 \qquad \Pr( S_n > \mu+\epsilon) \to \Pr( Z > \mu+\epsilon ) = 0 \end{equation} \]

and hence \(\Pr( \abs{S_n-\mu}>\epsilon) \to 0\) for any \(\epsilon>0\). This proves the weak law of large numbers.

Weak Convergence for \(\Pr[0,1]\)

Problem 18.3

Let \(X\) and \(Y\) be RV's taking values in \([0,1]\). Suppose that

\[ \begin{equation} \E X^k = \E Y^k \qquad k=0,1,2,\dots \end{equation} \]

Prove that

\(\E p(X) = \E p(Y)\) for every polynomial \(p\)
\(\E f(X) = \E f(Y)\) for every continuous function \(f\) on \([0,1]\)
\(\Pr(X \leq \alpha) = \Pr(Y\leq \alpha)\) for every \(\alpha\in [0,1]\).

Item (i) follows from linearity of expectations. If \(p(x) = \sum_{k=0}^n c_k x^k\) then

\[ \begin{equation} \E p(X) = \sum_{k=0}^n c_k \E X^k = \sum_{k=0}^n c_k \E Y^k = \E p(Y) \end{equation} \]

Item (ii) follows from the Weierstrass theorem. On the compact set \([0,1]\), every continuous function \(f\) can be approximated uniformly by a sequence of polynomials \(p_n\). Choose \(N\) large enough so that for all \(n>N\), \(\abs{f(x)-p_n(x)}<\epsilon\) for all \(x\). In this case

\[ \begin{equation} \Abs{\E f(X) - \E p_n(X)} \leq \E \abs{f(X)-p_n(X)} \leq \epsilon \end{equation} \]

\[ \begin{equation} \begin{split} \Abs{ \E f(X) - \E f(Y)} &\leq \Abs{\E f(X) - \E p_n(X)} + \Abs{\E p_n(X) - \E p_n(Y)} + \Abs{\E p_n(X) - \E f(X)}\\ &\leq \epsilon + 0 + \epsilon = 2\epsilon \end{split} \end{equation} \]

\[ \begin{equation} f_n(x) = \begin{cases} 1 & x \in [0, \alpha )\\ n\left(\alpha - x\right) & x \in [ \alpha, \alpha + \frac 1 n ) \\ 0 & x \in [ \alpha + \frac 1 n, 1] \end{cases} \end{equation} \]

\[ \begin{equation} \lim_{n\to \infty} \E f_n(X) = \E I_{[0,\alpha]}(X) = \Pr\left( X\leq \alpha \right) \end{equation} \]

The same holds for \(Y\). However, \(\E f_n(X) = \E f_n(Y)\) for all \(n\) by part (ii). Therefore the limits are equal and \(\Pr( X\leq \alpha) = \Pr(Y\leq \alpha)\). Thus the moments unique determine the distribution on \([0,1]\).

Problem 18.4

Suppose that \((F_n)\) is a sequence of DFs with

\[ \begin{equation} F_n(x) = 0 \text{ for } x<0, \qquad F_n(1)=1,\qquad \text{ for every } n \end{equation} \]

Suppose that

\[ \begin{equation} m_k = \lim_n \int_{[0,1]} x^k dF_n \text{ exists for } k=0,1,2,\dots \end{equation} \]

Use the Helly-Bray Lemma and 18.3 to show that \(F_n\wto F\), where \(F\) is characterized by \(\int_{[0,1]}x^k \, dF = m_k, \forall k\)

Let \(F_n\) be associated with the measure \(\mu_n\). Suppose a subsequence \(F_{n_i}\wto F\) converges in distribution. By Helly-Bray there is a subsequence \(n_i\) such that \(F_{n_i} \wto F\), and call the associated measure \(\mu\). By the definition of weak convergence since \(x^k\) is a continuous function on \([0,1]\), and since \(n_i\) is a subsequence of \(n\)

\[ \begin{equation} m_k = \lim_n \mu_n(x^k) = \lim_{n_i} \mu_{n_i}(x^k) = \mu(x^k) = \int_{[0,1]} x^k\, dF \end{equation} \]

If another subsequence \(n’_i\) satisfies \(F_{n’_i}\wto F'\) with associated measure \(\mu'\), then 10.3 implies that \(F’=F\) since the above argument shows \(\mu’(x^k) = m_k \) as well, so \(\mu’(x^k)= \mu(x^k)\) for all \(k\).

This implies that \(F_n \wto F\). For suppose there is some continuous \(g:[0,1]\to\R\) such that \(\limsup \mu_n(g) \neq \mu(g)\) or \(\liminf \mu_n(g) \neq \mu(g)\). Then choose a subsequence \(n_i\) so that \(\mu_{n_i}(g)\) converges to a value other than \(\mu(g)\). By Helly-Bray we can take a further subsequence converging to a distribution \(F'\). This distribution must satisfy \(\mu’(g) \neq \mu(g)\), but this contradicts what we've previously shown, that every convergent subsequence has the same limit. Therefore \(\lim_n \mu_n(g)\) exists for all \(g\in C[0,1]\) and \(\lim \mu_n(g) = \mu(g)\).

Problem 18.5 Improving on 18.3, a Moment Inversion Formula

Let \(F\) be a distribution with \(F(0-)= 0\) and \(F(1)=1\). Let \(\mu\) be the associated law, and define

\[ \begin{equation} m_k = \int_{[0,1]} x^k\, dF(x) \end{equation} \]

Define

\[ \begin{equation} \begin{split} &\Omega= [0,1]\times[0,1]^{\N},\,\,\, \mathcal = \mathcal B \times \mathcal B^{\N}, \,\,\, \Pr = \mu\times \Leb^\N \\ &\Theta(\omega) = \omega_0, \,\,\, H_k(\omega) = I_{[0,\omega_0]}(\omega_k) \end{split} \end{equation} \]

This models the situation in which \(\Theta\) is chosen with law \(\mu\), a coin with probability \(\Theta\) is minted and tossed at times \(1,2,\dots\). See E10.8. The RV \(H_k\) is 1 if the \(k\)th toss produces heads, 0 otherwise. Define

\[ \begin{equation} S_n = H_1+H_2+\dots+H_n \end{equation} \]

By the strong law of large numbers and fubini's theorem

\[ \begin{equation} S_n /n \to \Theta,\qquad \text{a.s.} \end{equation} \]

Define a map \(D\) on the space of real sequences \((a_n : n\in \Z_+)\) by setting

\[ \begin{equation} Da = (a_n-a_{n+1} : n \in \Z_+) \end{equation} \]

Prove that

\[ \begin{equation} F_n(x) = \sum \binom n i (D^{n-i}m)_i \to F(x) \label{eq:dist from moments} \end{equation} \]

at every point of continuity of \(F\)

\[ \begin{equation} F(x) = \Pr( \Theta \leq x) = \lim_{n\to \infty} \Pr( S_n/n \leq x ) \end{equation} \]

Conditional on \(\Theta\), \(S_n \sim \Bin( n, \Theta )\), so marginalizing over \(\Theta\) and using Fubini's theorem we find

\[ \begin{equation} \Pr( S_n \leq nx)= \int_{[0,1]} \sum_{k\leq nx} \binom{n}{k} \theta^{n-k} (1-\theta)^k \, \mu(d\theta) = \sum_{k\leq nx} \binom{n}{k} \int_{[0,1]} \theta^{n-k}(1-\theta)^{k} \, \mu(d\theta) \end{equation} \]

\[ \begin{equation} (D^n m)_k = \int_{[0,1]} x^k (1-x)^n \, \mu(dx) \end{equation} \]

\[ \begin{equation} \begin{split} (D^n m)_k &= (D \cdot (D^{n-1} m))_k = (D^{n-1}m)_{k} - (D^{n-1}m)_{k+1} \\ &= \int_{[0,1]} x^{k} (1-x)^{n-1} \, \mu(dx) - \int_{[0,1]} x^{k+1} (1-x)^{n-1} \, \mu(dx) \\ &= \int_{[0,1]} x^{k}(1-x)^{n-1} (1-x)\, \mu(dx) \end{split} \end{equation} \]

Problem 18.6 Moment Problem

Prove that if \((m_k : k\in \Z_+)\) is a sequence of numbers in \([0,1]\), then there exiss a RV \(X\) with values in \([0,1]\) such that \(\E(X^k) = m_k\) if and only if \(m_0=1\) and

\[ \begin{equation} (D^r m)_s \ge 0 \qquad (r,s \in \Z^+) \end{equation} \]

Weak Convergence for \(\Pr[0,\infty)\)

Problem 18.7

Using Laplace transforms instead of CF's. Suppose that \(F\) and \(G\) are DF's on \(\R_+\) such that \(F(0-) = G(0-)\) and

\[ \begin{equation} \int_{[0,\infty)} e^{-\lambda x} \, dF(x) = \int_{[0,\infty)} e^{-\lambda x}\, dG(x),\qquad \forall x\geq 0 \label{eq:laplace condition} \end{equation} \]

Note that the integral on the LHS has a contribution \(F(0)\) from \(\{0\}\). Prove that \(F=G\).

Suppose that \((F_n)\) is a sequence of distribution functions on \(\R\) each with \(F_n(0-) = 0\) and such that

\[ \begin{equation} L(\lambda) = \lim_n \int e^{-\lambda x}\, dF_n(x) \end{equation} \]

exists for \(\lambda\geq 0\) and that \(L\) is continuous at 0. Prove that \(F_n\) is tight and that

\[ \begin{equation} F_n \wto F \text{ where } \int e^{-\lambda x}\, dF(x) = L(\lambda),\qquad \forall \lambda\geq 0 \end{equation} \]

Let \(U=e^{-X}\) and \(V=e^{-Y}\). Since the range of \(X,Y\) is \([0,\infty)\) the range of \(U,V\) is \([0,1]\). From \(\eqref{eq:laplace condition}\)

\[ \begin{equation} \E U^k = \E e^{-k X} = \E e^{-k Y} = \E V^k \end{equation} \]

Hence by 10.3 the distributions of \(U\) and \(V\) are equal. However, since \(e^{x}\) is monotonic and invertible, this means that the distributions of \(X\) and \(Y\) are equal. More concretely, the relationship \(\Pr( U \geq c ) = \Pr( X \leq -\log c)\) provides a dictionary to translate between the distributions of \(X\) and \(U\), and similarly for \(Y\) and \(V\).

Given \(\epsilon>0\) choose \(\lambda\) small enough so that \(L(\lambda) > 1-\epsilon/4\). This is possible because \(L\) is continuous at 0. Thus, because \(L\) is the limit of the Laplace transforms of the \(F_n\), for all but finitely many \(n\) we must have

\[ \begin{equation} \int_{[0,\infty)} ( 1-e^{-\lambda x} )\, dF_n(x) < \epsilon/2 \end{equation} \]

\[ \begin{equation} (1-F_n(K)) (1-e^{-\lambda K}) \leq \int_{[0,\infty)} ( 1-e^{-\lambda x} )\, dF_n(x) < \epsilon/2 \end{equation} \]

Choose \(K\) large enough so that \(1-e^{-\lambda K}>\frac 1 2\). Thus we get, for all but finitely many \(n\)

The rest of the proof is basically the same as 18.4. Having proved tightness, Hally-Bray implies that some subsequence converges in distribution to some distribution, say \(F\). Any distribution which is the limit of a subsequence of \(F_n\) has the Laplace transform \(L\) (since the limit of a subsequences are the same as the limit of the sequence). By the argument above, there is a unique distribution with a given Laplace transform, Thus, the limit of a subsequence of \(F_n\), when it exists, converges to a unique distribution regardless of the subsequence. This implies that \(F_n \wto F\) since, for any continuous function \(g\), we must have \(\limsup \int g\, dF_n = \liminf \int g\, dF_n = \int g\,dF\), since otherwise we could use Helly-Bray to find a subsequence which converged to a distribution other than \(F\).

Probability with Martingales by David Williams

Weak Convergence for \(\Pr[0,1]\)

Weak Convergence for \(\Pr[0,\infty)\)

Contact