Probability with Martingales by David Williams

\[ \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\Z}{\mathbb Z} \newcommand{\N}{\mathbb N} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Norm}[1]{\left \lVert #1 \right \rVert} \newcommand{\Abs}[1]{\left \lvert #1 \right \rvert} \newcommand{\pind}{\bot \!\!\! \bot} \newcommand{\probto}{\buildrel P\over \to} \newcommand{\vect}[1]{\boldsymbol #1} \DeclareMathOperator{\EE}{\mathbb E} \DeclareMathOperator{\PP}{\mathbb P} \DeclareMathOperator{\E}{E} \DeclareMathOperator{\dnorm}{\mathcal N} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Leb}{Leb} \DeclareMathOperator{\Bin}{Bin} \newcommand{\wto}{\buildrel w\over \to} \]

The Strong Law

Problem 7.1

Let \(f\) be a bounded continuous function on \([0,\infty)\). The Laplace transform of \(f\) is the function \(L\) on \((0,\infty)\) defined by

\[ \begin{equation} L(\lambda) = \int_0^\infty e^{-\lambda x} f(x)\, dx \end{equation} \]

Let \(X_1,X_2,\dots\) be i.i.d. random variables with the exponential distriution of rate \(\lambda\), so \(\Pr(X>x) = e^{-\lambda x}\), \(\E[X] = \frac 1 \lambda\), \(\Var(X) = \frac 1 {\lambda^2}\). If \(S_n=X_1+X_2+\dots +X_n\) show that

\[ \begin{equation} \E f(S_n) = \frac{ (-1)^{n-1} \lambda^n }{(n-1)!} L^{(n-1)} (\lambda) \end{equation} \]

Show that \(f\) may be recovered from \(L\) as follows: for \(y>0\)

\[ \begin{equation} f(y) = \lim_{n\uparrow \infty} (-1)^{n-1} \frac{(n/y)^n L^{n-1}(n/y)} {(n-1)!} \end{equation} \]

We can write

\[ \begin{equation} \begin{split} \E f(S_n) &= \int_{\R_+^n} f(x_1 + x_2 + \dots + x_n) \lambda e^{-\lambda x_1} \lambda e^{-\lambda x_2} \dots \lambda e^{-\lambda x_n} \,dx_1 dx_2 \dots dx_n \\ &= \int_{0}^\infty \int_{u_1}^\infty \dots \int_{u_{n-1}}^\infty f(u_n) \lambda^n e^{-\lambda u_n} \, du_n\dots du_1 \label{eq:laplacechain} \end{split} \end{equation} \]

where we performed a change of variables \(u_1= x_1, u_2=x_1+x_2, \dots, u_n = x_1+x_2 + \dots +x_n\).

To simplify this, we're going to successively apply Fubini's theorem. To that end consider the calculation

\[ \begin{equation} \int_0^\infty u^k \int_u^\infty g(v) \, dv \,du = \int_0^\infty g(v) \int_0^v u^k \, du \,dv = \int_0^\infty \frac {v^{k+1}}{k+1} g(v)\, dv \end{equation} \]

So let \(u=u_1\), \(v=u_2\), \(k=0\) and \(g(v) = \int_v^\infty \dots \int_{u_{n-1}}^\infty f(u_n) e^{-\lambda u_n}\). We are left with

\[ \begin{equation} \int_{0}^\infty u_2 \int_{u_2}^\infty \dots \int_{u_{n-1}}^\infty f(u_n) \lambda^n e^{-\lambda u_n} \, du_n\dots du_1 \end{equation} \]

We may repeatedly interchange the order of integration, and then integrate out one variable until we are left with

\[ \begin{equation} \E f(S_n) = \int_0^\infty \frac{u_n^{n-1} \lambda^n}{(n-1)!} f(u_n) e^{-\lambda u_n}\, du_n \label{eq:simplified expectation} \end{equation} \]

Now we need a fact from the theory of Laplace transforms

\[ \begin{align} \frac d {d\lambda} L[f] &= \int_0^\infty \frac{d}{d\lambda} f(x)e^{-\lambda x}\, dx = \int_0^\infty -x f(x)e^{-\lambda x}\, dx \\ &= -L[xf] \label{eq:deriv of laplace}\\ \end{align} \]

Comparing this equation to \(\eqref{eq:simplified expectation}\) its clear that

\[ \begin{equation} \E f(S_n) = \frac {(-1)^{n-1} \lambda^n} {(n-1)!} L^{n-1}[f] \end{equation} \]

Note \(\E x^4 = \lambda^{-5} \Gamma(4) < \infty\). Therefore by the SLLN, \(S_n/n \to \lambda\) almost surely. By continuity, \(f(S_n/n) \to f(1/\lambda)\) almost surely. By bounded convergence theorem (since \(f \leq B\) for some constant \(B\) and \(\E B = B\)), \(\E f(S_n/n) \to f(1/\lambda)\). We connect this to our previous expression by first noting for \(\alpha > 0\)

\[ \begin{equation} L[f(\alpha x)](\lambda) = \int_0^\infty f(\alpha x)e^{-\lambda x}\, dx = \alpha^{-1} \int_0^\infty e^{-\lambda \alpha u} f(u) \, du = \alpha^{-1} L[f(x)](\alpha \lambda ) \end{equation} \]

where the middle equality comes from the substitution \(u=x/\alpha\). From this its clear that

\[ \begin{equation} \frac {d^{n-1}}{dx^n} L[f(\alpha x)](\lambda) = \alpha^{-n} \frac {d^{n-1}}{dx^n}L[f(x)](\alpha \lambda) \end{equation} \]

Taking \(y=1/\lambda\) and \(\alpha = 1/n\) we get the desired result

\[ \begin{equation} f(y) = \E f(S_n/n) = \lim_{n\to \infty} (-1)^{n-1}\frac{(n/y)^n L^{(n-1)}(n/y) }{ (n-1)!} \end{equation} \]

Problem 7.2

As usual, write \(S^{n-1} = \{x\in \R^n : \abs{x}=1 \}\). There is a unique probability measure \(\nu^{n-1}\) on \((S^{n-1},\mathcal B(S^{n-1}))\) such that \(\nu^{n-1}(A) = \nu^{n-1}(HA)\) for every orthogonal \(n\times n\) matrix \(H\) and every \(A\) in \(\mathcal B(S^{n-1})\). This is the same as the radial measure in polar coordinates, or the Haar measure under the action of orthogonal matricies.

  • Prove that if \(\vect X\) is a vector in \(\R^n\), the components of which are independent \(\dnorm(0,1)\) variables, then for every orthogonal \(n \times n\) matrix \(H\) the vector \(H\vect X\) has the same property. Deduce that \(\vect X / \norm{\vect X}\) has law \(\nu^{n-1}\)

  • Let \(Z_1,Z_2,\dots \sim \dnorm(0,1)\) and let \(R_n = \norm{(Z_1,\dots,Z_n)} = (Z_1^2 +Z_2^2+\dots+Z_n^2)^{\frac 1 2}\). Show \(R_n/\sqrt n \to 1\) a.s.

  • For each \(n\), let \((Y^{(n)}_1,Y^{(n)}_2,\dots,Y^{(n)}_n)\) be a point chosen on \(S^{n-1}\) according to the distribution \(\nu^{n-1}\). Then

\[ \begin{equation} \lim_{n\to\infty} \Pr( \sqrt Y^{(n)}_1 \leq x) = \Phi(x) =\frac 1 {\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2}\, dy \end{equation} \]

\[ \begin{equation} \lim_{n\to\infty} \Pr( \sqrt Y^{(n)}_1 \leq x_1; \sqrt Y^{(n)}_2 \leq x_2 ) = \Phi(x_1)\Phi(x_2) \end{equation} \]

Let \(\vect Y=H\vect X\). Each component is the linear combination of Gaussian random variables, so it too must be jointly Gaussian. Let's compute the moments of the components. Note \(\E \vect Y = \E H\vect X = \E H\vect 0 = \vect 0\). Therefore

\[ \begin{equation} \Cov \vect Y = \E \vect Y \vect Y^\intercal = \E (H\vect X) (H\vect X)^\intercal = \E H \vect X \vect X^\intercal H^\intercal = H (\E\vect X \vect X^\intercal) H^\intercal = HIH^\intercal = I \end{equation} \]

We've used the identity \(H H^\intercal\) for orthogonal matricies. Thus each \(Y_i \sim \N(0,1)\). Furthermore \(\Cov(Y_i,Y_j) = 0\) if \(i\neq j\). Since these are jointly Gaussian random variables, zero correlation is the same thing as independence. Now \(\vect X / \norm{\vect X} \in S^{n-1}\) and the law of \(\vect X\), by the above calculation, is invariant under orthogonal transformations. That is, \(H \vect X / \norm{H\vect X}\) has the same distribution as \(\vect X / \norm{\vect X}\), which is true since \(H\vect X\) has the same distribution as \(\vect X\). We conclude that the law of \(\vect X / \norm{\vect X}\) is \(\nu^{n-1}\) by the uniqueness of this measure.

Note \(\E Z_k^2 = 1\) since this is just the variance of \(Z_k\). Also \(\E Z^8_k < \infty\) since a Gaussian has finite moments of all orders. By the SLLN applied to the random variables \(Z_k\),

\[ \begin{equation} R_n^2 / n = (\sum_{k=1}^n Z^2_k)/n \to 1 \text{ a.s.} \end{equation} \]

By continuity of square roots, it must also be the case that \(R_n/\sqrt n \to 1\) almost surely.

wNote \(\vect Y\) has the same law as \(\vect X / \norm{\vect X} = \vect X / R_n\). As \(n\to \infty\), this tends to \(\vect X/\sqrt n\) almost surely. Thus \(\Pr(\sqrt n Y_1 < x) \to \Pr( X_1 < x) = \Phi(x)\). Similarly \(\Pr(\sqrt n Y_1<x_1; \sqrt n Y_2 < x_2) \to \Pr( X_1<x_1; X_2<x_2) = \Phi(x_1)\Phi(x_2)\).

Contact

For comments or corrections please contact Ryan McCorvie at ryan@martingale.group