Probability with Martingales by David Williams

\[ \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\Z}{\mathbb Z} \newcommand{\N}{\mathbb N} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Norm}[1]{\left \lVert #1 \right \rVert} \newcommand{\Abs}[1]{\left \lvert #1 \right \rvert} \newcommand{\pind}{\bot \!\!\! \bot} \newcommand{\probto}{\buildrel P\over \to} \newcommand{\vect}[1]{\boldsymbol #1} \DeclareMathOperator{\EE}{\mathbb E} \DeclareMathOperator{\PP}{\mathbb P} \DeclareMathOperator{\E}{E} \DeclareMathOperator{\dnorm}{\mathcal N} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Leb}{Leb} \DeclareMathOperator{\Bin}{Bin} \newcommand{\wto}{\buildrel w\over \to} \]

Independence

Problem 4.1

Let \((\Omega, \mathcal F, \Pr)\) be a probability triple. Let \(\mathcal I_1\), \(\mathcal I_2\) and \(\mathcal I_3\) be three \(\pi\)-systems on \(\Omega\) such that for \(k=1,2,3\)

\[ \begin{equation} \mathcal I_k \subset \mathcal F \text{ and } \Omega \in \mathcal I_k \end{equation} \]

Prove that if

\[ \begin{equation} \Pr(I_1 \cap I_2 \cap I_3) = \Pr(I_1) \Pr(I_2) \Pr(I_3) \label{eq:tripleind} \end{equation} \]

whenever \(I_k \in \mathcal I_k\) then the \(\sigma( \mathcal I_k)\) are independent. Why did we require \(\Omega \in \mathcal I_k\)?

Fix \(I_2 \in \mathcal I_2\) and \(I_3 \in \mathcal I_3\). Then consider the mappings for \(I\subset \mathcal I_1\)

\[ \begin{equation} \mu(I) \hookrightarrow \Pr(I \cap I_2 \cap I_3) \qquad \mu’(I) \hookrightarrow \Pr(I) \Pr(I_2) \Pr( I_3) \end{equation} \]

Each of \(\mu\) and \(\mu'\) are countably additive measures because the probability \(\Pr\) is a countably additive measure on \(\mathcal F\). By the hypothesis \(\eqref{eq:tripleind}\), the measures are equal on the \(\pi\)-system \(\mathcal I_1\). By assumption \(\Omega \in \mathcal I_1\) as well, so \(\mu(\Omega) = \mu’(\Omega)\). If we dropped the assumption that \(\Omega \in \mathcal I_1\), then the condition that \(\mu(\Omega) = \mu’(\Omega)\) is the condition that \(\Pr( I_2 \cap I_3) = \Pr( I_2) \Pr(I_3)\), which doesn't hold generically. We conclude from theorem 1.6 that \(\mu(I)=\mu’(I)\) for \(I\in \sigma(\mathcal I_1)\). Since \(I_2\) and \(I_3\) were arbitrary, equation \(\eqref{eq:tripleind}\) holds for all \(I_1\in\sigma(\mathcal I_1), I_2\in \mathcal I_2, I_3\in \mathcal I_3\).

Now we can repeat the above argument with the \(\pi\)-systems \(\mathcal I’_1 = \mathcal I_2, \mathcal I’_2 = \mathcal I_3, \mathcal I’_3 = \sigma(\mathcal I_1)\), and observe \(\eqref{eq:tripleind}\) is invariant to permutations of \(\{1,2,3\}\), to conclude that \(\eqref{eq:tripleind}\) holds for \(I_1\in\sigma(\mathcal I_1), I_2\in \sigma(\mathcal I_2), I_3\in \mathcal I_3\). Applying the arugment one more time for \(\mathcal I’_1 = \mathcal I_3, \mathcal I’_2 = \sigma(\mathcal I_1), \mathcal I’_3 = \sigma(\mathcal I_2)\) gives the result we want, that \(\eqref{eq:tripleind}\) holds for \(I_k \in \sigma(\mathcal I_k)\)

Problem 4.2

For \(s>1\) define \(\zeta(s) = \sum_{n\in\N} n^{-s}\) and consider the distribution on \(\N\) given by \(\Pr(X=n) = \zeta(s)^{-1} n^{-s}\). Let \(E_k = \{ n \text{ divisible by } k\}\). Show \(E_p\) and \(E_q\) are independent for primes \(p\neq q\). Show Euler's formula \(1/\zeta(s) = \prod_p (1-1/p^s)\). Show \(\Pr( \)X\( \text{ is square-free}) = 1/\zeta(2s)\). Let \(H = \gcd(X,Y)\). Show \(\Pr(H=n)= n^{-2s}/\zeta(2s)\)

The elements of \(m\in E_k\) are in one-to-one correspondence with elements of \(n\in \N\) by the relationship \(m=nk\). Therefore

\[ \begin{equation} \Pr( E_k) = \sum_{m\in E_k} \Pr(m) = \sum_{n\in \N} \Pr(nk) \end{equation} \]

Since

\[ \begin{equation} \Pr(nk) = k^{-s}n^{-s}/\zeta(s) = k^{-s} \Pr(n) \end{equation} \]

we have

\[ \begin{equation} \Pr(E_k) = k^{-s} \sum_{n\in\N} \Pr(n) = k^{-s} \end{equation} \]

Furthermore, the conditional probability satisfies

\[ \begin{equation} Pr( X = k n \mid X \in E_k) = \frac{ \Pr(X=n k)} {\Pr( E_k)} = \frac{(n k)^{-s}/\zeta(s)}{k^{-s}} = n^{-s} / \zeta( s ) \end{equation} \]

Thus the conditional distribution of \(X\) is given by the \(\zeta(s)\) distribution on the relative divisor \(X/k\).

Now, for \(m,n\) relatively prime, \(E_m \cap E_n = E_{mn}\) by unique factorization. Therefore

\[ \begin{equation} \Pr(E_m \cap E_n) = \Pr( E_{mn}) = (mn)^{-s} = m^{-s} n^{-s} = \Pr(E_m) \Pr(E_n) \end{equation} \]

which proves independence.

Consider the probability \(X\) is not divisible by any of the first \(n\) primes \(p_1,p_2,\dots,p_n\). This is the event \(E = E^c_{p_1} \cap \dots \cap E^c_{p_n}\) and

\[ \begin{equation} \Pr(E) = \prod_{i=1}^n (1-\Pr(E_{p_i}) )= \prod_{i=1}^n (1-p_i^{-s}) \end{equation} \]

As \(n\to\infty\) the set \(E\) consists of 1 and really big numbers, certainly every element of \(E\) exceeds \(p_{n+1}\), and \(\Pr(X>n)\downarrow 0\) as \(n\to\infty\). Thus \(\Pr(E) \to \Pr(X=1) = 1/\zeta(s)\). This shows

\[ \begin{equation} \zeta(s)^{-1} = \prod_p (1-1/p^s) \end{equation} \]

The set of integers which don't contain a square prime factor up to \(p_n\) is given by

\[ \begin{equation} F=E^c_{p_1^2} \cap \dots \cap E^c_{p_n^2} \end{equation} \]

Each of the sets \(E_{p^2}\) are independent, because the numbers \(p^2\) are relatively prime. Thus as \(n\to \infty\), \(F\) converges to a set of square-free numbers up to \(p_n\) (at least) and a set of negligible probability, so

\[ \begin{equation} \Pr(F) \to \prod_p (1-(p^2)^{-s}) = \prod_p (1-1/p^{2s}) = 1/\zeta(2s) \end{equation} \]

Now \(n=\gcd(x,y)\) iff \(X\) and \(Y\) are divisible by \(n\), and the relative divisors \(n_x,n_y\) such that \(x=n\cdot n_x\) and \(y=n \cdot n_y\). By independence the event that both \(X \in E_n\) and \(Y\in E_n\) satisfies

\[ \begin{equation} \Pr( X\in E_n, Y\in E_n) = \Pr( X \in E_n) \Pr( Y\in E_n) = n^{-2s} \end{equation} \]

Let \(R = \{ (a,b) \mid a,b \text{ relatively prime}\}\) where we take the convention \((1,1) \not \in R\). We can calulate \(\Pr( (X,Y) \in R )\) as follows. First consider the sets \(F_p = \{ X\in E_p \} \cap \{ Y \in E_p\}\). By independence \(\Pr( F_p ) = \Pr(X\in E_p) \Pr( Y\in E_p)= p^{-2s}\). The \(F_p\) are independent of each other since the \(E_p\) are. Then \(F_p^c\) is the event that \(p\) is not a divisor of at least one of \(X\) and \(Y\). Hence \(F = \bigcap_p F^c_p\) is the event that \(X\) and \(Y\) have no common divisor.

\[ \begin{equation} \Pr( X,Y \text{ relatively prime} )= \Pr( F ) =\prod_p (1-p^{-2s}) = 1/\zeta(2s) \end{equation} \]

Pulling it together

\[ \begin{equation} \Pr(\gcd(X,Y)=n) = \Pr( A_n ) \Pr( B_n \mid A_n) = n^{-2s} / \zeta(2s) \end{equation} \]

where \(A_n = \{ X \in E_n \} \cap \{ Y\in E_n \}\) and \(B_n = \{ (X/n,Y/n) \in R \}\).

Problem 4.3

Let \(X_1,X_2,\dots\) be i.i.d. random variables from a continuous distribution. Let \(E_1= \Omega\) and for \(n\geq 2\) let \(E_n = \{ X_n = \max(X_1,X_2,\dots,X_n)\}\), that is that a ‘‘record’’ occurs at event \(n\). Show that the events \(E_n\) are independent and that \(\Pr(E_n) = 1/n\).

A continuous distribution has the property that singletons have measure zero \(\mu(X=x) = 0\). Thus conditioning on the value of \(X_1\), it follows that \(\{X_1=X_2\}\) has measure zero also, by Fubini's theorem \(\Pr( X_1=X_2) = \int_\Omega \Pr(X_2=x) \, \mu(dx) = 0 \). Since there are a finite number of pairs of variables, we may assume none of the \(X_i\) are coincident by excluding a set of measure zero.

Allow a permutation \(\sigma\in S_n\) to act on the sample space by permuting the indices. That is \(\sigma : (X_1,X_2,\dots,X_n) \hookrightarrow (X_{\sigma 1}, X_{\sigma 2 }, \dots, X_{\sigma n})\). Clearly this also permutes the ranks so that \(\rho_i(\sigma X) = \rho_{\sigma i}(X)\), or in vector form, \(\rho(\sigma X) = \sigma \rho(X)\). Because the \(X_i\) are i.i.d., the permutation \(\sigma\) is measure preserving \(\Pr( E) = \Pr( \sigma E)\) for any event \(E\subset \Omega^n\). Consider the Weyl chamber \(W =\{ x_1 < x_2 < \dots < x_n\}\). Clearly the disjoint \(\sigma W\) partition \(\Omega^n\), excluding a set of measure zero of coincident points, since every point \(x\in \Omega^n\) is in \(\rho(x)^{-1}(W)\) where \(\rho(x)\) is the permutation induced by mapping each component to its ordinal ranking from smallest to greatest.

We infer that \(\Pr(W) = 1/n!\), since the disjoint copies of \(W\) have equal probability and comprise the entire space. The event \(E_n\) is the union of all the sets \(\sigma W\) where \( \sigma n = n\). There are \((n-1)!\) such permutations, so \(\Pr E_n = (n-1)! / n! = 1/n\).

For any \(\sigma W \subset E_n\) we can apply any permutation \(\sigma’ \in S_m \subset S_n\) which acts only on the first \(m\) letters and keeps the rest fixed. For any \(\sigma’ \in S_m\), \(\sigma’ \sigma W\subset E_n\) since the \(n\)th component is largest and unchanged by the action of \(\sigma'\). Let us group the sets \(\sigma W \subset E_n\) into the orbits of the action of \(S_m\). From each orbit we may choose the representative with \(\rho_1(x) < \rho_2(x) <\dots <\rho_m(x)\) for each \(x\in \sigma W\), which is formed by permuting the first \(m\) elements into sorted order. Let \(U\) be the union of all orbit representatives. For \(\sigma’ \in S_m\) the sets \(\sigma’ U\) are disjoint and partition \(E_n\). There are \(m!\) such sets, they all have equal probability, and their union has probability \(1/n\), so \(\Pr(U) = 1/(n \cdot m!)\). Of these exactly \((m-1)!\) are also in \(E_m\). These correspond to \(\sigma’ \in S_m\) which satisfy \(\sigma’ m = m\). Therefore \(\Pr(E_m \cap E_n) = 1/(mn) = \Pr( E_m) \Pr(E_n)\) and the sets are independent.

Borel-Cantelli Lemmas

Problem 4.4

Suppose a coin with probability \(p\) of heads is tossed repeatedly. Let \(A_k\) be the event that a sequence of \(k\) (or more) heads occurs among the tosses numbered \(2^k, 2^k+1, 2^k+1,\dots,2^{k+1}-1\). Prove that

\[ \begin{equation} \Pr( A_k \text{ i.o}) = \begin{cases} 1 & \text{if } p\geq \frac 1 2 \\ 0 & \text{if } p< \frac 1 2 \end{cases} \end{equation} \]

Let \(E_{k,i}\) be the event that there is a string of \(k\) heads at times \(j\) for

\[ \begin{equation} 2^k+i \leq j < 2^k + i + k-1 \end{equation} \]

Observe \(\Pr(E_{k,i}) = p^k\). If \(m_k = 2^k-k\) and \(i\leq m_k\) then \(E_{k,i} \subset A_k\). Furthmore \(A_k \subset \bigcup_{i=0}^{m_k} E_{k,i}\) since if there's a string of \(k\) heads in this range, it has to start somewhere. Thus by the union bound

\[ \begin{equation} \Pr(A_k) \leq \sum_{i=0}^{m_k} E_{k,i} = m_k \cdot \Pr( E_{k,i}) \leq (2p)^k \end{equation} \]

If \(p<\frac 1 2\) then \(\sum_k \Pr(A_k) < \infty\). By Borel-Cantelli 1, we must have \(\Pr(A_k \text{ i.o.}) = 0\).

Now consider \(E_{k,0}, E_{k,k}, E_{k,2k},\dots\) so long as \(ik \leq m_k\). Now the sets are independent and

\[ \begin{equation} A_k \supset \bigcup_{i=0}^{\lfloor m_k/k \rfloor} E_{k,ik} \end{equation} \]

Taking the inclusion-exclusion inequality

\[ \begin{equation} \Pr(A_k) \geq \sum_i \Pr( E_{k,ik}) - \sum_{i,j} \Pr( E_{k,ik} \cap E_{k,jk}) \end{equation} \]

All of the terms in the first sum are \(p^k\) and there are \(\lfloor m_k/k \rfloor \geq 2^k/k-2\) terms. All the terms in the second sum are \(p^{2k}\) since the \(E_{k,ik}\) are independent and there are \(\binom{\lfloor m_k/k\rfloor}{2}\) of them. Therefore when \(p=\frac 1 2\)

\[ \begin{equation} \Pr( A_k ) \geq \left(\frac{2^k}{k} - 2\right)p^k - \left(\frac{2^{2k}}{2k^2} \right) p^{2k} = \frac 1 k - 2^{-(k-1)} - \frac 1 {2k^2} \end{equation} \]

Therefore \(\sum \Pr( A_k) = \infty\), since its greater than sum of the harmonic series and a convergent series. Since the \(A_k\) are independent, the second Borel Cantelli lemma says that \(\Pr( A)k \text{ i.o.})=1\).

As a function of \(p\), the probability \(\Pr( A_k \text{i.o.})\) must non-decreasing, since increasing the probability of heads increases the likelihood of the event. However we just showed this probability is 1 for \(p=\frac 1 2\). Therefore the probability is 1 for any \(p>\frac 1 2\).

Problem 4.5

Prove that if \(G \sim \dnorm(0,1)\) then for \(x>0\)

\[ \begin{equation} \Pr(G > x) \leq \frac 1 {x\sqrt{2\pi}} e^{-\frac 1 2 x^2} \end{equation} \]

Let \(X_i \dnorm(0,1)\) be i.i.d.. Show that \(L = 1\) almost surely where

\[ \begin{equation} L = \limsup( X_n / \sqrt{2 \log n}) \end{equation} \]

Let \(S_n = \sum_i X_i\). Recall \(S_n/\sqrt n \sim \dnorm(0,1)\). Prove that

\[ \begin{equation} \abs{S_n} < 2\sqrt{n\log n}, \text{ eventually almost surely} \end{equation} \]

Let's do a calcualtion, noting that \(u/x\geq 1\) in the range of the integral

\[ \begin{align} \int_x^\infty \frac{e^{-\frac 1 2 u^2}}{u^k} \, du \leq \frac 1 {x^{k+1}} \int_x^\infty u e^{-\frac 1 2 u^2}\, du = \frac {e^{-\frac 1 2 x^2}} {x^{k+1}} \label{eq:erf_moment_bound} \end{align} \]

Now integrating by parts with \(v=xe^{-\frac 1 2 x^2}\) and \(u’ = 1/x\) we get

\[ \begin{equation} \sqrt{2 \pi} \Pr( G > x) =\int_x^\infty e^{-\frac 1 2 u^2}\, du = \frac{e^{-\frac 1 2 x^2}} x -\int_x^\infty \frac{ e^{-\frac 1 2 x^2}}{x^2} \, dx \end{equation} \]

Alternately dropping the last term or applying \(\eqref{eq:erf_moment_bound}\) with \(k=2\) we find

\[ \begin{equation} \frac 1 {\sqrt{2\pi}} \left( \frac 1 x - \frac 1 {x^3} \right) e^{-\frac 1 2 x^2} \leq \Pr(G>x) \leq \frac 1 {\sqrt{2\pi}} \frac 1 x e^{-\frac 1 2 x^2} \end{equation} \]

If \(I(x) = x^{-1} e^{-\frac 1 2 x^2}\) then this shows for any \(\epsilon>0\) of \(x\) is large enough then \(\frac{1-\epsilon}{\sqrt{2\pi} }I(x) \leq P(G>x) \leq \frac 1 {\sqrt{2\pi}}\). Now

\[ \begin{equation} I(\sqrt{2\alpha \log n}) = \frac 1 {\sqrt{\alpha\log n}} e^{-\alpha \log n } = \frac 1 {x^\alpha \sqrt{\alpha \log n}} \end{equation} \]

Using the substitution the substitution \(u=\sqrt{\log x}\) we can apply the integral test

\[ \begin{equation} \int_a^\infty I(\sqrt{2\alpha \log n}) \, dx = \int_a^\infty \frac{dx}{x^\alpha \sqrt{\alpha \log x}} = \frac 2 {\sqrt \alpha} \int_a^\infty e^{(1-\alpha) u^2} \, du \end{equation} \]

Clearly the integral converges if \(\alpha > 1\) and diverges if \(\alpha\leq 1\), so the same is true of

\[ \begin{equation} \sum_n \Pr(X_n>\sqrt{2\alpha \log n}) = O(\sum_n I(\sqrt{2\alpha \log n})) \end{equation} \]

The \(X_n\) are independent so we may apply Borel-Cantelli and its converse to conclude for \(\alpha > 1\)

\[ \begin{equation} \limsup X_n / \sqrt{2\alpha \log n} \leq 1 \text{ a.s.} \qquad \liminf X_n / \sqrt{2\log n} \geq 1 \text{ a.s.} \end{equation} \]

Let \(\alpha \downarrow 1\) to conclude \(\lim L =1\) almost surely.

Note that we only used independence in the lower bound, the upper bound uses plain jane Borel-Cantelli which does not assume independence. Thus \(Y_n = S_n/\sqrt n\) are a sequence of \(\dnorm(0,1)\) random variables. If we take \(\alpha = \sqrt2\) then conclude

\[ \begin{equation} Y_n < 2\sqrt {\log n} \text{ eventually, a.s.} \end{equation} \]

The same is true for \(-Y_n\) which has the same distribution, so eventually (taking the greater eventuality) \(\abs{Y_n} < 2\sqrt{\log n}\). This is the same as \(\abs{S_n }< 2\sqrt{n \log n}\) eventually almost surely. In particular this implies the strong law of large numbers for \(X_n\sim \dnorm(\mu,\sigma^2)\) since \(\abs{S_n/n-\mu} < 2\sigma \sqrt{\frac {\log n}n} \to 0\) eventually almost surely.

Problem 4.6

This is a converse to the SLLN. Let \(X_n\) be a sequence of i.i.d. RV with \(\E[\abs{X_n}] = \infty\). Prove that

\[ \begin{equation} \sum_n \Pr(\abs{X_n} > kn) = \infty \text{ for } \forall k \qquad \text{and} \qquad \limsup \frac{\abs{X_n}} n =\infty \text{ a.s.} \end{equation} \]

For \(S_n = \sum_{i=1}^n X_n\) conclude

\[ \begin{equation} \limsup \frac{\abs{S_n}} n = \infty \text{ a.s.} \end{equation} \]

Then

\[ \begin{equation} \lfloor Z \rfloor = \sum_{n\in \N} I_{Z\geq n} \end{equation} \]

since each indicator with \(n<Z\) adds 1 to the sum and all the other indicaters are 0, and there are \(\lfloor Z \rfloor\) such indicators. Taking expectations of both sides and noting that \(\lfloor Z \rfloor \leq Z \leq \lfloor Z \rfloor + 1\)

\[ \begin{equation} E[Z]-1 \leq \sum_{n\in \N} \Pr( Z \geq n ) \leq E[Z ] \end{equation} \]

FOr any \(k\in \N\), take \(Z = \abs{X_1}/k\), since \(E[Z] = \infty\) we get that

\[ \begin{equation} \sum_n \Pr( \abs{X_1}/k > n ) \geq E[Z]-1 = \infty \end{equation} \]

For i.i.d. \(X_n\) since \(\Pr( \abs{X_n} > \alpha) = \Pr( \abs{X_1} > \alpha)\) for any \(\alpha \geq 0\) this implies

\[ \begin{equation} \sum_n \Pr( \abs{X_n} > kn ) =\infty \end{equation} \]

By the converse of Borel-Cantelli we must have \(\abs{X_n}/n > k\) infinitely often, and therefore \(\limsup \abs{X_n} / n \geq k\) for every \(k\in \N\). We conclude \(\limsup \abs{X_n} / n = \infty\).

Suppose \(\abs{S_n}/n \leq k\) eventually for some \(k>0\). That is, there exists a \(k>0\) and \(N>0\) such that if \(n\geq N\) then \(\abs{S_n} \leq kn\). We know that almost surely there exists an \(n\geq N+1\) such that \(\abs{X_n} \geq 2kn\). But then

\[ \begin{equation} \abs{S_n} \geq \abs{X_n} - \abs{S_{n-1}} \geq 2kn - k(n-1) > kn \end{equation} \]

This is a contradiction. Hence, with probability 1, \(\abs{S_n}/n > k\) infinitely often for any \(k>0\) and therefore

\[ \begin{equation} \limsup \frac {\abs{S_n}} n = \infty \text{ a.s.} \end{equation} \]

Problem 4.7

What's fair about a fair game? Let \(X_1,X_2,\dots\) be independent random variables such that

\[ \begin{equation} X_n = \begin{cases} n^2-1 & \text{with probability } n^{-2} \\ -1 & \text{with probability } 1-n^{-2} \end{cases} \end{equation} \]

Prove that \(\E X_n = 0\) for all \(n\), but that if \(S_n = X_1 + X_2 + \dots + X_n\) then

\[ \begin{equation} \frac {S_n} n \to -1 \text{ a.s.} \end{equation} \]

An easy calculation shows

\[ \begin{align} \E X_n &= (n^2-1) \cdot \Pr(n^2-1) + -1 \cdot \Pr(-1) \\ &= (1-n^{-2} ) + (n^{-2}-1) = 0 \end{align} \]

Now let \(A_n\) be the event \(X_n = -1\). Since

\[ \begin{equation} \sum_n \Pr(A_n^c) = \sum_n n^{-2} <\infty \end{equation} \]

By Borel-Cantelli, with probability 1, only finitely many of the \(A^c_n\) occur. Therefore, with probability 1, \(X_n=-1\) with a finite number of exceptions. On any such sample, let \(N\) be the largest index such that \(X_N = 1-n^{-2}\). Then for \(n>N\)

\[ \begin{equation} \frac {S_n} {n} = \frac {S_N} n + \frac{ -1 \cdot (n-N)} n \end{equation} \]

The as \(n\to \infty\), the first term tends to zero because \(S_N\) is fixed and the second term tends to -1. (Note the indepdendence of \(X_n\) was not needed in this argument)

Problem 4.8

This exercise assumes that you are familiar with continuous-parameter Markov chains with two states.

For each \(n\in \N\), let \(X^{(n)} = \{X^{(n)}(t) : t \geq 0 \}\) be a Markov chain with state-space the two-point set \(\{0, 1\}\) with \(Q\)-matrix

\[ \begin{equation} Q^{(n)} = \begin{pmatrix} -a_n & a_n \\ b_n & -b_n \end{pmatrix}, \qquad a_n,b_n>0 \end{equation} \]

and transition function \(P^{(n)}(t) = \exp(tQ^{(n)})\). Show that, for every \(t\),

\[ \begin{equation} p_{00}^{(n)}(t) \geq b_n / (a_n+b_n), \qquad p_{01}^{(n)}(t) \leq a_n / (a_n+b_n) \label{eq:blackwell transition bounds} \end{equation} \]

The processes \((X^{(n)} : n \in \N)\) are independent and \(X^{(n)}(0) = 0\) for every \(n\). Each \(X^{(n)}\) has right-continuous paths.

Suppose that \(\sum a_n = \infty\) and \(\sum a_n/b_n <\infty\). Prove that if \(t\) is a fixed time then

\[ \begin{equation} \Pr( X^{(n)}(t)=1 \text{ for infinitely many }n) = 0 \end{equation} \]

Use Weierstrass's \(M\)-test to show that \(\sum_n \log p_{00}^{(n)}(t)\) is uniformly convergent on \([0,1]\), and deduce that

\[ \begin{equation} \Pr( X^{(n)}(t)=0 \text{ for ALL }n) \to 1\qquad \text{as } t\downarrow 0 \end{equation} \]

Prove that

\[ \begin{equation} \Pr( X^{(n)}(s)=0, \forall s\leq t, \forall n ) = 0 \text{ for every } t>0 \end{equation} \]

and discuss with your tutor why it is almost surely true that within every non-empty time interval, infinitely many of the \(X^{(n})\) chains jump.

First let's calculate the matrix exponential

\[ \begin{equation} Q = \begin{pmatrix} -a & a \\ b & -b \end{pmatrix} \qquad \text{we have} \qquad Q^2 = \begin{pmatrix} a^2 +ab & -a^2-ab \\ -ab-b^2 & ab + b^2 \end{pmatrix} = -(a+b)Q \end{equation} \]

Thus, by induction, \(Q^n = (-a-b)^{n-1} \, Q\), and therefore

\[ \begin{equation} \begin{split} \exp(tQ) &= I + \sum_{k\geq 1} \frac{t^k Q^k}{k!} = I - \frac Q {a+b} \sum_{k\geq 1} \frac{t^k(-a-b)^k}{k!} = I- \frac{e^{-(a+b)t}-1}{a+b} Q \\ &= \frac 1 {a+b} \begin{pmatrix} b + ae^{-t(a+b)} & a - ae^{-t(a+b)}\\ b- b e^{-t(a+b)} & a+b e^{-t(a+b)} \end{pmatrix} \end{split} \end{equation} \]

From this \(\eqref{eq:blackwell transition bounds}\) is evident.

For fixed \(t>0\),

\[ \begin{equation} \sum_n \Pr( \{ X^{(n)}(t)=1 \}) = \sum_n p^{(n)}_{01}(t)\leq \sum_n \frac{a_n}{a_n+b_n} \leq \sum_n \frac{a_n}{b_n} <\infty \end{equation} \]

Thus by Borel-Cantelli, almost surely, at any time \(t\) only finitely many of the \(X^{(n)}(t)= 1\)

Now consider the joint probability \(\Pi(t)\) that every \(X^{(n)}(t)=0\) at time \(t\). This satisfies the bound

\[ \begin{equation} -\log \Pi(t) = -\log \prod_n p^{(n)}_{00}(t) \leq \sum_n \log\left( \frac{ a_n + b_n }{b_n}\right) \leq \sum_n \frac{a_n}{b_n} < \infty \end{equation} \]

This gives a uniform bound for the convergence of \(-\log \Pi(t)\), implies the product converges uniformly on any compact subset of \(\R_{\geq 0}\), such as \([0,1]\). In particular this means that

\[ \begin{equation} \lim_{t\downarrow 0} \Pi(t) = \Pi(0) = \prod_{n=1}^\infty 1 = 1 \end{equation} \]

TODO finish this

Tail \(\sigma\)-algebras

Problem 4.9

Let \(Y_0, Y_1, Y_2,\dots\) be independent random variables with

\[ \begin{equation} Y_n = \begin{cases} +1 & \text{probability } \frac 1 2 \\ -1 & \text{probability } \frac 1 2 \\ \end{cases} \end{equation} \]

Define

\[ \begin{equation} X_n = Y_0 Y_1 \cdots Y_n \end{equation} \]

Prove the \(X_n\) are independent. Define

\[ \begin{equation} \mathcal Y = \sigma(Y_1,Y_2,\dots), \qquad \mathcal T_n = \sigma(X_r : r > n) \end{equation} \]

Prove

\[ \begin{equation} \mathcal L := \bigcap_n \sigma(\mathcal Y, \mathcal T_n) \neq \sigma\left( \mathcal Y, \bigcap_n \mathcal T_n \right) =: \mathcal R \end{equation} \]

First note \(\Pr( X_n = 1) = \Pr( X_n = -1 ) = \frac 1 2\), so \(X_n\) has the same distribution as \(Y_n\). If we let \(X_0=Y_0\), this follows by induction since

\[ \begin{equation} X_n = \begin{cases} X_{n-1} & \text{with probability } \frac 1 2 \\ -X_{n-1} & \text{with probability } \frac 1 2 \\ \end{cases} \end{equation} \]

and so, for example,

\[ \begin{equation} \Pr( X_n = 1 ) = \frac 1 2 \Pr( X_n=1 ) + \frac 1 2 \Pr( X_{n-1}=-1 ) = 2 \cdot \frac 1 4 = \frac 1 2 \end{equation} \]

For \(m<n\) and \(\epsilon_m,\epsilon_n\in \{-1,1\}\), consider \(\Pr( \{ X_m = \epsilon_m \} \cap \{ X_n = \epsilon_n \})\). Clearly this is the same as \(\Pr( \{ X_m = \epsilon_m \} \cap \{ X_m \cdot X_n = \epsilon_m \epsilon_n \})\). Given \(X_m\) and \(X_n\) we can multiply them to generate \(X_m \cdot X_n\). Conversely, given \(X_m\) and \(X_m \cdot X_n\) we can multiply them to recover \(X_n\). Note also that

\[ \begin{equation} X_m\cdot X_n = (Y_0 \cdot Y_1 \cdots Y_m)^2 \, Y_{m+1}\cdots Y_n = Y_{m+1}\cdots Y_n \end{equation} \]

Hence \(X_m\) and \(X_m \cdot X_n\) are independent, since they are functions of independent random variables. Thus

\[ \begin{equation} \begin{split} \Pr( \{ X_m = \epsilon_m \} \cap \{ X_n = \epsilon_n \}) &= \Pr( \{ X_m = \epsilon_m \} \cap \{ X_m \cdot X_n = \epsilon_m \epsilon_n \}) \\ &= \Pr( \{ X_m = \epsilon_m \} ) \Pr( \{ X_m \cdot X_n = \epsilon_m \epsilon_n \})\\ &= \frac 1 2 \cdot \frac 1 2 = \Pr( X_m = \epsilon_m )\Pr( X_n = \epsilon_n ) \end{split} \end{equation} \]

This argument generalizes to any finite set of \(X_{n_1}, \dots, X_{n_k}\) by considering the isometric transformation

\[ \begin{equation} \begin{pmatrix} X_{n_1} \\ X_{n_2} \\ \vdots \\ X_{n_k} \end{pmatrix} \to \begin{pmatrix} X_{n_1} \\ X_{n_1}X_{n_2} \\ \vdots\\ X_{n_1}X_{n_2} \cdots X_{n_k} \end{pmatrix} \end{equation} \]

and noting each of the variables on the right hand side are independent with the same distribution as \(X_n\) so the joint probability is \(2^{-k}\) for any fixed values \(\epsilon_{n_k}\). But this is the same as \(\prod_k \Pr( X_{n_k} = \epsilon_{n_k})\).

Furthermore \(X_n\) is independent of \(Y_m\). If \(m>n\) this is obvious because the definition of \(X_m\) involves terms independent of \(Y_n\) by definition. If \(m\leq n\) then consider the isometric transformation \((Y_m, X_n) \to (Y_m, X_n \cdot Y_m)\). The second variable is the product \(Y_0 \cdots Y_{m-1} \cdot Y_{m+1} \cdots Y_n\) which is clearly independent of \(Y_m\)

Suppose we know the values of \(Y_k\) for all \(k\geq 1\) and also some \(X_r\). Then we can immediately deduce the value of \(Y_0\) since

\[ \begin{equation} Y_0 = Y_0 \cdot (Y_1 \cdots Y_r )^2 = X_r \cdot Y_1 \cdot Y_2 \cdots Y_r \end{equation} \]

This shows that for any \(n\), \(Y_0\) is measurable in the algebra \(\sigma(Y_1,\dots, Y_n, X_n)\subset \sigma(\mathcal Y, \mathcal T_{n-1})\). Therefore \(Y_0\) is measurable in \(\mathcal L = \bigcap_n \sigma( \mathcal Y, \mathcal T_n)\)

On the other hand, \(Y_0\) is independent of \(\mathcal R\). Let \(\mathcal T = \cap_n \mathcal T_n\) be the tail field. By the above argument \(Y_0\) is independent of \(\sigma(Y_1,\dots,Y_m, \mathcal T_{n})\) for any \(n>m+1\). Therefore \(Y_0\) is independent of \(\mathcal R_m = \sigma(Y_1,\dots,T_m,\mathcal T)\) since its a subset. But \(\mathcal R_m \subset \mathcal R_{m+1}\) and so \(Y_0\) is independent of \(\cup_n \mathcal R_n = \mathcal R\)

Contact

For comments or corrections please contact Ryan McCorvie at ryan@martingale.group