Convergence

You can read the LaTeX document online (for the latest updated chapters) from the link: probability.pdf

Chapter 5: Convergence

Contents
Contents
 1.  Convergence of random variables
 2.  Convergence of measures
 3.  Relationships among various modes of convergence

1. Convergence of random variables Here and hereafter the term ''convergence'' will be used to mean convergence to a finite limit. The first notion of convergence would be the following one:

Definition 1 (Convergence almost everywhere). The sequence of random variables \{X _ n\} is said to converge almost everywhere (to the random variable X) iff there exists a null set N such that for all \omega\in\Omega\setminus N, X _ n(\omega)\to X(\omega)<\infty.

Theorem 2. The sequence \{X _ n\} converges to X iff
\[
\lim _ {n\to\infty} \mathbb{P}(|X _ m-X|<\varepsilon,\, \forall\, m\geqslant n)=1,\quad \forall\varepsilon>0;
\]
or equivalently
\[
\lim _ {n\to\infty} \mathbb{P}(|X _ m-X|>\varepsilon,\, \exists\, m\geqslant n)=0,\quad \forall\varepsilon>0.
\]

The proof can be found in this link: https://gaomj.cn/convergence-almost-surely/.

A weaker concept of convergence is of basic importance in probability theory.

Definition 3 (Convergence in probability). The sequence of random variables \{X _ n\} is said to converge in probability (to the random variable X) iff
\[
\lim _ {n\to\infty}\mathbb P(|X _ n-X|>\varepsilon)=0, \quad\forall \varepsilon>0.
\]

From the previous theorem we have the immediate consequence below.

Theorem 4. Convergence a.e. (to X) implies convergence in probability. (to X) .

Theorem 5. The sequence \{X _ n\} converges a.e. iff for every \varepsilon we have
\[
\lim _ {N\to\infty}\mathbb P(|X _ m-X _ n|<\varepsilon, \, \forall m,n>N)=1.
\]

Its obvious analogue for convergence in probability is
\[
\lim _ {m,n\to\infty}\mathbb P(|X _ m-X _ n|<\varepsilon)=1,\quad \forall\varepsilon>0.
\]
It can be shown that this implies the existence of a finite X such that X _ n\to X in probability.


Definition 6 (Convergence in L^p, 0<p<\infty). The sequence of random variables \{X _ n\} is said to converge in L^p (to the random variable X) iff X _ n,X\in L^p and \mathbb E(|X _ n-X|^p)\to0.

Like in functional analysis, we mostly consider the case where 1\leqslant p\leqslant\infty.

We will investigate the relation of these modes of convergence in some later sections. First let us take a look at the following proposition. It can be a corollary of some later results.

Proposition 7. If X _ n\stackrel{L^p}{\to}X, then X _ n\stackrel{p}{\to}X. The converse is true provided that \{X _ n\} is dominated by some Y\in L^p.

As a consequence, X _ n\stackrel p\to X if and only if \mathbb E(|X _ n-X|\mathbin/(1+|X _ n-X|))\to0. Furthermore, the functional \rho(\cdot,\cdot):=\mathbb E(|X-Y|\mathbin/(1+|X-Y|)) is a metric in the space of random variables, provided that we identify random variables that are equal a.e.

This section ends with another kind of convergence which is basic in functional analysis. However we confine ourselves to L^1 here. Recall in functional analysis, when X is a normed vector space, \{x _ n\}\subseteq X is said to converge weakly in X if there exists x\in X such that for each x'\in X'(the dual space), x'(x _ n)\to x'(x) as n\to\infty. In L^1, given any random variable that is bounded a.e., the relation \ell(X)=\mathbb E(XY) defines a continuous linear functional \ell on L^1. Furthermore, the linear isometry Y\to\ell defined in this fashion is bijective, i.e., given any continuous linear functional \ell on L^1, there exists one and only one Y which is bounded a.e. such that \ell(X)=\mathbb E(XY) for all X\in L^1. This result can be named F. Riesz representation theorem in L^1.

Hence \{X _ n\} in L^1 is said to converge weakly in L^1 to X iff for each Y that is bounded a.e. we have \mathbb E(X _ nY)=E(XY). Clearly convergence in L^1 defined above implies weak convergence; hence the former is sometimes referred to as ''strong''.

Borel-Cantelli lemma
The Borel–Cantelli lemma is a theorem about sequences of events. The following lemma is usually called the first Borel-Cantelli lemma.

Theorem 8 (Borel-Cantelli lemma Ⅰ). For arbitrary events \{A _ n\}:
\[
\sum _ {n=1}^\infty\mathbb{P}(A _ n)<\infty\implies\mathbb{P}\Big(\limsup _ {n\to\infty}A _ n\Big)=0.
\]

It can be seen easily that if \sum _ {n=1}^\infty\mathbb{P}(|X _ n-X|>\varepsilon)<\infty for all \varepsilon>0, then X _ n\to X almost everywhere, by the following theorem:

Theorem 9. We have
\[
\mathbb{P}\Big(\limsup _ {n\to\infty}\{|X _ n-X|>\varepsilon\}\Big)=0 \, \Longleftrightarrow\, X _ n\stackrel{\text{a.e.}}{\to}X.
\]

This theorem can be shown by
\[
\lim _ {k\to\infty}\mathbb{P}\Big(\bigcap _ {n=k}^\infty\, \{|X _ n-X|<\varepsilon\}\Big)=\lim _ {k\to\infty} \mathbb{P}(|X _ n-X|<\varepsilon,\, \forall\, n\geqslant k)=1.
\]

Theorem 10.
  • If X _ n\to X in probability, then there exists a subsequence that converges to X almost everywhere.
  • For any subsequence of \{X _ n\} _ {n=1}^\infty, there exists a sub-subsequence that converges to X almost everywhere.

The proof is the same as the one in real analysis.

Under the assumption of independence, the first Borel-Cantelli lemma has a striking complement.

Theorem 11 (Borel-Cantelli lemma Ⅱ). If \{A _ n\} are independent, then
\[
\sum _ {n=1}^\infty\mathbb P(A _ n)=\infty\implies \mathbb P\Big(\limsup _ {n\to\infty} A _ n\Big)=1.
\]

Theorem 12. The implication of the last theorem remains true if \{A _ n\} are pairwise independent.

For the proof, we just need \mathbb P(\liminf A _ n^c)=0, i.e., \mathbb P(\bigcap _ {n=m}^\infty A _ n^c)\to0\, (m\to\infty). From the independence we have \mathbb P(\bigcap _ {n=m}^{m'} A _ n^c)=\prod _ {n=m}^{m'}(1-\mathbb P(A _ n))\leqslant\exp(-\sum _ {n=m}^{m'}\mathbb P(A _ n))\to0 as m'\to\infty. For the case where the events are pairwise independent, the proof is a little longer. Let I _ n:=\chi _ {A _ n}, so that the hypothesis of pairwise independence becomes \mathbb E(I _ mI _ n)=\mathbb E(I _ m)\mathbb E(I _ n) for any different m,n; the conclusion we desired becomes \mathbb P(\sum I _ n=+\infty)=1, because \omega\in \limsup A _ n just means \omega was in an infinite number of A _ n; and the hypothesis \sum\mathbb P(A _ n)=\infty becomes \sum\mathbb E(I _ n)=+\infty.

Consider the partial sum J _ k=\sum _ {n=1}^{k}I _ n. Using Chebyshev's inequality, we have for every A>0, \mathbb P(|J _ k-\mathbb E(J _ k)|\leqslant A\sigma(J _ k))\geqslant1-\frac{\mathrm{Var}(J _ k)}{A^2\mathrm{Var}(J _ k)}=1-\frac1{A^2}. Denote p _ n=\mathbb E(I _ n). Now we want to calculate \mathrm{Var}(J _ k)=\mathbb E(J _ k^2)-\mathbb E(J _ k)^2. We have \mathbb E(J _ k^2)=\sum \mathbb E(I _ n^2)+2\sum \mathbb E(I _ m)E(I _ n)=\sum p _ n^2+2\sum p _ mp _ n+\sum(p _ n-p _ n^2), so \mathrm {Var}(J _ k)=\sum _ {i=1}^{k}(p _ n-p _ n^2). If you were familiar with the notion of covariance the calculation might be much simplified. Now since \sum p _ n=\mathbb E(J _ k)\to\infty, it follows that \sigma(J _ k)\leqslant(\sum p _ n)^{1/2}=\sqrt{\mathbb E(J _ k)}=o(\mathbb E(J _ k)). Note that we have obtained before that with probability at least 1-\frac1{A^2} it holds that |J _ k-\mathbb E(J _ k)|\leqslant A\sigma(J _ k). For sufficiently large k the right side is less than \frac12 \mathbb E(J _ k), so with probability at least 1-\frac1{A^2} we have J _ k\geqslant\frac12\mathbb E(J _ k), and therefore \lim _ n J _ n\geqslant\frac12\mathbb E(J _ k) (J _ k is increasing). Thus letting k\to\infty we obtain it holds that \lim _ n J _ n=+\infty with probability at least 1-\frac1{A^2}. Since A can be arbitrary large, \lim _ n J _ n=+\infty a.e.

Corollary 13. If \{A _ n\} are independent, then \mathbb P(\limsup A _ n)=0 or 1 according as \sum\mathbb P(A _ n) converges or diverges.

The first Borel-Cantelli lemma is often referred to as the ''convergence part'' and the second ''the divergence part''. The first is more useful since the events there may be completely arbitrary. The second has the extension to pairwise independent random variables; although the result is of some interest, it is the method of proof given that is more important. It is a useful technique in probability theory.

2. Convergence of measures In this section the contents are provided in a general framework. All the proofs are not given because maybe they are beyond the scope of this review.

Due to the generality, we need some definitions and facts from topology. For example, the sets that are compact, relatively compact, sequentially compact (relatively sequentially compact), locally compact (any point has an open neighborhood whose closure is compact), may be considered. The concept of Polish space is mentioned in the previous chapters. In short, a Polish space is a separable completely metrizable topological space. This is one type of topological space that is of particular importance in probability theory. Examples of Polish spaces are countable discrete spaces, the Euclidean spaces \mathbb R^n, and the space C[0, 1] of continuous functions, equipped with the sup-norm. In practice, all spaces that are of importance in probability theory are Polish spaces.

In the following, let (E, \tau) be a topological space with Borel \sigma-algebra \mathcal E=\mathcal B(E):=\sigma(\tau) and with complete metric d. Define the following spaces of measures on E: \mathcal M _ 1(E):=\{\text{probability measures on }E\}, and \mathcal M _ {\leqslant1}(E):=\{\text{finite measures on }E\mid \mu(E)\leqslant 1\}. The elements in the latter are called sub-probability measures on E. Define the following spaces of continuous functions on E: let C(E) be all continuous functions on E, C _ K(E) be all continuous functions with compact support, C _ 0(E) be all continuous functions that vanish at infinity, and C _ b(E) be all bounded continuous functions.

Now we are ready for the definitions of weak convergence and vague convergence.

Definition 14 (weak and vague convergence). Let E be a metric space. We consider a sequence of finite measures \{\mu _ n\} and a finite measure \mu.
  1. We say that \{\mu _ n\} _ {n=1}^\infty converges weakly to \mu, denoted by \mu _ n\stackrel{w}{\to}\mu if
    \[
    \int f\, \mathrm d\mu _ n\to\int f\, \mathrm d\mu,\quad\forall f\in C _ b(E).
    \]
  2. We say that \{\mu _ n\} _ {n=1}^\infty converges vaguely to \mu, denoted by \mu _ n\stackrel{v}{\to}\mu if
    \[
    \int f\, \mathrm d\mu _ n\to\int f\, \mathrm d\mu,\quad\forall f\in C _ K(E).
    \]

Weak convergence is a notion basically introduced in functional analysis. If we consider the set of all finite signed measures, denoted as X. It can be shown that X can be equipped with a so-called total variation norm |\cdot|: \mu\mapsto\sup _ A(\mu(A)-\mu(A^c)). For this normed vector space, we consider the dual space X' of continuous linear functionals X\to\mathbb R. The weak convergence in functional analysis under the topology induced by this norm is: \mu _ n\stackrel{w}{\to}\mu if x'(\mu _ n)\to x'(\mu) for every x'\in X'. This can be proved to be equivalent to: (\mu _ n) is bounded and \mu _ n(A)\to\mu(A) for any measurable set A. However, this is slightly different from the convergence in Definition 14, which holds if and only if \mu _ n(A)\to\mu(A) for all measurable set A with \mu(\partial A)=0 (this kind of A is called continuity set). Hence this is weaker than the one in functional analysis. In short, if we topologize in the above way and call the convergence in Definition 14 ''weak convergence'', then this kind of weak convergence is somewhat different from the one in functional analysis.

If we topologize in another way, we can see weak convergence induces on \mathcal M _ f(E) actually the weak* topology. It is believed that this brings the main seasons why some old books prefer to use ''vague convergence'' rather than ''weak convergence''. The authors would find the conflicts on terminology of ''weak'' between probability and functional analysis! They actually got ''weak *'' instead of ''weak'' in the context of functional analysis. People seldom use ''weak * convergence'' in probability but ''vague convergence'' or ''weak convergence'' because they don't like saying something like ''converge weak-star-ly'' or putting a lot of asterisks in their texts.

Recall that in functional analysis, when X is a normed vector space, (x _ n')\subseteq X' is said to weakly * converge in X' if there exists x'\in X' such that for each x\in X, x' _ n(x)\to x'(x), and the weak * topology on X is by definition the weakest topology on X such that all the mappings \varphi _ x:x'\in X'\mapsto \varphi(x'):= x'(x)\in\mathbb K, are continuous. In the case here, X=C _ b(E) and the norm is the sup-norm |\cdot| _ \infty. The dual is all functionals that map a bounded continuous function f on E to a number in \mathbb K. Consider the functionals: f\in C _ b(E)\mapsto \mu(f):=\int f\, \mathrm d\mu. Obviously they are all linear and continuous, so \mathcal M _ f(E)\subseteq C _ b(E)'. By the definition of weak * convergence in functional analysis, a sequence (\mu _ n) is said to weakly * converge in C _ b(E)' if there exists \mu\in C _ b(E)' such that \mu _ n(f)\to\mu(f) for all f\in C _ b(E), which coincides with the defining formula in Definition 14. \mathcal M _ f(E)\subseteq C _ b(E)' implies that the induced topology here is the trace of the weak * topology on \mathcal M _ f(E) (trace topology is also called subspace topology).

The contents of the following theorem can be called Portemanteau theorem.

Theorem 15. Let E be a metric space and let \mu,\mu _ 1,\mu _ 2,\dots\in \mathcal M _ {\leqslant1}(E). The following are equivalent.
  1. \mu _ n\stackrel{w}{\to}\mu _ n.
  2. For all bounded Lipschitz continuous f, \int f\, \mathrm d\mu _ n\to\int f\, \mathrm d\mu.
  3. For all bounded measurable f with \mu(U _ f)=0, where U _ f is the set of points of discontinuity of f which can be shown measurable, we have \int f\, \mathrm d\mu _ n\to\int f\, \mathrm d\mu.
  4. \liminf \mu _ n(E)\geqslant\mu(E) and for all closed F\subseteq E, \limsup\mu _ n(F)\leqslant\mu(F).
  5. \limsup\mu _ n(E)\leqslant\mu(E) and for all open G\subseteq E, \liminf\mu _ n(G)\geqslant\mu(G).
  6. For all measurable A with \mu(\partial A)=0 (i.e., continuity set), \mu _ n(A)\to\mu(A).

Prohorov's theorem
In the following, let E be a Polish space with Borel \sigma-algebra \mathcal E. A fundamental question is: When does a sequence \{\mu _ n\} _ {n=1}^\infty of measures on (E, \mathcal E) converge weakly or does at least have a weak limit point? Evidently, a necessary condition is that \{\mu _ n(E)\} _ {n=1}^\infty is bounded. Hence, without loss of generality, we will consider only sequences in \mathcal M _ {\leqslant1}(E), as we did before. However, this condition is not sufficient for the existence of weak limit points, as for example the sequence \{\delta _ n\} _ {n\in\mathbb N} of probability measures on \mathbb R does not have a weak limit point (although it converges vaguely to the zero measure). This example suggests that we also have to make sure that no mass ''vanishes at infinity''. The idea will be made precise by the notion of tightness.

Definition 16. A family \mathcal F\subseteq\mathcal M _ f(E) is called tight if, for any \varepsilon > 0, there exists a compact set K\subseteq E such that \sup\{\mu(E\setminus K)\mid \mu\in\mathcal F\}<\varepsilon.

Recall that a family \mathcal F of measures is called weakly relatively sequentially compact if every sequence in \mathcal F has a weak limit point (in the closure of \mathcal F).

Theorem 17 (Prohorov). Let (E, d) be a metric space and \mathcal F\subseteq \mathcal M _ {\leqslant1}(E). Then:
  1. \mathcal F is tight \implies \mathcal F is weakly relatively sequentially compact.
  2. If E is Polish, then the converse holds: \mathcal F is tight \Longleftarrow \mathcal F is weakly relatively sequentially compact.

If E is compact, then \mathcal M _ 1(E) and \mathcal M _ {\leqslant1}(E) are tight, so they are weakly sequentially compact.

Corollary 18. If E is a locally compact separable metric space, then \mathcal M _ {\leqslant1}(E) is vaguely sequentially compact.

When E=\mathbb R, the following Helly's theorem is useful.

Theorem 19. Let \{F _ n\} _ {n=1}^\infty be a uniformly bounded sequence in V, where V is the set of functions that are right continuous, monotone increasing and bounded. Then there exists an F \in V and a subsequence \{F _ {n _ k}\} with F _ {n _ k}(x)\to F(x) at all points of continuity of F.

Recall that in numerical sequences, if a sequence of real numbers such that every subsequence that tends to a limit (\pm\infty allowed) as the same value for the limit; then the whole sequence tends to this limit. In particular a bounded sequence such that every convergent subsequence has the same limit is convergent to this limit. The next theorem generalizes this result to vague convergence of sub-probability measures in \mathbb R.

Theorem 20. Suppose E=\mathbb R. If every vaguely convergent subsequence of the sequence of sub-probability measures \{\mu _ n\} converges to the same \mu, then \mu _ n\stackrel{v}{\to}\mu.

To prove the theorem by contraposition, suppose \mu _ n does not converge vaguely to \mu. Then it can be proved that there exists a continuity interval (a,b) of \mu such that \mu _ n(a,b) does not converge to \mu(a,b), so there exists a subsequence \{\mu _ {n _ k}\} such that it converges to a limit L different from \mu(a,b). From \{\mu _ {n _ k}\} there exists a sub-subsequence, say \{\mu _ {n _ {k(n')}}\}, which converges vaguely to \mu by hypothesis of the theorem. Hence \mu _ {n _ {k(n')}}(a,b)\to\mu(a,b). But the left side also converges to L, which is a contradiction.

We end this section with the following theorem on the connection between weak and vague convergence.

Theorem 21. Let E be a locally compact Polish space and let \mu,\mu _ 1,\mu _ 2,\dots\in\mathcal M _ f(E). Then the following are equivalent.
  1. \mu _ n\stackrel{w}{\to}\mu.
  2. \mu _ n\stackrel{v}{\to}\mu and \mu(E)=\lim\mu _ n(E).
  3. \mu _ n\stackrel{v}{\to}\mu and \mu(E)\geqslant\limsup _ n\mu _ n(E).
  4. \mu _ n\stackrel{v}{\to}\mu and \{\mu _ n\} is tight.

3. Relationships among various modes of convergence In section 5.1, we have some implications on convergence a.e. and convergence in probability. In this section we investigate more on the relationships among various modes of convergence. We first need one more notion of convergence of random variables: convergence in distribution.

Definition 22. The sequence of random variables \{X _ n\} is said to converge in distribution (to random variable X), if the probability measures of their distributions converge weakly.

Definition 23. Let F,F _ 1,F _ 2,\dots be distribution functions of probability measures on \mathbb R. We say that \{F _ n\} converges weakly to F, if for all points of continuity x of F, F _ n(x)\to\ F(x).

If F,F _ 1,F _ 2,\dots are distribution functions of sub-probability measures, then we define F(\infty):=\lim _ {x\to\infty}F(x) and for weak convergence require in addition F(\infty)\geqslant\limsup F _ n(\infty).

Note that the defining relation in the above definition implies F(\infty)\leqslant\liminf F _ n(\infty). Hence if F _ n\to F in distribution, then F(\infty)=\lim F _ n(\infty).

For the connection between these two definitions, we have:

Theorem 24. Let \mu,\mu _ 1,\mu _ 2,\dots\in\mathcal M _ {\leqslant1}(\mathbb R) with corresponding distribution functions F,F _ 1,F _ 2,\dots. Then
\[
\mu _ n\stackrel{w}\to\mu _ n\, \Longleftrightarrow\, F _ n\stackrel{d}\to F.
\]

Corollary 25. Let X,X _ 1,X _ 2,\dots be real random variables with distribution functions F,F _ 1,F _ 2,\dots. Then the following are equivalent.
  • X _ n\stackrel{d}{\to}X.
  • \mathbb E[f(X _ n)]\to\mathbb E[f(X)] for all f\in C _ b(\mathbb R).
  • F _ n\stackrel{d}{\to}F.

The continuous-mapping theorem is a simple result, but it is extremely useful. If the sequence of X _ n converges to X and g is continuous, then g(X _ n) converges to g(X). This is true for each of the three modes of stochastic convergence.

Theorem 26 (Continuous mapping). Let (E _ 1,d _ 1) and (E _ 2,d _ 2) be metric spaces and let \varphi:E _ 1\to E _ 1 be measurable. Denote by U _ \varphi the set of point of discontinuity of \varphi. Suppose X,X _ 1,X _ 2,\dots are E _ 1-valued random variables with \mathbb P(X\in U _ \varphi)=0.
  1. If X _ n\stackrel{d}{\to}X, then \varphi(X _ n)\stackrel{d}{\to}\varphi(X).
  2. If X _ n\stackrel{p}{\to}X, then \varphi(X _ n)\stackrel{p}{\to}\varphi(X).
  3. If X _ n\stackrel{a.s.}{\to}X, then \varphi(X _ n)\stackrel{a.s.}{\to}\varphi(X).

Proof. First note that U _ \varphi\subseteq E _ 1 is Borel measurable. Hence the conditions make sense.

Denote the probability measures of X,X _ 1,\dots by \mu,\mu _ 1,\dots. Then \mu(U _ \varphi)=0. Let f\in C _ b(E _ 2). Then f\circ \varphi is bounded and measurable, and its set of discontinuity must be ''smaller'' because every x continuous in \varphi is continuous in f\circ\varphi. Hence \mu(U _ {f\circ\varphi})=0. By Theorem 15,
\[
\lim _ {n\to\infty}\int f\, \mathrm d(\mu _ n\circ\varphi^{-1})=\lim _ {n\to\infty}\int(f\circ\varphi)\, \mathrm d\mu _ n=\int(f\circ\varphi)\, \mathrm d\mu=\int f\, \mathrm d(\mu\circ\varphi^{-1}).
\]
Therefore \mu _ n\circ\varphi^{-1}\stackrel{w}{\to}\mu\circ\varphi^{-1}, again by Theorem 15. (1) holds since \mu _ {\varphi(X)}=\mu _ X\circ\varphi^{-1}.

Now we consider (2). Fix \varepsilon>0, we want to show \mathbb P[d(\varphi(X _ n),\varphi(X))\geqslant\varepsilon]\to0 as n\to\infty. Suppose \delta>0. Consider the complement of the set \{x\in E _ 1\mid d(x,y)<\delta\Longrightarrow d(\varphi(x),\varphi(y))<\varepsilon\}, denoted by B _ \delta. If d(\varphi(X _ n),\varphi(X))\geqslant\varepsilon, then either d(X _ n,X)\geqslant\delta, or X\in B _ \delta. Consequently,
\[
\mathbb P[d(\varphi(X _ n),\varphi(X))\geqslant\varepsilon]\leqslant\mathbb P(X\in B _ \delta)+\mathbb P(d(X _ n,X)\geqslant\delta).
\]
By the definition of convergence in probability, \mathbb P(d(X _ n,X)\geqslant\delta)\to0 as n\to\infty. Then we let \delta\to0, and we obtain \mathbb P(X\in B _ \delta)\to0 as \delta\to0 because B _ \delta\cap U _ \varphi^c\downarrow\varnothing by the continuity of \varphi.

Assertion (3) is trivial.

Theorem 27. Let X _ n,Y _ n,X be random variables with values in E.
  1. X _ n\stackrel{a.s.}{\to}X\implies X _ n\stackrel{p}{\to}X\implies X _ n\stackrel{d}{\to}X.
  2. X _ n\stackrel{p}\to c for a constant c iff X _ n\stackrel{d}{\to}c.
  3. If d(X _ n,Y _ n)\stackrel{p}\to0 and X _ n\stackrel{d}\to X, then Y _ n\stackrel{d}\to X.
  4. If X _ n\stackrel{d}\to X and Y _ n\stackrel{d}\to c for a constant c, then (X _ n,Y _ n)\stackrel{d}\to (X,c).
  5. If X _ n\stackrel{p}\to X and Y _ n\stackrel{p}\to Y, then (X _ n,Y _ n)\stackrel{p}\to (X,Y).

Proof. (1) The first part: see Section 5.1; the second: d(X,X _ n)\stackrel{p}{\to}0 and X\stackrel d\to X. Then apply (3).

(2) We just need to show the ''if'' part. \mathbb P(|X _ n-c|\geqslant\varepsilon)=\mathbb P(X _ n\in B(c,\varepsilon)^c). By the (4) part of Theorem 15, \limsup \mathbb P(X _ n\in F)\leqslant\mathbb P(X\in F) for all closed set F. The completion of an open ball is clearly a closed set, so \limsup\mathbb P(X _ n\in B(c,\varepsilon)^c)\leqslant\mathbb P(c\in B(c,\varepsilon)^c)=0.

(3) We prove by showing that \mathbb E(f(X _ n)) and \mathbb E(f(Y _ n)) have the same limit for all bounded and Lipschitz continuous functions, and applying the (2) part of Theorem 15. For f with sup-norm M and Lipcshitz constant K, we take some \varepsilon>0, then
\begin{align*}
&|\mathbb E(f(X _ n))-\mathbb E(f(Y _ n))| \leqslant\mathbb E|f(Y _ n)-f(X _ n)| \\
={}& \mathbb E[|f(Y _ n)-f(X _ n)|\cdot\chi _ {\{|X _ n-Y _ n|<\varepsilon\}}]+\mathbb E[|f(Y _ n)-f(X _ n)|\cdot\chi _ {\{|X _ n-Y _ n|\geqslant\varepsilon\}}]\\
\leqslant{}& K\varepsilon\, \mathbb E\chi _ {\{|X _ n-Y _ n|<\varepsilon\}}+2M\, \mathbb P(d(X _ n,Y _ n)\geqslant\varepsilon)
\leqslant K\varepsilon+2M\, \mathbb P(d(X _ n,Y _ n)\geqslant\varepsilon).
\end{align*}
Let n\to\infty. The second term converges to zero. Then let \varepsilon\to0. The first term also converges to zero.

(4) Note that d[(X _ n,Y _ n),(X _ n,c)]=d(Y _ n,c)\stackrel{p}\to0. Thus, according to (3), it suffices to show that (X _ n, c)\stackrel{d}{\to}(X,c). For every bounded continuous function f(x,y), the function x\mapsto f(x,c) is bounded and continuous. If X _ n\stackrel d\to X, then \mathbb E[f(X _ n,c)]\to\mathbb E[f(X,c)]. This implies (X _ n, c)\stackrel{d}{\to}(X,c).

(5) This follows from d[(x _ 1,y _ 1),(x _ 2,y _ 2)]\leqslant d(x _ 1,x _ 2)+d(y _ 1,y _ 2).

Corollary 28 (Slutsky). If X _ n\stackrel{d}\to X and Y _ n\stackrel{d}\to c for a constant c, then:

\[X _ n+Y _ n\stackrel{d}\to X+c;\quad X _ nY _ n\stackrel{d}\to cX;\quad Y _ n^{-1}X _ n\stackrel{d}\to c^{-1}X \text{ provided } c\neq0.\]

Uniform integrability
We now introduce the concept of uniform integrability, which is of basic importance in some later theorems.

Definition 29. A family of random variable \mathcal F\subseteq L^1 is called uniformly integrable if
\[
\lim _ {A\to\infty}\int _ {|X|>A}|X|\, \mathrm d\mathbb P=0
\]
uniformly in X\in\mathcal F, that is,
\[
\inf _ {A\in[0,\infty)}\sup _ {X\in\mathcal F}\int _ {|X|>A}|X|\, \mathrm d\mathbb P=0.
\]

Here are some simple criteria. Finite family: If \mathcal F is a finite set, then it is uniformly integrable; Dominated: If \mathcal F is uniformly integrable and for any g\in\mathcal G there exists an f\in\mathcal F with |g|\leqslant|f|, then \mathcal G is also uniformly integrable.

Let p\in[1,\infty]. A family \mathcal F\subseteq L^p is called bounded in L^p if \sup\{|f| _ p\mid f\in\mathcal F\}<\infty. When p>1, if \mathcal F is bounded in L^p, then \mathcal F is uniformly integrable.

Theorem 30. A family of random variables \{X _ t\} _ {t\in T}\subseteq L^1 is uniformly integrable iff the following two conditions are satisfied.
  1. Uniform bounded first absolute moments: \sup _ {t\in T}\mathbb E(|X _ t|)<\infty.
  2. Uniform absolute continuity: For every \varepsilon>0, there is a \delta(\varepsilon)>0 such that for any A\in\mathcal A:
    \[
    \mathbb P(A)<\delta(\varepsilon)\implies \int _ A|X _ t|\, \mathrm d\mathbb P<\varepsilon\quad (\forall t\in T).
    \]

We are now ready for the following theorem.

Theorem 31. Let X _ n\in L^1. The following statements are equivalent.
  1. There is an X\in L^1 with X _ n\stackrel{L^1}\to X.
  2. \{X _ n\} _ {n=1}^\infty is an L^1-Cauchy sequence; that is, \mathbb E|X _ m-X _ n|\to0 as m,n\to\infty.
  3. \{X _ n\} _ {n=1}^\infty is uniformly integrable and there is an X such that X _ n\stackrel p\to X.

Theorem 32. Let p\in[1,\infty] and X _ 1,X _ 2,\dots\in L^p. The following statements are equivalent.
  1. There is an X\in L^p with X _ n\stackrel{L^p}\to X.
  2. \{X _ n\} _ {n=1}^\infty is a Cauchy sequence in L^p.
  3. (This is only valid for p<\infty) \{|X _ n|\} _ {n=1}^\infty is uniformly integrable and there exists an X with X _ n\stackrel p\to X.

Finally, we end this chapter with some results in convergence of moments. The function |x|^r\, (r>0) is continuous but not bounded, so we cannot apply the definition of weak convergence.

Theorem 33. If X _ n\stackrel d\to X, \{X _ n\} is bounded in L^p, then for each r<p: \mathbb E|X _ n|^r\to\mathbb E|X|^r<\infty.

Theorem 34. Let 0<r<\infty, X _ n\in L^r, and X _ n\stackrel p\to X. Then X _ n\stackrel{L^r}\to X \Longleftrightarrow\mathbb E|X _ n|^r\to\mathbb E|X|^r<\infty.


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注