Central limit theorem

You can read the LaTeX document online (for the latest updated chapters) from the link: probability.pdf

Chapter 8: Central limit theorem The main goal of this chapter is the central limit theorem (CLT) for sums of independent random variables and for independent arrays of random variables (Lindeberg–Feller theorem). For the latter, we will not cover its proof.

With the knowledge of characteristic functions, we can prove the CLT for real-valued random variables. Later, we prove a multidimensional CLT.

Contents
Contents
1.  The central limit theorem
2.  Lindeberg–Feller theorem
3.  Multidimensional central limit theorem

1. The central limit theorem In the strong law of large numbers, we saw that, for large $n$ , the order of magnitude of the sum $S _ n = X _ 1+\dots+X _ n$ of i.i.d. integrable random variables is $n\, \mathbb E(X _ 1)$ . In the central limit theorem (CLT), we study the size and shape of the typical fluctuations around $n\, \mathbb E(X _ 1)$ in the case where the $X _ i$ have a finite variance.

Theorem 1 (Central limit theorem). Let

X _ 1,X _ 2,\dots

be i.i.d. real random variables with

\mu:=\mathbb E(X _ 1)\in\mathbb R

and

\sigma^2:=\operatorname{Var}(X _ 1)\in(0,\infty)

. For

n\in\mathbb N

, let
\[
S _ n^\ast:=\frac{1}{\sqrt{n\sigma^2}}\sum _ {i=1}^{n}(X _ i-\mu).
\]
Then the probability distribution measure of

S _ n^\ast

satisfies
\[
\mathbb P _ {S _ n^\ast}\stackrel{w}{\to}N(0,1).
\]
And, for

-\infty\leqslant a<b\leqslant+\infty

, we have
\[
\lim _ {n\to\infty}\mathbb P(S _ n^\ast\in[a,b])=\frac{1}{\sqrt{2\pi}}\int _ {a}^{b}\mathrm e^{-x^2/2}\, \mathrm dx.
\]

Proof. Denote the characteristic function of $X _ n-\mu$ by $\varphi$ . By Theorem thm731, $\varphi(t)=1-\frac{\sigma^2}{2}t^2+o(t^2)$ ( $t\to0$ ). With simple calculation we have $\varphi _ {S _ n^\ast}=[\varphi(\frac{t}{\sqrt{n\sigma^2}})]^n$ (note that for independent r.v.s, the ch.f. of their sum is equal to the product of the ch.f.s of each r.v.).

Now
\[
\Big(1-\frac{t^2}{2n}\Big)^{n}=\bigg[\Big(1-\frac{t^2}{2n}\Big)^{\frac{-2n}{t^2}}\bigg] {\vphantom{\Big|}}^{-\frac{t^2}{2}}\to\mathrm e^{-\frac{t^2}{2}},\quad n\to\infty,
\]
and by the inequality $|u^n-v^n|=|u-v||u^{n-1}+u^{n-2}v+\dots+uv^{n-2}+v^{n-1}|\leqslant n|u-v|\cdot\max(|u|,|v|)^{n-1}$ ( $u,v\in\mathbb C$ ),
\[
\bigg|\Big(1-\frac{t^2}{2n}\Big)^{n}-\varphi^n\Big(\frac{t}{\sqrt{n\sigma^2}}\Big)\bigg|\leqslant
n\Big|1-\frac{t^2}{2n}-\varphi\Big(\frac{t}{\sqrt{n\sigma^2}}\Big)\Big|=n\, o\Big(\frac{t^2}{n\sigma^2}\Big)=o(1),\quad n\to\infty.
\]
Thus we conclude that $\varphi _ {S _ n^\ast}(t)\to\mathrm e^{-t^2/2}$ ( $n\to\infty$ ) for all $t\in\mathbb R$ . By Proposition prop711, $\varphi _ {S _ n^\ast}$ converges pointwise to ch.f. of $N(0,1)$ . By Lévy's continuity theorem (Theorem thm723), $\mathbb P _ {S _ n^\ast}$ converges weakly to $N(0,1)$ .

The second assertion follows by the fact that $\mathbb P _ {N(0,1)}(\partial[a,b])=0$ and Portemanteau theorem (Theorem thm521).

Now we have completed the proof of CLT.

The speed of convergence in CLT
A theorem of this speed is given without proof in the following. With different bounds (instead of $0.8$ ), the statement was found independently by Berry and Esseen.

Theorem 2 (Berry–Esseen). Let

X _ 1,X _ 2,\dots

be i.i.d. with mean

0

and variance

\sigma^2\in(0,\infty)

and

\gamma:=\mathbb E(|X _ 1|^3)<\infty

. Let

S _ n^\ast:=(X _ 1+\dots+X _ n)\mathbin/! \sqrt{n\sigma^2}

and let

\Phi

be the distribution function of the standard normal distribution. Then for all

n

,
\[
\sup _ {x\in\mathbb R}\big|\mathbb P(S _ n^\ast\leqslant x)-\Phi(x)\big|\leqslant\frac{0.8\gamma}{\sigma^3\sqrt n}.
\]

2. Lindeberg–Feller theorem

We try to weaken the assumption that the r.v.s are i.i.d. In fact, we can even take a different set of summands for every $n$ . The essential assumptions are that the summands are independent, each summand contributes only a little to the sum and the sum is centered and has variance $1$ .

Definition 3. For every

n\in\mathbb N

, let

k _ n\in\mathbb N

and let

X _ {n,1},\dots,X _ {n,n _ k}

be real random variables and let

S _ n=X _ {n,1}+\dots+X _ {n,n _ k}

$(X _ {n,l})=(X _ {n,l},\, 1\leqslant l\leqslant k _ n,\, n\in\mathbb N)$ is called an array of random variables. The array is called

independent if for every $n$ , $(X _ {n,l}) _ {1\leqslant l\leqslant k _ n}$ is independent,
centered if $X _ {n,l}\in L^1$ and $\mathbb E(X _ {n,l})=0$ for all $n,l$ , and
normed if $X _ {n,l}\in L^2$ and $\sum _ {l=1}^{k _ n}\operatorname{Var}(X _ {n,l})=1$ for all $n$ .

A centered array is called a null array if its individual components are asymptotically negligible in the sense that $\max _ l\mathbb P(|X _ {n,l}|>\varepsilon)\to0$ ( $n\to\infty$ ) for all $\varepsilon>0$ .

Definition 4. A centered array of random variables

(X _ {n,l})

with

X _ {n,l}\in L^2

for every

n\in\mathbb N

and

1\leqslant l\leqslant k _ n

is said to satisfy the Lindeberg condition if, for all

\varepsilon > 0

,
\[
L _ n(\varepsilon):=\frac{1}{\operatorname{Var}(S _ n)}\sum _ {l=1}^{k _ n}\mathbb E[X _ {n,l}^2\mathbb I(X _ {n,l}^2>\varepsilon^2 \operatorname{Var}(S _ n))]\to0,\quad n\to\infty,
\]
and is said to satisfy the Lyapunov condition if there exists a

\delta>0

such that
\[
\frac{1}{\operatorname{Var}(S _ n)^{1+\delta/2}}\sum _ {l=1}^{k _ n}\mathbb E(|X _ {n,l}|^{2+\delta})\to0,\quad n\to\infty.
\]

Proposition 5. The Lyapunov condition implies the Lindeberg condition.

The proof is short. We have $X^2\mathbb I\big[|X|>\varepsilon\sqrt{\operatorname{Var}(S _ n)}\big]\leqslant \big(\varepsilon\sqrt{\operatorname{Var}(S _ n)}\big)^{-\delta}|X|^{2+\delta}\mathbb I\big[|X|>\varepsilon\sqrt{\operatorname{Var}(S _ n)}\big] \leqslant \big(\varepsilon\sqrt{\operatorname{Var}(S _ n)}\big)^{-\delta}|X|^{2+\delta}$ . Hence
\[
L _ n(\varepsilon)\leqslant\varepsilon^{-\delta}\frac{1}{\operatorname{Var}(S _ n)^{1+\delta/2}}\sum _ {i=1}^{k _ n}\mathbb E(|X _ {n,l}|^{2+\delta}).
\]

Example. Let $\{Y _ n\} _ {n\in\mathbb N}$ be i.i.d. with mean $0$ and variance $1$ . Let $k _ n=n$ and $X _ {n,l}=\frac{Y _ l}{\sqrt n}$ . Then $(X _ {n,l})$ is independent, centered and normed. Clearly $\mathbb P(|X _ {n,l}|>\varepsilon) =\mathbb P(|Y _ 1|>\sqrt{\varepsilon n})\to 0$ ( $n\to\infty$ ), so $(X _ {n,l})$ is a null array. Furthermore, $L _ n(\varepsilon)=\mathbb E[Y _ 1^2\mathbb I(|Y _ 1|>\varepsilon\sqrt{n})]\to0$ , so $(X _ {n,l})$ satisfies the Lindeberg condition. If $Y _ 1\in L^{2+\delta}$ for some $\delta>0$ , then $\sum _ {l=1}^{n}\mathbb E(|X _ {n,l}|^{2+\delta})=n^{-\delta/2}\mathbb E(|Y _ 1|^{2+\delta})\to0$ . In this case, $(X _ {n,l})$ also satisfies the Lyapunov condition.

Theorem 6 (CLT for Lindeberg–Feller). Let

(X _ {n,l})

be an independent centered and normed array of real random variables. For every

n\in\mathbb N

, let

S _ n=X _ {n,1}+\dots+X _ {n,n _ k}

. Then the following are equivalent.

The Lindeberg condition holds.
$(X _ {n,l})$ is a null array and $\mathbb P _ {S _ n}\stackrel{w}{\to}N(0,1)$ .

This theorem is due to Lindeberg (1922) for the implication $(1)\Longrightarrow(2)$ and is attributed to Feller (1935 and 1937) for the converse implication $(2)\Longrightarrow(1)$ . The proof is not given here.

We can apply this theorem to prove the three-series theorem, which is due to Kolmogorov.

Theorem 7 (Kolmogorov's three-series theorem). Let

X _ 1,X _ 2,\dots

be independent real random variables. Let

K >0

and for all

n

let

Y _ n

be truncated version of

X _ n

Y _ n:=X _ n\mathbb I(|X _ n|\leqslant K)

The series $\sum _ {n=1}^{\infty}X _ n<\infty$ almost everywhere iff each of the following three conditions holds:

(1) $\sum _ {n=1}^{\infty}\mathbb P(|X _ n|>K)<\infty$ .

(2) $\sum _ {n=1}^{\infty}\mathbb E(Y _ n)<\infty$ .

(3) $\sum _ {n=1}^{\infty}\operatorname{Var}(Y _ n)<\infty$ .

We first prove '' $\Longleftarrow$ ''. A lemma is needed.

Lemma 8. Let

X _ 1,X _ 2,\dots

be independent, square integrable random variables with zero mean. If

\sum _ {i=1}^{\infty}\operatorname{Var}(X _ i)<\infty

, then there exists a real random variable

X

with

\sum _ {i=1}^{n}X _ i\to X

almost everywhere.

A standard proof of this lemma is given first. Recall Theorem thm513 which states that $\{X _ n\}$ converges almost everywhere iff for every $\varepsilon>0$ we have $\mathbb P(\, \exists\, m,n>N,\, |X _ m-X _ n|>\varepsilon)\to0$ ( $N\to\infty$ ). Denote $S _ n=X _ 1+\dots+X _ n$ . With Chebyshev's inequality,
\[
\mathbb P\Big(\sup _ {N\leqslant n<m\leqslant M}|S _ m-S _ n|>\varepsilon\Big)=\mathbb P\Big(\Big|\sum _ {i=N}^{M}X _ i\Big|>\varepsilon\Big)\leqslant \varepsilon^{-2}\operatorname{Var}\Big(\sum _ {i=N}^{M}X _ i\Big)\leqslant \varepsilon^{-2}\sum _ {i=N}^{\infty}\operatorname{Var}(X _ i),
\]
which implies
\[
\mathbb P\Big(\, \exists\, m,n\geqslant N,\, |S _ m-S _ n|>\varepsilon\Big) \leqslant\varepsilon^{-2}\sum _ {i=N}^{\infty}\operatorname{Var}(X _ i)\to0.
\]
Thus we can conclude that $S _ n$ converges almost everywhere to a real random variable.

Maybe there is another simpler proof: Since $(\sum _ {i=1}^{n}X _ i)^2$ is nonnegative for all $n$ , by Fatou's lemma $\int(\sum _ {i=1}^{\infty}X _ i)^2\, \mathrm d\mathbb P\leqslant \liminf _ {n\to\infty}\sum _ {i=1}^{n}\operatorname{Var}(X _ i)=\sum _ {i=1}^{\infty}\operatorname{Var}(X _ i)<\infty$ . Thus $\sum _ {i=1}^{\infty}X _ i$ is square integrable, and therefore it is finite almost everywhere, which means that $S _ n$ converges almost everywhere.

Now we return to the theorem.

Since (3) holds, the series $\sum _ {n=1}^{\infty}(Y _ n-\mathbb E(Y _ n))$ converges almost everywhere. As (2) holds, $\sum _ {n=1}^{\infty}Y _ n$ converges almost everywhere. Like the proof in the SLLN, for almost every $\omega$ , there exists an $N(\omega)$ such that $X _ n=Y _ n$ for all $n\geqslant N(\omega)$ . Hence for almost all $\omega$ , $\sum _ {n=1}^{\infty}X _ n=\sum _ {n=1}^{N-1}X _ n+\sum _ {n=N}^{\infty}Y _ n$ converges.

Then we prove $\Longrightarrow$ . Assume $\sum _ {n=1}^{\infty}X _ n$ converges almost everywhere. If (1) does not hold, then by the second Borel–Cantelli lemma, $|X _ n|>K$ infinitely often a.e., but $\sum _ {n=1}^{\infty}X _ n$ converges means that $|X _ n|>K$ i.o. has zero probability. The contradiction indicates that (1) holds. If (3) does not hold, in order to produce a contradiction, let $\sigma^2:=\sum _ {i=1}^{n}\operatorname{Var}(Y _ i)$ and define an array $(X _ {n,l};\, 1\leqslant l\leqslant n,\, n\in\mathbb N)$ by $X _ {n,l}=(Y _ l-\mathbb E(Y _ l))\mathbin/\sigma^2 _ n$ . Then this array is independent, centered and normed. The assumption that (3) does not hold means $\sigma _ n^2\to\infty$ , so for every $\varepsilon>0$ , when $n$ is sufficiently large, $2K<\varepsilon\sigma _ n$ . Note that $Y _ n\leqslant K$ . We have $|X _ {n,l}|<\varepsilon$ for all $1\leqslant l\leqslant n$ , which implies $L _ n(\varepsilon)\to0$ , where $L _ n(\varepsilon)$ is the quantity of the Lindeberg condition: $L _ n(\varepsilon)=\sum _ {l=1}^{n}\mathbb E[X _ {n,l}^2\, \mathbb I(|X _ {n,l}|>\varepsilon)]$ . By the Lindeberg–Feller theorem, we then get $S _ n:=X _ {n,1}+\dots+X _ {n,n}\stackrel{d}\to N(0,1)$ . As in the proof in the ''if'' part, $X _ n=Y _ n$ for all $n\geqslant N(\omega)$ , so $\sum _ {n=1}^{\infty}X _ n<\infty$ a.e. implies $\sum _ {n=1}^{\infty}Y _ n<\infty$ a.e. Since $\sigma _ n\to\infty$ , $T _ n:=(Y _ 1+\dots+Y _ n)\mathbin/\sigma _ n\stackrel{d}\to0$ . By Slutzky's theorem, $S _ n-T _ n\stackrel{d}\to N(0,1)$ . On the other hand, we can see that $S _ n-T _ n$ is deterministic, contradicting the assumption that (3) does not hold. Finally, (3) together with the lemma implies $\sum _ {n=1}^{\infty}(Y _ n-\mathbb E(Y _ n))$ converges almost surely. Having obtained $\sum _ {n=1}^{\infty}Y _ n<\infty$ a.e., we conclude (2).
3. Multidimensional central limit theorem In this section, we change our notation to a version slightly more complicated. $x,\boldsymbol x,X$ denote a number, a vector and a matrix respectively; $\mathrm x, \mathbf x, \mathrm X$ denote a real random variable, a random vector and a random matrix respectively. In other words, a bold letter represents a vector, a capital letter represents a matrix and a upright letter represents a random variable. However, for capital Greek symbols, they are upright but always represent constant matrix, instead of random matrix.

Definition 9. Let

C

be a (strictly) positive definite symmetric real

d \times d

matrix and let

\boldsymbol\mu\in\mathbb R^d

. A random vector

\mathbf x = (\mathrm x _ 1,\dots,\mathrm x _ d)^\mathsf T

is called

d

-dimensional normally distributed with expectation

\boldsymbol\mu

and covariance matrix

\Sigma

\mathbf x

has the density
\[
\frac{1}{\sqrt{(2\pi)^d}}\frac{1}{\sqrt{|\Sigma|}}\exp\Big(-\frac12(\boldsymbol x-\boldsymbol\mu)^\mathsf T\Sigma^{-1}(\boldsymbol x-\boldsymbol\mu)\Big)
\]
for

\boldsymbol x\in\mathbb R^d

. In this case, we write

\mathbf x\sim N _ d(\boldsymbol \mu,\Sigma)

If $\Sigma$ is only positive semidefinite, we define $d$ -dimensional normal distribution on $\mathbb R^d$ as the one with characteristic function $\exp(\mathrm i\, \boldsymbol t^\mathsf T\boldsymbol \mu-\frac12\boldsymbol t^\mathsf T\Sigma\boldsymbol t)$ .

Theorem 10. If

\mathbf x\sim N _ d(\boldsymbol\mu,\Sigma)

where

\Sigma\succ0

, then

$\mathbb E(\mathrm x _ i)=\mu _ i$ for $i=1,\dots, d$ .
$\operatorname{Cov}(\mathrm x _ i,\mathrm x _ j)=\Sigma _ {ij}$ for $i,j=1,\dots,d$ .
$\boldsymbol\lambda^\mathsf T\mathbf x\sim N(\boldsymbol\lambda^\mathsf T\boldsymbol\mu, \boldsymbol\lambda^\mathsf T\Sigma\boldsymbol \lambda)$ for every $\boldsymbol\lambda\in\mathbb R^d$ .
$\varphi(\boldsymbol t):=\mathbb E(\mathrm e^{\mathrm i\boldsymbol t^\mathsf T\mathbf x})=\exp(\mathrm i\, \boldsymbol t^\mathsf T\boldsymbol \mu-\frac12\boldsymbol t^\mathsf T\Sigma\boldsymbol t)$ for every $\boldsymbol t\in\mathbb R^d$ .

Moreover,
\[
\mathbf x\sim N _ d(\boldsymbol \mu,\Sigma)\iff (\mathit{3})\iff (\mathit{{4}}).
\]

Proof. Let $\mathbf x=\Sigma^{1/2}\mathbf y+\boldsymbol\mu$ . By Theorem thm315, $\mathbf y$ has density
\[
f _ {\mathbf y}(\boldsymbol y)=f _ {\mathbf x}(\Sigma^{1/2}\boldsymbol y+\boldsymbol\mu)\cdot|\Sigma^{1/2}|=\frac{1}{(2\pi)^{d/2}}\exp\Big(-\frac12\boldsymbol y^\mathsf T\boldsymbol y\Big),
\]
indicating that $\mathbf y\sim N _ d(\boldsymbol0,I _ d)$ . We can see that the density of $\mathbf y$ can be decomposed into the product of the marginal densities of $\mathrm y _ i$ , which are all density of $N(0,1)$ . Thus $\mathrm y _ i\sim N(0,1)$ and they are independent.

Now we can calculate the characteristic function of $\mathbf x$ . Let $\boldsymbol s=\Sigma^{1/2}\boldsymbol t$ , then
\begin{align*}
\varphi _ {\mathbf x}(\boldsymbol t) & =\mathbb E[\mathrm e^{\mathrm i\boldsymbol t^\mathsf T\mathbf x}] =\mathbb E[\exp(\mathrm i\, \boldsymbol t^\mathsf T(\Sigma^{1/2}\mathbf y+\boldsymbol \mu))] =\exp(\mathrm i\, \boldsymbol t^\mathsf T\boldsymbol\mu)\prod _ {j=1}^{d}\mathbb E(\mathrm e^{\mathrm i\, \alpha _ j\mathrm y _ j}) \\
& =\exp(\mathrm i\, \boldsymbol t^\mathsf T\boldsymbol\mu)\prod _ {j=1}^{d}\mathrm e^{-\frac12\alpha _ j^2} =\exp\Big(\mathrm i\, \boldsymbol t^\mathsf T\boldsymbol \mu-\frac12\boldsymbol\alpha^\mathsf T\boldsymbol\alpha\Big) =\exp\Big(\mathrm i\, \boldsymbol t^\mathsf T\boldsymbol \mu-\frac12\boldsymbol t^\mathsf T\Sigma\boldsymbol t\Big).
\end{align*}
By Theorem thm714, we have proved: $\mathbf x\sim N _ d(\boldsymbol \mu,\Sigma)\iff (\mathit{{4}})$ .

Now we prove $(\mathit{{3}})\iff (\mathit{{4}})$ . (3) is equivalent to $\mathbb E(\mathrm e^{\mathrm it\boldsymbol \lambda^\mathsf T\mathbf x})=\exp(\mathrm i\, t\boldsymbol\lambda^\mathsf T\boldsymbol\mu-\frac12\boldsymbol\lambda^\mathsf T\Sigma\boldsymbol\lambda t^2)$ . As $t\in\mathbb R$ and $\boldsymbol \lambda\in\mathbb R^d$ are arbitrary, it is equivalent to $\mathbb E(\mathrm e^{\mathrm i\boldsymbol t^\mathsf T\mathbf x})=\exp(\mathrm i\, \boldsymbol t^\mathsf T\boldsymbol \mu-\frac12\boldsymbol t^\mathsf T\Sigma\boldsymbol t)$ for every $\boldsymbol t\in\mathbb R^d$ . In short, $(\mathit{{3}})\iff (\mathit{{4}})$ .

(1) and (2). It follows by simple computations that $\mathbb E(\mathbf y)=\boldsymbol 0$ and $\operatorname{Cov}(\mathbf y)=I _ d$ . By the property of expectation, $\mathbb E(\mathbf x)=\Sigma^{1/2}\mathbb E(\mathbf y)+\boldsymbol\mu=\boldsymbol\mu$ . By the property of covariance matrix, $\operatorname{Cov}(\mathbf x)=\Sigma^{1/2}\operatorname{Cov}(\mathbf y)\Sigma^{1/2}=\Sigma$ .

Theorem 11 (Cramér–Wold device). Let

\mathbf x _ n=(\mathrm x _ {n1},\dots,\mathrm x _ {nd})^\mathsf T\in\mathbb R^d

n\in\mathbb N

. Then the following are equivalent:

There is a random vector $\mathbf x$ such that $\mathbf x _ n\stackrel{d}\to \mathbf x$ .
For any $\boldsymbol\lambda\in\mathbb R^d$ , there is a random variable $\mathrm x^\lambda$ such that $\boldsymbol\lambda^\mathsf T\mathbf x _ n\stackrel{d}\to\mathrm x^\lambda$ .

If (1) and (2) hold, then $\mathrm x^\lambda\stackrel{d}=\boldsymbol\lambda^\mathsf T\mathbf x$ for all $\boldsymbol\lambda \in\mathbb R^d$ .

Proof. Assume (1). The function $f(\boldsymbol x)=\exp(\mathrm is\boldsymbol\lambda^\mathsf T\boldsymbol x)$ is continuous and bounded. Since $\mathbf x _ n\stackrel{d}\to \mathbf x$ implies $\mathbb E(f(\mathbf x _ n))\to\mathbb E(f(\mathbf x))$ for all bounded continuous functions, we have $\mathbb E[\exp(\mathrm is\boldsymbol \lambda^\mathsf T\mathbf x _ n)]\to\mathbb E[\exp(\mathrm is\boldsymbol \lambda^\mathsf T\mathbf x)]$ . By Lévy's continuity theorem, (2) holds with $\mathrm x^\lambda:=\boldsymbol\lambda^\mathsf T\mathbf x$ .

Now assume (2). Let $\boldsymbol\lambda$ be the unit vectors. We can see that every component of $\mathbf x _ n$ , $\mathrm x _ {n,l}$ ( $1\leqslant l\leqslant d$ ), converges weakly as $n\to\infty$ , so $\{\mathbb P _ {\mathrm x _ {n,l}}\} _ {n\in\mathbb N}$ is tight for every $l$ . Hence, as in the proof of Lévy's continuity theorem, $\{\mathbb P _ {\mathbf x _ n}\} _ {n\in\mathbb N}$ is tight. And from that proof we can also obtain that every subsequence of $\{\mathbb P _ {\mathbf x _ n}\} _ {n\in\mathbb N}$ has a further sub-subsequence that converges weakly to a probability measure $\mathbb P _ {\mathbf x _ \infty}$ . By (2) we have $\int\exp(\mathrm i\boldsymbol\lambda^\mathsf T\mathbf x)\, \mathrm d\mathbb P _ {\mathbf x _ \infty}=\mathbb E(\mathrm e^{\mathrm i\mathrm x^\lambda})$ , which means $\mathbb P _ {\mathbf x _ \infty}$ is unique. It is then not difficult to derive that $\mathbb P _ {\mathbf x _ n}\stackrel{w}\to\mathbb P _ {\mathbf x _ {\infty}}$ (similar to the proof of Lévy's continuity theorem as well). That is, (1) holds.

The additional assertion is easy to check. We have shown that the distribution of $\mathrm x^\lambda$ has been uniquely determined and that $\boldsymbol\lambda^\mathsf T\mathbf x$ is one possible choice. Thus $\mathrm x^\lambda\stackrel{d}=\boldsymbol\lambda^\mathsf T\mathbf x$ .

Theorem 12 (Central limit theorem in

\mathbb R^d

). Let

\mathbf x _ 1,\mathbf x _ 2,\dots

be i.i.d. random vectors with mean

\boldsymbol 0

and Covariance matrix

\Sigma

respectively. Let

\mathbf s _ n^\ast:=(\mathbf x _ 1+\dots+\mathbf x _ n)\mathbin/\! \sqrt{n}

. Then
\[
\mathbb P _ {\mathbf s _ n^\ast}\stackrel{w}\to N _ d(\boldsymbol 0,\Sigma).
\]

With Cramér–Wold device, this theorem follows easily. Let $\boldsymbol\lambda\in\mathbb R^d$ . Define $\mathrm x^\lambda _ n=\boldsymbol\lambda^\mathsf T\mathbf x _ n$ , $\mathrm s _ n^\lambda=\boldsymbol \lambda^\mathsf T\mathbf s _ n^\ast$ and $\mathbf s _ \infty\sim N _ d(\boldsymbol 0,\Sigma)$ . Then $\mathrm x^\lambda _ n\sim(0,\boldsymbol\lambda^\mathsf T\Sigma\boldsymbol\lambda)$ . By the one-dimensional CLT, it is not hard to see that $\mathbb P _ {\mathrm s _ n^\lambda}\stackrel{w}\to \mathbb P _ {N(0,\boldsymbol\lambda^\mathsf T\Sigma\boldsymbol\lambda)}=\mathbb P _ {\boldsymbol\lambda^\mathsf T\mathbf s _ \infty}$ . By Cramér–Wold device, this yields the claim.

Central limit theorem

评论

发表回复取消回复

Central limit theorem

评论

发表回复 取消回复

发表回复取消回复