Characteristic functions

You can read the LaTeX document online (for the latest updated chapters) from the link: probability.pdf

Chapter 7: Characteristic functions Our next goal is to establish central limit theorem. Generally, the ideal tools for the treatment of CLT are characteristic functions; that is, Fourier transforms of probability measures. There are too many contents that are for characteristic functions, but in this chapter let us try to just discuss them in a concise manner. The proof of CLT needs Lévy's continuity theorem, which is presented with some generality so that it seems there is too much preparation for it. However, maybe it is possible to skip so many contents of ch.f. and go ahead to the second part of the continuity theorem, if your goal is only to prove CLT.

Contents
Contents
 1.  Separating families
 2.  Lévy's continuity theorem
 3.  Some more properties

1. Separating families We first introduce some notions in a general setting. Let (E,d) be a metric space with Borel \sigma-algebra \mathcal E=\mathcal B(E). It is not difficult to see that a function f: E\to\mathbb C is measurable iff \operatorname{Re}(f) and \operatorname{Im}(f) are measurable. For the integral of a complex function it is defined by \int f\, \mathrm d\mu:=\int\operatorname{Re}(f)\, \mathrm d\mu+\mathrm i\int\operatorname{Im}(f)\, \mathrm d\mu, if both integrals exist and are finite.

Definition 1. Let \mathcal F be a family of finite measures. A family \mathcal C of measurable functions is called a separating family for \mathcal F if, for any two measures \mu,\nu\in\mathcal F, the following holds:
\[
\Big(\text{for all integrable }f\in\mathcal C, \, \int f\, \mathrm d\mu=\int f\, \mathrm d\nu\Big)\implies \mu=\nu.
\]

Or, for any \mu\neq\nu, there exists an integrable f\in\mathcal C such that \int f\, \mathrm d\mu\neq\int f\, \mathrm d\nu.

Or, \mathcal C characterises the measure \mu in \mathcal F by the integrals \big\{\int f\, \mathrm d\mu\mid f\in\mathcal C\big\}.

Now we need a theorem named Stone-Weierstrass theorem, which can be checked in: https://gaomj.cn/pma7/#sec:7. For convenience it is stated again in the following theorem.

Theorem 2. Let E be a compact metric space. Let \mathbb K=\mathbb R or \mathbb K=\mathbb C. Let \mathcal C be an algebra of continuous functions on E that separates points and vanishes at no point of E. (If \mathbb K=\mathbb C, then in addition assume that \mathcal C is closed under complex conjugation). Then \mathcal C is dense in C(E) with respect to the supremum norm.

Corollary 3. Let E be a compact metric space. Let \mathbb K=\mathbb R or \mathbb K=\mathbb C. Let \mathcal C be a family that is closed under multiplication and contains 1. If \mathbb K=\mathbb C, then in addition assume that \mathcal C is closed under complex conjugation. Then \mathcal C is a separating family for \mathcal M _ f(E) (finite measures on (E,\mathcal E)).

We now prove Corollary 3. Let \mu _ 1,\mu _ 2\in\mathcal M _ f(E) with \int g\, \mathrm d\mu _ 1=\int g\, \mathrm d\mu _ 2 for all g\in\mathcal C. Let \mathcal C' be the algebra of finite linear combinations of elements of \mathcal C. We need to show \mu _ 1=\mu _ 2, and in fact, this is true if for all continuous function f it holds that \int f\, \mathrm d\mu _ 1=\int f\, \mathrm d\mu _ 2. To see this, note that to show \mu _ 1=\mu _ 2 it is enough to show that \mu _ 1(C)=\mu _ 2(C) for all closed sets C\in\mathcal E (some familiarity of topology is needed here), that is, \int\chi _ C\, \mathrm d\mu _ 1=\int\chi _ C\, \mathrm d\mu _ 2. The problem is the indicator function is not continuous. We can construct some continuous functions \rho _ \varepsilon:E\to[0,1] such that \rho _ {\varepsilon}=1 if x\in A and \rho _ \varepsilon= 0 if d(x,C)\geqslant\varepsilon. Then as \varepsilon\to0, \rho _ \varepsilon\to\chi _ C. By dominated convergence theorem, \mu _ i(C)=\lim _ {\varepsilon\to0}\int\rho _ \varepsilon\, \mathrm d\mu _ i, so we can conclude \mu _ 1(C)=\mu _ 2(C).

Hence, to prove Corollary 3 we can consider to prove for all continuous function f it holds that. By linearity of the integral, \int g\, \mathrm d\mu _ 1=\int g\, \mathrm d\mu _ 2 for all g\in \mathcal C'. Now for \varepsilon>0, by Stone–Weierstrass theorem, there exists a g\in\mathcal C' such that |f-g| _ \infty<\varepsilon, so
\begin{align*}
\Big|\int f\, \mathrm d\mu _ 1-\int f\, \mathrm d\mu _ 2\Big| & \leqslant\Big|\int f\, \mathrm d\mu _ 1-\int g\, \mathrm d\mu _ 1\Big| +\Big|\int g\, \mathrm d\mu _ 1-\int g\, \mathrm d\mu _ 2\Big|+\Big|\int g\, \mathrm d\mu _ 2-\int f\, \mathrm d\mu _ 2\Big| \\
& \leqslant\varepsilon(\mu _ 1(E)+\mu _ 2(E)).
\end{align*}
Letting \varepsilon\to0, we get \int f\, \mathrm d\mu _ 1=\int f\, \mathrm d\mu _ 2 and therefore the corollary is proved.

Corollary 3 is very useful.

Theorem 4. The distribution of a bounded real random variable X is characterized by its moments.

This theorem is followed by Corollary 3. Suppose X takes values in [m,M]. For n\in\mathbb N, define functions f _ n:x\in[m,M]\to[(x-m)\mathbin/(M-m)]^n\in[0,1]. Since f _ 0=1 and \{f _ n\} _ {n\in\mathbb N} is closed under multiplication, it is a separating class for \mathcal M _ f(E). Thus \mathbb P _ X (from now on we change our notation of probability distribution measure of X to this kind) is uniquely determined by \int f _ n\, \mathbb P _ X(\mathrm dx) and hence determined by \int x^n\, \mathbb P _ X(\mathrm dx)=\mathbb E(X^n) for n\in\mathbb N.

Theorem 5 (Laplace transform). A finite measure \mu on [0,\infty) is characterized by its Laplace transform
\[
\mathcal L _ \mu(\lambda):=\int\mathrm e^{-\lambda x}\, \mu(\mathrm dx),\quad \lambda\geqslant0.
\]

We cannot prove it by directly using Stone-Weierstrass theorem since [0,\infty) is not compact. In general topology, there is a trick called ''compactification'' -- the process or result of making a topological space into a compact space. Here we need Alexandroff one-point compactification so we can consider a compact [0,\infty]. For \lambda\geqslant0, define the continuosu function f _ \lambda:[0,\infty]\to[0,1] by f _ \lambda(x)=\mathrm e^{-\lambda x} if x<\infty and f _ \lambda(\infty)=\lim _ {x\to\infty}\mathrm e^{-\lambda x}. Then \mathcal{C}:=\{f _ \lambda\mid \lambda\geqslant0\} fulfills the conditions in Corollary 3, since 1=f _ 0\in\mathcal C and f _ \mu\cdot f _ \lambda=f _ {\mu+\lambda}\in\mathcal C. Thus, \mathcal C is a separating class for \mathcal M _ f([0,\infty]) and thus also for \mathcal M _ f([0,\infty)).

Definition 6. For \mu\in\mathcal M _ f(\mathbb R^d), define \varphi _ \mu:\mathbb R^d\to\mathbb C by
\[
\varphi _ \mu(t)=\int\mathrm e^{\mathrm i\langle \boldsymbol t,\boldsymbol x\rangle}\, \mu(\mathrm dx)
\]
\varphi _ \mu is called the characteristic function of \mu.

Theorem 7 (Characteristic function). A finite measure \mu\in\mathcal M _ f(\mathbb R^d) is characterized by its characteristic function.

Proof. Let \mu _ 1,\mu _ 2\in\mathcal M _ f(\mathbb R^d) with \varphi _ {\mu _ 1}(\boldsymbol t)=\varphi _ {\mu _ 2}(\boldsymbol t) for all \boldsymbol t\in\mathbb R^d. From the proof of Corollary 3, those \rho _ {C,\varepsilon} is separating for \mathcal M _ f(\mathbb R^d), where C is some closed set. Now we assert that \rho _ {K,\varepsilon} is also separating for \mathcal M _ f(\mathbb R^d), where K is some compact set, because \mu _ 1=\mu _ 2 can be implied from \mu _ 1(K)=\mu _ 2(K) for any compact set K and \mu _ i(A)=\sup\, \{\mu _ i(K)\mid K\subseteq A\text{ is compact}\} for any A\in\mathcal E.

Hence, it is enough to show that \int f\, \mathrm d\mu _ 1=\int f\, \mathrm d\mu _ 2 for all f\in C _ K(\mathbb R^d) (class of continuous functions with compact support).

Let \varepsilon>0 and M be large enough such that the support of f is contained in (-M/2,M/2)^d and such that \mu _ i\big(\mathbb R^d\setminus(-K,K)^d\big)<\varepsilon. For \boldsymbol m\in\mathbb R^d we define
\[
g _ {\boldsymbol m}:\boldsymbol x\in\mathbb R^d\mapsto\exp\Big(\mathrm i\cdot\frac\pi M\langle\boldsymbol m,\boldsymbol x\rangle\Big).
\]
Then from the assumption of the theorem, \int g _ {\boldsymbol m}\, \mathrm d\mu _ 1=\int g _ {\boldsymbol m}\, \mathrm d\mu _ 2. Moreover, g _ m(\boldsymbol x)=g _ m(\boldsymbol x+2M\boldsymbol t) for all \boldsymbol x\in\mathbb R^d and \boldsymbol t\in\mathbb Z^d.

Let \mathcal C be the algebra of finite linear combinations of g _ {\boldsymbol m}, and \tilde{\mathcal C}:=\{g| _ {[-M,M]^d}\mathrel: g\in\mathcal C\}. We can see that \tilde{\mathcal C} fulfills the conditions of Stone-Weierstrass theorem, so there is a g\in\mathcal C such that |g| _ {[-M,M]^d}-f| _ \infty<\varepsilon. Now |(f-g)\chi _ {[-M,M]^d}| _ \infty<\varepsilon, and |(f-g)\chi _ {\mathbb R^d\setminus[-M,M]^d}| _ \infty\leqslant|g| _ \infty=|g| _ {[-M,M]^d}| _ \infty\leqslant|f| _ \infty+\varepsilon.

Since \int g\, \mathrm d\mu _ 1=\int g\, \mathrm d\mu _ 2,
\begin{align*}
& \Big|\int f\, \mathrm d\mu _ 1-\int f\, \mathrm d\mu _ 2\Big| =\Big|\int (f-g)\, \mathrm d\mu _ 1-\int (f-g)\, \mathrm d\mu _ 2\Big| \\
\leqslant{} & \int|f-g|\, \mathrm d\mu _ 1+\int|f-g|\, \mathrm d\mu _ 2\leqslant\varepsilon(\mu _ 1(\mathbb R^d)+\mu _ 2(\mathbb R^d)) +2\varepsilon(|f| _ \infty+\varepsilon).
\end{align*}
Letting \varepsilon\to0, we can see the integrals coincide. The proof is now completed.

.

Normal distribution
We end this chapter with an example of normal distribution, since CLT is a theorem about this important distribution. On the other hand, other distributions are not covered here and it is better to refer to other materials for them.

Proposition 8. For normal distribution N(\mu,\sigma^2) with density x\mapsto\frac1{\sqrt{2\pi\sigma^2}}\exp(-\frac{(x-\mu)^2}{2\sigma^2}), the characteristic function is
\[
\varphi(t)=\exp\Big(\mathrm i\mu t-\frac12\sigma^2t^2\Big).
\]

Characteristic functions have a simple property that \varphi _ {aX+b}(t)=\varphi _ X(at)\cdot \mathrm e^{\mathrm ibt}, so it is enough to consider the case \mu=1 and \sigma^2=1. Differentiating the characteristic function we get
\begin{align*}
\frac{\mathrm d}{\mathrm dt}\varphi(t) & =\frac1{\sqrt{2\pi\sigma^2}}\int _ {-\infty}^{+\infty}\mathrm e^{\mathrm itx}\, \mathrm ix\cdot\mathrm e^{-{x^2}/{2}}\, \mathrm dx=\int _ {-\infty}^{+\infty}\mathrm e^{\mathrm itx}\, \mathrm i\, \mathrm d\big(\mathrm e^{-{x^2}/{2}}\big) \\
& =\int _ {-\infty}^{+\infty}\mathrm e^{-{x^2}/{2}}\, \mathrm i\, \mathrm d\big(\mathrm e^{\mathrm itx}\big) =-\int _ {-\infty}^{+\infty}\mathrm e^{-{x^2}/{2}}t\, \mathrm e^{\mathrm itx}\, \mathrm dx=-t\varphi(t).
\end{align*}
Note that we changed the order of differentiation and integration (which can be justified) and used partial integration.

This linear differential equation with initial value \varphi(0)=1 has the unique solution \varphi(t)=\mathrm e^{-t^2/2}.
2. Lévy's continuity theorem The main statement of this section is Lévy's continuity theorem. We had better first review the definition of {equicontinuous} and Arzelà-Ascoli theorem. We need Definition 13 and Theorem 16 from https://gaomj.cn/pma7/#sec:6, and for convenience they are stated again here. (These are for the first statement of Lévy's continuity theorem, so if the second statement interests you only, you can go to the theorem straightforward.)

Definition 9. A family \mathcal F of complex functions f defined on a set E in a metric space X is said to be equicontinuous on E if for every \varepsilon>0 there exists a \delta>0 such that |f(x)−f(y)|<\varepsilon whenever d(x,y)<\delta, x\in E, y\in E, and f\in \mathcal F. Here d denotes the metric of X.

Theorem 10. Suppose K is compact. If f _ n\in C(K) for n=1,2,\dots and if \{f _ n\} _ {n\in\mathbb N} is pointwise bounded and equicontinuous on K, then:
  • \{f _ n\} _ {n\in\mathbb N} is uniformly bounded on K;
  • \{f _ n\} _ {n\in\mathbb N} contains a uniformly convergent subsequence.

With this theorem, we have the following corollary:

Corollary 11. Let (E, d) be a metric space and let f,f _ 1,f _ 2,\dots be real-valued functions on E with f _ n\to f pointwise. If \{f _ n\} _ {n\in\mathbb N} is equicontinuous, then f is uniformly continuous, and on compact sets \{f _ n\} _ {n\in\mathbb N} converges to f uniformly.

It is not hard to see the uniform continuity of f so we only prove the uniform convergnece. For any compact set K in E, we consider the space (C(K),|\cdot| _ \infty). For any subsequence of \{f _ n\} _ {n\in\mathbb N}, by the theorem we can know it contains a uniformly convergent, i.e., |\cdot| _ \infty-convergent sub-subsequence. This means that the closure of \{f _ n\} _ {n\in\mathbb N} is a sequentially compact subset of (C(K),|\cdot| _ \infty). Since f _ n\to f pointwise, any |\cdot| _ \infty-convergent subsequence must converge to f. If \{f _ n\} _ {n\in\mathbb N} does not |\cdot| _ \infty-converge to f, then there is a subsequence such that |f _ {n _ k}-f| _ \infty>\varepsilon _ 0, so any subsequence of \{f _ {n _ k}\} cannot |\cdot| _ \infty-converge to f, which is a contradiction.

Theorem 12. If \mathcal F is a tight family of probability measures on \mathbb R^d, then \{\varphi _ \mu\mid \mu\in\mathcal F\} is equicontinuous. In particular, every characteristic function is uniformly continuous.

We need a lemma for its proof:
Let \varphi be probability measure with characteristic function \varphi, then
\[
|\varphi(\boldsymbol t)-\varphi(\boldsymbol s)|^2\leqslant2\big(1-\operatorname{Re}(\varphi(\boldsymbol t-\boldsymbol s))\big),\quad \forall \boldsymbol t,\boldsymbol s\in\mathbb R^d.
\]
This lemma can be proved by Cauchy–Schwarz inequality,
\begin{align*}
|\varphi(\boldsymbol t)-\varphi(\boldsymbol s)|^2 & =\Big|\int\mathrm e^{\mathrm i\langle \boldsymbol t,\boldsymbol x\rangle}-\mathrm e^{\mathrm i\langle \boldsymbol s,\boldsymbol x\rangle}\, \mu(\mathrm d\boldsymbol x)\Big|^2=\Big|\int(\mathrm e^{\mathrm i\langle \boldsymbol t-\boldsymbol s,\boldsymbol x\rangle}-1)\mathrm e^{\mathrm i\langle \boldsymbol s,\boldsymbol x\rangle}\, \mu(\mathrm d\boldsymbol x)\Big|^2 \\
& \leqslant\int|\mathrm e^{\mathrm i\langle \boldsymbol t-\boldsymbol s,\boldsymbol x\rangle}-1|^2\, \mu(\mathrm d\boldsymbol x)\cdot \int|\mathrm e^{\mathrm i\langle \boldsymbol s,\boldsymbol x\rangle}|^2\, \mu(\mathrm d\boldsymbol x) \\
& =\int(\mathrm e^{\mathrm i\langle \boldsymbol t-\boldsymbol s,\boldsymbol x\rangle}-1)(\mathrm e^{-\mathrm i\langle \boldsymbol t-\boldsymbol s,\boldsymbol x\rangle}-1)\, \mu(\mathrm d\boldsymbol x) \\
& =2\big(1-\operatorname{Re}(\varphi(\boldsymbol t-\boldsymbol s))\big).
\end{align*}

Now we return to the theorem. We have to show that for every \varepsilon>0, there exists a \delta>0 such that for all \boldsymbol t,\boldsymbol s\in\mathbb R^d with |\boldsymbol t-\boldsymbol s|<\delta and all \mu\in\mathcal F, we have |\varphi _ \mu(\boldsymbol t)-\varphi _ \mu(\boldsymbol s)|<\varepsilon.

As \mathcal F is tight, there exists an N\in\mathbb N with \mu([-N,N]^d)>1-\varepsilon for all \mu\in\mathcal F. Furthermore, there exists a \delta>0 such that for \boldsymbol x\in [-N,N]^d and |\boldsymbol u|<\delta, we have |1-\mathrm e^{\mathrm i\langle \boldsymbol u,\boldsymbol x\rangle }|<\varepsilon. Hence for all \mu\in\mathcal F,
\[
1-\operatorname{Re}(\varphi _ \mu(\boldsymbol u))\leqslant\int |1-\mathrm e^{\mathrm i\langle\boldsymbol u,\boldsymbol x\rangle}|\, \mu(\mathrm d\boldsymbol x)\leqslant2\varepsilon+\int _ {[-N,N]^d}|1-\mathrm e^{\mathrm i\langle\boldsymbol u,\boldsymbol x\rangle}|\, \mu(\mathrm d\boldsymbol x)\leqslant3\varepsilon.
\]
Thus for |\boldsymbol t-\boldsymbol s|<\delta, by the lemma, |\varphi _ \mu(\boldsymbol t)-\varphi _ \mu(\boldsymbol s)|\leqslant\sqrt{6\varepsilon}. The conclusion follows.

Now we are ready for the main theorem of this section.

Theorem 13 (Lévy's continuity theorem). Let P _ 1,P _ 2,\dots be probability measures on \mathbb R^d with characteristic functions \varphi _ 1,\varphi _ 2,\dots.
  1. If P _ n\stackrel{w}{\to}P, where P is a p.m. with ch.f. \varphi, then \varphi _ n\to\varphi uniformly on compact sets.
  2. If \varphi _ n\to\varphi pointwise for some \varphi:\mathbb R^d\to\mathbb C that is continuous at \boldsymbol 0 along each axis, then P _ n converges weakly to a probability measure P with characteristic function equal to \varphi.

Proof. (1) By the definition of weak convergence, \varphi _ n\to\varphi pointwise. The family \{P _ n\} _ {n\in\mathbb N} is tight, by Theorem 12, \{\varphi _ n\} _ {n\in\mathbb N} is equicontinuous. By Corollary 11, this implies uniform convergence on compact sets.

(2) We need to prove that the sequence \{P _ n\} _ {n\in\mathbb N} is tight. Assume it gets proved, by the corollary of Prohorov's theorem (Corollary cor521), \mathcal M _ {\leqslant 1}(\mathbb R^d) is vaguely sequentially compact, so \{P _ n\} _ {n\in\mathbb N} is vaguely sequentially compact. By Theorem thm525, tightness and vague sequential compactness implies weak sequential compactness, and every weak convergent subsequence of P _ n converges to a probability measure P. Thus, for any subsequence \{P _ {n _ k}\}, there exists a probability measure P such that there is a further sub-subsequence \{P _ {n _ {k(n')}}\} converging weakly to P. We assert that this P is independent from the choice of the subsequence: Since \exp(\mathrm i\langle\boldsymbol t,\boldsymbol x\rangle) is bounded continuous function, we have \varphi _ {n _ {k(n')}}(\boldsymbol t)\to\int \exp(\mathrm i\langle\boldsymbol t,\boldsymbol x\rangle)\, P(\mathrm d\boldsymbol x), so \int \exp(\mathrm i\langle\boldsymbol t,\boldsymbol x\rangle)\, P(\mathrm d\boldsymbol x)=\varphi(\boldsymbol t) for all \boldsymbol t\in\mathbb R^d. Therefore \varphi is the characteristic function of P and every weakly convergent subsequence will converge to the same P. Now if \{P _ n\} does not converges weakly to P, then there exist a f\in C _ b, \varepsilon>0 and a subsequence P _ {n _ k} such that |\int f\, \mathrm dP _ {n _ k}-\int f\, \mathrm dP|>\varepsilon. However, there is P _ {n _ {k(n')}} such that \int f\, \mathrm dP _ {n _ {k(n')}}\to \int f\, \mathrm dP. This contradiction implies that P _ n\stackrel{w}{\to}P.

Thus we just need to prove the tightness, and it suffices to show that, for every k\, (1\leqslant k\leqslant d), the sequence \{P _ n^k\} _ {n\in\mathbb N} of k-th marginal distributions is tight. If each coordinate is proved to be tight, then for each k there is M _ k such that P _ n^k(\mathbb R\setminus [-M _ k,M _ k])<\varepsilon for all n\in\mathbb N. Now we define a compact set C:=\prod _ {k=1}^{d}[-M _ k,M _ k], and then we have P _ n(\mathbb R^d\setminus C)\leqslant\sum _ {k=1}^{d}P _ n^k(\mathbb R\setminus[-M _ k,M _ k])<d\varepsilon for all n\in\mathbb N. This implies the tightness of \{P _ n\}.

Let \boldsymbol e _ k be the k-th unit vector in \mathbb R^d. Then \varphi _ {P _ n^k}(t)=\varphi _ n(t\boldsymbol e _ k) is the characteristic function of P _ n^k. By assumption, as n\to\infty, \varphi _ {P _ n^k}\to \varphi^k pointwise for some function \varphi^k that is continuous at 0. We have thus reduced the problem to the one-dimensional situation and will
henceforth assume d=1.

Clearly \varphi(0)=\lim\varphi _ n(0)=\lim1=1. Define function h:\mathbb R\to[0,\infty) by h(x)=1-\frac{\sin x}x for x\neq0 and h(0)=0. It is continuously differentiable on \mathbb R. Let \alpha:=\inf\, \{h(x)\mid |x|\geqslant1\}>0. Now for K>0,
\begin{align*}
& P _ n([-K,K]^c)\leqslant\frac1\alpha\int _ {[-K,K]^c}h\Big(\frac xK\Big)\, P _ n(\mathrm dx)\leqslant \frac1\alpha\int _ {\mathbb R}h\Big(\frac xK\Big)\, P _ n(\mathrm dx) \\
={} & \frac1\alpha\int _ \mathbb R\Big(\int _ {0}^{1}1-\cos\Big(\frac {tx}K\Big)\, \mathrm dt\Big)\, P _ n(\mathrm dx) =\frac1\alpha\int _ 0^1\Big(\int _ \mathbb R 1-\cos\Big(\frac {tx}K\Big)\, P _ n(\mathrm dx)\Big)\, \mathrm dt \\
={} & \frac1\alpha\int _ 0^11-\operatorname{Re}(\varphi _ n(t/K))\, \mathrm dt.
\end{align*}
Here we have used Fubini's theorem. Now using dominated convergence theorem, we have
\begin{align*}
& \limsup _ {n\to\infty}P _ n([-K,K]^c)\leqslant\frac1\alpha\limsup _ {n\to\infty}\int _ 0^11-\operatorname{Re}(\varphi _ n(t/K))\, \mathrm dt \\
={} & \frac1\alpha\int _ 0^1\lim _ {n\to\infty}(1-\operatorname{Re}(\varphi _ n(t/K)))\, \mathrm dt=\frac1\alpha\int _ 0^1 1-\operatorname{Re}(\varphi(t/K))\, \mathrm dt.
\end{align*}
By the assumptions of \varphi, when K is large |1-\operatorname{Re}(\varphi _ n(t/K))|<\varepsilon, so the last integral converges to 0 as K\to\infty. Hence \{P _ n\} _ {n\in\mathbb N} is tight. The theorem is proved.
3. Some more properties In this section, we first study the connection between the moments of a real random variable X and the derivatives of its characteristic function \varphi _ X. Then we look at some theorems that can help us determine whether a function can be a characteristic function.

Since characteristic functions are obtained by taking expectations to some exponential function, it might be helpful to have some estimate of the exponential function.

Proposition 14. For t\in\mathbb R and n\in\mathbb N, we have
\[
\Big|\mathrm e^{\mathrm it}-1-\frac{\mathrm it}{1!}-\dots-\frac{(\mathrm it)^{n-1}}{(n-1)!}\Big|\leqslant\frac{|t|^n}{n!}.
\]

It can be proved as follows: Recall the Taylor theorem with integral form of the remainder,
\[
\mathrm e^{z}-\Big(1+\frac z{1!}+\dots+\frac{z^{n-1}}{(n-1)!}\Big)=\int _ {0}^{z}\frac{(z-x)^{n-1}}{(n-1)!}\, \mathrm e^x\, \mathrm dx.
\]
Thus
\[
\Big|\mathrm e^{\mathrm it}-1-\frac{\mathrm it}{1!}-\dots-\frac{(\mathrm it)^{n-1}}{(n-1)!}\Big|\leqslant
\int _ {0}^{t}\frac{|t-x|^{n-1}}{(n-1)!}\, |\mathrm e^{\mathrm ix}|\, \mathrm dx=\frac{|t|^n}{n!}.
\]

Theorem 15 (Moments and differentiability). Let X be a real random variable with characteristic function \varphi. If \mathbb E (|X|^n)<\infty, then \varphi is n-times continuously differentiable with derivatives
\[
\varphi^{(k)}(t)=\mathbb E[(\mathrm iX)^k \, \mathrm e^{\mathrm itX}],\quad k=0,1,\dots,n.
\]
In particular,
\[
\varphi^{(k)}(0)=\mathrm i^k\, \mathbb E(X^k),\quad k=0,1,\dots,n.
\]
If \mathbb E(X^2)<\infty, then
\[
\varphi(t)=1+\mathrm i\, t\, \mathbb E(X)-\frac12t^2\, \mathbb E(X^2)+o(t^2),\quad t\to0.
\]

This theorem is easy to prove. It is clearly true for k=0. Suppose \varphi^{(k-1)}(t)=\mathbb E[(\mathrm iX)^{k-1}\, \mathrm e^{\mathrm itX}], then \varphi^{(k-1)}(t)\leqslant\, \mathbb E[|X|^{k-1}]<\infty. Thus (\mathrm iX)^{k-1}\, \mathrm e^{\mathrm itX} is integrable, with derivative (\mathrm iX)^{k}\, \mathrm e^{\mathrm itX} bounded by |(\mathrm iX)^{k}\, \mathrm e^{\mathrm itX}|=|X|^k, which is also integrable. Therefore, \frac{\mathrm d}{\mathrm dt}\mathbb E[(\mathrm iX)^{k-1}\, \mathrm e^{\mathrm itX}]=\mathbb E[\frac{\mathrm d}{\mathrm dt}(\mathrm iX)^{k-1}\, \mathrm e^{\mathrm itX}], that is, \varphi^{(k)}(t)=\mathbb E[(\mathrm iX)^k \, \mathrm e^{\mathrm itX}]. On the other hand, the continuity is not difficult to check.

Corollary 16. Let X be a real random variable with
\[
\alpha:=\limsup _ {n\to\infty}\frac{\sqrt[n]{\mathbb E[|X|^n]}}{n}<\infty.
\]
Then the distribution of X is uniquely determined by the moments \mathbb E(X^n). In particular, the statement is true if \mathbb E(\mathrm e^{|tX|})<\infty.

To prove this corollary, we can use Stirling's formula. For |h|<\frac1{3\alpha},
\[
\limsup _ {n\to\infty}\frac{|h|^n\, \mathbb E(|X|^n)}{n!}=\limsup _ {n\to\infty}\frac{1}{\sqrt{2\pi n}}\Big(|h|\, \sqrt[n]{E[|X|^n]}\cdot\frac{\mathrm e}{n}\Big)^n\leqslant\limsup _ {n\to\infty}\frac{1}{\sqrt{2\pi n}}\Big(\frac{\mathrm e}{3}\Big)^n=0.
\]
By Proposition 14, we have
\begin{align*}
& \Big|\varphi(t+h)-\sum _ {k=0}^{n-1}\mathbb E[\mathrm e^{\mathrm itX}(\mathrm iX)^k]\frac{h^k}{k!}\Big|=\bigg|\mathbb E\Big[\mathrm e^{\mathrm itX}\Big(\mathrm e^{\mathrm ihX}-\sum _ {j=0}^{n-1} \frac{(\mathrm ihX)^k}{k!}\Big)\Big]\bigg| \\
\leqslant{} & \mathbb E\Big|\mathrm e^{\mathrm ihX}-\sum _ {j=0}^{n-1} \frac{(\mathrm ihX)^k}{k!}\Big|\leqslant \frac{|h|^n}{n!}\mathbb E(|X|^n)\to0.
\end{align*}
We can see the characteristic function can be expanded about any point t in a power series with radius of convergence at least 1/({3\alpha}). Thus \varphi is a real analytic function. The value of \varphi on the ball (-1/(3\alpha),1/(3\alpha)) can be determined by the coefficients of its power series about t = 0, that is, by the moments of X.

To show the moments determine the value of \varphi over the whole real line \mathbb R, we can consider a proof similar to the one of identity theorem for complex analytic function. Let \psi be another characteristic function that has the same coefficients of its power series as \varphi about t = 0. Let S be the set on which \varphi and \psi have the same Taylor expansion. We will show S is nonempty, open, and closed. Then by connectedness of \mathbb R, S must be equal to \mathbb R, which implies \varphi=\psi on S=\mathbb R. First, by definition S contains 0, so it is not empty. Second, for t\in S, since both \varphi and \psi at t have non-zero radius of convergence, some interval centered at t also lies in S, so S is open. Finally, it can be seen that \varphi^{(n)} and \psi^{(n)} are continuous, which means that \{t\mid \varphi^{n}(t)=\psi^n(t)\} is closed for all n. Hence S is an intersection of closed sets, so it's closed. We can conclude that S=\mathbb R, which implies the moments of a real random variable satisfying \alpha<\infty uniquely determine the characteristic function, and thus the distribution as well.

Theorem 17. Let X be a real random variable and let \varphi be its characteristic function. Let n\in\mathbb N, and assume \varphi is 2n-times differentiable at 0 with derivative \varphi^{(2n)}(0). Then \mathbb E(X^{2n})=(-1)^n\varphi^{(2n)}(0)<\infty.

Note that the statement may be false for odd moments.

By the previous theorem, to prove this theorem it suffices to show \mathbb E(X^{(2n)})<\infty. We carry out the proof by induction. For n=0, the proposition is true. Now assume \varphi is 2n-times differentiable at 0. We consider the real part of \varphi: define u(t)=\operatorname{Re}(\varphi(t))=\mathbb E(\cos tX). Then u is also 2n-times differentiable at 0, and since u is even, u^{(2k-1)}(0)=0 for k=1,\dots,n. By Taylor's theorem, for t close enough to 0,
\[
\Big|u(t)-\sum _ {k=0}^{n-1} u^{(2 k)}(0) \frac{t^{2 k}}{(2 k) !}\Big| \leq \frac{|t|^{2 n-1}}{(2 n-1) !} \sup _ {\theta \in(0,1]}|u^{(2 n-1)}(\theta t)|.
\]
Define a continuous function f:\mathbb R\to[0,\infty) by f(0)=1 and for non-zero x,
\[
f(x)=\frac{(-1)^n(2n)!}{x^{2n}}\Big(\cos x-\sum _ {k=0}^{n-1}(-1)^k\frac{x^{2k}}{(2k)!}\Big).
\]
By the induction hypothesis, \mathbb E(X^{2k})=(-1)^ku^{(2k)}(0) for k=1,\dots,n-1. Combining these results, we infer
\[
\mathbb E[f(tX)X^{2n}]\leqslant \frac{2 n}{|t|} \sup _ {\theta \in(0,1]}|u^{(2 n-1)}(\theta t)| \leqslant g(t):=2 n \sup _ {\theta \in(0,1]} \Big|\frac{u^{(2 n-1)}(\theta t)}{\theta t}\Big|.
\]
By Fatou's lemma,
\[
\mathbb E(X^{2n})=\mathbb E[f(0)X^{2n}]\leqslant\liminf _ {t\to0}\mathbb E[f(tX)X^{2n}]\leqslant\liminf _ {t\to0}g(t)=2n|u^{(2n)}(0)|<\infty.
\]
We have now proved \mathbb E(X^{(2n)})<\infty for all n by induction. And the theorem follows.

.

Conclusion Now we summarise the results for those connections of moments and characteristic functions. A random variable with finite n-th moment possesses a characteristic function that is n-times differentiable. The moments can be read off from the derivatives at 0. Conversely, for even moments, if characteristic function is n-times differentiable, then the random variable has finite n-th moment. Moreover, if all moments exist and do not grow too quickly, then the moments determine the distribution.

We end this section with Pólya's theorem and Bochner's theorem. Pólya's theorem gives a sufficient condition for
a symmetric real function to be a characteristic function. However, that condition is not necessary, as, for example, the normal distribution does not fulfill it. Bochner's theorem formulates a necessary and sufficient condition for a function to be the characteristic function of a probability measure. Both of the theorems are not followed by their proofs.

.

Theorem 18 (Pólya). Let f:\mathbb R\to[0,1] be continuous and even with f(0)=1. Assume that f is convex on [0,\infty). Then f is the characteristic function of a probability measure.

Definition 19. A function f:\mathbb R^d\to\mathbb C is called positive semidefinite if, for all n\in\mathbb N, \boldsymbol t _ 1,\dots,\boldsymbol t _ n\in\mathbb R^d and y _ 1,\dots,y _ n\in\mathbb C, we have
\[
\sum _ {i,j=1}^{n}y _ i\bar y _ j f(\boldsymbol t _ i-\boldsymbol t _ j)\geqslant0,
\]
in other words, if the matrix (f(\boldsymbol t _ i-\boldsymbol t _ j)) is positive semidefinite.

Theorem 20 (Bochner). A continuous function \varphi:\mathbb R^d\to\mathbb C is the characteristic function of a probability distribution on \mathbb R^d if and only if \varphi is positive semidefinite and \varphi(\boldsymbol 0) = 1.

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注