Random variables

You can read the LaTeX document online (for the latest updated chapters) from the link: probability.pdf

Chapter 3: Random variables 1. Definitions Basically speaking, a random variable is just a measurable function.

It is necessary to give a general definition, as the way in real analysis, for logical reasons in many applications.

Definition 1. A real, extended-valued random variable is a function

X

whose domain is a set

\Delta

\mathcal A

and whose range is contained in

\overline{\mathbb{R}}

such that for each

B

\overline{\mathcal B}

, we have
\[
\{\omega \mid X(\omega) \in B\}\in\Delta\cap \mathcal A,
\]

where $\Delta\cap \mathcal A$ is the trace of $\mathcal A$ on $\Delta$ . A complex-valued random variable is a function on a set $\Delta$ in $\mathcal A$ to the complex plane whose real and imaginary parts are both real, finite-valued random variables.

For a discussion of basic properties we may suppose $\Delta=\Omega$ and that $X$ is real and finite-valued with probability one. The general case may be reduced to this one by considering the trace of $(\Omega,\mathcal A,\mathbb P)$ on $\Delta$ , or on the "domain of finiteness" $\Omega _ 0=\{\omega\mid|X(\omega)|<\infty\}$ , and taking real and imaginary parts.

Recall the theory in measurable functions, we can characterise random variable with the following theorem.

Theorem 2.

X

is a random variable iff for each real number

x

, we have

\{\omega\mid X(\omega)\leqslant x\}\in\mathcal A

The probability of the set in Definition def31 is clearly defined and will be written as
\[
\mathbb{P}(X(\omega)\in B),\, \text{or }\mathbb{P}(X\in B).
\]

The next theorem relates the probability measure $\mathbb P$ to a probability measure on $(\mathbb R,\mathcal B)$ as discussed in Chapter 2.

Theorem 3. Each random variable on the probability space

(\Omega, \mathcal A,\mathbb P)

induces a probability space

(\mathbb R, \mathcal B,\mu)

by means of the following correspondence:
\[
\mu(B)=\mathbb{P}(X^{-1}(B))=\mathbb{P}(X\in B),\quad \forall B\in\mathcal B.
\]

The collections of sets $\{X^{-1}(S),\, S\subseteq\mathbb R\}$ is a $\sigma$ -algebra for any function $X$ . If $X$ is a random variable then the collection $\{X^{-1}(B),\, B\subseteq\mathcal B\}$ is called the $\sigma$ -algebra generated by $X$ . It is The smallest $\sigma$ -algebra contained in $\mathcal A$ which contains all sets of the form $\{\omega\mid X(\omega)\leqslant x\}$ , where $x\in\mathbb R$ . Thus Theorem 3 gives us a convenient way of representing the measure $\mathbb P$ when it is restricted to this subfield; symbolically we may write it as follows:
\[
\mu=\mathbb P\circ X^{-1}.
\]

This $\mu$ is called the "probability distribution measure" or probability measure of $X$ , and its associated distribution function $F$ according to Section 2.2 will be called the distribution function of $X$ . Specifically, $F$ is given by
\[
F(x)=\mu((-\infty,x])=\mathbb P(X\leqslant x).
\]

While the random variable $X$ determines $\mu$ and therefore $F$ , the converse is obviously false. A family of random variables having the same distribution is said to be "identically distributed".

Theorem 4. If

X

is a random variable,

f

a Borel measurable function, then

f(X)

is a random variable.

For the notion of random vector, it is just a vector each of whose components is a random variable. It is sufficient to consider the case of two dimensions, since there is no essential difference in higher dimensions apart from in notation.

The class of Borel sets in $\mathbb R^2$ is denoted as $\mathcal B^2$ , and the class of sets, each of which is a finite union of disjoint product sets, forms an algebra denoted $\mathcal B _ 0^2$ .

Now let $X$ and $Y$ be two random variables on $(\Omega, \mathcal A,\mathbb P)$ . The random vector $(X,Y)$ induces a probability $\nu$ on $\mathcal B^2$ as follows:
\[
\nu(A)=\mathbb P((X,Y)\in A), \quad\forall A\in\mathcal B^2,
\]

the right side being an abbreviation of $\mathbb P(\omega\mid (X(\omega),Y(\omega))\in A)$ . This $\nu$ is called the (2-dimensional, probability) distribution or simply the probability measure of $(X, Y)$ .

Theorem 5. If

X

and

Y

are random variables and

f

is a Borel measurable function of two variables, then

f(X,Y)

is a random variable.

Proof $[f\circ(X,Y)]^{-1}(\mathcal B)=(X,Y)^{-1}(f^{-1}(\mathcal B))\subseteq (X,Y)^{-1}(\mathcal B^2)$ , we just need to show $(X,Y)^{-1}(\mathcal B^2)\subseteq \mathcal A$ . If $A$ has the simple form of two Borel sets, say $A=B _ 1\times B _ 2$ where $B _ 1,B _ 2\in\mathcal B$ , then it is clear $(X,Y)^{-1}(A)=X^{-1}(B _ 1)\cap Y^{-1}(B _ 2)\in\mathcal A$ . Hence for the class of sets $A$ for which $(X,Y)^{-1}(A)\in\mathcal A$ , the algebra $\mathcal B _ 0^2$ is contained in it. It can be proved that this class forms a $\sigma$ -algebra, thus it must contain $\mathcal B^2$ (which is the smallest $\sigma$ -algebra containing $\mathcal B _ 0^2$ ). We now have the desired conclusion.

Corollary 6. If

X

is a random variable and

f

a continuous function on

\mathbb R

, then

f(X)

is a random variable; in particular

X^n

for positive integer

n

|X|^r

for positive real

r

\mathrm{e}^{\lambda x}

\mathrm{e}^{\mathrm{i}tX}

for real

\lambda

and

t

are all random variables. If

X

and

Y

are random variables, then

\max\{X,Y\}

\min\{X,Y\}

X+Y

X-Y

XY

X/Y

(provided

Y

does not vanish) are all random variables.

If $\{X _ i\} _ {i=1}^\infty$ is a sequence of random variables, then $\inf X _ i$ , $\sup X _ i$ , $\liminf X _ i$ , $\limsup X _ i$ are random variables, not necessarily finite-valued with probability one though everywhere defined, and $\lim X _ i$ is an random variable on the set $\Delta$ on which there is either convergence or divergence to $\pm\infty$ .

The analogue in real analysis should be well known to the reader, so we omit the proof.

Definition 7. A random variable

X

is called discrete (or countably valued) iff there is a countable set

B\subseteq \mathbb R

such that

\mathbb P(X\in B)=1

It is easy to see that $X$ is discrete iff its distribution function is.

Let $\{\Lambda _ j\}$ be a countable partition of $\Omega$ , $\{b _ j\}$ be arbitrary real numbers, then the function $\varphi$ defined by $\varphi(\omega)=\sum _ j b _ j\chi _ {\Lambda _ j}(\omega)\, (\forall \omega\in\Omega)$ , is a discrete random variable. We shall call $\varphi$ the random variable belonging to the weighted partition $\{\Lambda _ j;b _ j\}$ . Each discrete random variable $X$ belongs to a certain partition. For let $\{b _ j\}$ be the countable set in the definition of $X$ and let $\Lambda _ j = \{\omega\mid X (\omega)) = b _ j \}$ , then $X$ belongs to the weighted partition $\{\Lambda _ j;b _ j\}$ . If $j$ ranges over a finite index set, the partition is called finite and the r.v. belonging to it simple.

Transformation in $\mathbb R^n$

Here It is stated without proof the transformation formula for measures with continuous densities under differentiable maps. The proof can be found in textbooks on calculus. With this formula we can obtain the density of some new transformed random variables.

Theorem 8. Let

\mu

be a measure in

\mathbb R^n

that has a continuous density

f:\mathbb R^n\to[0,\infty)

. That is, for any

\prod _ {i=1}^{n}(x _ i,y _ i]

, it holds that

\mu\left(\prod (x _ i,y _ i]\right)=\int f(\boldsymbol t)\, \mathrm d\boldsymbol t

. Let

A\subseteq \mathbb R^n

be an open or a closed subset of

\mathbb R^n

with

\mu(\mathbb R^n\setminus A)=0

, and let

B\subseteq \mathbb R^n

be open or closed. Assume

\varphi:A\to B

is a continuously differentiable bijection. Then the image measure

\mu\circ\varphi^{-1}

has the density
\[
f _ {\varphi}(\boldsymbol x)={f(\varphi^{-1}(\boldsymbol x))}\cdot|\det(\varphi'[\varphi^{-1}(\boldsymbol x)])|^{-1}, \quad \boldsymbol x\in B,\, \det(\varphi'[\varphi^{-1}(\boldsymbol x)])\neq0.
\]

For $\boldsymbol x$ elsewhere, $f _ \varphi(\boldsymbol x)$ is assigned $0$ .

Now we let $\mu$ be the probability measure of $\boldsymbol X$ , i.e., $\mu=\mathbb P\circ \boldsymbol X^{-1}$ . We have $\mu(A)=(\mathbb P\circ \boldsymbol X^{-1})(A)=1$ , and $(\mu\circ\varphi^{-1})(B)=\mu(A)=1$ . Let $\boldsymbol Y^{-1}=\boldsymbol X^{-1}\circ\varphi^{-1}$ , then $\boldsymbol Y=\varphi(\boldsymbol X)$ . The range of $\boldsymbol Y$ is $B$ . When $\boldsymbol X$ has density $f(\boldsymbol x)$ , from the theorem the density of $\boldsymbol Y$ is
\[
f _ {\boldsymbol Y}(\boldsymbol y)={f(\boldsymbol x)}\cdot|\det(\varphi'(\boldsymbol x))|^{-1}={f(\varphi^{-1}(\boldsymbol y))}\cdot|\det(\varphi'[\varphi^{-1}(\boldsymbol y)])|^{-1}.
\]

2. Expectation The concept of "(mathematical) expectation" is the same as that of integration in the probability space with respect to the measure $\mathbb P$ . The reader is supposed to have some acquaintance with this. The general theory is not much different. The random variables below will be tacitly assumed to be finite everywhere to avoid trivial complications.

We first define the expectation of an arbitrary positive random variable $X$ . For any two positive integers $m,n$ , the set $\Lambda _ {mn}=\{\omega\mid n/2^m\leqslant X(\omega)<(n+1)/2^m\}$ belongs to $\mathcal A$ . For each $m$ , let $X _ m$ denote the random variable belonging to the weighted partition $\{\Lambda _ {mn};n/2^m\}$ . It is easy to see we have got a increasing sequence of random variables, and there is monotone convergence: $X _ m(\omega)=X(\omega)$ . The expectation $\mathbb E(X)$ of $X$ is defined as the limit as $m\to\infty$
\[
\sum _ {n=0}^{\infty}\frac{n}{2^m}\mathbb{P}\left(\frac{n}{2^m}\leqslant X<\frac{n+1}{2^m}\right),
\]

the limit existing, finite or infinite. For an arbitrary $X$ , put as usual $X=X^+-X^-$ , both $X^+$ and $X^-$ are positive random variables, so their expectations are defined. Unless both $\mathbb E(X^+)$ and $\mathbb E(X^-)$ are $+\infty$ , we define $\mathbb E(X)=\mathbb (E^+)-\mathbb (E^-)$ with the usual convention regrading $\infty$ . The expectation, when it exists, is also denoted by
\[
\int _ \Omega X(\omega)\, \mathbb{P}(\mathrm{d}\omega).
\]

For each $\Lambda\in\mathcal A$ , we define
\[\int _ \Lambda X(\omega)\, \mathbb{P}(\mathrm{d}\omega)=\mathbb{E}(X\cdot\chi _ {\Lambda})\]

and call it "the integral of $X$ (with respect to $\mathbb P$ ) over the set $\Lambda$ ". As a general notation, the left member above will be abbreviated to
\[
\int _ \Lambda X\, \mathrm{d}\mathbb P.
\]

We shall say that $X$ is integrable with respect to $\mathbb P$ over $\Lambda$ iff the integral above exists and is finite.

The general integral has the familiar properties of the Lebesgue integral.

Proposition 9.

Absolute integrability. $\int _ \Lambda X\, \mathrm{d}\mathbb P$ is finite iff $\int _ \Lambda |X|\, \mathrm{d}\mathbb P<\infty$ .
Linearity. $\int _ \Lambda(aX+bY)\, \mathrm{d}\mathbb P=a\int _ \Lambda X\, \mathrm{d}\mathbb P+b\int _ \Lambda Y\, \mathrm{d}\mathbb P$ , provided that the right side is meaningful, namely not $+\infty-\infty$ or $-\infty+\infty$ .
Additivity over sets. If the $\Lambda _ n$ 's are disjoint, then $\int _ {\bigcup \Lambda _ n}X\, \mathrm{d}\mathbb P=\sum _ n\int _ {\Lambda _ n}X\, \mathrm{d}\mathbb P$ .
Positivity. If $X\geqslant 0$ a.e. on $\Lambda$ , then $\int _ \Lambda X\, \mathrm{d}\mathbb P\geqslant0$ .
Monotonicity. If $X _ 1\leqslant X\leqslant X _ 2$ on $\Lambda$ a.e. on $\Lambda$ , then $\int _ \Lambda X _ 1\, \mathrm{d}\mathbb P\leqslant\int _ \Lambda X\, \mathrm{d}\mathbb P\leqslant\int _ \Lambda X _ 2\, \mathrm{d}\mathbb P$ .
Mean value theorem. If $a\leqslant X\leqslant b$ a.e. on $\Lambda$ , then $a\, \mathbb P(\Lambda)\leqslant\int _ \Lambda X\, \mathrm{d}\mathbb P\leqslant b\, \mathbb P(\Lambda)$ .
Modulus inequality. $\left|\int _ \Lambda X\, \mathrm{d}\mathbb P\right|\leqslant\int _ \Lambda |X|\, \mathrm{d}\mathbb P$ .
Dominated convergence theorem. If $X _ n\to X$ a.e. (or merely in measure) on $\Lambda$ and $|X _ n|\leqslant Y$ a.e. on $\Lambda$ , with $\int _ \Lambda Y\, \mathrm{d}\mathbb P<\infty$ , then
\[
\lim _ {n\to\infty}\int _ \Lambda X _ n\, \mathrm{d}\mathbb P=\int _ \Lambda\lim _ {n\to\infty} X _ n\, \mathrm{d}\mathbb P=\int _ \Lambda X\, \mathrm{d}\mathbb P.
\]
Bounded convergence theorem. If $X _ n\to X$ a.e. (or merely in measure) on $\Lambda$ and $|X _ n|\leqslant Y$ a.e. on $\Lambda$ and there is a constant $M$ such that $|X _ n|\leqslant M$ a.e. on $\Lambda$ , then the formula in (8) is true.
Monotone convergence theorem. If $X _ n\geqslant0$ and $X _ n\uparrow X$ a.e. on $\Lambda$ , then the formula in (8) is again true provided that $+\infty$ is allowed as a value for either member. The condition " $X _ n\geqslant 0$ " may be weakened to : " $\mathbb E(X _ n) > -\infty$ for some $n$ ".
Integration term by term. If $\sum _ n\int _ \Lambda |X _ n|\, \mathrm d\mathbb P<\infty$ , then $\sum _ n|X _ n|<\infty$ a.e. on $\Lambda$ so that $\sum _ n X _ n$ converges a.e. on $\Lambda$ and
\[
\int _ \Lambda\sum _ n X _ n\, \mathrm{d}\mathbb P=\sum _ n\int _ \Lambda X _ n\, \mathrm{d}\mathbb P.
\]
Fatou's lemma. If $X _ n\geqslant0$ a.e. on $\Lambda$ , then
\[
\int _ \Lambda\liminf _ {n\to\infty} X _ n\, \mathrm{d}\mathbb P\leqslant\liminf _ {n\to\infty}\int _ \Lambda X _ n\, \mathrm{d}\mathbb P.
\]

Theorem 10. We have
\[
\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n)\leqslant\mathbb E(|X|)\leqslant1+\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n)
\]

so that $\mathbb E(|X|)<\infty$ iff the series above converges.

Corollary 11. If

X

takes only positive integer values, then
\[
\mathbb E(X)=\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n).
\]

To verify the theorem, set $\Lambda _ n=\{n\leqslant X<n+1\}$ , then $\mathbb E(|X|)=\sum _ {n=0}^\infty\int _ {\Lambda _ n}|X|\, \mathrm d\mathbb P$ , and
\[
\sum _ {n=0}^{\infty}n\, \mathbb P(\Lambda _ n)=\sum _ {n=0}^{\infty}\mathbb P(|X|\geqslant n).
\]

When $X$ only takes positive integer values, this equation is just the above corollary.

There is a basic relation between the abstract integral with respect to $\mathbb P$ over sets in $\mathcal A$ on the one hand, and the Lebesgue-Stieltjes integral with respect to $\mu$ over sets in $\mathcal B$ on the other, induced by each random variable. We give the version in one dimension first.

Theorem 12. Let

X

(\Omega,\mathcal A,\mathbb P)

induce the probability space

(\mathbb R,\mathcal B, \mu)

according to Theorem 3 and let

f

be Borel measurable. Then we have
\[
\int _ \Omega f(X(\omega))\, \mathbb P(\mathrm d\omega)=\int _ \mathbb R f(x)\, \mu(\mathrm dx)
\]

provided that either side exists.

The key point of the proof is approximation using simple functions. If $f$ is a characteristic function of a Borel set, then the left side is $\mathbb P(X\in B)$ and the right side is $\mu(B)$ . They are equal by the definition. Hence the proposition holds when $f$ is a simple function. Then we construct a positive increasing sequence of simple functions such that $f _ n\uparrow f$ , and take limits on both sides to the proposition with $f _ n$ . The general case follows in the usual way.

As a consequence of this theorem, we have: if $\mu _ X$ and $F _ X$ denote, respectively, the probability measure and distribution function induced by $X$ , then we have
\[
\mathbb E(X)=\int _ \mathbb R x\, \mu _ X(\mathrm dx)=\int _ {-\infty}^{+\infty}x\, \mathrm dF _ X(x),
\]

and more generally,
\[
\mathbb E(f(X))=\int _ \mathbb R f(x)\, \mu _ X(\mathrm dx)=\int _ {-\infty}^{+\infty}f(x)\, \mathrm dF _ X(x),
\]

with the usual proviso regarding existence and finiteness.

We shall need the generalization of the preceding theorem in several dimensions. No change is necessary except for notation, which we will give in two dimensions. Let us write the "mass element" as $\mu^2(\mathrm dx,\mathrm dy)$ so that
\[
\nu(A)=\iint _ A\mu^2(\mathrm dx,\mathrm dy).
\]

Theorem 13. Let

(X,Y)

(\Omega,\mathcal A,\mathbb P)

induce the probability space

(\mathbb R^2,\mathcal B^2,\mu^2)

and let

f

be a Borel measurable function of two variables. Then we have
\[
\int _ \Omega f(X(\omega),Y(\omega))\, \mathbb P(\mathrm d\omega)=\iint _ {\mathbb R^2}f(x,y)\, \mu^2(\mathrm dx,\mathrm dy).
\]

Note that $f(X,Y)$ is a random variable.

If we take $f(x,y)=x+y$ , we obatain
\[
\mathbb E(X+Y)=\mathbb E(X)+\mathbb E(Y).
\]

This is a useful relation.

More generally, we have the following theorem (you can prove it on your own):

Theorem 14 (Image measure). Let

(\Omega,\mathcal{A})

and

(\Omega',\mathcal A')

be measurable spaces, let

\mu

be a measure on

(\Omega,\mathcal{A})

and let

X:\Omega\to\Omega'

be measurable. Let

\mu'=\mu\circ X^{-1}

be the image measure of

\mu

under the map

X

. Assume that

f:\Omega'\to\overline{\mathbb R}

\mu'

-integrable. Then

f\circ X

\mu

-integrable and
\[
\int_\Omega (f\circ X)\, \mathrm d\mu=\int_{\Omega'}f\, \mathrm d(\mu\circ X^{-1}).
\]

Moments Let $a\in\mathbb R$ , $r\in\mathbb R^+$ , then $\mathbb E(|X-a|^r)$ is called the absolute moment of $X$ of order $r$ , about $a$ . It may be $+\infty$ ; otherwise, and if $r$ is an integer, $\mathbb E(X-a)^r$ is the corresponding moment. If $\mu$ and $F$ denote, respectively, the probability measure and distribution function induced by $X$ , then
\begin{align*}
\mathbb E|X-a|^r & =\int _ \mathbb R |x-a|^r\, \mu(\mathrm dx)=\int _ {-\infty}^{+\infty}|x-a|^r\, \mathrm dF(x), \\
\mathbb E(X-a)^r & =\int _ \mathbb R (x-a)^r\, \mu(\mathrm dx)=\int _ {-\infty}^{+\infty}(x-a)^r\, \mathrm dF(x).
\end{align*}

For $r=1$ , $a=0$ , this reduces to $\mathbb E(X)$ , which is also called the mean of $X$ . The moments about mean are called central moments. That of order $2$ is particularly important and is called the variance, $\operatorname{Var} (X)$ ; its positive square root the standard deviation $\sigma(X)$ :
\[
\operatorname{Var}(X)=\sigma^2(X)=\mathbb E(X-\mathbb EX)^2=\mathbb E(X^2)-(\mathbb EX)^2.
\]

For any positive number $p$ , $X$ is said to belong to $L^p=L^p(\Omega,\mathcal A,\mathbb P)$ iff $\mathbb E|X|^p<\infty$ .

Here are some well-known inequalities.

Theorem 15. Let

X

and

Y

be random variables,

1<p<\infty

and

1/p+1/q=1

, then

Hölder's inequality:
\[
|\mathbb E(XY)|\leqslant \mathbb E|XY|\leqslant \mathbb (\mathbb E|X|^p)^{1/p}(\mathbb E|Y|^q)^{1/q}.
\]
Minkowski inequality:
\[
(\mathbb E|X+Y|^p)^{1/p}\leqslant(\mathbb E|X|^p)^{1/p}+(\mathbb E|Y|^p)^{1/p}.
\]
Cauchy-Schwarz inequality:
\[
(\mathbb E|XY|)^2\leqslant \mathbb (E|X|^2)(\mathbb E|Y|^2).
\]
If $Y\equiv1$ in 1.,
\[
\mathbb E|X|\leqslant (\mathbb E|X|^p)^{1/p}.
\]
Liapounov inequality:
\[
(\mathbb E|X|^r)^{1/r}\leqslant(\mathbb E|X|^{r^\prime})^{1/{r}^\prime},\quad0<r<r^\prime<\infty.
\]
Jensen's inequality: If $\varphi$ is a convex function on $\mathbb R$ , and $X$ and $\varphi(X)$ are integrable random variables:
\[
\varphi(\mathbb E(X))\leqslant \mathbb E(\varphi(X)).
\]
Chebyshev inequality: If $\varphi$ is a strictly positive and increasing function on $(0, +\infty)$ , $\varphi(u) = \varphi(-u)$ , and $X$ is a random variable such that $\mathbb E(\varphi(X))<\infty$ , then for each $u>0$ ,
\[
\mathbb P(|X|\geqslant u)\leqslant\frac{\mathbb E(\varphi(X))}{\varphi(u)}.
\]

评论

发表回复取消回复

Random variables

评论

发表回复 取消回复

发表回复取消回复