Loading [MathJax]/jax/element/mml/optable/SuppMathOperators.js

Random variables

You can read the LaTeX document online (for the latest updated chapters) from the link: probability.pdf

Chapter 3: Random variables 1. Definitions Basically speaking, a random variable is just a measurable function.

It is necessary to give a general definition, as the way in real analysis, for logical reasons in many applications.

Definition 1. A real, extended-valued random variable is a function XX whose domain is a set Δ\Delta in A\mathcal A and whose range is contained in R\overline{\mathbb{R}} such that for each BB in B\overline{\mathcal B}, we have
{ωX(ω)B}ΔA,

where ΔA\Delta\cap \mathcal A is the trace of A\mathcal A on Δ\Delta. A complex-valued random variable is a function on a set Δ\Delta in A\mathcal A to the complex plane whose real and imaginary parts are both real, finite-valued random variables.

For a discussion of basic properties we may suppose Δ=Ω\Delta=\Omega and that XX is real and finite-valued with probability one. The general case may be reduced to this one by considering the trace of (Ω,A,P)(\Omega,\mathcal A,\mathbb P) on Δ\Delta, or on the "domain of finiteness" Ω0={ωX(ω)<}\Omega _ 0=\{\omega\mid|X(\omega)|<\infty\}, and taking real and imaginary parts.

Recall the theory in measurable functions, we can characterise random variable with the following theorem.

Theorem 2. XX is a random variable iff for each real number xx, we have {ωX(ω)x}A\{\omega\mid X(\omega)\leqslant x\}\in\mathcal A.

The probability of the set in Definition def31 is clearly defined and will be written as
P(X(ω)B),or P(XB).

The next theorem relates the probability measure P\mathbb P to a probability measure on (R,B)(\mathbb R,\mathcal B) as discussed in Chapter 2.

Theorem 3. Each random variable on the probability space (Ω,A,P)(\Omega, \mathcal A,\mathbb P) induces a probability space (R,B,μ)(\mathbb R, \mathcal B,\mu) by means of the following correspondence:
μ(B)=P(X1(B))=P(XB),BB.

The collections of sets {X1(S),SR}\{X^{-1}(S),\, S\subseteq\mathbb R\} is a σ\sigma-algebra for any function XX. If XX is a random variable then the collection {X1(B),BB}\{X^{-1}(B),\, B\subseteq\mathcal B\} is called the σ\sigma-algebra generated by XX. It is The smallest σ\sigma-algebra contained in A\mathcal A which contains all sets of the form {ωX(ω)x}\{\omega\mid X(\omega)\leqslant x\}, where xRx\in\mathbb R. Thus Theorem 3 gives us a convenient way of representing the measure P\mathbb P when it is restricted to this subfield; symbolically we may write it as follows:
μ=PX1.

This μ\mu is called the "probability distribution measure" or probability measure of XX, and its associated distribution function FF according to Section 2.2 will be called the distribution function of XX. Specifically, FF is given by
F(x)=μ((,x])=P(X

While the random variable XX determines μ\mu and therefore FF, the converse is obviously false. A family of random variables having the same distribution is said to be "identically distributed".

Theorem 4. If XX is a random variable, ff a Borel measurable function, then f(X)f(X) is a random variable.

For the notion of random vector, it is just a vector each of whose components is a random variable. It is sufficient to consider the case of two dimensions, since there is no essential difference in higher dimensions apart from in notation.

The class of Borel sets in R2\mathbb R^2 is denoted as B2\mathcal B^2, and the class of sets, each of which is a finite union of disjoint product sets, forms an algebra denoted B02\mathcal B _ 0^2.

Now let XX and YY be two random variables on (Ω,A,P)(\Omega, \mathcal A,\mathbb P). The random vector (X,Y)(X,Y) induces a probability ν\nu on B2\mathcal B^2 as follows:
\nu(A)=\mathbb P((X,Y)\in A), \quad\forall A\in\mathcal B^2,

the right side being an abbreviation of P(ω(X(ω),Y(ω))A)\mathbb P(\omega\mid (X(\omega),Y(\omega))\in A). This ν\nu is called the (2-dimensional, probability) distribution or simply the probability measure of (X,Y)(X, Y).

Theorem 5. If XX and YY are random variables and ff is a Borel measurable function of two variables, then f(X,Y)f(X,Y) is a random variable.

Proof [f(X,Y)]1(B)=(X,Y)1(f1(B))(X,Y)1(B2)[f\circ(X,Y)]^{-1}(\mathcal B)=(X,Y)^{-1}(f^{-1}(\mathcal B))\subseteq (X,Y)^{-1}(\mathcal B^2), we just need to show (X,Y)1(B2)A(X,Y)^{-1}(\mathcal B^2)\subseteq \mathcal A. If AA has the simple form of two Borel sets, say A=B1×B2A=B _ 1\times B _ 2 where B1,B2BB _ 1,B _ 2\in\mathcal B, then it is clear (X,Y)1(A)=X1(B1)Y1(B2)A(X,Y)^{-1}(A)=X^{-1}(B _ 1)\cap Y^{-1}(B _ 2)\in\mathcal A. Hence for the class of sets AA for which (X,Y)1(A)A(X,Y)^{-1}(A)\in\mathcal A, the algebra B02\mathcal B _ 0^2 is contained in it. It can be proved that this class forms a σ\sigma-algebra, thus it must contain B2\mathcal B^2 (which is the smallest σ\sigma-algebra containing B02\mathcal B _ 0^2). We now have the desired conclusion.

Corollary 6. If XX is a random variable and ff a continuous function on R\mathbb R, then f(X)f(X) is a random variable; in particular XnX^n for positive integer nn, Xr|X|^r for positive real rr, eλx\mathrm{e}^{\lambda x}, eitX\mathrm{e}^{\mathrm{i}tX} for real λ\lambda and tt are all random variables. If XX and YY are random variables, then max{X,Y}\max\{X,Y\}, min{X,Y}\min\{X,Y\}, X+YX+Y, XYX-Y, XYXY, X/YX/Y(provided YY does not vanish) are all random variables.

If {Xi}i=1\{X _ i\} _ {i=1}^\infty is a sequence of random variables, then infXi\inf X _ i, supXi\sup X _ i, lim infXi\liminf X _ i, lim supXi\limsup X _ i are random variables, not necessarily finite-valued with probability one though everywhere defined, and limXi\lim X _ i is an random variable on the set Δ\Delta on which there is either convergence or divergence to ±\pm\infty.

The analogue in real analysis should be well known to the reader, so we omit the proof.

Definition 7. A random variable XX is called discrete (or countably valued) iff there is a countable set BRB\subseteq \mathbb R such that P(XB)=1\mathbb P(X\in B)=1.

It is easy to see that XX is discrete iff its distribution function is.

Let {Λj}\{\Lambda _ j\} be a countable partition of Ω\Omega, {bj}\{b _ j\} be arbitrary real numbers, then the function φ\varphi defined by φ(ω)=jbjχΛj(ω)(ωΩ)\varphi(\omega)=\sum _ j b _ j\chi _ {\Lambda _ j}(\omega)\, (\forall \omega\in\Omega), is a discrete random variable. We shall call φ\varphi the random variable belonging to the weighted partition {Λj;bj}\{\Lambda _ j;b _ j\}. Each discrete random variable XX belongs to a certain partition. For let {bj}\{b _ j\} be the countable set in the definition of XX and let Λj={ωX(ω))=bj}\Lambda _ j = \{\omega\mid X (\omega)) = b _ j \}, then XX belongs to the weighted partition {Λj;bj}\{\Lambda _ j;b _ j\}. If jj ranges over a finite index set, the partition is called finite and the r.v. belonging to it simple.

Transformation in Rn\mathbb R^n

Here It is stated without proof the transformation formula for measures with continuous densities under differentiable maps. The proof can be found in textbooks on calculus. With this formula we can obtain the density of some new transformed random variables.

Theorem 8. Let μ\mu be a measure in Rn\mathbb R^n that has a continuous density f:Rn[0,)f:\mathbb R^n\to[0,\infty). That is, for any i=1n(xi,yi]\prod _ {i=1}^{n}(x _ i,y _ i], it holds that μ((xi,yi])=f(t)dt\mu\left(\prod (x _ i,y _ i]\right)=\int f(\boldsymbol t)\, \mathrm d\boldsymbol t. Let ARnA\subseteq \mathbb R^n be an open or a closed subset of Rn\mathbb R^n with μ(RnA)=0\mu(\mathbb R^n\setminus A)=0, and let BRnB\subseteq \mathbb R^n be open or closed. Assume φ:AB\varphi:A\to B is a continuously differentiable bijection. Then the image measure μφ1\mu\circ\varphi^{-1} has the density
f _ {\varphi}(\boldsymbol x)={f(\varphi^{-1}(\boldsymbol x))}\cdot|\det(\varphi'[\varphi^{-1}(\boldsymbol x)])|^{-1}, \quad \boldsymbol x\in B,\, \det(\varphi'[\varphi^{-1}(\boldsymbol x)])\neq0.

For x\boldsymbol x elsewhere, fφ(x)f _ \varphi(\boldsymbol x) is assigned 00.

Now we let μ\mu be the probability measure of X\boldsymbol X, i.e., μ=PX1\mu=\mathbb P\circ \boldsymbol X^{-1}. We have μ(A)=(PX1)(A)=1\mu(A)=(\mathbb P\circ \boldsymbol X^{-1})(A)=1, and (μφ1)(B)=μ(A)=1(\mu\circ\varphi^{-1})(B)=\mu(A)=1. Let Y1=X1φ1\boldsymbol Y^{-1}=\boldsymbol X^{-1}\circ\varphi^{-1}, then Y=φ(X)\boldsymbol Y=\varphi(\boldsymbol X). The range of Y\boldsymbol Y is BB. When X\boldsymbol X has density f(x)f(\boldsymbol x), from the theorem the density of Y\boldsymbol Y is
f _ {\boldsymbol Y}(\boldsymbol y)={f(\boldsymbol x)}\cdot|\det(\varphi'(\boldsymbol x))|^{-1}={f(\varphi^{-1}(\boldsymbol y))}\cdot|\det(\varphi'[\varphi^{-1}(\boldsymbol y)])|^{-1}.

2. Expectation The concept of "(mathematical) expectation" is the same as that of integration in the probability space with respect to the measure P\mathbb P. The reader is supposed to have some acquaintance with this. The general theory is not much different. The random variables below will be tacitly assumed to be finite everywhere to avoid trivial complications.

We first define the expectation of an arbitrary positive random variable XX. For any two positive integers m,nm,n, the set Λmn={ωn/2mX(ω)<(n+1)/2m}\Lambda _ {mn}=\{\omega\mid n/2^m\leqslant X(\omega)<(n+1)/2^m\} belongs to A\mathcal A. For each mm, let XmX _ m denote the random variable belonging to the weighted partition {Λmn;n/2m}\{\Lambda _ {mn};n/2^m\}. It is easy to see we have got a increasing sequence of random variables, and there is monotone convergence: Xm(ω)=X(ω)X _ m(\omega)=X(\omega). The expectation E(X)\mathbb E(X) of XX is defined as the limit as mm\to\infty
\sum _ {n=0}^{\infty}\frac{n}{2^m}\mathbb{P}\left(\frac{n}{2^m}\leqslant X<\frac{n+1}{2^m}\right),

the limit existing, finite or infinite. For an arbitrary XX, put as usual X=X+XX=X^+-X^-, both X+X^+ and XX^- are positive random variables, so their expectations are defined. Unless both E(X+)\mathbb E(X^+) and E(X)\mathbb E(X^-) are ++\infty, we define E(X)=(E+)(E)\mathbb E(X)=\mathbb (E^+)-\mathbb (E^-) with the usual convention regrading \infty. The expectation, when it exists, is also denoted by
\int _ \Omega X(\omega)\, \mathbb{P}(\mathrm{d}\omega).

For each ΛA\Lambda\in\mathcal A, we define
\int _ \Lambda X(\omega)\, \mathbb{P}(\mathrm{d}\omega)=\mathbb{E}(X\cdot\chi _ {\Lambda})

and call it "the integral of XX (with respect to P\mathbb P) over the set Λ\Lambda". As a general notation, the left member above will be abbreviated to
\int _ \Lambda X\, \mathrm{d}\mathbb P.

We shall say that XX is integrable with respect to P\mathbb P over Λ\Lambda iff the integral above exists and is finite.

The general integral has the familiar properties of the Lebesgue integral.

Proposition 9.
  1. Absolute integrability. ΛXdP\int _ \Lambda X\, \mathrm{d}\mathbb P is finite iff ΛXdP<\int _ \Lambda |X|\, \mathrm{d}\mathbb P<\infty.
  2. Linearity. Λ(aX+bY)dP=aΛXdP+bΛYdP\int _ \Lambda(aX+bY)\, \mathrm{d}\mathbb P=a\int _ \Lambda X\, \mathrm{d}\mathbb P+b\int _ \Lambda Y\, \mathrm{d}\mathbb P, provided that the right side is meaningful, namely not ++\infty-\infty or +-\infty+\infty.
  3. Additivity over sets. If the Λn\Lambda _ n's are disjoint, then ΛnXdP=nΛnXdP\int _ {\bigcup \Lambda _ n}X\, \mathrm{d}\mathbb P=\sum _ n\int _ {\Lambda _ n}X\, \mathrm{d}\mathbb P.
  4. Positivity. If X0X\geqslant 0 a.e. on Λ\Lambda, then ΛXdP0\int _ \Lambda X\, \mathrm{d}\mathbb P\geqslant0.
  5. Monotonicity. If X1XX2X _ 1\leqslant X\leqslant X _ 2 on Λ\Lambda a.e. on Λ\Lambda, then ΛX1dPΛXdPΛX2dP\int _ \Lambda X _ 1\, \mathrm{d}\mathbb P\leqslant\int _ \Lambda X\, \mathrm{d}\mathbb P\leqslant\int _ \Lambda X _ 2\, \mathrm{d}\mathbb P.
  6. Mean value theorem. If aXba\leqslant X\leqslant b a.e. on Λ\Lambda, then aP(Λ)ΛXdPbP(Λ)a\, \mathbb P(\Lambda)\leqslant\int _ \Lambda X\, \mathrm{d}\mathbb P\leqslant b\, \mathbb P(\Lambda).
  7. Modulus inequality. ΛXdPΛXdP\left|\int _ \Lambda X\, \mathrm{d}\mathbb P\right|\leqslant\int _ \Lambda |X|\, \mathrm{d}\mathbb P.
  8. Dominated convergence theorem. If XnXX _ n\to X a.e. (or merely in measure) on Λ\Lambda and XnY|X _ n|\leqslant Y a.e. on Λ\Lambda, with ΛYdP<\int _ \Lambda Y\, \mathrm{d}\mathbb P<\infty, then
    \lim _ {n\to\infty}\int _ \Lambda X _ n\, \mathrm{d}\mathbb P=\int _ \Lambda\lim _ {n\to\infty} X _ n\, \mathrm{d}\mathbb P=\int _ \Lambda X\, \mathrm{d}\mathbb P.

  9. Bounded convergence theorem. If XnXX _ n\to X a.e. (or merely in measure) on Λ\Lambda and XnY|X _ n|\leqslant Y a.e. on Λ\Lambda and there is a constant MM such that XnM|X _ n|\leqslant M a.e. on Λ\Lambda, then the formula in (8) is true.
  10. Monotone convergence theorem. If Xn0X _ n\geqslant0 and XnXX _ n\uparrow X a.e. on Λ\Lambda, then the formula in (8) is again true provided that ++\infty is allowed as a value for either member. The condition "Xn0X _ n\geqslant 0" may be weakened to : " E(Xn)>\mathbb E(X _ n) > -\infty for some nn".
  11. Integration term by term. If nΛXndP<\sum _ n\int _ \Lambda |X _ n|\, \mathrm d\mathbb P<\infty, then nXn<\sum _ n|X _ n|<\infty a.e. on Λ\Lambda so that nXn\sum _ n X _ n converges a.e. on Λ\Lambda and
    \int _ \Lambda\sum _ n X _ n\, \mathrm{d}\mathbb P=\sum _ n\int _ \Lambda X _ n\, \mathrm{d}\mathbb P.

  12. Fatou's lemma. If Xn0X _ n\geqslant0 a.e. on Λ\Lambda, then
    \int _ \Lambda\liminf _ {n\to\infty} X _ n\, \mathrm{d}\mathbb P\leqslant\liminf _ {n\to\infty}\int _ \Lambda X _ n\, \mathrm{d}\mathbb P.


Theorem 10. We have
\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n)\leqslant\mathbb E(|X|)\leqslant1+\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n)

so that E(X)<\mathbb E(|X|)<\infty iff the series above converges.


Corollary 11. If XX takes only positive integer values, then
\mathbb E(X)=\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n).

To verify the theorem, set Λn={nX<n+1}\Lambda _ n=\{n\leqslant X<n+1\}, then E(X)=n=0ΛnXdP\mathbb E(|X|)=\sum _ {n=0}^\infty\int _ {\Lambda _ n}|X|\, \mathrm d\mathbb P, and
\sum _ {n=0}^{\infty}n\, \mathbb P(\Lambda _ n)=\sum _ {n=0}^{\infty}\mathbb P(|X|\geqslant n).

When XX only takes positive integer values, this equation is just the above corollary.

.

There is a basic relation between the abstract integral with respect to P\mathbb P over sets in A\mathcal A on the one hand, and the Lebesgue-Stieltjes integral with respect to μ\mu over sets in B\mathcal B on the other, induced by each random variable. We give the version in one dimension first.

Theorem 12. Let XX on (Ω,A,P)(\Omega,\mathcal A,\mathbb P) induce the probability space (R,B,μ)(\mathbb R,\mathcal B, \mu) according to Theorem 3 and let ff be Borel measurable. Then we have
\int _ \Omega f(X(\omega))\, \mathbb P(\mathrm d\omega)=\int _ \mathbb R f(x)\, \mu(\mathrm dx)

provided that either side exists.

The key point of the proof is approximation using simple functions. If ff is a characteristic function of a Borel set, then the left side is P(XB)\mathbb P(X\in B) and the right side is μ(B)\mu(B). They are equal by the definition. Hence the proposition holds when ff is a simple function. Then we construct a positive increasing sequence of simple functions such that fnff _ n\uparrow f, and take limits on both sides to the proposition with fnf _ n. The general case follows in the usual way.

As a consequence of this theorem, we have: if μX\mu _ X and FXF _ X denote, respectively, the probability measure and distribution function induced by XX, then we have
\mathbb E(X)=\int _ \mathbb R x\, \mu _ X(\mathrm dx)=\int _ {-\infty}^{+\infty}x\, \mathrm dF _ X(x),

and more generally,
\mathbb E(f(X))=\int _ \mathbb R f(x)\, \mu _ X(\mathrm dx)=\int _ {-\infty}^{+\infty}f(x)\, \mathrm dF _ X(x),

with the usual proviso regarding existence and finiteness.

We shall need the generalization of the preceding theorem in several dimensions. No change is necessary except for notation, which we will give in two dimensions. Let us write the "mass element" as μ2(dx,dy)\mu^2(\mathrm dx,\mathrm dy) so that
\nu(A)=\iint _ A\mu^2(\mathrm dx,\mathrm dy).

Theorem 13. Let (X,Y)(X,Y) on (Ω,A,P)(\Omega,\mathcal A,\mathbb P) induce the probability space (R2,B2,μ2)(\mathbb R^2,\mathcal B^2,\mu^2) and let ff be a Borel measurable function of two variables. Then we have
\int _ \Omega f(X(\omega),Y(\omega))\, \mathbb P(\mathrm d\omega)=\iint _ {\mathbb R^2}f(x,y)\, \mu^2(\mathrm dx,\mathrm dy).

Note that f(X,Y)f(X,Y) is a random variable.

If we take f(x,y)=x+yf(x,y)=x+y, we obatain
\mathbb E(X+Y)=\mathbb E(X)+\mathbb E(Y).

This is a useful relation.

More generally, we have the following theorem (you can prove it on your own):

Theorem 14 (Image measure). Let (Ω,A)(\Omega,\mathcal{A}) and (Ω,A)(\Omega',\mathcal A') be measurable spaces, let μ\mu be a measure on (Ω,A)(\Omega,\mathcal{A}) and let X:ΩΩX:\Omega\to\Omega' be measurable. Let μ=μX1\mu'=\mu\circ X^{-1} be the image measure of μ\mu under the map XX. Assume that f:ΩRf:\Omega'\to\overline{\mathbb R} is μ\mu'-integrable. Then fXf\circ X is μ\mu-integrable and
\int_\Omega (f\circ X)\, \mathrm d\mu=\int_{\Omega'}f\, \mathrm d(\mu\circ X^{-1}).

Moments Let aRa\in\mathbb R, rR+r\in\mathbb R^+, then E(Xar)\mathbb E(|X-a|^r) is called the absolute moment of XX of order rr, about aa. It may be ++\infty; otherwise, and if rr is an integer, E(Xa)r\mathbb E(X-a)^r is the corresponding moment. If μ\mu and FF denote, respectively, the probability measure and distribution function induced by XX, then
\begin{align*} \mathbb E|X-a|^r & =\int _ \mathbb R |x-a|^r\, \mu(\mathrm dx)=\int _ {-\infty}^{+\infty}|x-a|^r\, \mathrm dF(x), \\ \mathbb E(X-a)^r & =\int _ \mathbb R (x-a)^r\, \mu(\mathrm dx)=\int _ {-\infty}^{+\infty}(x-a)^r\, \mathrm dF(x). \end{align*}

For r=1r=1, a=0a=0, this reduces to E(X)\mathbb E(X), which is also called the mean of XX. The moments about mean are called central moments. That of order 22 is particularly important and is called the variance, Var(X)\operatorname{Var} (X); its positive square root the standard deviation σ(X)\sigma(X):
\operatorname{Var}(X)=\sigma^2(X)=\mathbb E(X-\mathbb EX)^2=\mathbb E(X^2)-(\mathbb EX)^2.

For any positive number pp, XX is said to belong to Lp=Lp(Ω,A,P)L^p=L^p(\Omega,\mathcal A,\mathbb P) iff EXp<\mathbb E|X|^p<\infty.

Here are some well-known inequalities.

Theorem 15. Let XX and YY be random variables, 1<p<1<p<\infty and 1/p+1/q=11/p+1/q=1, then
  1. Hölder's inequality:
    |\mathbb E(XY)|\leqslant \mathbb E|XY|\leqslant \mathbb (\mathbb E|X|^p)^{1/p}(\mathbb E|Y|^q)^{1/q}.

  2. Minkowski inequality:
    (\mathbb E|X+Y|^p)^{1/p}\leqslant(\mathbb E|X|^p)^{1/p}+(\mathbb E|Y|^p)^{1/p}.

  3. Cauchy-Schwarz inequality:
    (\mathbb E|XY|)^2\leqslant \mathbb (E|X|^2)(\mathbb E|Y|^2).

  4. If Y1Y\equiv1 in 1.,
    \mathbb E|X|\leqslant (\mathbb E|X|^p)^{1/p}.

  5. Liapounov inequality:
    (\mathbb E|X|^r)^{1/r}\leqslant(\mathbb E|X|^{r^\prime})^{1/{r}^\prime},\quad0<r<r^\prime<\infty.

  6. Jensen's inequality: If φ\varphi is a convex function on R\mathbb R, and XX and φ(X)\varphi(X) are integrable random variables:
    \varphi(\mathbb E(X))\leqslant \mathbb E(\varphi(X)).

  7. Chebyshev inequality: If φ\varphi is a strictly positive and increasing function on (0,+)(0, +\infty), φ(u)=φ(u)\varphi(u) = \varphi(-u), and XX is a random variable such that E(φ(X))<\mathbb E(\varphi(X))<\infty, then for each u>0u>0,
    \mathbb P(|X|\geqslant u)\leqslant\frac{\mathbb E(\varphi(X))}{\varphi(u)}.


评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注