You can read the LaTeX document online (for the latest updated chapters) from the link: probability.pdf
Chapter 3: Random variables 1. Definitions Basically speaking, a random variable is just a measurable function.
It is necessary to give a general definition, as the way in real analysis, for logical reasons in many applications.
{ω∣X(ω)∈B}∈Δ∩A,
where is the trace of on . A complex-valued random variable is a function on a set in to the complex plane whose real and imaginary parts are both real, finite-valued random variables.
For a discussion of basic properties we may suppose and that is real and finite-valued with probability one. The general case may be reduced to this one by considering the trace of on , or on the "domain of finiteness" , and taking real and imaginary parts.
Recall the theory in measurable functions, we can characterise random variable with the following theorem.
The probability of the set in Definition def31 is clearly defined and will be written as
P(X(ω)∈B),or P(X∈B).
The next theorem relates the probability measure to a probability measure on as discussed in Chapter 2.
μ(B)=P(X−1(B))=P(X∈B),∀B∈B.
The collections of sets is a -algebra for any function . If is a random variable then the collection is called the -algebra generated by . It is The smallest -algebra contained in which contains all sets of the form , where . Thus Theorem 3 gives us a convenient way of representing the measure when it is restricted to this subfield; symbolically we may write it as follows:
μ=P∘X−1.
This is called the "probability distribution measure" or probability measure of , and its associated distribution function according to Section 2.2 will be called the distribution function of . Specifically, is given by
F(x)=μ((−∞,x])=P(X⩽
While the random variable determines and therefore , the converse is obviously false. A family of random variables having the same distribution is said to be "identically distributed".
For the notion of random vector, it is just a vector each of whose components is a random variable. It is sufficient to consider the case of two dimensions, since there is no essential difference in higher dimensions apart from in notation.
The class of Borel sets in is denoted as , and the class of sets, each of which is a finite union of disjoint product sets, forms an algebra denoted .
Now let and be two random variables on . The random vector induces a probability on as follows:
\nu(A)=\mathbb P((X,Y)\in A), \quad\forall A\in\mathcal B^2,
the right side being an abbreviation of . This is called the (2-dimensional, probability) distribution or simply the probability measure of .
Proof , we just need to show . If has the simple form of two Borel sets, say where , then it is clear . Hence for the class of sets for which , the algebra is contained in it. It can be proved that this class forms a -algebra, thus it must contain (which is the smallest -algebra containing ). We now have the desired conclusion.
If is a sequence of random variables, then , , , are random variables, not necessarily finite-valued with probability one though everywhere defined, and is an random variable on the set on which there is either convergence or divergence to .
The analogue in real analysis should be well known to the reader, so we omit the proof.
It is easy to see that is discrete iff its distribution function is.
Let be a countable partition of , be arbitrary real numbers, then the function defined by , is a discrete random variable. We shall call the random variable belonging to the weighted partition . Each discrete random variable belongs to a certain partition. For let be the countable set in the definition of and let , then belongs to the weighted partition . If ranges over a finite index set, the partition is called finite and the r.v. belonging to it simple.
Transformation in
Here It is stated without proof the transformation formula for measures with continuous densities under differentiable maps. The proof can be found in textbooks on calculus. With this formula we can obtain the density of some new transformed random variables.
f _ {\varphi}(\boldsymbol x)={f(\varphi^{-1}(\boldsymbol x))}\cdot|\det(\varphi'[\varphi^{-1}(\boldsymbol x)])|^{-1}, \quad \boldsymbol x\in B,\, \det(\varphi'[\varphi^{-1}(\boldsymbol x)])\neq0.
For elsewhere, is assigned .
Now we let be the probability measure of , i.e., . We have , and . Let , then . The range of is . When has density , from the theorem the density of is
f _ {\boldsymbol Y}(\boldsymbol y)={f(\boldsymbol x)}\cdot|\det(\varphi'(\boldsymbol x))|^{-1}={f(\varphi^{-1}(\boldsymbol y))}\cdot|\det(\varphi'[\varphi^{-1}(\boldsymbol y)])|^{-1}.
2. Expectation The concept of "(mathematical) expectation" is the same as that of integration in the probability space with respect to the measure . The reader is supposed to have some acquaintance with this. The general theory is not much different. The random variables below will be tacitly assumed to be finite everywhere to avoid trivial complications.
We first define the expectation of an arbitrary positive random variable . For any two positive integers , the set belongs to . For each , let denote the random variable belonging to the weighted partition . It is easy to see we have got a increasing sequence of random variables, and there is monotone convergence: . The expectation of is defined as the limit as
\sum _ {n=0}^{\infty}\frac{n}{2^m}\mathbb{P}\left(\frac{n}{2^m}\leqslant X<\frac{n+1}{2^m}\right),
the limit existing, finite or infinite. For an arbitrary , put as usual , both and are positive random variables, so their expectations are defined. Unless both and are , we define with the usual convention regrading . The expectation, when it exists, is also denoted by
\int _ \Omega X(\omega)\, \mathbb{P}(\mathrm{d}\omega).
For each , we define
\int _ \Lambda X(\omega)\, \mathbb{P}(\mathrm{d}\omega)=\mathbb{E}(X\cdot\chi _ {\Lambda})
and call it "the integral of (with respect to ) over the set ". As a general notation, the left member above will be abbreviated to
\int _ \Lambda X\, \mathrm{d}\mathbb P.
We shall say that is integrable with respect to over iff the integral above exists and is finite.
The general integral has the familiar properties of the Lebesgue integral.
- Absolute integrability. is finite iff .
- Linearity. , provided that the right side is meaningful, namely not or .
- Additivity over sets. If the 's are disjoint, then .
- Positivity. If a.e. on , then .
- Monotonicity. If on a.e. on , then .
- Mean value theorem. If a.e. on , then .
- Modulus inequality. .
- Dominated convergence theorem. If a.e. (or merely in measure) on and a.e. on , with , then
\lim _ {n\to\infty}\int _ \Lambda X _ n\, \mathrm{d}\mathbb P=\int _ \Lambda\lim _ {n\to\infty} X _ n\, \mathrm{d}\mathbb P=\int _ \Lambda X\, \mathrm{d}\mathbb P. - Bounded convergence theorem. If a.e. (or merely in measure) on and a.e. on and there is a constant such that a.e. on , then the formula in (8) is true.
- Monotone convergence theorem. If and a.e. on , then the formula in (8) is again true provided that is allowed as a value for either member. The condition "" may be weakened to : " for some ".
- Integration term by term. If , then a.e. on so that converges a.e. on and
\int _ \Lambda\sum _ n X _ n\, \mathrm{d}\mathbb P=\sum _ n\int _ \Lambda X _ n\, \mathrm{d}\mathbb P. - Fatou's lemma. If a.e. on , then
\int _ \Lambda\liminf _ {n\to\infty} X _ n\, \mathrm{d}\mathbb P\leqslant\liminf _ {n\to\infty}\int _ \Lambda X _ n\, \mathrm{d}\mathbb P.
\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n)\leqslant\mathbb E(|X|)\leqslant1+\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n)
so that iff the series above converges.
\mathbb E(X)=\sum _ {n=1}^{\infty}\mathbb P(|X|\geqslant n).
To verify the theorem, set , then , and
\sum _ {n=0}^{\infty}n\, \mathbb P(\Lambda _ n)=\sum _ {n=0}^{\infty}\mathbb P(|X|\geqslant n).
When only takes positive integer values, this equation is just the above corollary.
.
There is a basic relation between the abstract integral with respect to over sets in on the one hand, and the Lebesgue-Stieltjes integral with respect to over sets in on the other, induced by each random variable. We give the version in one dimension first.
\int _ \Omega f(X(\omega))\, \mathbb P(\mathrm d\omega)=\int _ \mathbb R f(x)\, \mu(\mathrm dx)
provided that either side exists.
The key point of the proof is approximation using simple functions. If is a characteristic function of a Borel set, then the left side is and the right side is . They are equal by the definition. Hence the proposition holds when is a simple function. Then we construct a positive increasing sequence of simple functions such that , and take limits on both sides to the proposition with . The general case follows in the usual way.
As a consequence of this theorem, we have: if and denote, respectively, the probability measure and distribution function induced by , then we have
\mathbb E(X)=\int _ \mathbb R x\, \mu _ X(\mathrm dx)=\int _ {-\infty}^{+\infty}x\, \mathrm dF _ X(x),
and more generally,
\mathbb E(f(X))=\int _ \mathbb R f(x)\, \mu _ X(\mathrm dx)=\int _ {-\infty}^{+\infty}f(x)\, \mathrm dF _ X(x),
with the usual proviso regarding existence and finiteness.
We shall need the generalization of the preceding theorem in several dimensions. No change is necessary except for notation, which we will give in two dimensions. Let us write the "mass element" as so that
\nu(A)=\iint _ A\mu^2(\mathrm dx,\mathrm dy).
\int _ \Omega f(X(\omega),Y(\omega))\, \mathbb P(\mathrm d\omega)=\iint _ {\mathbb R^2}f(x,y)\, \mu^2(\mathrm dx,\mathrm dy).
Note that is a random variable.
If we take , we obatain
\mathbb E(X+Y)=\mathbb E(X)+\mathbb E(Y).
This is a useful relation.
More generally, we have the following theorem (you can prove it on your own):
\int_\Omega (f\circ X)\, \mathrm d\mu=\int_{\Omega'}f\, \mathrm d(\mu\circ X^{-1}).
Moments Let , , then is called the absolute moment of of order , about . It may be ; otherwise, and if is an integer, is the corresponding moment. If and denote, respectively, the probability measure and distribution function induced by , then
\begin{align*}
\mathbb E|X-a|^r & =\int _ \mathbb R |x-a|^r\, \mu(\mathrm dx)=\int _ {-\infty}^{+\infty}|x-a|^r\, \mathrm dF(x), \\
\mathbb E(X-a)^r & =\int _ \mathbb R (x-a)^r\, \mu(\mathrm dx)=\int _ {-\infty}^{+\infty}(x-a)^r\, \mathrm dF(x).
\end{align*}
For , , this reduces to , which is also called the mean of . The moments about mean are called central moments. That of order is particularly important and is called the variance, ; its positive square root the standard deviation :
\operatorname{Var}(X)=\sigma^2(X)=\mathbb E(X-\mathbb EX)^2=\mathbb E(X^2)-(\mathbb EX)^2.
For any positive number , is said to belong to iff .
Here are some well-known inequalities.
- Hölder's inequality:
|\mathbb E(XY)|\leqslant \mathbb E|XY|\leqslant \mathbb (\mathbb E|X|^p)^{1/p}(\mathbb E|Y|^q)^{1/q}. - Minkowski inequality:
(\mathbb E|X+Y|^p)^{1/p}\leqslant(\mathbb E|X|^p)^{1/p}+(\mathbb E|Y|^p)^{1/p}. - Cauchy-Schwarz inequality:
(\mathbb E|XY|)^2\leqslant \mathbb (E|X|^2)(\mathbb E|Y|^2). - If in 1.,
\mathbb E|X|\leqslant (\mathbb E|X|^p)^{1/p}. - Liapounov inequality:
(\mathbb E|X|^r)^{1/r}\leqslant(\mathbb E|X|^{r^\prime})^{1/{r}^\prime},\quad0<r<r^\prime<\infty. - Jensen's inequality: If is a convex function on , and and are integrable random variables:
\varphi(\mathbb E(X))\leqslant \mathbb E(\varphi(X)). - Chebyshev inequality: If is a strictly positive and increasing function on , , and is a random variable such that , then for each ,
\mathbb P(|X|\geqslant u)\leqslant\frac{\mathbb E(\varphi(X))}{\varphi(u)}.
发表回复