Functions of several variables

我的原文地址

Chapter 9: Functions of several variables

Contents
Contents
 1.  Linear transformations
 2.  Differentiation
 3.  The contraction principle
 4.  The inverse function theorem
 5.  The implicit function theorem
 6.  The rank theorem
 7.  Determinants
 8.  Derivatives of higher order
 9.  Differentiation of integrals

1. Linear transformations We begin this chapter with a discussion of sets of vectors in Euclidean n-space \mathbb R^n. The algebraic facts presented here extend without change to finite-dimensional vector spaces over any field of scalars. However, for our purposes it is quite sufficient to stay within the familiar framework provided by the Euclidean spaces.

This section can be a review of some linear algebra.

Definition 1. a
  • A nonempty set X\subset \mathbb{R}^n is a \textbf{vector space} if \mathbf{x+y}\in X and c\mathbf{x}\in X for all x\in X, y\in X, and for all scalars c.
  • If \mathbf{x} _ 1,\dots,\mathbf{x} _ k\in\mathbb{R}^n and c _ 1,\dots,c _ k are scalars, the vector c _ 1\mathbf{x} _ 1+\dots+c _ k\mathbf{x} _ k is called a \textbf{linear combination} of \mathbf{x} _ 1,\dots,\mathbf{x} _ k. If S\subset \mathbb{R}^n and if E is the set of all linear combinations of elements of S, we say that S spans E, or that E is the span of S. (Observe that every span is a vector space.)
  • A set consisting of vectors \mathbf{x} _ 1,\dots,\mathbf{x} _ k (we shall use the notation \{\mathbf{x} _ 1,\dots,\mathbf{x} _ k\} for such a set) is said to be \textbf{independent} if the relation c _ 1\mathbf{x} _ 1+\dots+c _ k\mathbf{x} _ k = 0 implies that c _ 1 = \dots = c _ k = 0. Otherwise \{\mathbf{x} _ 1,\dots,\mathbf{x} _ k\} is said to be \textbf{dependent}.
  • If a vector space X contains an independent set of r vectors but contains no independent set of r + 1 vectors, we say that X has dimension r, and write \dim X = r. (The set consisting of \mathbf{0} alone is a vector space; its dimension is 0.)
  • An independent subset of a vector space X which spans X is called a basis of X.

Observe that if B = \{\mathbf{x} _ 1,\dots,\mathbf{x} _ r\} is a basis of X, then every x \in X has a unique representation of the form \mathbf{x}=\sum c _ j\mathbf{x} _ j. Such a representation exists since B spans X, and it is unique since B is independent. The numbers c _ 1, \dots , c _ r, are called the coordinates of x with respect to the basis B.

The most familiar example of a basis is the set \{\mathbf{e} _ 1, \dots , \mathbf{e} _ n\}, where \mathbf{e} _ i is the vector in \mathbb{R}^n whose jth coordinate is 1 and whose other coordinates are all 0. If x \in \mathbb{R}^n, \mathbf{x} = (x _ 1, \dots , x _ n), then x = \sum x _ j\mathbf{e} _ j. We shall call \{\mathbf{e} _ 1, \dots , \mathbf{e} _ n\} the standard basis of \mathbb{R}^n.

Theorem 2. Let r be a positive integer. If a vector space X is spanned by a set of r vectors, then \dim X\leqslant r.

Corollary 3. \dim \mathbb{R}^n=n.

Theorem 4. Suppose X is a vector space, and \dim X= n.
  1. A set E of n vectors in X spans X if and only if E is independent.
  2. X has a basis, and every basis consists of n vectors.
  3. If 1\leqslant r\leqslant n and \{\mathbf{y} _ 1,\dots,\mathbf{y} _ r\} is an independent set in X, then X has a basis containing \{\mathbf{y} _ 1,\dots,\mathbf{y} _ r\}.

Definition 5. A mapping A of a vector space X into a vector space Y is said to be a \textbf{linear transformation} if A(\mathbf{x} _ 1+\mathbf{x} _ 2)=A\mathbf{x} _ 1+A\mathbf{x} _ 2, A(c\mathbf{x})=cA\mathbf{x} for all \mathbf{x},\mathbf{x} _ 1,\mathbf{x} _ 2\in X and all scalars c. Note that one often writes A\mathbf{x} instead of A(\mathbf{x}) if A is linear.

Observe that A\mathbf{0} = \mathbf{0} if A is linear. Observe also that a linear transformation A of X into Y is completely determined by its action on any basis.

Linear transformations of X into X are often called linear operators on X. If A is a linear operator on X which (i) is one-to-one and (ii) maps X onto X, we say that A is invertible. In this case we can define an operator A^{-1} on X by requiring that A^{-1}(A\mathbf{x}) = \mathbf{x} for all \mathbf{x} \in X. It is trivial to verify that we then also have A(A^{-1}\mathbf{x}) = \mathbf{x}, for all \mathbf{x} \in X, and that A^{-1} is linear.

An important fact about linear operators on finite-dimensional vector spaces is that each of the above conditions (i) and (ii) implies the other:

Theorem 6. A linear operator A on a finite-dimensional vector space X is one-to-one if and only if the range of A is all of X.

Definition 7. a
  • Let L(X, Y) be the set of all linear transformations of the vector space X into the vector space Y. Instead of L(X, X), we shall simply write L(X). If A _ 1, A _ 2 \in L(X, Y) and if c _ 1, c _ 2 are scalars, define c _ 1A _ 1 + c _ 2 A _ 2 by (c _ 1A _ 1 + c _ 2 A _ 2)\mathbf{x}=c _ 1A _ 1\mathbf{x} + c _ 2 A _ 2\mathbf{x}\, (x\in\mathbf{x}). It is then clear that c _ 1A _ 1 + c _ 2 A _ 2\in L(X, Y).
  • If X, Y, Z are vector spaces, and if A \in L(X, Y) and B\in L(Y, Z), we define their product BA to be the composition of A and B: (BA)\mathbf{x}=B(A\mathbf{x})\, (x\in\mathbf{x}). Then BA\in L(X,Z). (Note that BA need not be the same as AB, even if X= Y = Z.)
  • For A\in L(\mathbb{R}^n,\mathbb{R}^m), define the \textbf{norm} |A| of A to be the sup of all numbers A\mathbf{x}, where \mathbf{x} ranges over all vectors in \mathbb{R}^n with |\mathbf{x}|\leqslant1.

Observe that the inequality |A\mathbf{x}|\leqslant|A||\mathbf{x}| holds for all x\in\mathbb{R}^n. Also, if \lambda is such that |A\mathbf{x}|\leqslant\lambda|\mathbf{x}| for all x\in\mathbb{R}^n, then |A|\leqslant\lambda.

Theorem 8. a
  1. If A\in L(\mathbb{R}^n,\mathbb{R}^m), then |A|<\infty and A is a uniformly continuous mapping of \mathbb{R}^n into \mathbb{R}^m.
  2. If A,B\in L(\mathbb{R}^n,\mathbb{R}^m) and c is a scalar, then |A+B|\leqslant|A|+|B|, |cA|=|c||A|. With the distance between A and B defined as |A- B|, L(\mathbb{R}^n,\mathbb{R}^m) is a metric space.
  3. If A\in L(\mathbb{R}^n,\mathbb{R}^m) and B\in L(\mathbb{R}^m,\mathbb{R}^k), then |BA|\leqslant|B||A|.

Since we now have metrics in the spaces L(\mathbb{R}^n,\mathbb{R}^m), the concepts of open set, continuity, etc., make sense for these spaces. Our next theorem utilizes these concepts.

Theorem 9. Let \Omega be the set of all invertible linear operators on \mathbb{R}^n.
  1. If A\in\Omega, B\in L(\mathbb{R}^n), and |B-A||A^{-1}|<1, then B\in\Omega.
  2. \Omega is an open subset of L(\mathbb{R}^n), and the mapping A\to A^{-1} is continuous on \Omega. (This mapping is also obviously a 1 - 1 mapping of n onto n, which is its own inverse)

Proof. Theorem theorem914. (1) Suppose \mathbf{x}=\sum c _ i\mathbf{\mathbf{e}} _ i, |\mathbf{x}|\leqslant1, so that |c _ i|\leqslant1 for i=1,\dots,n. Then
\[|A\mathbf{x}|=\Big|\sum _ {i=1}^{n} c _ iA\mathbf{e} _ i\Big|\leqslant\sum _ {i=1}^{n}|c _ i||A\mathbf{e} _ i| \leqslant\sum _ {i=1}^{n}|A\mathbf{e} _ i| \Longrightarrow|A|\leqslant\sum _ {i=1}^{n} |A\mathbf{e} _ i|<\infty.\]

Since |A\mathbf{x}-A\mathbf{y}|\leqslant|A||\mathbf{x}-\mathbf{y}| if \mathbf{x},\mathbf{y}\in \mathbb{R}^n, we see that A is uniformly continuous.

(2) The first inequality follows from
\[|(A+B)\mathbf{x}|=|A\mathbf{x}+B\mathbf{x}|\leqslant|A\mathbf{x}|+|B\mathbf{x}|\leqslant(|A|+|B|)|\mathbf{x}|.\]
The second inequality is proved in the same manner.

If A,B,C\in L(\mathbb{R}^n,\mathbb{R}^m), we have the inequality |A-C|=|A-B+B-C|\leqslant |A-B|+|B-C|, and it is easily verified that | A - B| has the other properties of a metric.

(3) This follows from
\[|(BA)\mathbf{x}|=|B(A\mathbf{x})|\leqslant|B||A\mathbf{x}|\leqslant|B||A||\mathbf{x}|.\]

Theorem theorem915. (1) Put |A^{-1}|=1/\alpha, put |B-A|=\beta. Then \beta<\alpha. For every \mathbf{x}\in \mathbb{R}^n,
\begin{gather*}
\alpha|\mathbf{x}|=\alpha|A^{-1}A\mathbf{x}|\leqslant\alpha|A^{-1}|\cdot|A\mathbf{x}| =|A\mathbf{x}|\leqslant|(A-B)\mathbf{x}|+|B\mathbf{x}|\leqslant\beta|\mathbf{x}|+|B\mathbf{x}|, \\
\Longrightarrow\quad (\alpha-\beta)|\mathbf{x}|\leqslant|B\mathbf{x}|\quad (\mathbf{x}\in \mathbb{R}^n).
\end{gather*}

Since (\alpha-\beta>0, this shows that B\mathbf{x}\neq\mathbf{0} if \mathbf{x}\neq\mathbf{0}. Hence B is 1-1. By Theorem theorem913, B\in\Omega. This holds for all B with |B-A|<\alpha. Thus we have (1) and the fact that \Omega is open.

(2) Next, replace \mathbf{x} by B^{-1}\mathbf{y} in the above formula. The resulting inequality (\alpha-\beta)|B^{-1}\mathbf{y}|\leqslant|BB^{-1}\mathbf{y}|=|\mathbf{y}|\, (\mathbf{y}\in \mathbb{R}^n) shows that |B^{-1}|\leqslant(\alpha-\beta)^{-1}. The identity B^{-1}-A^{-1}=B^{-1}(A-B)A^{-1}, combined with Theorem theorem914(3), implies therefore that
\[|B^{-1}-A^{-1}|\leqslant|B^{-1}||A-B||A^{-1}|\leqslant\frac{\beta}{\alpha(\alpha-\beta)}.\]
This establishes the continuity assertion made in (2), since \beta\to 0 as B\to A.

{Matrices}
Suppose \{\mathbf{x} _ 1, \dots , \mathbf{x} _ n\} and \{\mathbf{y} _ 1, \dots , \mathbf{y} _ m\} are bases of vector spaces
X and Y, respectively. Then every A \in L(X, Y) determines a set of numbers a _ {ij} such that A\mathbf{x} _ {j}=\sum _ {i=1}^{m}a _ {ij}\mathbf{y} _ i\, (1\leqslant j\leqslant n). It is convenient to visualize these numbers in a rectangular array of m rows and n columns, called an m by n matrix:
\[[A]=
\begin{bmatrix}
a _ {11} & a _ {12} & \cdots & a _ {1n} \\
a _ {21} & a _ {22} & \cdots & a _ {2n} \\
\vdots & \vdots & \ddots & \vdots \\
a _ {m1} & a _ {m2} & \cdots & a _ {mn}
\end{bmatrix}.
\]
Observe that the coordinates a _ {ij} of the vector A\mathbf{x} _ j (with respect to the basis \{\mathbf{y} _ 1, \dots , \mathbf{y} _ m\}) appear in the jth column of [A]. The vectors A\mathbf{x} _ j are therefore sometimes called the column vectors of [A]. With this terminology, the range of A is spanned by the column vectors of [A].

If \mathbf{x}=\sum c _ j\mathbf{x} _ j, the linearity of A shows that A\mathbf{x}=\sum _ {i=1}^m(\sum _ {j=1}^na _ {ij}c _ j)\mathbf{y} _ i. Thus the coordinates of A\mathbf{x} are \sum _ ja _ {ij}c _ j.

Suppose next that an m by n matrix is given, with real entries a _ {ij}. If A is then defined by A\mathbf{x}=\sum _ i(\sum _ j a _ {ij}c _ j)\mathbf{y} _ i, it is clear that A \in L(X, Y) and that [A] is the given matrix. Thus there is a natural 1-1 correspondence between L(X, Y) and the set of all real m by n matrices. We emphasize, though, that [A] depends not only on A but also on the choice of bases in X and Y. The same A may give rise to many different matrices if we change bases, and vice versa. We shall not pursue this observation any further, since we shall usually work with fixed bases.

If Z is a third vector space, with basis \{\mathbf{z} _ 1, \dots , \mathbf{z} _ p\}, if A is given by A\mathbf{x} _ {j}=\sum _ {i=1}^{m}a _ {ij}\mathbf{y} _ i, and if B\mathbf{y} _ i=\sum _ k b _ {ki}\mathbf{z} _ k, (BA)\mathbf{x} _ j=\sum _ k c _ {kj}\mathbf{z} _ k, then A\in L(X,Y), and since B(A\mathbf{x} _ j)=B\sum _ i a _ {ij}\mathbf{y} _ i=\sum _ i a _ {ij}B\mathbf{y} _ i=\sum a _ {ij}\sum _ k b _ {ki}\mathbf{z} _ k =\sum _ k\left(\sum _ i b _ {ki}a _ {ij}\right)\mathbf{z} _ k, the independence of \{\mathbf{z} _ 1, \dots , \mathbf{z} _ p\} implies that c _ {kj}=\sum _ i b _ {ki}a _ {ij}\, (1\leqslant k\leqslant p,\, 1\leqslant j\leqslant n). This shows how to compute the p by n matrix [BA] from [B] and [A]. If we define the product [B][A] to be [BA], then this describes the usual rule of matrix multiplication.

Finally, suppose \{\mathbf{x} _ 1,\dots , \mathbf{x} _ n\} and \{\mathbf{y} _ 1,\dots , \mathbf{y} _ m\} are standard bases of \mathbb{R}^n and \mathbb{R}^m, and A is given by A\mathbf{x}=\sum _ {i=1}^m(\sum _ {j=1}^na _ {ij}c _ j)\mathbf{y} _ i. The Schwarz inequality shows that
\begin{gather*}
|A\mathbf{x}|^2=\sum _ i\Big(\sum _ j a _ {ij}c _ j\Big)^2\leqslant\sum _ i\Big(\sum _ j a _ {ij}^2\sum _ jc _ j^2\Big) =\sum _ {i,j}a _ {ij}^2|\mathbf{x}|^2 \\
\Longrightarrow \quad |A|\leqslant\Big\{\sum _ {i,j}a _ {ij}^2\Big\}^{1/2}.
\end{gather*}

If we apply this to B-A in place of A, where A, B\in L(\mathbb{R}^n, \mathbb{R}^m), we see that if the matrix elements a _ {ij} are continuous functions of a parameter, then the same is true of A. More precisely:

If S is a metric space, if a _ {11} , \dots , a _ {mn} are real continuous functions on S, and if, for each p\in S, A _ p is the linear transformation of \mathbb{R}^n into \mathbb{R}^m whose matrix has entries a _ {ij}(p), then the mapping p \to A _ p is a continuous mapping of S into L(\mathbb{R}^n,\mathbb{R}^m).
2. Differentiation
Definition 10. Suppose E is an open set in \mathbb{R}^n, f maps E into \mathbb{R}^m, and \mathbf{x} \in E. If there exists a linear transformation A of \mathbb{R}^n into \mathbb{R}^m such that
\begin{equation}\label{91}
\lim _ {\mathbf{h}\to\mathbf{0}}\frac{|\mathbf{f}(\mathbf{x}+\mathbf{h})-\mathbf{f}(\mathbf{x})-A\mathbf{h}|}{|\mathbf{h}|}=0,
\end{equation}
then we say that \mathbf{f} is differentiable at \mathbf{x}, and we write
\[\mathbf{f}^\prime (\mathbf{x})=A.\]
If \mathbf{f} is differentiable at every \mathbf{x} \in E, we say that \mathbf{f} is differentiable in E.

It is of course understood in (91) that \mathbf{h}\in \mathbb{R}^n. If |\mathbf{h}| is small enough, then \mathbf{x}+\mathbf{h}\in E, since E is open. Thus \mathbf{f}(\mathbf{x}+\mathbf{h}) is defined, \mathbf{f}(\mathbf{x}+\mathbf{h})\in \mathbb{R}^m, and since A\in L(\mathbb{R}^n,\mathbb{R}^m), A\mathbf{h}\in \mathbb{R}^m. Thus \mathbf{f}(\mathbf{x}+\mathbf{h})-\mathbf{f}(\mathbf{x})-A\mathbf{h}\in \mathbb{R}^m. The norm in the numerator of (91) is that of \mathbb{R}^m. In the denominator we have the \mathbb{R}^n-norm of \mathbf{h}.

There is an obvious uniqueness problem which has to be settled before we go any further.

Theorem 11. Suppose E and \mathbf{f} are as in Definition def94, \mathbf{x}\in E, and (91) holds with A=A _ 1 and A=A _ 2. Then A _ 1=A _ 2.

If B=A _ 1-A _ 2, the inequality |B\mathbf{h}|\leqslant|\mathbf{f}(\mathbf{x}+\mathbf{h})-\mathbf{f}(\mathbf{x})-A _ 1\mathbf{h}|+ |\mathbf{f}(\mathbf{x}+\mathbf{h})-\mathbf{f}(\mathbf{x})-A _ 2\mathbf{h}| shows that |B\mathbf{h}|/|\mathbf{h}|\to0 as \mathbf{h}\to\mathbf{0}. For fixed \mathbf{h}\neq\mathbf{0}, it follows that |B(t\mathbf{h})|/|t\mathbf{h}|\to0 as t\to0. The linearity of B shows that |B(t\mathbf{h})|/|t\mathbf{h}| is independent of t. Thus B\mathbf{h}=0 for every \mathbf{h}\in \mathbb{R}^m. Hence B=0.

The relation (91) can be rewritten in the form
\begin{equation}\label{92}
\mathbf{f(x+h)-f(x)=f^\prime (x)h+r(h)}
\end{equation}
where the remainder \mathbf{r(h)} satisfies \displaystyle\mathbf{\lim _ {h\to0}|r(h)|/|h|}=0. We may interpret (92), by saying that for fixed \mathbf{x} and small \mathbf{h}, the left side of (92) is approximately equal to \mathbf{f^\prime (x)h}, that is, to the value of a linear transformation applied to \mathbf{h}.

Suppose \mathbf{f} and E are as in Definition def94, and \mathbf{f} is differentiable in E. For every \mathbf{x}\in E, \mathbf{f^\prime (x)} is then a function, namely, a linear transformation of \mathbb{R}^n into \mathbb{R}^m. But \mathbf{f}^\prime is also a function: \mathbf{f}^\prime maps E into L(\mathbb{R}^n, \mathbb{R}^m).

A glance at (92) shows that \mathbf{f} is continuous at any point at which \mathbf{f} is differentiable.

The derivative defined by (91) or (92) is often called the differential of \mathbf{f} at \mathbf{x}, or the total derivative of \mathbf{f} at \mathbf{x}, to distinguish it from the partial derivatives that will occur later.

We have defined derivatives of functions carrying \mathbb{R}^n to \mathbb{R}^m to be linear transformations of \mathbb{R}^n into \mathbb{R}^m. What is the derivative of such a linear transformation? The answer is very simple.

If A\in L(\mathbb{R}^n,\mathbb{R}^m) and if \mathbf{x}\in \mathbb{R}^n, then A^\prime (\mathbf{x})=A.

Note that \mathbf{x} appears on the left side of the formula, but not on the right. Both sides are members of L(\mathbb{R}^n, \mathbb{R}^m), whereas A\mathbf{x}\in \mathbb{R}^m. (Since A(\mathbf{x+h})-A\mathbf{x}=A\mathbf{h}, by the linearity of A. With \mathbf{f(x)}=A\mathbf{x}, the numerator in (91) is thus 0 for every \mathbf{h}\in \mathbb{R}^n. In (92), \mathbf{r(h)=0}.)

Theorem 12. Suppose E is an open set in \mathbb{R}^n, f maps E into \mathbb{R}^m, \mathbf{f} is differentiable at \mathbf{x} _ 0 \in E, \mathbf{g} maps an open set containing \mathbf{f}(E) into \mathbb{R}^k, and \mathbf{g} is differentiable at \mathbf{f}(\mathbf{x} _ 0). Then the mapping \mathbf{F} of E into \mathbb{R}^k defined by \mathbf{F(x)=g(f(x))} is differentiable at \mathbf{x} _ 0, and
\begin{equation}\label{93}
\mathbf{F}^\prime (\mathbf{x} _ 0)=\mathbf{g}^\prime (\mathbf{f}(\mathbf{x} _ 0))\mathbf{f}^\prime (\mathbf{x} _ 0).
\end{equation}

On the right side of (93), we have the product of two linear transformations, as defined in Def. def93.

If \mathbf{f} is known to be differentiable at a point \mathbf{x}, then its partial derivatives exist at \mathbf{x}, and they determine the linear transformation \mathbf{f^\prime (x)} completely:

Theorem 13. Suppose \mathbf{f} maps an open set E\subset \mathbb{R}^n into \mathbb{R}^m, and \mathbf{f} is differentiable at a point \mathbf{x}\in E. Then the partial derivatives (D _ jf _ i)(\mathbf{x}) exist, and
\begin{equation}\label{94}
\mathbf{f^\prime (x)}\mathbf{e} _ j=\sum _ {i=1}^{m}(D _ jf _ i)(\mathbf{x})\mathbf{u} _ i\quad (1\leqslant j\leqslant n).
\end{equation}
Here, \{\mathbf{e} _ 1 , \dots , \mathbf{e} _ n\} and \{\mathbf{u} _ 1, \dots , \mathbf{u} _ m\} are the standard bases of \mathbb{R}^n and \mathbb{R}^m.

Here are some consequences of Theorem theorem923. Let [\mathbf{f^\prime (x)}] be the matrix that represents \mathbf{f^\prime (x)} with respect to our standard bases.

Then \mathbf{f^\prime (x)}\mathbf{e} _ j is the jth column vector of [\mathbf{f^\prime (x)}], and (94) shows therefore that the number (D _ jf _ i)(\mathbf{x}) occupies the spot in the ith row and jth column of \mathbf{[f^\prime (x)]}. Thus
\[[\mathbf{f^\prime (x)}]=
\begin{bmatrix}
(D _ 1f _ 1)(\mathbf{x}) & \cdots & (D _ nf _ 1)(\mathbf{x}) \\
\vdots & \ddots & \vdots \\
(D _ 1f _ m)(\mathbf{x}) & \cdots & (D _ nf _ m)(\mathbf{x})
\end{bmatrix}.
\]
If \mathbf{h}=\sum h _ j\mathbf{e} _ j is any vector in \mathbb{R}^n, then (94) implies that
\[\mathbf{f}^\prime (\mathbf{x}) \mathbf{h}=\sum _ {l=1}^{m}\Big\{\sum _ {j=1}^{n}(D _ {j} f _ {i})(\mathbf{x}) h _ {j}\Big\} \mathbf{u} _ {i}.\]

{Example}Let \gamma be a differentiable mapping of the segment (a, b) \subset \mathbb{R}^1 into an open set E \subset\mathbb{R}^n, in other words, \gamma is a differentiable curve in E. Let f be a real-valued differentiable function with domain E. Thus f is a differentiable mapping of E into \mathbb{R}^1. Define g(t)=f(\gamma(t))\, (a<t<b). The chain rule asserts then that g^\prime (t)=f^\prime (\gamma(t))\gamma^\prime (t)\, (a<t<b). Since \gamma^\prime (t)\in L(\mathbb{R}^1, \mathbb{R}^n) and f^\prime (\gamma(t)) \in L(\mathbb{R}^n, \mathbb{R}^1), this defines g^\prime (t) as a linear operator on \mathbb{R}^1. This agrees with the fact that g maps (a, b) into \mathbb{R}^1. However, g^\prime (t) can also be regarded as a real number. This number can be computed in terms of the partial derivatives of f and the derivatives of the components of \gamma, as we shall now see.

With respect to the standard basis \{\mathbf{e} _ 1 , \dots , \mathbf{e} _ n\} of \mathbb{R}^n, [\gamma^\prime (t)] is the n by 1 matrix (a "column matrix") which has \gamma^\prime _ i(t) in the ith row, where \gamma _ 1,\dots,\gamma _ n are the components of \gamma. For every \mathbf{x} \in E, [f^\prime (\mathbf{x})] is the 1 by n matrix (a "row matrix") which has (D _ jf)(\mathbf{x}) in the jth column. Hence [g^\prime (t)] is the 1 by 1 matrix whose only entry is the real number g^\prime (t)=\sum (D _ if)(\gamma(t))\gamma^\prime (t).

This is a frequently encountered special case of the chain rule. It can be rephrased in the following manner.

Associate with each \mathbf{x} \in E a vector, the so-called "gradient" of f at \mathbf{x}, defined by
\begin{equation}\label{95}
(\nabla f)(\mathbf{x})=\sum _ {i=1}^{n}(D _ if)(\mathbf{x})\mathbf{e} _ i.
\end{equation}

Since \gamma^\prime (t)=\sum \gamma^\prime _ i(t)\mathbf{e} _ i, g^\prime (t) can be written in the form g^\prime (t)=(\nabla f)(\gamma(t))\cdot\gamma^\prime (t), the scalar product of the vectors (\nabla f)(\gamma(t)) and \gamma^\prime (t).

.

Let us now fix an \mathbf{x} \in E, let \mathbf{u}\in \mathbb{R}^n be a unit vector (that is, |\mathbf{u}|=1), and specialize \gamma so that \gamma(t)=\mathbf{x}+t\mathbf{u}\, (-\infty<t<\infty). Then \gamma^\prime (t)=\mathbf{u} for every t. Hence g^\prime (0)=(\nabla f)(\mathbf{x})\cdot \mathbf{u}.

On the other hand, g(t)-g(0)=f(\mathbf{x}+t\mathbf{u})-f(\mathbf{x}). Hence
\[\lim _ {t\to0}\frac{f(\mathbf{x}+t\mathbf{u})-f(\mathbf{x})}{t}=(\nabla f)(\mathbf{x})\cdot \mathbf{u}.\]

This limit is usually called the directional derivative of f at \mathbf{x}, in the direction of the unit vector \mathbf{u}, and may be denoted by (D _ {\mathbf{u}}f)(\mathbf{x}).

If f and \mathbf{x} are fixed, but \mathbf{u} varies, then it shows that (D _ \mathbf{u}f)(\mathbf{x}) attains its maximum when \mathbf{u} is a positive scalar multiple of (\nabla f)(\mathbf{x}). [The case (\nabla f)(\mathbf{x}) = \mathbf{0} should be excluded here.]

If \mathbf{u}=\sum u _ i\mathbf{e} _ i, then it also show that (D _ \mathbf{u}f)(\mathbf{x}) can be expressed in terms of the partial derivatives of f at \mathbf{x} by the formula
\[(D _ \mathbf{u}f)(\mathbf{x})=\sum _ {i=1}^{n}(D _ if)(\mathbf{x})u _ i.\]

Some of these ideas will play a role in the following theorem.

Theorem 14. Suppose \mathbf{f} maps a convex open vet E \subset \mathbb{R}^n into \mathbb{R}^m, \mathbf{f} is differentiable in E, and there is a real number M such that |\mathbf{f^\prime (x)}|\leqslant M for every \mathbf{x}\in E. Then |\mathbf{f(b)-f(a)}|\leqslant M|\mathbf{b-a}| for all \mathbf{a}\in E, \mathbf{b}\in E.

Theorem 15. If, in addition, \mathbf{f^\prime (x)=0} for all \mathbf{x}\in E, then \mathbf{f} is constant.

Definition 16. A differentiable mapping \mathbf{f} of an open set E \subset \mathbb{R}^n into \mathbb{R}^m is said to be \textbf{continuously differentiable} in E if \mathbf{f}^\prime is a continuous mapping of E into L(\mathbb{R}^n, \mathbb{R}^m).

Mote explicitly, it is required that to every \mathbf{x} \in E and to every \varepsilon > 0 corresponds a \delta > 0 such that |\mathbf{f^\prime (y)-f^\prime (x)}|<\varepsilon if \mathbf{y}\in E and |\mathbf{x-y}|<\delta.

If this is so, we also say that \mathbf{f} is a \mathcal{C}^\prime-mapping, or that \mathbf{f}\in \mathcal{C}^\prime (E).


Theorem 17. Suppose \mathbf{f} maps an open set E \subset \mathbb{R}^n into \mathbb{R}^m. Then \mathbf{f}\in \mathcal{C}^\prime (E) if and only if the partial derivatives D _ jf _ i exist and are continuous on E for 1\leqslant i\leqslant m, 1\leqslant j\leqslant n.

Proof. Theorem theorem926. Assume first that \mathbf{f}\in\mathcal{C}^\prime (E). By (94), (D _ jf _ i)(\mathbf{x})=(\mathbf{f^\prime (x)}\mathbf{e} _ j)\cdot\mathbf{u} _ i for all i,j, and for all \mathbf{x}\in E. Hence
\[\left(D _ {j} f _ {i}\right)(\mathbf{y})-\left(D _ {j} f _ {i}\right)(\mathbf{x})=\left\{\left[\mathbf{f}^{\prime}(\mathbf{y})-\mathbf{f}^{\prime}(\mathbf{x})\right] \mathbf{e} _ {j}\right\} \cdot \mathbf{u} _ {i}\Longrightarrow|\left(D _ {j} f _ {i}\right)(\mathbf{y})-\left(D _ {j} f _ {i}\right)(\mathbf{x})|\leqslant|\mathbf{f^\prime (y)-f^\prime (x)}|.\]
Hence D _ jf _ i is continuous.

For the converse, it suffices to consider the case m = 1. (Why?) Fix \mathbf{x} \in E and \varepsilon > 0. Since E is open, there is an open ball S \subset E, with center at \mathbf{x} and radius r, and the continuity of the functions D _ jf shows that r can be chosen so that \left|\left(D _ {j} f\right)(\mathbf{y})-\left(D _ {j} f\right)(\mathbf{x})\right|<{\varepsilon}/{n}\, (\mathbf{y} \in S, 1 \leq j \leq n).

Suppose \mathbf{h}=\sum h _ j\mathbf{e} _ j, |\mathbf{h}|<r, put \mathbf{v} _ 0=\mathbf{0}, and \mathbf{v} _ k=h _ 1\mathbf{e} _ 1+\dots+h _ k\mathbf{e} _ k, for 1\leqslant k\leqslant n. Then
\begin{equation}\label{96}
f(\mathbf{x}+\mathbf{h})-f(\mathbf{x})=\sum _ {j=1}^{n} \left[f\left(\mathbf{x}+\mathbf{v} _ {j}\right)-f\left(\mathbf{x}+\mathbf{v} _ {j-1}\right)\right].
\end{equation}
Since |\mathbf{v} _ k|<r for 1\leqslant k\leqslant n and since S is convex, the segments with end points \mathbf{x} + \mathbf{v} _ {j-1} and \mathbf{x} + \mathbf{v} _ j lie in S. Since \mathbf{v} _ j = \mathbf{v} _ {j-1} + h _ j\mathbf{e} _ j , the mean value theorem shows that the jth summand in (96) is equal to h _ {j}\left(D _ {j} f\right)\left(\mathbf{x}+\mathbf{v} _ {j-1}+\theta _ {j} h _ {j} \mathbf{e} _ {j}\right) for some \theta _ j\in(0,1), and this differs from h _ j(D _ jf)(\mathbf{x}) by less than |h _ j|\varepsilon/n. By (96), it follows that
\[\Big|f(\mathbf{x}+\mathbf{h})-f(\mathbf{x})-\sum _ {j=1}^{n} h _ {j}\left(D _ {j} f\right)(\mathbf{x})\Big| \leqslant \frac{1}{n} \sum _ {j=1}^{n}\left|h _ {j}\right| \varepsilon \leqslant|\mathbf{h}| \varepsilon\]
for all \mathbf h such that |\mathbf{h}|<r.

This says that f is differentiable at \mathbf{x} and that f^\prime (\mathbf{x}) is the linear function which assigns the number \sum h _ j(D _ jf)(\mathbf{x}) to the vector \mathbf{h}=\sum h _ j\mathbf{e} _ j. The matrix [f^\prime (\mathbf{x})] consists of the row (D _ 1f)(\mathbf{x}), \dots , (D _ nf)(\mathbf{x}); and since D _ 1f, \dots , D _ nf are continuous functions on E, we have f\in\mathcal{C}^\prime (E).
3. The contraction principle We now interrupt our discussion of differentiation to insert a fixed point theorem that is valid in arbitrary complete metric spaces. It will be used in the proof of the inverse function theorem.

Definition 18. Let X be a metric space, with metric d. If \varphi maps X into X and if there is a number c<1 \textrm{s.t.} d(\varphi(x)-\varphi(y))\leqslant c\, d(x,y) for all x,y\in X, then \varphi is said to be a \textbf{contraction} of X into X.

Theorem 19. If X is a complete metric space, and if \varphi is a contraction of X into X, then there exists one and only one x\in X such that \varphi(x)=x.

In other words, \varphi has a unique fixed point. The uniqueness is a triviality, for if \varphi(x) = x and \varphi(y) = y, then d(x, y)\leqslant c\, d(x, y), which can only happen when d(x, y) = 0.

The existence of a fixed point of \varphi is the essential part of the theorem. The proof actually furnishes a constructive method for locating the fixed point.

Proof. Pick x _ 0\in X arbitrary, and define \{x _ n\} recursively, by setting x _ {n+1}=\varphi(x _ n)\, (n=0,1,2,\dots). Choose c<1 so that d(\varphi(x)-\varphi(y))\leqslant c\, d(x,y) holds. For n\geqslant1 we then have d(x _ {n+1},x _ n)=d\left(\varphi\left(x _ {n}\right), \varphi\left(x _ {n-1}\right)\right) \leq c d\left(x _ {n}, x _ {n-1}\right). Hence deduction gives d(x _ {n+1},x _ n)\leqslant c^n d(x _ 1,x _ 0)\, (n=0,1,2,\dots). If n<m, it follows that
\[d(x _ n,x _ m)\leqslant\sum _ {i=n}^{m}d(x _ i,x _ {i-1})\leqslant(c^n+\dots+c^{m-1})d(x _ 1,x _ 0)\leqslant[(1-c)^{-1}d(x _ 1,x _ 0)]c^n.\]
Thus \{x _ n\} is a Cauchy sequence. Since X is complete, \lim x _ n=x for some x\in X.

Since \varphi is a contraction, \varphi is continuous (in fact, uniformly continuous) on X. Hence \varphi(x)=\lim\varphi(x _ n)=\lim x _ {n+1}=x.
4. The inverse function theorem The inverse function theorem states, roughly speaking, that a continuously differentiable mapping \mathbf{f} is invertible in a neighborhood of any point \mathbf{x} at which the linear transformation \mathbf{f^\prime (x)} is invertible:

Theorem 20. Suppose \mathbf{f} is a \mathcal{C}^\prime-mapping of an open set E \subset \mathbb{R}^n into \mathbb{R}^n, \mathbf{f^\prime (a)} is invertible for some \mathbf{a} \in E, and \mathbf{b = f(a)}. Then
  1. there exist open sets U and V in \mathbb{R}^n such that \mathbf{a} \in U, \mathbf{b}\in V, \mathbf{f} is one-to-one on U, and f(U) = V;
  2. if \mathbf{g} is the inverse of \mathbf{f} [which exists, by (1)], defined in V by \mathbf{g(f(x))=x}\, (\mathbf{x}\in U), then \mathbf{g}\in\mathcal{C}^\prime (V).

Writing the equation \mathbf{y = f(x)} in component form, we arrive at the following interpretation of the conclusion of the theorem: The system of n equations y _ i=f _ i(x _ 1,\dots,x _ n)\, (1\leqslant i\leqslant n) can be solved for x _ 1,\dots,x _ n in terms of y _ 1,\dots,y _ n, if we restrict \mathbf{x} and \mathbf{y} to small enough neighborhoods of \mathbf{a} and \mathbf{b}; the solutions are unique and continuously differentiable.

Definition 21. Suppose \varphi is a bijective mapping of an open set U\subset\mathbb{R}^m onto an open set V\subset\mathbb{R}^m. Then \varphi is said to be a \textbf{diffeomorphism}, if \varphi\in\mathcal{C}^\prime (U) and \varphi^{-1}\in\mathcal{C}^\prime (V).

Suppose \mathbf{f}\in\mathcal{C}^\prime and \mathbf{f(a)=b}. If there exist open sets U\subset\mathbb{R}^m and V\subset\mathbb{R}^m such that \mathbf{a}\in U, \mathbf{b}\in V and \mathbf{f} restricted to U is a diffeomorphism from U to V, then we say \mathbf{f} at \mathbf{a} is a \textbf{local diffeomorphism}.

The inverse function theorem answers the question that under what conditions can \mathbf{f} be a local diffeomorphism at \mathbf{a}.

Suppose \mathbf{f} is a \mathcal{C}^\prime-mapping of an open set E \subset \mathbb{R}^n into \mathbb{R}^n, \mathbf{f^\prime (a)} is invertible for some \mathbf{a} \in E, and \mathbf{b = f(a)}. Then \mathbf{f} is a local diffeomorphism at \mathbf{a}.

The following is an immediate consequence of part (1) of the inverse function theorem.

Theorem 22. If \mathbf{f} is a \mathcal{C}^\prime-mapping of an open set E\subset \mathbb{R}^n into \mathbb{R}^n and if \mathbf{f^\prime (x)} is invertible for every \mathbf{x}\in E, then \mathbf{f}(W) is an open subset of \mathbb{R}^n for every open set W\in E.

In other words, \mathbf{f} is an open mapping of E into \mathbb{R}^n.

The hypotheses made in this theorem ensure that each point \mathbf{x} \in E has a neighborhood in which \mathbf{f} is 1-1. This may be expressed by saying that \mathbf{f} is locally one-to-one in E. But \mathbf{f} need not be 1-1 in E under these circumstances.

{Proof.} We shall firstly recall Newton's method for root-finding. Suppose function f(x) has some nice properties such as f\in \mathcal{C}^\prime, f^\prime (x)\neq0, etc. In order to solve equation f(x)=0, we start with a proper initial approximation x _ 1, and define x _ {n+1}=x _ n-\frac{f(x _ n)}{f^\prime (x _ n)},\, (n=1,2,\dots). If the sequence \{x _ n\} converges to x _ \ast, then x _ \ast should be the root of the equation. (This can be seen by letting n\to\infty.)

Now we want to solve equation \mathbf{f(x)=y} for \mathbf{x}. Similarly, initialize \mathbf{x} _ 1 properly, and define \mathbf{x} _ {n+1}=\mathbf{x} _ n-(\mathbf{f}^\prime (\mathbf{x} _ n))^{-1}(\mathbf{y}-\mathbf{f}(\mathbf{x} _ n))\, (n=1,2,\dots). If this sequence is convergent, the existence of the solution is followed.

For the convenience of calculation, we may simplify the construction of \{(\mathbf{x} _ n,\mathbf{y})\}. Since (\mathbf{x} _ n,\mathbf{y}) should be very close to (\mathbf{a,b}), we may assert that \mathbf{f}^\prime (\mathbf{x} _ n) should be close to \mathbf{f^\prime (a)}. If we replace \mathbf{f}^\prime (\mathbf{x} _ n) with \mathbf{f^\prime (a)}, we obtain an iterative construction which is more convenient for calculation: \mathbf{x} _ {n+1}=\mathbf{x} _ n-(\mathbf{f^\prime (a)})^{-1}(\mathbf{y}-\mathbf{f}(\mathbf{x} _ n)).

For the convergence of \{\mathbf{x} _ n\}, we point out that contraction mapping principle is an effective method.

(1) Put \mathbf{f^\prime (a)}=A, and choose \lambda so that 2\lambda|A|=1. Since \mathbf{f^\prime } is continuous at \mathbf{a}, there is an open ball U \subset E, with center at \mathbf{a}, such that |\mathbf{f^\prime (x)}-A|\leqslant\lambda\, (\mathbf{x}\in U).

We associate to each \mathbf{y}\in \mathbb{R}^n a function \varphi, defined by
\begin{equation}\label{97}
\varphi(\mathbf{x})=\mathbf{x}+A^{-1}(\mathbf{y}-\mathbf{f}(\mathbf{x})) \quad(\mathbf{x} \in E).
\end{equation}

Note that \mathbf{f(x) = y} if and only if \mathbf{x} is a fixed point of \varphi.

Since \varphi^\prime (\mathbf{x})=I-A^{-1} \mathbf{f}^\prime (\mathbf{x})=A^{-1}\left(A-\mathbf{f}^\prime (\mathbf{x})\right), we have |\varphi^\prime (\mathbf{x})|\leqslant1/2\, (\mathbf{x}\in U). Hence
\begin{equation}\label{98}
\left|\varphi\left(\mathbf{x} _ {1}\right)-\varphi\left(\mathbf{x} _ {2}\right)\right| \leqslant \frac{1}{2}\left|\mathbf{x} _ {1}-\mathbf{x} _ {2}\right| \quad\left(\mathbf{x} _ {1}, \mathbf{x} _ {2} \in U\right)
\end{equation}
by Theorem theorem924. It follows that \varphi has at most one fixed point in U, so that \mathbf{f(x) = y} for at most one \mathbf{x} \in U. Thus \mathbf{f} is 1-1 in U.

Next, put V=f(U), and pick \mathbf{y} _ 0\in V. Then \mathbf{y} _ 0 = \mathbf{f}(\mathbf{x} _ 0) for some \mathbf{x} _ 0 \in U. Let B be an open ball with center at \mathbf{x} _ 0 and radius r > 0, so small that its closure \bar B lies in U. We will show that \mathbf{y} \in V whenever |\mathbf{y}-\mathbf{y} _ 0|<\lambda r. This proves, of course, that V is open.

Fix \mathbf{y}, |\mathbf{y}-\mathbf{y} _ 0|<\lambda r. With \varphi as in (97),
\[
\left|\varphi\left(\mathbf{x} _ {0}\right)-\mathbf{x} _ {0}\right|=\left|A^{-1}\left(\mathbf{y}-\mathbf{y} _ {0}\right)\right|< |A^{-1}| \lambda r=\frac{r}{2}.
\]
If \mathbf{x}\in\bar B, it therefore follows from (98) that
\[\left|\varphi(\mathbf{x})-\mathbf{x} _ {0}\right| \leqslant\left|\varphi(\mathbf{x})-\varphi\left(\mathbf{x} _ {0}\right)\right|+ \left|\varphi\left(\mathbf{x} _ {0}\right)-\mathbf{x} _ {0}\right|<\frac{1}{2}\left|\mathbf{x}-\mathbf{x} _ {0}\right|+ \frac{r}{2}\leqslant r;\]
hence \varphi(\mathbf{x})\in B. Note that (98) holds if \mathbf{x} _ 1\in\bar{B}, \mathbf{x} _ 2\in\bar{B}.

Thus \varphi is a contraction of \bar B into \bar B. Being a closed subset of \mathbb{R}^n, \bar B is complete. Theorem theorem931 implies therefore that \varphi has a fixed point \mathbf{x} \in\bar B. For this \mathbf{x}, \mathbf{f(x) = y}. Thus \mathbf{y}\in \mathbf{f}(\bar B)\subset \mathbf{f}(U)=V.

This proves part (1) of the theorem.

(2) Pick \mathbf{y}\in V, \mathbf{y+k}\in V. Then there exist \mathbf{x}\in U, \mathbf{x+h}\in U, so that \mathbf{y=f(x)}, \mathbf{y+k=f(x+h)}. Write \varphi as in (97),
\[\varphi(\mathbf{x}+\mathbf{h})-\varphi(\mathbf{x})= \mathbf{h}+A^{-1}[\mathbf{f}(\mathbf{x})-\mathbf{f}(\mathbf{x}+\mathbf{h})]=\mathbf{h}-A^{-1} \mathbf{k}.\]
By (98), |\mathbf{h}-A^{-1}\mathbf{k}|\leqslant|\mathbf{h}|/2. Hence |A^{-1}\mathbf{k}|\geqslant|\mathbf{h}|/2, and |\mathbf{h}|\leqslant2|A^{-1}||\mathbf{k}|=\lambda^{-1}|\mathbf{k}|.

By Theorem theorem915, \mathbf{f^\prime (x)} has an inverse, say T. Since
\[\mathbf{g}(\mathbf{y}+\mathbf{k})-\mathbf{g}(\mathbf{y})-T \mathbf{k}=\mathbf{h}-T \mathbf{k}=-T\left[\mathbf{f}(\mathbf{x}+\mathbf{h})-\mathbf{f}(\mathbf{x})-\mathbf{f}^\prime (\mathbf{x}) \mathbf{h}\right],\]
|\mathbf{h}|\leqslant\lambda^{-1}|\mathbf{k}| implies
\[\frac{|\mathbf{g}(\mathbf{y}+\mathbf{k})-\mathbf{g}(\mathbf{y})-T \mathbf{k}|}{|\mathbf{k}|} \leqslant \frac{|T|}{\lambda} \cdot \frac{\left|\mathbf{f}(\mathbf{x}+\mathbf{h})-\mathbf{f}(\mathbf{x})-\mathbf{f}^{\prime}(\mathbf{x}) \mathbf{h}\right|}{|\mathbf{h}|}.\]

As \mathbf{k\to0}, we can see that \mathbf{h\to 0}. The right side of the last inequality thus tends to 0. Hence the same is true of the left. We have thus proved that \mathbf{g^\prime (y)}=T. But T was chosen to be the inverse of \mathbf{f^\prime (x)=f^\prime (g(y))}. Thus
\begin{equation}\label{99}
\mathbf{g^\prime (y)=\{f^\prime (g(y))\}^{-1}}\quad(\mathbf{y}\in V).
\end{equation}

Finally, note that \mathbf{g} is a continuous mapping of V onto U (since \mathbf{g} is differentiable), that \mathbf{f^\prime } is a continuous mapping of U into the set \Omega of all invertible elements of L(\mathbb{R}^n), and that inversion is a continuous mapping of \Omega onto \Omega, by Theorem theorem915. If we combine these facts with (99), we see that \mathbf{g}\in\mathcal{C}^\prime (V).

This completes the proof.

The full force of the assumption that \mathbf{f}\in \mathcal{C}^\prime (E) was only used in the last paragraph of the preceding proof. Everything else, down to Eq.(99), was derived from the existence of \mathbf{f^\prime (x)} for \mathbf{x} \in E, the invertibility of \mathbf{f^\prime (a)}, and the continuity of \mathbf{f^\prime } at just the point \mathbf{a}.
5. The implicit function theorem The proof of the so-called "implicit function theorem" makes strong use of the fact that continuously differentiable transformations behave locally very much like their derivatives. Accordingly, we first prove Theorem theorem951, the linear version of Theorem theorem952.

{Notation} If \mathbf{x}=(x _ 1,\dots,x _ n)\in\mathbb{R}^n and \mathbf{y}=(y _ 1,\dots,y _ m)\in\mathbb{R}^m, let us write (\mathbf{x,y}) for the point (or vector) (x _ 1,\dots,x _ n,y _ 1,\dots,y _ m)\in\mathbb{R}^{n+m}. In what follows, the first entry in (\mathbf{x,y}) or in a similar symbol will always be a vector in \mathbb{R}^n, the second will be a vector in \mathbb{R}^m.

Every A \in L(\mathbb{R}^{n+m}, \mathbb{R}^n) can be split into two linear transformations A _ x and A _ y, defined by A _ x\mathbf{h}=A(\mathbf{h,0}), A _ y\mathbf{k}=A(\mathbf{0,k}) for any \mathbf{h}\in\mathbb{R}^n, \mathbf{k}\in\mathbb{R}^m. Then A _ x\in L(\mathbb{R}^n), A _ y\in L(\mathbb{R}^m,\mathbb{R}^n), and A(\mathbf{h,k})=A _ x\mathbf{h}+A _ y\mathbf{k}.

The linear version of the implicit function theorem is now almost obvious.

Theorem 23. If A \in L(\mathbb{R}^{n+m}, \mathbb{R}^n) and if A _ x is invertible, then there corresponds to every k\in\mathbb{R}^m a unique h\in\mathbb{R}^n such that A\mathbf{(h,k)}=\mathbf{0}.

This \mathbf{h} can be computed from \mathbf{k} by the formula \mathbf{h}=-(A _ x)^{-1}A _ y\mathbf{k}.

The conclusion of this theorem is, in other words, that the equation A(\mathbf{h, k}) = \mathbf{0} can be solved (uniquely) for \mathbf{h} if \mathbf{k} is given, and that the solution \mathbf{h} is a linear function of \mathbf{k}. Those who have some acquaintance with linear algebra will recognize this as a very familiar statement about systems of linear equations.

Theorem 24. Let \mathbf{f} be a \mathcal{C}^\prime-mapping of an open set E \subset \mathbb{R}^{n+m} into \mathbb{R}^n, such that \mathbf{f(a, b) = 0} for some point \mathbf{(a, b)} \in E. Put A=\mathbf{f^\prime (a,b)} and assume that A _ x is invertible.

Then there exist open sets U\subset \mathbb{R}^{n + m} and W\subset \mathbb{R}^m, with \mathbf{(a, b)} \in U and \mathbf{b} \in W, having the following property:

To every \mathbf{y}\in W corresponds a unique \mathbf{x} such that \mathbf{(x,y)}\in U and \mathbf{f(x,y)=0}.

If this \mathbf{x} is defined to be \mathbf{g(y)}, then \mathbf{g} is a \mathcal{C}^\prime-mapping of W into \mathbb{R}^n, \mathbf{g(b)=a},
\begin{gather*}
\mathbf{f(g(y),y)=0}\quad(\mathbf{y}\in W), \\
\mathbf{g^\prime (b)}=-(A _ x)^{-1}A _ y.
\end{gather*}

The function g is "implicitly" defined by \mathbf{f(g(y),y)=0}. Hence the name of the theorem.

The equation \mathbf{f(x,y)=0} can be written as a system of n equations in n + m variables:
\begin{gather*}
f _ 1(x _ 1,\dots,x _ n,y _ 1,\dots,y _ m) =0, \\
\vdots\\
f _ n(x _ 1,\dots,x _ n,y _ 1,\dots,y _ m) =0.
\end{gather*}

The assumption that A _ x is invertible means that the n by n matrix [D _ jf _ i] evaluated at \mathbf{(a, b)} defines an invertible linear operator in \mathbb{R}^n; in other words, its column vectors should be independent, or, equivalently, its determinant should be \neq0. If, furthermore, the n equations holds when \mathbf{x=a} and \mathbf{y = b}, then the conclusion of the theorem is that they can be solved for x _ 1,\dots,x _ n in terms of y _ 1, \dots , y _ m, for every \mathbf{y} near \mathbf{b}, and that these solutions are continuously differentiable functions of \mathbf{y}.

{Proof. }Define \mathbf{F} by \mathbf{F(x,y)=(f(x,y),y)}\, ((\mathbf{x,y})\in E). Then \mathbf{F} is a \mathcal{C}^\prime-mapping of E into \mathbb{R}^{n+m}. We claim that \mathbf{F^\prime (a, b)} is an invertible element of L(\mathbb{R}^{n+m}).

Since \mathbf{f(a,b)=0}, we have \mathbf{f(a+h,b+k)}=A\mathbf{(h,k)+r(h,k)}, where \mathbf{r} is the remainder that occurs in the definition of \mathbf{f^\prime (a, b)}. Since
\[\mathbf{F(a+h,b+k)-F(a,b)=(f(a+h,b+k),k)}=(A\mathbf{(h,k),k})+(\mathbf{r(h,k),0}).\]
it follows that \mathbf{F^\prime (a, b)} is the linear operator on \mathbb{R}^{n+m} that maps \mathbf{(h, k)} to (A\mathbf{(h, k), k}). If this image vector is \mathbf{0}, then A\mathbf{(h, k) = 0} and \mathbf{k = 0}, hence A\mathbf{(h, 0) = 0}, and Theorem theorem951 implies that \mathbf{h = 0}. It follows that \mathbf{F^\prime (a, b)} is 1-1; hence it is invertible (Theorem theorem913).

The inverse function theorem can therefore be applied to \mathbf{F}. It shows that there exist open sets U and V in \mathbb{R}^{n+m}, with \mathbf{(a, b)} \in U, \mathbf{(0, b)} \in V, such that \mathbf{F} is a 1-1 mapping of U onto V.

We let W be the set of all \mathbf{y}\in\mathbb{R}^m such that \mathbf{(0,y)}\in V. Note that \mathbf{b}\in W.

It is clear that W is open since V is open.

If \mathbf{y}\in W, then \mathbf{(0,y)=F(x,y)} for some \mathbf{(x,y)}\in U. Hence, \mathbf{f(x,y)=0} for this \mathbf{x}.

Suppose, with the same \mathbf{y}, that \mathbf{(x^\prime , y)} \in U and \mathbf{f(x^\prime , y) = 0}. Then \mathbf{F}\left(\mathbf{x}^\prime , \mathbf{y}\right)=\left(\mathbf{f}\left(\mathbf{x}^{\prime}, \mathbf{y}\right), \mathbf{y}\right)=(\mathbf{f}(\mathbf{x}, \mathbf{y}), \mathbf{y})=\mathbf{F}(\mathbf{x}, \mathbf{y}). Since \mathbf{F} is 1-1 in U, it follows that \mathbf{x^\prime = x}.

This proves the first part of the theorem.

For the second part, define \mathbf{g(y)}, for \mathbf{y}\in W, so that (\mathbf{g(y),y})\in U and \mathbf{f(g(y),y)=0} holds. Then \mathbf{F(g(y),y)=(0,y)}\, (\mathbf{y}\in W). If \mathbf{G} is the mapping of V onto U that inverts \mathbf{F}, then \mathbf{G}\in \mathcal{C}^\prime, by the inverse function theorem, and it gives \mathbf{(g(y),y)=G(0,y)}\, (\mathbf{y}\in W). Since \mathbf{G}\in \mathcal{C}^\prime, this shows that \mathbf{g}\in \mathcal{C}^\prime.

Finally, to compute \mathbf{g^\prime (b)}, put \mathbf{(g(y),y)=G(0,y)}=\Phi(\mathbf{y}). Then \Phi^\prime (\mathbf{y}) \mathbf{k}=\left(\mathbf{g}^\prime (\mathbf{y}) \mathbf{k}, \mathbf{k}\right) \, (\mathbf{y} \in W, \mathbf{k} \in \mathbb{R}^{m}), and \mathbf{f}(\Phi\mathbf{(y)})=\mathbf{0} in W. The chain rule shows therefore that \mathbf{f}^\prime (\Phi(\mathbf{y}))\Phi^\prime (\mathbf{y})=0. When \mathbf{y=b}, then \Phi\mathbf{(y)=(a,b)} and \mathbf{f}^\prime (\Phi(\mathbf{y}))=A. Thus A\Phi^\prime (\mathbf{b})=0.

Now we have
\[A _ {x} \mathbf{g}^\prime (\mathbf{b}) \mathbf{k}+A _ {y} \mathbf{k}=A\left(\mathbf{g}^\prime (\mathbf{b}) \mathbf{k}, \mathbf{k}\right)=A \Phi^\prime (\mathbf{b}) \mathbf{k}=\mathbf{0}\]
for every \mathbf{k}\in\mathbb{R}^m. Thus A _ {x} \mathbf{g}^\prime (\mathbf{b})+A _ {y}=0. This is actually what we want.
6. The rank theorem Although this theorem is not as important as the inverse function theorem or the implicit function theorem, we include it as another interesting illustration of the general principle that the local behavior of a continuously differentiable mapping \mathbf{F} near a point \mathbf{x} is similar to that of the linear transformation \mathbf{F^\prime (x)}. Before stating it, we need a few more facts about linear transformations.

Definition 25. Suppose X and Y are vector spaces, and A \in L( X, Y), as in Definition def93. The null space of A, \mathcal{N}(A), is the set of all \mathbf{x} \in X at which A\mathbf{x} = \mathbf{0}. It is clear that \mathcal{N}(A) is a vector space in X.

Likewise, the range of A, \mathcal{R}(A), is a vector space in Y.

The rank of A is defined to be the dimension of \mathcal{R}(A).

If A\in L(X,Y) and A has rank 0, then A\mathbf{x}=\mathbf{0} for all \mathbf{x}\in A, hence \mathcal{N}(A)=X.

{Projections} Let X be a vector space. An operator P\in L(X) is said to be a projection in X if P^2=P. In
other words, P fixes every vector in its range \mathcal{R}(P).

Here are some elementary properties of projections:

(a) If P is a projection in X, then every \mathbf{x} \in X has a unique representation of the form \mathbf{x}=\mathbf{x} _ 1+\mathbf{x} _ 2 where \mathbf{x} _ 1\in \mathcal{R}(P), \mathbf{x} _ 2\in \mathcal{N}(P). (To obtain the representation, put \mathbf{x} _ 1 = P\mathbf{x}, \mathbf{x} _ 2 = \mathbf{x} - \mathbf{x} _ 1.)

(b) If X is a finite-dimensional vector space and if X _ 1 is a vector space in X, then there is a projection P in X with \mathcal{R}(P)=X _ 1. (Note also that there are infinitely many projections in X, with range X _ 1 , if 0 <\dim X _ 1 <\dim X.)

Theorem 26. Suppose m, n, r are nonnegative integers, m \geqslant r, n \geqslant r, \mathbf{F} is a \mathcal{C}^\prime-mapping of an open set E \subset \mathbb{R}^n into \mathbb{R}^m, and \mathbf{F^\prime (x)} has rank r for every \mathbf{x}\in E.

Fix \mathbf{a}\in E, put A=\mathbf{F^\prime (a)}, let Y _ 1 be the range of A, and let P be a projection in \mathbb{R}^m whose range is Y _ 1. Let Y _ 2 be the null space of P.

Then there are open sets U and V in \mathbb{R}^n, with \mathbf{a}\in U, U\subset E, and there is a 1-1 \mathcal{C}^\prime-mapping \mathbf{H} of V onto U (whose inverse is also of class \mathcal{C}^\prime such that
\[\mathbf{F(H(x))}=A\mathbf{x}+\varphi(A\mathbf{x})\quad(\mathbf{x}\in V)\]
where \varphi is a \mathcal{C}^\prime-mapping of the open set A(V)\subset Y _ 1 into Y _ 2.

We omit the proof, and give a more geometric description of the information that the theorem contains.

If \mathbf{y}\in \mathbf{F}(U) then \mathbf{y=F(H(x))} for some \mathbf{x}\in V, and the formula shows that P\mathbf{y}=A\mathbf{x}. Therefore
\[\mathbf{y}=P\mathbf{y}+\varphi(P\mathbf{y})\quad(\mathbf{y}\in \mathbf{F}(U)).\]

This shows that \mathbf{y} is determined by its projection P\mathbf{y}, and that P, restricted to \mathbf{F}(U), is a 1-1 mapping of \mathbf{F}(U) onto A(V). Thus \mathbf{F}(U) is an "r-dimensional surface" with precisely one point "over" each point of A(V). We may also regard \mathbf{F}(U) as the graph of \varphi.

We state another form of this theorem.

Suppose M and N are smooth manifolds of dimensions m and n, respectively, and F:M\to N is a smooth map with constant rank r. For each p\in M there exist smooth charts (U,\varphi) for M centered ar p and (V,\psi) for N centered at F(p) such that F(U)\subseteq V, in which F has a coordinate representation of the form
\[\hat{F}(x^1,\dots,x^r,x^{r+1},\dots,x^m)=(x^1,\dots,x^r,0,\dots,0).\]
In particular, if F is a smooth submersion, this becomes
\[\hat{F}(x^1,\dots,x^r,x^{r+1},\dots,x^m)=(x^1,\dots,x^n),\]
and if F is a smooth immersion, it is
\[\hat{F}(x^1,\dots,x^m)=(x^1,\dots,x^m,0,\dots,0).\]
7. Determinants

Determinants are numbers associated to square matrices, and hence to the operators represented by such matrices. They are 0 if and only if the corresponding operator fails to be invertible. They can therefore be used to decide whether the hypotheses of some of the preceding theorems are satisfied. They will play an even more important role in Chap. 10.

Theorem 27. If I is the identity operator on \mathbb{R}^n, then \det[I]=1.

\det is linear function of each of the column vectors \mathbf{x} _ j, if the others are held fixed.

If [A] _ 1 is obtained from [A] by interchanging two columns, then \det [A] _ 1 = -\det [A].

If [A] has two equal columns, then \det [A]= 0.


Theorem 28. If [A] and [B] are n by n matrices, then \det([B][A])=\det[B]\det[A].

Theorem 29. A linear operator A on \mathbb{R}^n is invertible if and only if \det [A] \neq 0.

The determinant of the matrix of a linear operator does not depend on the basis which is used to construct the matrix. It is thus meaningful to speak of the determinant of a linear operator, without having any basis in mind.

{Jacobians} If \mathbf{f} maps an open set E \subset \mathbb{R}^n into \mathbb{R}^n, and if \mathbf{f} is differentiable at a point \mathbf{x} \in E, the determinant of the linear operator \mathbf{f^\prime (x)} is called the Jacobian of \mathbf{f} at \mathbf{x}. In symbols, J _ \mathbf{f}(\mathbf{x})=\det \mathbf{f^\prime (x)}. We shall also use the notation \dfrac{\partial(y _ 1,\dots,y _ n)}{\partial(x _ 1,\dots,x _ n)} for J _ \mathbf{f}(\mathbf{x}), if (y _ 1,\dots,y _ n)=\mathbf{f}(x _ 1,\dots,x _ n).

In terms of Jacobians, the crucial hypothesis in the inverse function theorem is that J _ \mathbf{f}(\mathbf{a})\neq0. If the implicit function theorem is stated in terms of the n functions, the assumption made there on A amounts to \dfrac{\partial(y _ 1,\dots,y _ n)}{\partial(x _ 1,\dots,x _ n)}\neq0.
8. Derivatives of higher order A mapping \mathbf{f} of E into \mathbb{R}^m is said to be of class \mathcal{C}^{\prime\prime} if each component of \mathbf{f} is of class \mathcal{C}^{\prime\prime}.

It can happen that D _ {ij}f\neq D _ {ji}f at some point, although both derivatives exist. However, we shall see below that D _ {ij}f=D _ {ji}f whenever these derivatives are continuous.

For simplicity (and without loss of generality) we state our next two theorems for real functions of two variables. The first one is a mean value theorem.

Theorem 30. Suppose f is defined in an open set E \subset \mathbb{R}^2 , and D _ 1f and D _ {21}f exist at every point of E. Suppose Q \subset E is a closed rectangle with sides parallel to the coordinate axes, having (a, b) and (a +h, b + k) as opposite vertices (h \neq 0, k \neq 0). Put \Delta(f,Q)=f(a+h,b+k)-f(a+h,b)-f(a,b+h)+f(a,b). Then there is a point (x, y) in the interior of Q such that \Delta(f,Q)=hk(D _ {21}f)(x,y).

Theorem 31. Suppose f is defined in an open set E \subset \mathbb{R}^2, suppose that D _ 1f, D _ {21}f, and D _ 2f exist at every point of E, and D _ {21}f is continuous at some point (a, b) \in E. Then D _ {12}f exists at (a,b) and (D _ {12}f)(a,b)=(D _ {21}f)(a,b).

Corollary 32. (D _ {21}f)(a,b)=(D _ {12}f)(a,b) if f\in\mathcal{C}^{\prime\prime} (E).

9. Differentiation of integrals

Suppose \varphi is a function of two variables which can be integrated with respect to one and which can be differentiated with respect to the other. Under what conditions will the result be the same if these two limit processes are carried out in the opposite order? To state the question more precisely: Under what conditions on \varphi can one prove that the equation
\[\frac{d}{dt}\int _ {a}^{b}\varphi(x,t)\, dx=\int _ {a}^{b}\frac{\partial\varphi}{\partial t}(x,t)\, dx\]
is true?

It will be convenient to use the notation \varphi^t(x)=\varphi(x,t). Thus \varphi^t is, for each t, a function of one variable.

Theorem 33. Suppose
  • \varphi(x,t) is defined for a\leqslant x\leqslant b, c\leqslant t\leqslant d;
  • \alpha is an increasing function on [a,b];
  • \varphi^t\in\mathcal{R}(\alpha) for every t\in[c,d];
  • c<s<d, and to every \varepsilon>0 corresponds a \delta>0 such that |(D _ 2\varphi)(x,t)-(D _ 2\varphi)(x,s)|<\varepsilon for all x\in[a,b] and for all t\in(s-\delta,s+\delta).
Define \displaystyle f(t)=\int _ {a}^{b}\varphi(x,t)\, d\alpha(x)\, (c\leqslant t\leqslant d). Then (D _ 2\varphi)^s\in\mathcal{R}(\alpha) exists, and \displaystyle f^\prime (s)=\int _ {a}^{b}(D _ 2\varphi)(x,s)\, d\alpha(x).

Note that the third point simply asserts the existence of the integrals in the definition of f(t) for all t \in [c, d]. Note also that the forth point certainly holds whenever D _ 2\varphi is continuous on the rectangle on which \varphi is defined.


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注