中文版本:[[偏导数与梯度:从“切片”到“基变换”的直观理解]]
## The Core Act: Freeze and Slice
Given a surface $z = f(x, y)$, the partial derivative $\frac{\partial f}{\partial x}$ answers: **if I freeze $y$ and walk along $x$, how steeply does $z$ change?**
The act has two steps:
1. **Freeze** one variable — this slices the surface with a vertical plane, reducing it to a curve $C$.
2. **Differentiate** that curve — ordinary single-variable calculus on the slice.
Three ways to see the same thing:
- **The knife cut.** Hold a vertical knife at $y = y_0$ and slice through the surface. The cross-section is a curve in the $xz$-plane. $\frac{\partial f}{\partial x}$ at $P$ is the slope of that curve at $P$.
- **The ant on the surface.** An ant at $P$ marches purely along $x$, refusing to deviate in $y$. $\frac{\partial f}{\partial x}$ tells the ant how steeply it climbs per unit step in $x$.
- **The controlled experiment.** Treat $x$ and $y$ as two independent knobs. Lock one knob, vary the other, measure sensitivity. The partial derivative is **sensitivity of output to one input while everything else holds still**.
> [!important] Direction matters Slicing at $y = y_0$ gives $\frac{\partial f}{\partial x}$. Slicing at $x = x_0$ gives $\frac{\partial f}{\partial y}$. Each partial derivative measures the slope **along a different axis**.
---
## The Notation Hides Something
$\frac{\partial f}{\partial x}$ looks like it only relates $z$ to $x$. **It does not.** The result is a function of **all input variables**.
Consider $f(x, y) = x^2 y$:
$\frac{\partial f}{\partial x} = 2xy$
Both $x$ and $y$ survive. Freezing $y$ during differentiation does not erase $y$ — it treats $y$ as a constant, and **constants carry through**, just as the 3 carries through in $\frac{d}{dx}(3x^2) = 6x$.
Geometrically this makes sense: the slope of the slice at a point depends on **where you sliced**. Slicing at $y = 1$ gives one curve; slicing at $y = 5$ gives another. The answer must remember which $y$ you were at.
> [!note] The honest notation A more explicit form would be $\frac{\partial f}{\partial x}\bigg|_{(x,y)}$, making the dependency on the full point visible.
---
## The Gradient: Reassembly
Each partial derivative isolates one direction. The **gradient** assembles them back:
$\nabla f = \left(\frac{\partial f}{\partial x},\ \frac{\partial f}{\partial y}\right)$
The gradient lives in the **input space**, not the output. A function $f(x, y)$ takes 2 inputs and produces 1 output — so $\nabla f$ is a 2-component vector. $z$ does not appear because $z$ is **determined** by $(x, y)$; it responds, it does not drive.
The rule scales mechanically:
|Function|Inputs|Gradient dimension|
|---|---|---|
|$f(x, y)$|2|$\nabla f \in \mathbb{R}^2$|
|$f(x, y, z)$|3|$\nabla f \in \mathbb{R}^3$|
|$f(x_1, \dots, x_n)$|$n$|$\nabla f \in \mathbb{R}^n$|
Each component freezes the other $n - 1$ variables and measures sensitivity along one axis. The algebra does not care about dimension; only your visual imagination gives out beyond 3.
---
## From Slopes to Plane: The Tangent Plane
Two slopes at a point $P = (x_0, y_0, z_0)$, one along $x$ and one along $y$, together pin down a **tangent plane** — a flat sheet kissing the surface at $P$:
$z \approx f(x_0, y_0) + \frac{\partial f}{\partial x}\bigg|_P (x - x_0) + \frac{\partial f}{\partial y}\bigg|_P (y - y_0)$
This is the **best linear approximation** of $f$ near $P$ — the multivariable analogue of the tangent line in single-variable calculus. The tangent plane is the first object that uses both partial derivatives simultaneously.
### Tangent Vectors That Span the Plane
The tangent plane equation tells you **where** the plane sits, but not **which directions** it contains. To find those, parameterize each slice curve and differentiate.
On the slice at $y = y_0$, the curve is $(x,\ y_0,\ f(x, y_0))$. Let $x$ be the parameter:
$\mathbf{t}_x = \frac{d}{dx}(x,\ y_0,\ f(x, y_0)) = \left(1,\ 0,\ \frac{\partial f}{\partial x}\bigg|_P\right)$
This says: move 1 unit in $x$, 0 in $y$ (frozen), and $\frac{\partial f}{\partial x}$ units in $z$. The slope is the ratio of the third component to the first. The same logic on the $y$-slice gives:
$\mathbf{t}_y = \left(0,\ 1,\ \frac{\partial f}{\partial y}\bigg|_P\right)$
These two vectors **span** the tangent plane. The tangent plane is the plane generated by $\mathbf{t}_x$ and $\mathbf{t}_y$ anchored at $P$. Two slices, two tangent vectors, one plane. The partial derivatives sit inside those vectors as the $z$-components, controlling how steeply each direction tilts off the horizontal.
### The Normal Vector
Two vectors spanning a plane — their cross product gives the normal:
$\mathbf{t}_x \times \mathbf{t}_y = (1,\ 0,\ f_x) \times (0,\ 1,\ f_y) = \left(0 \cdot f_y - f_x \cdot 1,; f_x \cdot 0 - 1 \cdot f_y,; 1 \cdot 1 - 0 \cdot 0\right) = \left(-f_x,\ -f_y,\ 1\right)$
where $f_x = \frac{\partial f}{\partial x}\bigg|_P$ and $f_y = \frac{\partial f}{\partial y}\bigg|_P$.
The gradient reappears. The first two components of this normal are exactly $-\nabla f$. The normal to the tangent plane **encodes the gradient** as its shadow on the $xy$-plane.
Project this normal down onto the input space (drop the $z$-component) and you recover $-\nabla f$ — the direction of steepest descent. The normal tilts away from the horizontal precisely in the direction the surface rises fastest. The steeper the surface, the more the normal tips, and the longer that projected shadow becomes.
> [!important] Three views of the same information The **tangent plane** is the geometric object. The **normal vector** is its orientation in 3D. The **gradient** is that orientation collapsed back into the 2D input space where you actually walk.
---
## The Gradient's Geometric Meaning
The gradient $\nabla f$ at $P$ is not just a bookkeeping vector. It is the **direction along which the tangent plane tilts most steeply**. In the input space:
- $\nabla f$ **points in the direction of steepest ascent**.
- $|\nabla f|$ gives the **magnitude of that steepest slope**.
- $-\nabla f$ points in the direction of steepest descent.
The two isolated slopes fuse into a single object with both direction and strength.
---
## Directional Derivative: Arbitrary Directions
What if the ant walks at 45°, not along any axis? The **directional derivative** answers this — the slope along an arbitrary unit vector $\mathbf{u}$:
$D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$
The gradient is the master key: one dot product gives the slope in **any** direction. The partial derivatives $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ are just the special cases where $\mathbf{u} = (1, 0)$ and $\mathbf{u} = (0, 1)$.
---
## Connection to Change of Basis (Linear Algebra)
The directional derivative is not a new operation — it is **change of basis applied to the gradient**.
$\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ are the gradient's components in the **standard basis** ${(1,0),\ (0,1)}$. They answer: how does $f$ change when you move along the first basis vector? The second? The standard basis is not special — it is just the one you happened to start with.
Rotate your coordinate axes by some angle and get a new orthonormal basis ${\mathbf{u}_1, \mathbf{u}_2}$. The gradient as a geometric arrow **does not move**. But its components in the new basis become:
$\nabla f \text{ in new basis} = (\nabla f \cdot \mathbf{u}_1,\ \nabla f \cdot \mathbf{u}_2)$
These are exactly the **directional derivatives** along $\mathbf{u}_1$ and $\mathbf{u}_2$ — the "partial derivatives with respect to the new axes."
The dot product $D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$ is doing the same thing as projecting a vector onto a new basis direction. In LA, the scalar projection of $\mathbf{v}$ onto a unit vector $\mathbf{u}$ is $\mathbf{v} \cdot \mathbf{u}$. The directional derivative **is** that scalar projection, applied to the gradient.
> [!tip] The unifying principle **The vector is basis-independent; only its components depend on which axes you choose.** The gradient is one geometric object — a direction and a magnitude at a point. Partial derivatives are what you see when you decompose it into the standard basis. Directional derivatives are what you see when you decompose it into any other direction. Change of basis is the bridge.
---
## The Storyline
$\text{Slice} \xrightarrow{\text{isolate}} \text{Slope} \xrightarrow{\text{parameterize}} \text{Tangent Vectors} \xrightarrow{\text{span}} \text{Tangent Plane} \xrightarrow{\text{cross product}} \text{Normal} \xrightarrow{\text{project}} \text{Gradient} \xrightarrow{\text{generalize}} \text{Directional Derivative} \xrightarrow{\text{recognize}} \text{Change of Basis}$
Each step **reassembles** what the previous step took apart. Each answers a question the previous step left open. The final recognition closes a loop: the directional derivative is not new machinery — it is the same projection operation from linear algebra, applied to the gradient.