Some useful math results.

Inverse Function Theorem

Let $f(x)$ be a function that is invertible and differentiable. Let $y=f^{-1}(x)$ be the inverse function of $f(x)$. For all $x$ satisfying $f’(f^{-1}(x))\neq 0$, $$ \frac{dy}{dx}=\frac{d}{dx}(f^{-1}(x))=\frac{1}{f’(f^{-1}(x))}. $$ Alternatively, if $y=g(x)$ is the inverse of $f(x)$, then $$ g’(x)=\frac{1}{f’(g(x))}. $$

Sion’s minimax theorem

Wiki: Sion’s minimax theorem Let $X$ be a compact convex subset of a linear topological space and $Y$ a convex subset of a linear topological space. If $f$ is a real-valued function on $X\times Y$ with

  • $f(x,\cdot)$ upper semicontinuous and quasi-concave on $Y$ for all $x\in X$, and
  • $f(\cdot,y)$ lower semicontinuous and quasi-convex on $X$ for all $y\in Y$, then $$ \min_x\sup_y f(x,y)=\sup_y\min_x f(x,y). $$

Note: quasi-concave and quasi-convex are weaker than concave and convex, i.e., convex implies quasi-concave and concave implies quasi-convex but not vice versa.

Cauchy-Schwarz inequality for RVs

For any two random variables $X$ and $Y$, we have $$ |E[XY]| \leq \sqrt{E[X^2]E[Y^2]}, $$ where equality holds if and only if $X=\alpha Y$ for some constant $\alpha$. This is a special case of the Cauchy-Schwarz inequality.

Markov’s inequality

If $X$ is any nonnegative random variable, then $$ P(X\geq a)\leq \frac{EX}{a}, \forall a>0. $$

Chernoff Bounds

Denote $M_X(s):=Ee^{sX}$ as the moment generating function of $X$. Then for any $a\in\mathbb{R}$, we have $$ P(X\geq a)\leq e^{-sa}M_X(s), \forall s>0.\\ P(X\leq a)\leq e^{-sa}M_X(s), \forall s<0. $$

The Union Bound

For any events $A_1,A_2,\dots,A_n$ in a probability space, we have $$ P(\bigcup_{i=1}^n A_i)\leq \sum_{i=1}^n P(A_i). $$

Gaussian Distribution

$$ X\sim N(\mu_X,\sigma_X^2), Y\sim N(\mu_Y,\sigma_Y^2), Z=X+Y $$ we have $$Z\sim N(\mu_X+\mu_Y,\sigma_X^2+\sigma_Y^2)$$

  • Mutual information between bivariate Gaussian RVs $$ I(X;Y)=\frac12 \log \frac1{1-\rho^2_{XY}} $$ where $\rho_{XY}$ is the correlation coefficient between $X$ and $Y$.

  • One-dimensional Gaussian Entropy $$H(p) = -\frac12(1+\log 2\pi\sigma_1^2)$$

  • High-dimensional Gaussian Entropy $$H(p)=-\frac{J}2\log(2\pi)-\frac12 \sum_{j=1}^J(1+\log\sigma_j^2)$$

  • KL divergence between two bivariate Gaussian RVs Given $p(x)=N(\mu_1,\sigma_1^2), q(x)=N(\mu_2,\sigma_2^2)$, we have $$D(p,q)=\log \frac{\sigma_2}{\sigma_1}+\frac{\sigma_1^2+(\mu_1-\mu_2)^2}{2\sigma_2^2}-\frac12.$$