7.9. Subgaussian Distributions#

In this section we review subgaussian distributions and matrices drawn from subgaussian distributions.

Examples of subgaussian distributions include

Gaussian distribution
Rademacher distribution taking values $\pm \frac{1}{\sqrt{M}}$
Any zero mean distribution with a bounded support

Definition 7.33

A random variable $X$ is called subgaussian if there exists a constant $c > 0$ such that

(7.2)#

M_{X} (t) = E [\exp (X t)] \leq \exp (\frac{c^{2} t^{2}}{2})

holds for all $t \in R$ . We use the notation $X \sim Sub (c^{2})$ to denote that $X$ satisfies the constraint (7.2). We also say that $X$ is $c$ -subgaussian.

$E [\exp (X t)]$ is moment generating function of $X$ .

$\exp (\frac{c^{2} t^{2}}{2})$ is moment generating function of a Gaussian random variable with variance $c^{2}$ .

The definition means that for a subgaussian variable $X$ , its M.G.F. is bounded by the M.G.F. of a Gaussian random variable $\sim N (0, c^{2})$ .

Example 7.16 (Gaussian r.v. as subgaussian r.v.)

Consider zero-mean Gaussian random variable $X \sim N (0, σ^{2})$ with variance $σ^{2}$ . Then

E [\exp (X t)] = \exp (\frac{σ^{2} t^{2}}{2}) .

Putting $c = σ$ we see that (7.2) is satisfied. Hence $X \sim Sub (σ^{2})$ is a subgaussian r.v. or $X$ is $σ$ -subgaussian.

Example 7.17 (Rademacher distribution)

Consider $X$ with

P_{X} (x) = \frac{1}{2} δ (x - 1) + \frac{1}{2} δ (x + 1)

i.e. $X$ takes a value $1$ with probability $0.5$ and value $- 1$ with probability $0.5$ .

Then

E [\exp (X t)] = \frac{1}{2} \exp (- t) + \frac{1}{2} \exp (t) = \cosh t \leq \exp (\frac{t^{2}}{2}) .

Thus $X \sim Sub (1)$ or $X$ is 1-subgaussian.

Example 7.18 (Uniform distribution)

Consider $X$ as uniformly distributed over the interval $[- a, a]$ for some $a > 0$ . i.e.

\begin{array}{r} f_{X} (x) = {\begin{cases} \frac{1}{2 a} & - a \leq x \leq a \\ 0 & otherwise \end{cases} \end{array}

Then

E [\exp (X t)] = \frac{1}{2 a} \int_{- a}^{a} \exp (x t) d x = \frac{1}{2 a t} [e^{a t} - e^{- a t}] = \sum_{n = 0}^{\infty} \frac{(a t)^{2 n}}{(2 n + 1)!}

But $(2 n + 1)! \geq n! 2^{n}$ . Hence we have

\sum_{n = 0}^{\infty} \frac{(a t)^{2 n}}{(2 n + 1)!} \leq \sum_{n = 0}^{\infty} \frac{(a t)^{2 n}}{(n! 2^{n})} = \sum_{n = 0}^{\infty} \frac{(a^{2} t^{2} / 2)^{n}}{(n!)} = \exp (\frac{a^{2} t^{2}}{2}) .

Thus

E [\exp (X t] \leq \exp (\frac{a^{2} t^{2}}{2}) .

Hence $X \sim Sub (a^{2})$ or $X$ is $a$ -subgaussian.

Example 7.19 (Random variable with bounded support)

Consider $X$ as a zero mean, bounded random variable i.e.

P (| X | \leq B) = 1

for some $B \in R^{+}$ and

E (X) = 0.

Then, the following upper bound holds:

E [\exp (X t)] = \int_{- B}^{B} \exp (x t) f_{X} (x) d x \leq \exp (\frac{B^{2} t^{2}}{2}) .

This result can be proven with some advanced calculus. $X \sim Sub (B^{2})$ or $X$ is $B$ -subgaussian.

There are some useful properties of subgaussian random variables.

Lemma 7.7 (Mean and variance of subgaussian random variables)

If $X \sim Sub (c^{2})$ then

E (X) = 0

and

E (X^{2}) \leq c^{2} .

Thus subgaussian random variables are always zero-mean. Their variance is always bounded by the variance of the bounding Gaussian distribution.

Proof. We proceed as follows:

Note that

$\sum_{n = 0}^{\infty} \frac{t^{n}}{n!} E (X^{n}) = E (\sum_{n = 0}^{\infty} \frac{(X t)^{n}}{n!}) = E (\exp (X t)) .$
But since $X \sim Sub (c^{2})$ hence

$\sum_{n = 0}^{\infty} \frac{t^{n}}{n!} E (X^{n}) \leq \exp (\frac{c^{2} t^{2}}{2}) = \sum_{n = 0}^{\infty} \frac{c^{2 n} t^{2 n}}{2^{n} n!} .$
Restating

$E (X) t + E (X^{2}) \frac{t^{2}}{2!} \leq \frac{c^{2} t^{2}}{2} + o (t^{2}) as t \to 0.$
Dividing throughout by $t > 0$ and letting $t \to 0$ we get $E (X) \leq 0$ .
Dividing throughout by $t < 0$ and letting $t \to 0$ we get $E (X) \geq 0$ .
Thus $E (X) = 0$ . So $Var (X) = E (X^{2})$ .
Now we are left with

$E (X^{2}) \frac{t^{2}}{2!} \leq \frac{c^{2} t^{2}}{2} + o (t^{2}) as t \to 0.$
Dividing throughout by $t^{2}$ and letting $t \to 0$ we get $Var (X) \leq c^{2}$ .

Subgaussian variables have a linear structure.

Theorem 7.24 (Linearity of subgaussian variables)

If $X \sim Sub (c^{2})$ i.e. $X$ is $c$ -subgaussian, then for any $α \in R$ , the r.v. $α X$ is $| α | c$ -subgaussian.

If $X_{1}, X_{2}$ are r.v. such that $X_{i}$ is $c_{i}$ -subgaussian, then $X_{1} + X_{2}$ is $c_{1} + c_{2}$ -subgaussian.

Proof. Scalar multiplication:

Let $X$ be $c$ -subgaussian.
Then

$E [\exp (X t)] \leq \exp (\frac{c^{2} t^{2}}{2}) .$
Now for $α \neq 0$ , we have

$E [\exp (α X t)] \leq \exp (\frac{α^{2} c^{2} t^{2}}{2}) \leq \exp (\frac{(| α | c)^{2} t^{2}}{2}) .$
Hence $α X$ is $| α | c$ -subgaussian.

Addition:

Consider $X_{1}$ as $c_{1}$ -subgaussian and $X_{2}$ as $c_{2}$ -subgaussian.
Thus

$E (\exp (X_{i} t)) \leq \exp (\frac{c_{i}^{2} t^{2}}{2}) .$
Let $p, q > 1$ be two numbers s.t. $\frac{1}{p} + \frac{1}{q} = 1$ .
Using H”older’s inequality, we have

$\begin{aligned} E (\exp ((X_{1} + X_{2}) t)) & \leq {[E (\exp (X_{1} t))^{p}]}^{\frac{1}{p}} {[E (\exp (X_{2} t))^{q}]}^{\frac{1}{q}} \\ = {[E (\exp (p X_{1} t))]}^{\frac{1}{p}} {[E (\exp (q X_{2} t))]}^{\frac{1}{q}} \\ \leq {[\exp (\frac{(p c_{1})^{2} t^{2}}{2})]}^{\frac{1}{p}} {[\exp (\frac{(q c_{2})^{2} t^{2}}{2})]}^{\frac{1}{q}} \\ = \exp (\frac{t^{2}}{2} (p c_{1}^{2} + q c_{2}^{2})) \\ = \exp (\frac{t^{2}}{2} (p c_{1}^{2} + \frac{p}{p - 1} c_{2}^{2})) . \end{aligned}$
Since this is valid for any $p > 1$ , we can minimize the r.h.s. over $p > 1$ .
If suffices to minimize the term

$r = p c_{1}^{2} + \frac{p}{p - 1} c_{2}^{2} .$
We have

$\frac{\partial r}{\partial p} = c_{1}^{2} - \frac{1}{(p - 1)^{2}} c_{2}^{2} .$
Equating it to 0 gives us

$p - 1 = \frac{c_{2}}{c_{1}} ⟹ p = \frac{c_{1} + c_{2}}{c_{1}} ⟹ \frac{p}{p - 1} = \frac{c_{1} + c_{2}}{c_{2}} .$
Taking second derivative, we can verify that this is indeed a minimum value.
Thus

$r_{min} = (c_{1} + c_{2})^{2} .$
Hence we have the result

$E (\exp ((X_{1} + X_{2}) t)) \leq \exp (\frac{(c_{1} + c_{2})^{2} t^{2}}{2}) .$
Thus $X_{1} + X_{2}$ is $(c_{1} + c_{2})$ -subgaussian.

If $X_{1}$ and $X_{2}$ are independent, then $X_{1} + X_{2}$ is $\sqrt{c_{1}^{2} + c_{2}^{2}}$ -subgaussian.

If $X$ is $c$ -subgaussian then naturally, $X$ is $d$ -subgaussian for any $d \geq c$ . A question arises as to what is the minimum value of $c$ such that $X$ is $c$ -subgaussian.

Definition 7.34 (Subgaussian moment)

For a centered random variable $X$ , the subgaussian moment of $X$ , denoted by $σ (X)$ , is defined as

σ (X) = inf {c \geq 0 | E (\exp (X t)) \leq \exp (\frac{c^{2} t^{2}}{2}), \forall t \in R .}

$X$ is subgaussian if and only if $σ (X)$ is finite.

We can also show that $σ (\cdot)$ is a norm on the space of subgaussian random variables. And this normed space is complete.

For centered Gaussian r.v. $X \sim N (0, σ^{2})$ , the subgaussian moment coincides with the standard deviation. $σ (X) = σ$ .

Sometimes it is useful to consider more restrictive class of subgaussian random variables.

Definition 7.35 (Strictly subgaussian distribution)

A random variable $X$ is called strictly subgaussian if $X \sim Sub (σ^{2})$ where $σ^{2} = E (X^{2})$ , i.e. the inequality

E (\exp (X t)) \leq \exp (\frac{σ^{2} t^{2}}{2})

holds true for all $t \in R$ .

We will denote strictly subgaussian variables by $X \sim SSub (σ^{2})$ .

Example 7.20 (Gaussian distribution)

If $X \sim N (0, σ^{2})$ then $X \sim SSub (σ^{2})$ .

7.9.1. Characterization#

We quickly review Markov’s inequality which will help us establish the results in this subsection.

Let $X$ be a non-negative random variable. And let $t > 0$ . Then

P (X \geq t) \leq \frac{E (X)}{t} .

Theorem 7.25

For a centered random variable $X$ , the following statements are equivalent:

moment generating function condition:

$E [\exp (X t)] \leq \exp (\frac{c^{2} t^{2}}{2}) \forall t \in R .$
Subgaussian tail estimate: There exists $a > 0$ such that

$P (| X | \geq λ) \leq 2 \exp (- a λ^{2}) \forall λ > 0.$
$ψ_{2}$ -condition: There exists some $b > 0$ such that

$E [\exp (b X^{2})] \leq 2.$

Proof. $(1) ⟹ (2)$

Using Markov’s inequality, for any $t > 0$ we have

$\begin{aligned} P (X \geq λ) & = P (t X \geq t λ) = P (e^{t X} \geq e^{t λ}) \\ \leq \frac{E (e^{t X})}{e^{t λ}} \leq \exp (- t λ + \frac{c^{2} t^{2}}{2}) \forall t \in R . \end{aligned}$
Since this is valid for all $t \in R$ , hence it should be valid for the minimum value of r.h.s.
The minimum value is obtained for $t = \frac{λ}{c^{2}}$ .
Thus we get

$P (X \geq λ) \leq \exp (- \frac{λ^{2}}{2 c^{2}}) .$
Since $X$ is $c$ -subgaussian, hence $- X$ is also $c$ -subgaussian.
Hence

$P (X \leq - λ) = P (- X \geq λ) \leq \exp (- \frac{λ^{2}}{2 c^{2}}) .$
Thus

$P (| X | \geq λ) = P (X \leq - λ) + P (X \geq λ) \leq 2 \exp (- \frac{λ^{2}}{2 c^{2}}) .$
Thus we can choose $a = \frac{1}{2 c^{2}}$ to complete the proof.

$(2) ⟹ (3)$

TODO PROVE THIS

E (\exp (b X^{2})) \leq 1 + \int_{0}^{\infty} 2 b t \exp (b t^{2}) P (| X | > t) d t

$(3) ⟹ (1)$

TODO PROVE THIS

7.9.2. More Properties#

We also have the following result on the exponential moment of a subgaussian random variable.

Lemma 7.8

Suppose $X \sim Sub (c^{2})$ . Then

E [\exp (\frac{λ X^{2}}{2 c^{2}})] \leq \frac{1}{\sqrt{1 - λ}}

for any $λ \in [0, 1)$ .

Proof. We are given that

\begin{aligned} E (\exp (X t)) \leq \exp (\frac{c^{2} t^{2}}{2}) \\ ⟹ \int_{- \infty}^{\infty} \exp (t x) f_{X} (x) d x \leq \exp (\frac{c^{2} t^{2}}{2}) \forall t \in R \end{aligned}

Multiplying on both sides with $\exp (- \frac{c^{2} t^{2}}{2 λ})$ :

\int_{- \infty}^{\infty} \exp (t x - \frac{c^{2} t^{2}}{2 λ}) f_{X} (x) d x \leq \exp (\frac{c^{2} t^{2}}{2} \frac{λ - 1}{λ}) = \exp (- \frac{t^{2}}{2} \frac{c^{2} (1 - λ)}{λ})

Integrating on both sides w.r.t. $t$ we get:

\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \exp (t x - \frac{c^{2} t^{2}}{2 λ}) f_{X} (x) d x d t \leq \int_{- \infty}^{\infty} \exp (- \frac{t^{2}}{2} \frac{c^{2} (1 - λ)}{λ}) d t

which reduces to:

\begin{aligned} \frac{1}{c} \sqrt{2 π λ} \int_{- \infty}^{\infty} \exp (\frac{λ x^{2}}{2 c^{2}}) f_{X} (x) d x \leq \frac{1}{c} \sqrt{\frac{2 π λ}{1 - λ}} \\ ⟹ & E (\exp (\frac{λ X^{2}}{2 c^{2}})) \leq \frac{1}{\sqrt{1 - λ}} \end{aligned}

which completes the proof.

7.9.3. Subgaussian Random Vectors#

The linearity property of subgaussian r.v.s can be extended to random vectors also. This is stated more formally in following result.

Theorem 7.26

Suppose that $X = [X_{1}, X_{2}, \dots, X_{N}]$ , where each $X_{i}$ is i.i.d. with $X_{i} \sim Sub (c^{2})$ . Then for any $α \in R^{N}$ , $⟨ X, α ⟩ \sim Sub (c^{2} ‖ α ‖_{2}^{2})$ . Similarly if each $X_{i} \sim SSub (σ^{2})$ , then for any $α \in R^{N}$ , $⟨ X, α ⟩ \sim SSub (σ^{2} ‖ α ‖_{2}^{2})$ .

Norm of a subgaussian random vector

Let $X$ be a random vector where each $X_{i}$ is i.i.d. with $X_{i} \sim Sub (c^{2})$ .
Consider the $l_{2}$ norm $‖ X ‖_{2}$ . It is a random variable in its own right.
It would be useful to understand the average behavior of the norm.
Suppose $N = 1$ . Then $‖ X ‖_{2} = | X_{1} |$ .
Also $‖ X ‖_{2}^{2} = X_{1}^{2}$ . Thus $E (‖ X ‖_{2}^{2}) = σ^{2}$ .
It looks like $E (‖ X ‖_{2}^{2})$ should be connected with $σ^{2}$ .
Norm can increase or decrease compared to the average value.
A ratio based measure between actual value and average value would be useful.
What is the probability that the norm increases beyond a given factor?
What is the probability that the norm reduces beyond a given factor?

These bounds are stated formally in the following theorem.

Theorem 7.27

Suppose that $X = [X_{1}, X_{2}, \dots, X_{N}]$ , where each $X_{i}$ is i.i.d. with $X_{i} \sim Sub (c^{2})$ . Then

(7.3)#

E (‖ X ‖_{2}^{2}) = N σ^{2} .

Moreover, for any $α \in (0, 1)$ and for any $β \in [\frac{c^{2}}{σ^{2}}, β_{max}]$ , there exists a constant $κ^{*} \geq 4$ depending only on $β_{max}$ and the ratio $\frac{σ^{2}}{c^{2}}$ such that

(7.4)#

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq \exp (- \frac{N (1 - α)^{2}}{κ^{*}})

and

(7.5)#

P (‖ X ‖_{2}^{2} \geq β N σ^{2}) \leq \exp (- \frac{N (β - 1)^{2}}{κ^{*}})

First equation gives the average value of the square of the norm.
Second inequality states the upper bound on the probability that norm could reduce beyond a factor given by $α < 1$ .
Third inequality states the upper bound on the probability that norm could increase beyond a factor given by $β > 1$ .
Note that if $X_{i}$ are strictly subgaussian, then $c = σ$ . Hence $β \in (1, β_{max})$ .

Proof. Since $X_{i}$ are independent hence

E [‖ X ‖_{2}^{2}] = E [\sum_{i = 1}^{N} X_{i}^{2}] = \sum_{i = 1}^{N} E [X_{i}^{2}] = N σ^{2} .

This proves the first part.

Now let us look at (7.5).

By applying Markov’s inequality for any $λ > 0$ we have:

\begin{aligned} P (‖ X ‖_{2}^{2} \geq β N σ^{2}) & = P (\exp (λ ‖ X ‖_{2}^{2}) \geq \exp (λ β N σ^{2})) \\ \leq \frac{E (\exp (λ ‖ X ‖_{2}^{2}))}{\exp (λ β N σ^{2})} = \frac{\prod_{i = 1}^{N} E (\exp (λ X_{i}^{2}))}{\exp (λ β N σ^{2})} \end{aligned}

Since $X_{i}$ is $c$ -subgaussian, hence from \cref {lem:subgaussian_exp_square_moment} we have

E (\exp (λ X_{i}^{2})) = E (\exp (\frac{2 c^{2} λ X_{i}^{2}}{2 c^{2}})) \leq \frac{1}{\sqrt{1 - 2 c^{2} λ}} .

Thus:

\prod_{i = 1}^{N} E (\exp (λ X_{i}^{2})) \leq {(\frac{1}{\sqrt{1 - 2 c^{2} λ}})}^{\frac{N}{2}} .

Putting it back we get:

P (‖ X ‖_{2}^{2} \geq β N σ^{2}) \leq {(\frac{\exp (- 2 λ β σ^{2})}{\sqrt{1 - 2 c^{2} λ}})}^{\frac{N}{2}} .

Since above is valid for all $λ > 0$ , we can minimize the R.H.S. over $λ$ by setting the derivative w.r.t. $λ$ to $0$ .

Thus we get optimum $λ$ as:

λ = \frac{β σ^{2} - c^{2}}{2 c^{2} σ^{2} (1 + β)} .

Plugging this back we get:

P (‖ X ‖_{2}^{2} \geq β N σ^{2}) \leq {(β \frac{σ^{2}}{c^{2}} \exp (1 - β \frac{σ^{2}}{c^{2}}))}^{\frac{N}{2}} .

Similarly proceeding for (7.4) we get

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq {(α \frac{σ^{2}}{c^{2}} \exp (1 - α \frac{σ^{2}}{c^{2}}))}^{\frac{N}{2}} .

We need to simplify these equations. We will do some jugglery now. Consider the function

f (γ) = \frac{2 (γ - 1)^{2}}{(γ - 1) - \ln γ} \forall γ > 0.

By differentiating twice, we can show that this is a strictly increasing function. Let us have $γ \in (0, γ_{max}]$ . Define

κ^{*} = max (4, \frac{2 (γ_{max} - 1)^{2}}{(γ_{max} - 1) - \ln γ_{max}})

Clearly

κ^{*} \geq \frac{2 (γ - 1)^{2}}{(γ - 1) - \ln γ} \forall γ \in (0, γ_{max}] .

Which gives us:

\ln (γ) \leq (γ - 1) - \frac{2 (γ - 1)^{2}}{κ^{*}} .

Hence by exponentiating on both sides we get:

γ \leq \exp [(γ - 1) - \frac{2 (γ - 1)^{2}}{κ^{*}}] .

By slight manipulation we get:

γ \exp (1 - γ) \leq \exp [\frac{2 (1 - γ)^{2}}{κ^{*}}] .

We now choose

γ = α \frac{σ^{2}}{c^{2}}

Substituting we get:

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq {(γ \exp (1 - γ))}^{\frac{N}{2}} \leq \exp [\frac{N (1 - γ)^{2}}{κ^{*}}] .

Finally

c \geq σ ⟹ \frac{σ^{2}}{c^{2}} \leq 1 ⟹ γ \leq α ⟹ 1 - γ \geq 1 - α

Thus we get

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq \exp [\frac{N (1 - α)^{2}}{κ^{*}}] .

Similarly by choosing $γ = β \frac{σ^{2}}{c^{2}}$ proves the other bound.

We can now map $γ_{max}$ to some $β_{max}$ by:

γ_{max} = \frac{β_{max} σ^{2}}{c^{2}} .

This result tells us that given a vector with entries drawn from a subgaussian distribution, we can expect the norm of the vector to concentrate around its expected value $N σ^{2}$ .

Topics in Signal Processing

Subgaussian Distributions

Contents