7.4. Basic Inequalities#

Probability theory is full of inequalities. Many results are derived from the application of these inequalities. This section collects some basic inequalities.

A good reference is Wikipedia list of inequalities. In particular see the section on probability inequalities.

In this section we will cover the basic inequalities.

7.4.1. Markov’s inequality#

Theorem 7.18

Let X be a non-negative random variable and a>0. Then

P(Xa)E(X)a.

7.4.2. Chebyshev’s inequality#

Theorem 7.19

Let X be a random variable with finite mean μ and finite non-zero variance σ2. Then for any real number k>0, the following holds

P(|Xμ|kσ)1k2.

Proof. TBD.

Choosing k=2, we see that at least half of the values lie in the interval (μ2σ,μ+2σ).

7.4.3. Boole’s inequality#

This is also known as union bound.

7.4.4. Fano’s inequality#

7.4.5. Cramér–Rao inequality#

7.4.6. Hoeffding’s inequality#

This inequality provides an upper bound on the probability that the sum of random variables deviates from its expected value.

We start with a version of the inequality for i.i.d Bernoulli random variables.

Theorem 7.20

Let X1,,Xn be i.i.d. Bernoulli random variables with probability of success as p. E[iXi]=pn. The probability of the sum deviating from the mean
by ϵn for some ϵ>0 is bounded by

P(iXi(pϵ)n)exp(2ϵ2n)

and

P(iXi(p+ϵ)n)exp(2ϵ2n).

The two inequalities can be summarized as

P[(pϵ)niXi(p+ϵ)n]12exp(2ϵ2n).

The inequality states that the number of successes that we see is concentrated around its mean with exponentially small tail.

We now state the inequality for the general case for any (almost surely) bounded random variable.

Theorem 7.21

Let X1,,Xn be independent r.v.s. Assume that Xi are almost surely bounded; i.e.:

P(Xi[ai,bi])=1,1in.

Define the empirical mean of the variables as

X1n(X1++Xn).

Then the probability that X deviates from its mean E(X) by an amount t>0 is bounded by following inequalities:

P(XE(X)t)exp(2n2t2i=1n(biai)2)

and

P(XE(X)t)exp(2n2t2i=1n(biai)2).

Together, we have

P(|XE(X)|t)2exp(2n2t2i=1n(biai)2).

Note that we don’t require Xi to be identically distributed in this formulation. For the special case when Xi are i.i.d. uniform r.v.s over [0,1], then E(X)=E(Xi)=12 and

P(|X12|t)2exp(2nt2).

Clearly, X starts concentrating around its mean as n increases and the tail falls exponentially.

The proof of this result depends on what is known as Hoeffding’s Lemma.

Lemma 7.6

Let X be a zero mean r.v. with P(X[a,b])=1. Then

E[exp(tX)]exp(18t2(ba)2).

7.4.7. Jensen’s inequality#

Jensen’s inequality relates the value of a convex function of an integral to the integral of the convex function. In the context of probability theory, the inequality take the following form.

Theorem 7.22

Let f:RR be a convex function. Then

f(E[X])E[f(X)].

The equality holds if and only if either X is a constant r.v. or f is linear.

7.4.8. Bernstein inequalities#

7.4.9. Chernoff’s inequality#

This is also known as Chernoff bound.

7.4.10. Fréchet inequalities#