(sec:prob:intro)=
# Probability Spaces
Probability measures the amount of uncertainty in an event.
Typically, the probability of an event is expressed
as a nonnegative real number ranging between $0$ and $1$.
1. When an event has a probability $1$, we are certain that
the event will occur.
1. When an event has a probability $0$, we are certain that
the event will not occur.
1. When we toss a coin, we are not certain whether it will
turn heads (H) or tails (T). We normally assign
a probability of $0.5$ to both H and T outcomes as
we believe that both outcomes are equally likely.
1. When we throw a die, then the possible outcomes are
$\{ 1, 2, 3, 4, 5, 6 \}$. If we believe that each
outcome is equally likely, then we can assign
a probability of $\frac{1}{6}$ to each outcome.
1. If we have a fairly reliable digital communication
channel, then the probability of error may be
as low as $1e-6$. In other words, there is a
one in a million chance of a transmitted bit
flipping during the transmission.
Given a sample space of outcomes, there are two main
activities involved:
1. Assigning a probability to different outcomes or
events in a manner that the probabilities
are sensible.
1. Use the laws of probability theory to infer
the probabilities of other outcomes or events.
Our notes will be based on the axiomatic treatment of probability.
We describe the rules for assigning sensible probabilities
to different events. We then develop the theory for computing with
the probabilities. We leave out the task of estimating the
probabilities of individual events which is covered extensively
in statistics.
The foundations of modern probability theory are rooted in
the measure theory which is a study of measures.
A measure is a generalization of geometric notions like
length, area and volume. The probability of a set is
also a measure.
Although we don't provide extensive coverage of
measure theory in these notes,
we cover some fundamental concepts like $\sigma$-algebra
in our introduction to probability theory.
For an introduction to probability theory, see
{cite}`ross2014introduction,ross2014first,bertsekas2008introduction,papoulis2002probability`.
For engineering applications, see
{cite}`stark2012probability`.
For a comprehensive measure theoretic treatment,
see {cite}`billingsley2012probability`.
```{index} Random experiment
```
```{prf:definition} Random experiment
:label: def-prob-random-experiment
A *random experiment* is an experiment in which the
outcomes are nondeterministic.
In other words, different outcomes can occur each time
the experiment is run.
```
## Sample Space
```{index} Sample space
```
```{prf:definition} Sample space
:label: def-prob-sample-space
The *sample space* associated with a random experiment
is the set of all possible outcomes of the experiment.
```
We shall often denote the sample space by the symbol
$\Omega$. Individual outcomes shall often be denoted
by $\zeta$.
## Sigma Algebras and Fields
```{index} Field; probability, Algebra; probability
```
```{prf:definition} Field, Algebra
:label: def-prob-field
Consider a sample space $\Omega$
and a certain collection of subsets of $\Omega$ denoted by $\FFF$.
We say that $\FFF$ forms an *algebra* (in the set theoretic sense)
over $\Omega$ if meets the following rules:
1. $\EmptySet \in \FFF$.
1. If $A, B \in \FFF$ then $A \cup B \in \FFF$ and $A \cap B \in \FFF$.
1. If $E \in \FFF$ then $\Omega \setminus E = E^c \in \FFF$.
In other words,
$\FFF$ contains the empty set,
$\FFF$ is closed under union, intersection, complement operations.
The pair $(\Omega, \FFF)$ is known as a *field*.
```
```{note}
The term *algebra* is used here in the sense of Boolean
algebra from set theory whose operations include
union, intersection and complement.
The notion of *field* here is different from the notion
of fields (e.g. $\RR$ and $\CC$)
in ring theory.
Similarly the term algebra over $\Omega$ should not
be confused with algebras over fields or rings in ring theory.
```
```{prf:example} trivial algebra
:label: ex-prob-algebra-1
Let $\Omega$ be an arbitrary sample space.
Then, $\{ \EmptySet, \Omega \}$ is a trivial algebra over $\Omega$.
```
```{prf:example} algebras over a set of 4 elements
:label: ex-prob-algebra-4
Let $\Omega = \{ 1, 2, 3, 4 \}$.
1. $\{\EmptySet, \{ 1, 2, 3, 4 \} \}$ is an algebra.
1. $\{\EmptySet, \{ 1, 2\}, \{3, 4 \}, \{ 1, 2, 3, 4 \} \}$ is an algebra.
1. $\{\EmptySet, \{ 1\}, \{2, 3, 4 \}, \{ 1, 2, 3, 4 \} \}$ is an algebra.
1. The power set consisting of all subsets of $\Omega$ is an algebra.
```
```{prf:theorem} Properties of an algebra
:label: res-prob-algebra-props
Let $\FFF$ be an algebra over $\Omega$. Then
1. $\Omega \in \FFF$.
1. If $A_1, \dots, A_n \in \FFF$, then $A_1 \cup \dots \cup A_n \in \FFF$.
1. If $A_1, \dots, A_n \in \FFF$, then $A_1 \cap \dots \cap A_n \in \FFF$.
1. If $A, B \in \FFF$ then $A \setminus B \in \FFF$.
```
$\FFF$ includes the sample space.
$\FFF$ is closed under finite unions and finite intersections.
$\FFF$ is closed under set difference.
```{prf:proof}
(1) Sample space
1. $\EmptySet \in \FFF$.
1. $\Omega = \EmptySet^c$.
1. Hence $\Omega \in \FFF$.
(2) Closure under finite union by mathematical induction
1. Base case: for $n=2$ is trivial.
1. Assume that the statement is true for some $n \geq 2$.
1. Let $A_1, \dots, A_n, A_{n+1} \in \FFF$.
1. By inductive hypothesis $A = A_1 \cup \dots \cup A_n \in \FFF$.
1. Then
$$
A_1 \cup \dots \cup A_n \cup A_{n + 1}
= (A_1 \cup \dots \cup A_n) \cup A_{n + 1}
= A \cup A_{n + 1}.
$$
1. Since both $A, A_{n + 1} \in \FFF$, hence there union
is also in $\FFF$.
1. Hence by mathematical induction, $\FFF$ is closed under
all finite unions.
(3) Closure under finite intersection
1. We note that
$$
A_1 \cap \dots \cap A_n = (A_1^c \cup \dots \cup A_n^c)^c.
$$
1. Since $A_i \in \FFF$, hence $A_i^c \in \FFF$ for $i=1,\dots,n$.
1. Then $A = A_1^c \cup \dots \cup A_n^c \in \FFF$ due to (2).
1. Then $A_1 \cap \dots \cap A_n = A^c \in \FFF$.
(4) Set difference
1. We note that $A \setminus B = A \cap B^c$.
1. Since $B \in \FFF$, hence $B^c \in \FFF$.
1. Since $A, B^c \in \FFF$, hence $A \cap B^c \in \FFF$.
```
```{prf:example} Algebra from a partition
:label: ex-prob-algebra-from-partition
Let $\Omega$ be a sample space. Let
$A = \{ A_1, \dots, A_n \}$ be a (finite) partition of $\Omega$.
In other words, $A_i$ are pairwise disjoint and their
union is $\Omega$. Then the collection $\FFF$
consisting of all unions of the sets $A_i$
(including the empty set which is the union of zero sets)
forms an algebra.
1. Since there are $n$ sets in the partition $A$, hence
number of elements of $\FFF$ is $2^n$.
1. By definition $\EmptySet$ and $\Omega$ are in $\FFF$.
1. Let $X, Y \in \FFF$. Then both $X$ and $Y$ are unions
of some members of $A$.
1. Hence $X \cup Y$ is also a union of some members of $A$.
1. Hence $\FFF$ is closed under union.
1. If $X$ and $Y$ are disjoint then $X \cap Y \in \FFF$.
1. Otherwise, $X \cap Y$ is the union of some members of $A$
which are common in both $X$ and $Y$.
Hence $X \cap Y \in \FFF$.
1. Hence $\FFF$ is closed under intersection.
1. Let $X \in \FFF$. Then $X$ is union of some members of $A$.
1. Then $\Omega \setminus X$ is the union of remaining members of $A$.
1. Hence $\Omega \setminus X \in \FFF$.
1. Hence $\FFF$ is closed under complement.
1. Hence $\FFF$ is an algebra.
```
Often, the sample space $\Omega$ is an infinite set (e.g., $\RR$)
and the field of subsets of $\Omega$ is also infinite.
For example, the Borel field defined over $\RR$ contains
all the open and closed intervals in $\RR$.
We have to deal with situations which
involve countable unions and intersections of sets in the field.
Mathematical induction shows that a field is
closed under finite unions and intersections
but it is not enough to prove that it is also
closed under countable unions.
To handle such cases, we need to extend the
definition of a field.
```{index} $\sigma$ field, $\sigma$ algebra
```
```{prf:definition} $\sigma$-field, $\sigma$-algebra
:label: def-prob-sigma-field
Consider an infinite sample space $\Omega$
and a certain collection of subsets of $\Omega$ denoted by $\FFF$.
We say that $\FFF$ is a $\sigma$-*algebra* over $\Omega$ if
it is an algebra over $\Omega$ and it is closed
under countable unions.
In other words, if $A_1, A_2, \dots$ is a countable
collection of sets in $\FFF$, then, their union
$$
\bigcup_{i=1}^{\infty} A_i \in \FFF.
$$
The pair $(\Omega, \FFF)$ is known as a $\sigma$-*field*.
```
When the sample space is obvious from the context, we will
often call $\FFF$ also as a field or a $\sigma$-field
as appropriate.
```{prf:example} Power set
:label: ex-prob-power-set-field
Let $\Omega$ be an arbitrary sample space.
Then its power set $\PPP(\Omega)$ is a $\sigma$-algebra.
```
```{prf:theorem} Countable intersection
:label: res-prob-sigma-algebra-countable-intersection
Let $\FFF$ be a $\sigma$-algebra over $\Omega$.
Then $\FFF$ is closed under countable intersection.
In other words, if $A_1, A_2, \dots$ is a countable
collection of sets in $\FFF$, then, their intersection
$$
\bigcap_{i=1}^{\infty} A_i \in \FFF.
$$
```
```{prf:proof}
We use the complement property.
1. Let $A_1, A_2, \dots$ be subsets in $\FFF$.
1. Then $A_1^c, A_2^c, \dots$ are also in $\FFF$.
1. Then their countable union:
$$
\bigcup_{i=1}^{\infty} A_i^c \in \FFF.
$$
1. Taking complement, we get
$$
\bigcap_{i=1}^{\infty} A_i \in \FFF
$$
as desired.
```
```{prf:remark}
:label: rem-prob-sigma-field-bounds
Any $\sigma$-algebra $\FFF$ of subsets of a sample space $\Omega$
likes between the two extremes:
$$
\{ \EmptySet, \Omega \} \subseteq \FFF \subseteq \PPP(\Omega).
$$
```
### Generated $\sigma$ Algebra
```{index} Atom; $\sigma$-algebra
```
```{prf:definition} atom
:label: def-prob-atom
An *atom* of $\FFF$ is a set $A \in \FFF$ such that
the only subsets of $A$ which are also in $\FFF$
are the empty set $\EmptySet$ and $A$ itself.
```
```{prf:theorem} Intersection of $\sigma$-algebras
:label: res-prob-intersection-sigma-algebras
Let $\GGG = \{ \GGG_i \}_{i \in I}$
be a nonempty collection of $\sigma$-algebras over $\Omega$
where $I$ is some index set.
Then their intersection $\bigcap_{i \in I} \GGG_i$ is
also a $\sigma$-algebra.
```
```{prf:proof}
Denote $\FFF = \bigcap_{i \in I} \GGG_i$. We shall
verify all the properties of a $\sigma$-algebra.
Empty Set
1. Since $\EmptySet \in \GGG_i$ for every $i$, hence
$\EmptySet \in \FFF$.
Union
1. Let $A, B \in \FFF$.
1. Then $A, B \in \GGG_i$ for every $i$.
1. Hence $A \cup B \in \GGG_i$ for every $i$.
1. Hence $A \cup B \in \FFF$.
Intersection
1. Let $A, B \in \FFF$.
1. Then $A, B \in \GGG_i$ for every $i$.
1. Hence $A \cap B \in \GGG_i$ for every $i$.
1. Hence $A \cap B \in \FFF$.
Countable union
1. Let $A_1, A_2, \dots$ be a countable collection of subsets
in $\FFF$.
1. Then $A_1, A_2, \dots \in \GGG_i$ for every $i$.
1. Since each $\GGG_i$ is a $\sigma$-algebra, hence
$\bigcup_{j=1}^{\infty} A_j \in \GGG_i$ for every $i$.
1. Hence $\bigcup_{j=1}^{\infty} A_j \in \FFF$.
```
```{index} Generated sigma algebra
```
```{prf:definition} $\sigma$-algebra generated by a collection of sets
:label: res-prob-generated-field
Let $\AAA = \{ A_i \}_{i \in I}$ be a collection of subsets of $\Omega$
where $I$ is an index set.
Let $\FFF$ be the smallest $\sigma$-algebra such that
$A_i \in \FFF$ for every $i \in I$.
Then $\FFF$ is called the $\sigma$-*algebra generated by* $\AAA$
and is denoted by $\sigma(\AAA)$.
```
1. Here by smallest, we mean that if there is any other $\sigma$-algebra
$\GGG$ such that $A_i \in \GGG$ for every $i \in I$, then
$\FFF \subseteq \GGG$.
1. Since the power set $\PPP(\Omega)$ is a $\sigma$ algebra
and it contains all subsets of $\Omega$ (including $\AAA$)
hence there is always a $\sigma$-algebra containing a given
collection of sets.
1. It may not be possible to visualize every member of $\FFF$ easily
from the descriptions of $A_i$.
We next show that a smallest $\sigma$-algebra exists
for every collection $\AAA$.
```{prf:theorem} Existence of the generated $\sigma$-algebra
:label: res-prob-existence-generated-algebra
Let $\AAA$ be a collection of subsets of $\FFF$. Then
there exists a smallest $\sigma$-algebra containing $\AAA$.
In other words, there is a $\sigma$-algebra generated by $\AAA$.
```
```{prf:proof}
The main issue is to verify that if there is any other
$\sigma$-algebra, it contains the smallest one.
We provide a constructive proof.
1. Let $\GGG = \{ \GGG_j \}_{j \in J}$
be the collection of all $\sigma$-algebras
containing all sets of $\AAA$.
1. We can see that $\GGG$ is nonempty since $\PPP(\Omega) \in \GGG$.
1. By {prf:ref}`res-prob-intersection-sigma-algebras`,
$\FFF = \bigcap_{j \in J} \GGG_j$ is a $\sigma$-algebra.
1. We claim that $\sigma (\AAA) = \FFF$.
1. Since $\AAA \subseteq \GGG_j$ for every $j \in J$, hence
$\AAA \subseteq \FFF$.
1. By construction $\FFF \subseteq \GGG_j$ for every $j \in J$.
In other words, $\FFF$ is contained in every $\sigma$-algebra
containing $\AAA$.
1. Hence $\FFF$ is indeed the $\sigma$-algebra generated by $\AAA$.
We note that if $\AAA$ itself is a $\sigma$-algebra, then
of course $\sigma(\AAA) = \AAA$.
```
```{prf:example} Algebra generated by 2 sets
:label: ex-prob-gen-algebra-2-sets
Let $\Omega$ be an arbitrary sample space. Let $A$ and $B$
be two subsets of $\Omega$ which are not necessarily disjoint.
We shall construct the algebra $\FFF$ generated by $A$ and $B$.
1. Since $A \in \FFF$, hence $A^c \in \FFF$.
1. Similarly, $B^c \in \FFF$.
1. Since $A$ and $B$ and their complements are in $\FFF$,
hence $AB, AB^c, A^c B, A^c B^c \in \FFF$
as $\FFF$ is closed under intersection.
1. Let us name them $E = AB$, $F = AB^c$, $G = A^c B$ and $H = A^c B^c$.
1. We can see that these four sets $E, F, G, H$
are disjoint and
$$
\Omega = E \cup F \cup G \cup H.
$$
1. We now have a partition of $\Omega$ into $4$ disjoint sets.
We can follow the lead from {prf:ref}`ex-prob-algebra-from-partition`
to construct an algebra
by constructing all the unions of $0$ or more sets from the
collection $\{ E, F, G, H \}$.
1. The empty set $\EmptySet$ doesn't contain any of these sets ${4 \choose 0}$.
1. There are ${4 \choose 1} = 4$ of these disjoint subsets.
1. There are ${4 \choose 2} = 6$ pair-wise unions of these $4$ sets.
1. There are ${4 \choose 3} = 4$ unions of $3$ of these $4$ subsets.
1. There is ${4 \choose 4} = 1$ union of all the $4$ subsets which is $\Omega$.
1. A total of $1 + 4 + 6 + 4 + 1 = 16 = 2^4$ possible subsets are formed.
1. In particular, note that $A = E \cup F$, $B = E \cup G$,
$A^c = G \cup H$ and $B^c = F \cup H$.
1. We can see that this collection of $16$ subsets of $\Omega$ is
an algebra following {prf:ref}`ex-prob-algebra-from-partition`.
1. This is indeed the smallest algebra containing
$A$ and $B$ as any other algebra must contain $\FFF$.
1. Hence it is the algebra generated by $A$ and $B$.
1. We can see that $E, F, G, H$ are the atoms of the algebra $\FFF$.
```
### Dynkin $\pi-\lambda$ theorem
When constructing probability measures
(see {ref}`sec:prob:probability:measure`),
it is generally impossible to assign
a probability to each subset in a $\sigma$-algebra.
The Carathéodory extension theorem allows
us to define a measure explicitly for only a small
collection of simple sets, which may or may not
form a $\sigma$-algebra (e.g. the atoms inside
a $\sigma$-algebra) and automatically extend
the measure to all other sets in the algebra.
The uniqueness claim in the extension theorem
makes use of Dynkin $\pi-\lambda$ theorem
(below). Readers may skip this subsection
in the first reading.
```{index} Pi system
```
```{prf:definition} $\pi$ system
:label: def-prob-pi-system
Let $\Omega$ be a sample space. A collection $P$
of subsets of $\Omega$ is a $\pi$-*system* if
$P$ is closed under finite intersections.
In other words, if $A, B \in P$, then $A \cap B \in P$.
```
```{index} Lambda system
```
```{prf:definition} $\lambda$ system
:label: def-prob-lambda-system
Let $\Omega$ be a sample space. A collection $L$
of subsets of $\Omega$ is a $\lambda$-*system* if
1. $L$ contains the empty set $\EmptySet$
1. $L$ is closed under complements:
if $A \in L$ then $A^c \in L$
1. $L$ is closed under countable *disjoint* union:
if $A_1, A_2, \dots \in L$ and $A_i \cap A_j = \EmptySet$
for every $i \neq j$, then $\bigcup_{i=1}^{\infty} A_i \in L$.
```
```{prf:theorem} Dynkin $\pi-\lambda$ theorem
:label: res-prob-dynkin-pi-lambda
If $P$ is a $\pi$ system and $L$ a $\lambda$-system
of subsets of $\Omega$ with $P \subseteq L$ then
$$
\sigma(P) \subseteq L.
$$
In other words, the $\sigma$-algebra generated by $P$
is contained in $L$.
```
The proof of this result is involved.
The overall strategy works as follows:
1. We shall construct a $\sigma$-algebra
that lies between $P$ and $L$.
1. In particular, we shall construct the
set $l(P)$ which is the intersection
of all $\lambda$-systems containing $P$.
1. We then show that $l(P)$ is a $\lambda$-system.
1. We then show that $l(P)$ is also a $\pi$-system.
1. We show that a collection which is both a $\lambda$-system
and a $\pi$-system is also a $\sigma$-algebra.
1. Thus, we claim that $l(P)$ is a $\sigma$-algebra.
1. We finally show that $\sigma(P) \subseteq l(P) \subseteq L$.
In order to prove this result, we first prove
some of the intermediate steps individually.
```{prf:lemma} Closedness under proper differences
:label: res-prob-lambda-sys-proper-diff
A $\lambda$-system is closed under proper differences.
In other words, if $A, B \in L$ where $L$ is a
$\lambda$-system and $A \subseteq B$ then the difference
$B \setminus A$ is also in $L$.
```
```{prf:proof}
We note that $B \setminus A = B \cap A^c$. We shall show
this as a complement of the disjoint union of sets.
1. Since $B \in L$, hence $B^c \in L$.
1. Since $A \subseteq B$, hence $A$ and $B^c$ are disjoint.
1. Hence $D = A \cup B^c \in L$ since it is a disjoint union.
1. Hence $D^c = A^c \cap B$ in $L$.
1. But $B \setminus A = B \cap A^c = D^c$.
1. Hence $B \setminus A \in L$.
```
```{prf:lemma} $\pi + \lambda \implies \sigma$
:label: res-prob-pi-lambda-sigma
A family of subsets of $\Omega$ which is both a $\pi$-system
and a $\lambda$-system is a $\sigma$-algebra.
```
```{prf:proof}
Let $S$ be a family of subsets of $\Omega$ which is
both a $\pi$-system and a $\lambda$-system.
1. $\EmptySet \in S$ since it is a $\lambda$-system.
1. $S$ is closed under finite intersections since it is a
$\pi$-system.
We just need to show that it is closed under countable
unions of (not necessarily disjoint) sets.
1. Let $A_1, A_2, \dots \in S$.
1. We shall write $\bigcup_{i=1}^{\infty} A_i$ as a
countable union of disjoint sets.
1. Let $B_1 = A_1$.
1. For $n \geq 2$, let
$$
B_n = A_n \setminus (A_1 \cup A_2 \cup \dots \cup A_{n-1})
= A_n \cap A_1^c \cap A_2^c \cap \dots \cap A_{n-1}^c.
$$
In other words, $B_n$ consists of all elements of $A_n$
which don't appear in any of the sets $A_1, \dots, A_{n-1}$.
1. Then $B_1, B_2, \dots$ is a sequence of disjoint sets.
1. We can also see that $A_1 \cup \dots \cup A_n = B_1 \cup \dots \cup B_n$
for every $n$.
1. Hence $\bigcup_{i=1}^{\infty} A_i = \bigcup_{i=1}^{\infty} B_i$.
1. Since $S$ is a $\lambda$-system, hence $A_i^c \in S$ for every $i$.
1. Since $S$ is a $\pi$-system, hence $B_i \in S$ for every $i$.
1. Hence $\bigcup_{i=1}^{\infty} B_i \in S$ as it is a countable
union of disjoint sets.
```
```{prf:lemma} $\lambda$-co-systems
:label: res-prob-dynkin-lem-3
Suppose $L'$ is a $\lambda$-system of subsets of $\Omega$.
For any set $A \in L'$, let $S_A$ be the set of all
$B \in \Omega$ for which $A \cap B \in L'$.
Then $S_A$ is a $\lambda$-system.
```
```{prf:proof}
Empty set
1. Since $\EmptySet = A \cap \EmptySet \in L'$, hence
$\EmptySet \in S_A$.
Countable union of disjoint sets
1. Let $B_1, B_2, \dots \in S_A$ be a sequence of
disjoint subsets in $S_A$.
1. Then $A \cap B_i \in L'$ for every $i$.
1. Then $\bigcup (A \cap B_i) \in L'$
since $A \cap B_i$ are also disjoint.
1. Also $\bigcup (A \cap B_i) = A \cap (\bigcup B_i)$.
1. Since $A \cap (\bigcup B_i) \in L'$, hence
$\bigcup B_i \in S_A$.
1. Hence $S_A$ is closed under countable union of disjoint sets.
Complements
1. Let $B \in S_A$. Then $A \cap B \in L'$.
1. Now $A \cap B^c = A \setminus B = A \setminus (A \cap B)$.
1. Since $A \in L'$ and $A \cap B \in L'$
and $A \cap B \subseteq A$, hence
due to {prf:ref}`res-prob-lambda-sys-proper-diff`,
$A \setminus (A \cap B) \in L'$.
1. Since $A \cap B^c \in L'$ hence $B^c \in S_A$.
1. Hence $S_A$ is closed under complements.
Thus, $S_A$ is indeed a $\lambda$-system.
```
```{prf:lemma}
:label: res-prob-dynkin-lem-4
Let $l(P)$ be the intersection of all $\lambda$-systems
containing $P$. Then $l(P)$ is a $\lambda$-system.
```
```{prf:proof}
We can easily verify that $l(P)$ satisfies all the properties
of a $\lambda$-system.
1. Let $\LLL = \{L_i \}$ be the collection of all $\lambda$-systems
containing $P$.
1. Then $l(P) = \bigcap L_i$.
1. Since $\EmptySet \in L_i$ for every $i$, hence
$\EmptySet \in l(P)$.
1. Let $A \in l(P)$.
1. Then $A \in L_i$ for every $i$.
1. Hence $A^c \in L_i$ for every $i$.
1. Hence $A^c \in l(P)$.
1. Let $A_1,A_2, \dots \in l(P)$ be a collection
of pairwise disjoint sets.
1. Then $A_1,A_2,\dots \in L_i$ for every $i$.
1. Hence $\bigcup A_i \in L_i$ for every $i$.
1. Hence $\bigcup A_i \in l(P)$.
```
```{prf:lemma}
:label: res-prob-dynkin-lem-5
Let $l(P)$ be the intersection of all $\lambda$-systems
containing $P$. Then $l(P)$ is a $\pi$-system.
```
```{prf:proof}
The proof uses a *bootstrap* argument often used in
measure theory.
We first show that for any $A \in P$ and $B \in l(P)$,
$A \cap B \in l(P)$.
1. Let $A \in P$.
1. Then $A \in l(P)$.
1. Let $S_A$ be the set of all sets $B \subseteq \Omega$
such that $A \cap B \in l(P)$.
1. By {prf:ref}`res-prob-dynkin-lem-3`, $S_A$
is a $\lambda$-system.
1. Let $B \in P$. Then $A \cap B \in P$ since $P$ is a $\pi$-system.
1. But $P \subseteq l(P)$.
1. Hence $A \cap B \in l(P)$.
1. Hence $B \in S_A$.
1. Hence $P \subseteq S_A$.
1. Thus, $S_A$ is a $\lambda$-system containing $P$.
1. Hence $l(P) \subseteq S_A$ as $l(P)$ is the intersection
of all $\lambda$-systems containing $P$.
1. Thus, for any $A \in P$ and for any $B \in l(P)$,
the intersection $A \cap B \in l(P)$.
1. $B \in l(P) \implies B \in S_A$.
1. $B \in S_A \implies A \cap B \in l(P)$.
We now show that for any $A, B \in l(P)$,
$A \cap B \in l(P)$.
1. Consider any $B \in l(P)$.
1. Let $S_B$ be the set of all sets $C \subseteq \Omega$
such that $B \cap C \in l(P)$.
1. By preceding argument $P \subseteq S_B$.
1. Let $A \in P$.
1. Since $A \in P$ and $B \in l(P)$
hence $A \cap B \in l(P)$.
1. Hence $A \in S_B$.
1. Hence $P \subseteq S_B$.
1. By {prf:ref}`res-prob-dynkin-lem-3`, $S_B$
is a $\lambda$-system.
1. Therefore $l(P) \subseteq S_B$.
1. This means that for any $A \in l(P)$, the intersection
$A \cap B \in l(P)$.
1. $A \in l(P) \implies A \in S_B$.
1. $A \in S_B \implies B \cap A \in l(P)$.
Thus, $l(P)$ is closed under intersections and
is indeed a $\pi$-system.
```
We are now ready to prove the Dynkin $\pi-\lambda$ theorem
({prf:ref}`res-prob-dynkin-pi-lambda`).
```{prf:proof}
Let $l(P)$ be the intersection of all $\lambda$-systems
containing $P$.
1. By hypothesis $L$ is a $\lambda$-system
containing $P$.
1. By definition $l(P) \subseteq L$.
1. By {prf:ref}`res-prob-dynkin-lem-4`,
$l(P)$ is a $\lambda$-system.
1. By {prf:ref}`res-prob-dynkin-lem-5`,
$l(P)$ is a $\pi$-system.
1. By {prf:ref}`res-prob-pi-lambda-sigma`,
$l(P)$ is a $\sigma$-algebra.
1. By definition $\sigma(P)$ is the smallest
$\sigma$-algebra containing $P$.
1. Hence $P \subseteq \sigma(P) \subseteq l(P) \subseteq L$.
```
### Borel $\sigma$-Algebra
Recall the notions of topology (open sets, closed sets, etc.)
on the real line from {ref}`sec:bra:real-line-topology`.
If we consider the collection of open sets
of $\RR$, denoted by $\OOO$, then it is
clear that it is not a $\sigma$-algebra
as it is not closed under complements.
We are interested in the $\sigma$-algebra
generated by $\OOO$.
By {prf:ref}`res-prob-existence-generated-algebra`
we know that such a $\sigma$-algebra exists.
```{index} Borel field, Borel algebra
```
```{prf:definition} Borel $\sigma$-algebra
:label: def-prob-borel-field
The $\sigma$-algebra generated by the open sets of the
real line $\RR$ is known as the
*Borel* $\sigma$-*algebra* of $\RR$
and is denoted by $\BBB$.
In other words, if $\OOO$ is the collection
of all open subsets of $\RR$, then
$$
\BBB = \sigma(\OOO).
$$
The pair $(\RR, \BBB)$ is known as the
*Borel field*.
The members of the Borel $\sigma$-algebra
are known as *Borel sets*.
```
Since $\BBB$ is a $\sigma$-algebra, it contains
all open sets, all closed sets,
all countable unions of closed sets,
all countable intersections of open sets.
There are other subsets of real line which are not
included in the Borel $\sigma$-algebra. However, they don't
happen to be of much engineering and scientific
interest.
```{prf:example} Examples of Borel sets
:label: ex-prob-borel-sets-1
Following are some examples of Borel sets:
1. $\{ x \}$ for any $x \in \RR$.
1. The set of rational numbers.
1. Any countable subset of $\RR$.
1. The intervals $(0,1)$, $[0,1]$, $(0, 1]$, $[1,0)$.
1. $[1,2] \cup [4,5]$.
```
```{prf:definition} $G_{\delta}$-set
:label: def-prob-g-delta
A countable intersection of open sets is known as
a $G_{\delta}$-set.
```
```{prf:definition} $F_{\sigma}$-set
:label: def-prob-f-sigma
A countable union of closed sets is known as
a $F_{\sigma}$-set.
```
We recall that a countable intersection of open
sets need not be open and a countable union
of closed sets need not be closed.
However $\BBB$ contains all the $G_{\delta}$
and $F_{\sigma}$ sets since it is a $\sigma$-algebra.
We haven't yet shown that $\BBB \neq 2^{\RR}$.
In other words, there exist non-Borel subsets of
$\RR$. There are other characterizations of the
Borel $\sigma$-algebra which provide us with
a better understanding of its structure.
```{prf:theorem} Generation from one-sided closed intervals
:label: res-prob-borel-gen-intervals
The Borel $\sigma$-algebra $\BBB$ is generated
by intervals of the form $(-\infty, a]$, where
$a \in \QQ$ is a rational number.
```
```{prf:proof}
.
1. Let $\OOO_0$ denote the collection of all open intervals.
1. By {prf:ref}`res-rl-open-set-countable-union`, every
open set in $\RR$ is an at most countable union
of open intervals.
1. Hence $\sigma(\OOO_0) = \BBB$.
1. Let $\DDD$ denote the collection of all intervals
of the form $(-\infty, a], a \in \QQ$.
1. Let $(a,b) \in \OOO_0$ for some $a < b$.
1. Let $a_n$ be a rational number in $(a, a + \frac{1}{n})$.
We can see that $a_n \downarrow a$ as $n \to \infty$.
1. Let $b_n$ be a rational number in $(b - \frac{1}{n}, b)$.
We can see that $b_n \uparrow b$ as $n \to \infty$.
1. Thus,
$$
(a, b) = \bigcup_{n=1}^{\infty}(a_n, b_n]
= \bigcup_{n=1}^{\infty}
\{ (-\infty, b_n] \cap (-\infty, a_n]^c \}.
$$
1. Hence $(a, b) \in \sigma(\DDD)$.
1. Hence $\OOO_0 \subseteq \sigma(\DDD)$.
1. Hence $\sigma(\OOO_0) \subseteq \sigma(\DDD)$.
1. However, every element of $\DDD$ is a closed set.
1. Hence $\sigma(\DDD) \subseteq \BBB$.
1. We have
$$
\BBB \subseteq \sigma(\OOO_0) \subseteq \sigma(\DDD)
\subseteq \BBB.
$$
1. Hence $\BBB = \sigma(\DDD)$.
```
## Events
```{index} Event; probability
```
```{prf:definition} Event
:label: def-prob-event
An *event* is a subset of the sample space of
a random experiment that belongs to some
$\sigma$-algebra defined on it.
Let $\Omega$ be the sample space of a random experiment.
Let $\FFF$ be a $\sigma$-algebra of subsets of $\Omega$.
Then every member of $\FFF$ is an event.
```
```{note}
Events are the subsets of the
sample space to which probabilities can
be assigned.
For finite sample spaces, any subset can
be an event. For infinite sample spaces,
it may not be possible to meaningfully
assign probabilities to every possible
subset. We need the notion of a $\sigma$-algebra
which is a collection of subsets of the
sample space satisfying closure under countable unions,
intersections and complements.
The subsets belonging to a $\sigma$-algebra
can be assigned probabilities and are
events.
```
We can translate the set-theoretic language
to the language of events as follows.
Let $A, B$ be two different events.
1. $A$ doesn't occur is denoted by $A^c$.
1. Either $A$ or $B$ occur is denoted by $A \cup B$.
1. Both $A$ and $B$ occur is denoted by $A B$.
1. $A$ occurs and $B$ doesn't occur is denoted by $A \setminus B$.
This can also be denoted as $A B^c$.
1. The events $A$ and $B$ are *exhaustive* if
$\Omega = A \cup B$.
In particular $A \cup A^c = \Omega$.
1. $A$ and $B$ events are exclusive if $A B = \EmptySet$.
```{index} Singleton event, Elementary event
```
```{prf:definition} Elementary event
:label: def-prob-elementary-event
An event consisting of only one outcome
is called a *singleton event* or an *elementary event*.
```
```{index} Certain event
```
```{prf:definition} Certain event
:label: def-prob-certain-event
The sample space $\Omega$ is known as the *certain event*.
```
Since $\Omega$ contains all possible outcomes of an
experiment, hence this event always occurs whenever
the experiment is run.
```{index} Null event
```
```{prf:definition} Null event
:label: def-prob-null-event
The empty set $\EmptySet$ is known as the *null event*.
```
The null event never occurs.
```{index} Mutually exclusive events
```
```{prf:definition} Mutually exclusive events
:label: def-prob-mutually-exclusive-events
Let $E$ and $F$ be two events. If $E$ and $F$ are disjoint sets
then we say that the two events are mutually exclusive.
```
(sec:prob:probability:measure)=
## Probability Measure and Space
We next provide an axiomatic definition of a probability measure.
Note that we will often write a joint event (intersection of two
events)
as $A B$ rather than $A \cap B$.
```{index} Probability measure
```
```{prf:definition} Probability measure
:label: def-prob-probability-measure
Let $\Omega$ be a sample space and
let $\FFF$ be a $\sigma$ algebra of subsets of $\Omega$.
A *probability measure* is a set function $\PP : \FFF \to \RR$
that assigns to every event $E \in \FFF$ a real number
$\PP(E)$ called the probability of the event $E$ satisfying
the following rules:
1. Nonnegativity: $\PP(E) \geq 0$.
1. Unit measure or normalization: $\PP(\Omega) = 1$.
1. Additivity: $\PP(E \cup F) = \PP(E) + \PP(F)$ if $E F = \EmptySet$.
```
We can write it in the form of axioms
(first introduced by Andrey Kolmogorov).
```{prf:axiom} First axiom: nonnegativity
:label: ax-prob-nonnegativity
The probability of an event is a nonnegative real number.
$$
\PP(E) \in \RR, \PP(E) \geq 0 \quad \Forall E \in \FFF.
$$
```
This axiom implies that the probability is always finite.
```{prf:axiom} Second axiom: unit measure
:label: ax-prob-unit-measure
$$
\PP(\Omega) = 1.
$$
```
```{prf:axiom} Third axiom: additivity
:label: ax-prob-additivity
Let $E, F \in \FFF$ be two disjoint sets. Then
$$
\PP(E \cup F) = \PP(E) + \PP(F).
$$
```
### Probability Space
```{index} Probability space
```
```{prf:definition} Probability space
:label: def-prob-probability-space
A sample space endowed with a $\sigma$-algebra and
a probability measure is known as a *probability space*.
In other words,
let $\Omega$ be the sample space of a random experiment,
let $\FFF$ be a $\sigma$ algebra of subsets of $\Omega$
and let $\PP$ be a probability measure defined on $\FFF$.
Then the triplet $(\Omega, \FFF, \PP)$ is known as
a probability space.
```
We next establish some basic facts about a probability measure.
### Basic Properties
```{prf:theorem} Properties of a probability measure
:label: res-prob-prob-measure-props
Let $(\Omega, \FFF, \PP)$ be a probability space.
Then for the events contained in $\FFF$ satisfy
the following properties.
1. Probability of null event: $\PP(\EmptySet) = 0$.
1. $\PP(E F^c) = \PP(E) - \PP(EF)$.
1. Complement rule: $\PP(E) = 1 - \PP(E^c)$.
1. Sum rule: $\PP(E \cup F) = \PP(E) + \PP(F) - \PP(EF)$.
1. Monotonicity: If $E \subseteq F$, then
$$
\PP (E)\leq \PP(F).
$$
1. Numeric bound:
$$
0 \leq \PP(E) \leq 1.
$$
1. Finite additivity: For any positive integer $n$, we have
$$
\PP\left ( \bigcup_{i=1}^n E_i \right) = \sum_{i=1}^n \PP(E_i)
$$
if $E_1, E_2, \dots, E_n$ are pairwise disjoint events.
```
```{prf:proof}
(1)
1. $\EmptySet$ and $\Omega$ are disjoint.
1. Hence
$$
\PP(\EmptySet \cup \Omega) = \PP(\EmptySet) + \PP(\Omega).
$$
1. This simplifies to
$$
1 = \PP(\EmptySet) + 1.
$$
1. Hence $\PP(\EmptySet) = 0$.
(2)
1. Recall that $E F^c$ and $EF$ are disjoint sets
with $E F^c \cup EF = E$.
1. Hence
$$
\PP(E) = \PP(E F^c) + \PP(E F).
$$
1. Hence
$$
\PP(E F^c) = \PP(E) - \PP(EF).
$$
(3)
1. $E$ and $E^c$ are disjoint with $E \cup E^c = \Omega$.
1. Hence
$$
1 = \PP(\Omega) = \PP(E \cup E^c) = \PP(E) + \PP(E^c).
$$
1. Hence $\PP(E) = 1 - \PP(E^c)$.
(4)
1. Recall that we can split $E \cup F$ into disjoint sets
$$
E \cup F = EF^c \cup EF \cup E^c F.
$$
1. By additivity, we have
$$
\PP(E \cup F) &= \PP(EF^c \cup (EF \cup E^c F))\\
&= \PP(EF^c) + \PP(EF \cup E^c F)\\
&= \PP(EF^c) + \PP(EF) + \PP(E^c F)\\
&= \PP(E) - \PP(EF) + \PP(EF) + \PP(F) - \PP(EF)\\
&= \PP(E) + \PP(F) - \PP(EF).
$$
(5)
1. We have $F = F E \cup F E^c = E \cup F E^c$.
1. $E$ and $FE^c$ are disjoint.
1. Then
$$
\PP (F) = \PP(E \cup F E^c) = \PP(E) + \PP(F E^c).
$$
1. By nonnegativity, $\PP(F E^c) \geq 0$.
1. Hence $\PP(E) \leq \PP(F)$.
(6)
1. We have $\PP(E) = 1 - \PP(E^c)$.
1. But $\PP(E^c) \geq 0$.
1. Hence $\PP(E) \leq 1$.
(7)
1. The statement is trivially true for $n=1$.
1. The statement is true for $n=2$ by the additivity rule.
1. Assume that the statement is true for some $k \geq 2$.
1. In other words, for every collection of events
$E_1, \dots, E_k$, such that the events are pairwise
disjoint, we have
$$
\PP\left ( \bigcup_{i=1}^k E_i \right) = \sum_{i=1}^k \PP(E_i).
$$
1. Let $E_1, E_2, \dots, E_k, E_{k+1}$ be a collection of $k+1$
pairwise disjoint events. Define $E= \bigcup_{i=1}^k E_i$.
1. We have $E \cap E_{k+1} = \EmptySet$.
1. Then
$$
\PP\left ( \bigcup_{i=1}^{k+1} E_i \right)
&= \PP ( E \cup E_{k + 1}) \\
&= \PP (E) + \PP(E_{k + 1})\\
&= \PP\left ( \bigcup_{i=1}^k E_i \right) + \PP(E_{k + 1})\\
&= \sum_{i=1}^k \PP(E_i) + \PP(E_{k + 1})\\
&= \sum_{i=1}^{k+1} \PP(E_i).
$$
1. By principle of mathematical induction, the statement is true
for every $n$.
```
```{note}
We assign probabilities to events and not to individual outcomes
of a random experiment.
If the sample space is finite or countable, often it is convenient
to assign probabilities to individual outcomes. One should treat
this as assignment of probability
to the event consisting of a single outcome;
a singleton event.
```
In the following, we shall assume that a
probability space $(\Omega, \FFF, \PP)$ has
been given and all events are contained in $\FFF$.
```{prf:theorem} Union of three events
:label: res-prob-union-3-events
Let $A, B, C$ be three events. Then
$$
\PP(A \cup B \cup C) = \PP(A) + \PP(B) + \PP(C) - \PP(AB) - \PP(BC) - \PP(AC) + \PP(ABC).
$$
```
```{prf:proof}
Define $D = B \cup C$.
1. Then
$$
\PP(A \cup B \cup C) = \PP(A \cup D) = \PP(A) + \PP(D) - \PP(AD).
$$
1. Further
$$
\PP(D) = \PP(B \cup C) = \PP(B) + \PP(C) - \PP(BC).
$$
1. Note that $AD = AB \cup AC$.
1. Also $AB \cap AC = ABC$.
1. Hence
$$
\PP(AD) = \PP(AB \cup AC) = \PP(AB) + \PP(AC) - \PP(ABC).
$$
1. Putting these back, we get
$$
\PP(A \cup B \cup C)
&= \PP(A) + (\PP(B) + \PP(C) - \PP(BC)) - (\PP(AB) + \PP(AC) - \PP(ABC)) \\
&= \PP(A) + \PP(B) + \PP(C) - \PP(AB) - \PP(BC) - \PP(AC) + \PP(ABC).
$$
```
### Inclusion-Exclusion Principle
{prf:ref}`res-prob-union-3-events` can be extended
to the union of $n$ events. This is known
as the inclusion-exclusion principle.
```{index} Inclusion-exclusion principle
```
```{prf:theorem} Inclusion-exclusion principle
:label: res-prob-inc-ex
Let $A_1, A_2, \dots, A_n$ be $n$ events
in a probability space $(\Omega, \FFF, \PP)$.
Then
$$
\PP \left( \bigcup_{i=1}^n A_i \right )
&= S_1 - S_2 + S_3 - \dots + (-1)^{n + 1} S_n\\
&= \sum_{k=1}^n (-1)^{k+1}S_k
$$
where $S_k$ is the sum of the probability of all $k$-cardinality
intersections among the sets $A_1, \dots, A_n$.
In particular,
$$
& S_1 = \sum_{i=1}^n \PP(A_i) \\
& S_2 = \sum_{1 \leq i < j \leq n} \PP(A_i A_j) \\
& S_3 = \sum_{1 \leq i < j < k \leq n} \PP(A_i A_j A_k)\\
& \vdots\\
& S_n = \PP(A_1 A_2 \dots A_n).
$$
In general for every $k \in 1,\dots,n$, we can write:
$$
S_k = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq n} \PP(A_{i_1} A_{i_2} \dots A_{i_k}).
$$
```
It is known as inclusion-exclusion principle since $S_1$ is
included then $S_2$ is excluded, then $S_3$ is included and
so on.
```{prf:proof}
The proof is based on mathematical induction.
```
### Boole's Inequality
```{index} Boole's inequality, Union bound
```
````{prf:theorem} Boole's inequality
:label: res-prob-boole-inequality
Let $A_1, A_2, \dots, A_n$ be a finite collection of events.
Then we have
$$
\PP \left ( \bigcup_{i=1}^n A_i \right) \leq \sum_{i=1}^n \PP \left ( A_i \right).
$$
````
````{prf:proof}
We prove it using induction.
1. For $n=1$, obviously
$$
\PP (A_1) \leq \PP (A_1).
$$
1. Assume the inequality is true for the set of $n$ events for some $n \geq 1$.
In other words,
$$
\PP \left ( \bigcup_{i=1}^n A_i \right) \leq \sum_{i=1}^n \PP \left ( A_i \right).
$$
1. Since
$$
\PP (A \cup B ) = \PP (A) + \PP(B) - \PP (A \cap B),
$$
hence
$$
\PP \left ( \bigcup_{i=1}^{n + 1} A_i \right)
= \PP \left ( \bigcup_{i=1}^n A_i \right)
+ \PP (A_{n + 1}) - \PP \left ( \bigcup_{i=1}^n A_i \bigcap A_{n +1} \right ).
$$
1. Since
$$
\PP \left ( \bigcup_{i=1}^n A_i \bigcap A_{n +1} \right ) \geq 0,
$$
hence
$$
\PP \left ( \bigcup_{i=1}^{n + 1} A_i \right)
\leq \PP \left ( \bigcup_{i=1}^n A_i \right) + \PP (A_{n + 1})
\leq \sum_{i=1}^{n + 1} \PP \left ( A_i \right).
$$
````
### Countable Additivity
Often, we need to work with problems where we need to estimate
the probability of a countable union of events. The basic
axioms of a probability measure are unable to handle
this. We need one more axiom that a probability measure
must satisfy.
```{prf:axiom} Fourth axiom: countable additivity
:label: ax-prob-countable-additivity
Let $E_1, E_2, \dots$ be a (countable) sequence of mutually exclusive
events (disjoint sets). Then
$$
\PP\left ( \bigcup_{i=1}^{\infty} E_i \right) = \sum_{i=1}^{\infty} \PP(E_i).
$$
```
## Joint Probability
```{index} Joint probability
```
```{prf:definition} Joint probability
:label: def-prob-joint-probability
Let $A$ and $B$ be two different events. Then
the *joint probability* of the events $A$
and $B$ is the probability that the two events
occur together and is given by $\PP(A B)$.
Similarly, let $\{A_i \}_{i \in I}$ be a collection
of events indexed by the set $I$. Then their *joint probability*
is given by
$$
\PP \left (\bigcap_{i \in I} A_i \right ).
$$
In other words, it is the probability of every event
happening together.
```
## Conditional Probability
Conditional probability provides us the means
to reason about experiments with partial information.
If an event is known to have happened, then
the probabilities of other events associated
with an experiment change.
Here are some examples:
1. If a cancer test is 90% reliable and it turns positive for
a person then the probability of the person
having cancer increases dramatically.
1. One can analyze a corpus of English literature to
estimate the probabilities of a letter coming after
another. Given that a letter $t$ has appeared
as the first letter of the word, the probability
that the next letter will be $h$ is higher than
the general probability of $h$ being the second
letter of a word.
1. If a die is rolled twice successively and we
are told that the sum of two rolls is $7$, then
the probability that the first roll is $1$ is
$0.5$.
One point to note that conditional probability
doesn't establish any chronological order between
events. It merely describes how the probabilities
change based on partial information about an
experiment.
```{index} Conditional probability
```
```{prf:definition} Conditional probability
:label: def-prob-conditional-probability
Let $A$ and $B$ be two events.
Assume that $\PP(A) > 0$.
The *conditional probability* of the event $B$
given that the event $A$ has happened is denoted by
$\PP(B | A)$. It is defined as
$$
\PP(B | A) \triangleq \frac{\PP(AB)}{\PP(A)}.
$$
```
Note that the conditional probability is not
defined if $\PP(A) = 0$.
By definition, we can see that
$$
\PP(A B) = \PP(B | A) \PP(A) = \PP(A | B) \PP(B).
$$
```{prf:example}
:label: ex-prob-cond-prob-1-toss
Consider an experiment of tossing a coin 3 times.
1. The sample space is
$$
\Omega =
\{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}.
$$
1. Assume that all the outcomes are equally likely.
Each each outcome has the probability $\frac{1}{8}$.
1. Let $A$ denote the event that the first toss is
a head. We have
$$
A = \{ HHH, HHT, HTH, HTT \}.
$$
1. We can see that $\PP(A) = \frac{1}{2}$.
1. Let $B$ be the event that more heads than tails
come up in the three tosses. We have
$$
B = \{HHH, HHT, HTH, THH \}.
$$
1. We can see that $\PP(B) = \frac{1}{2}$.
1. If the first outcome is a head, then the probability
that more heads than tails come will increase.
1. Let us first check the event $A \cap B$. We have
$$
A \cap B = \{ HHH, HHT, HTH \}.
$$
1. Hence $\PP(A B) = \frac{3}{8}$.
1. Then the probability that more heads than tails
come up given that first toss is a head is
given by
$$
\PP(B | A) = \frac{3 / 8}{ 1/2} = \frac{3}{4}.
$$
1. We can also compute the probability that the
first toss is a head given that more heads
than tails come up as
$$
\PP(A | B) = \frac{3 / 8}{ 1/2} = \frac{3}{4}.
$$
```
We should verify that the conditional probability
as defined above satisfies the axioms of probability.
```{prf:theorem}
:label: res-prob-cond-prob-measure
The conditional probability is a probability measure.
```
```{prf:proof}
(Nonnegativity)
By definition, it is a ratio of nonnegative quantities.
Hence, it is nonnegative.
(Normalization)
We can see that
$$
\PP(\Omega | A) = \frac{\PP(A \Omega)}{\PP(A)}
= \frac{\PP(A)}{\PP(A)} = 1.
$$
(Additivity)
1. Let $B_1$ and $B_2$ be disjoint events.
1. Then $A B_1$ and $A B_2$ are also disjoint events.
1. Hence
$$
\PP(B_1 \cup B_2 | A) &= \frac{\PP(A (B_1 \cup B_2 ))}{\PP(A)}\\
&= \frac{\PP(A B_1 \cup A B_2 ))}{\PP(A)}\\
&= \frac{\PP(A B_1) + \PP(A B_2 ))}{\PP(A)}\\
&= \frac{\PP(A B_1))}{\PP(A)} + \frac{\PP(A B_2 ))}{\PP(A)}\\
&= \PP(B_1 | A) + \PP(B_2 | A).
$$
The argument for countable additivity is similar.
```
We note that
$$
\PP(A | A) = \frac{\PP(AA)}{\PP(A)}
= \frac{\PP(A)}{\PP(A)} = 1.
$$
### Properties
Since $\PP(B | A)$ is a valid probability measure,
all the properties of a probability measure are
applicable for the conditional probability also.
```{prf:theorem} Properties of a conditional probability measure
:label: res-prob-cond-prob-measure-props
Let $(\Omega, \FFF, \PP)$ be a probability space.
Let all probabilities be conditioned on an event $A$.
Then the following properties hold:
1. $\PP(\EmptySet | A) = 0$.
1. $\PP(A | A) = 1$.
1. $\PP(E F^c | A) = \PP(E | A) - \PP(EF | A)$.
1. $\PP(E | A) = 1 - \PP(E^c | A)$.
1. $\PP(E \cup F | A) = \PP(E | A) + \PP(F | A) - \PP(EF | A)$.
1. If $E \subseteq F$, then
$$
\PP (E | A)\leq \PP(F | A).
$$
1. For any positive integer $n$, we have
$$
\PP\left ( \bigcup_{i=1}^n E_i | A \right) = \sum_{i=1}^n \PP(E_i | A)
$$
if $E_1, E_2, \dots, E_n$ are pairwise disjoint events.
1. Union of three events
$$
\PP(B \cup C \cup D | A )
&= \PP(B | A) + \PP(C | A) + \PP(D | A) \\
&- \PP(BC | A) - \PP(CD | A) - \PP(BD | A) + \PP(BCD | A).
$$
1. Let $B_1, B_2, \dots, B_n$ be a finite collection of events.
Then we have
$$
\PP \left ( \bigcup_{i=1}^n B_i | A \right)
\leq \sum_{i=1}^n \PP \left ( B_i | A \right).
$$
```
The proofs are similar to {prf:ref}`res-prob-prob-measure-props`
and other results in the following.
```{note}
Since $\PP(A | A) = 1$, one can see that all
of the conditional probability is concentrated
on the outcomes in $A$. Thus, we might as well
discard the outcomes in $A^c$ and treat the
conditional probabilities as a probability
measure on the new sample space $A$.
```
### Multiplication Rule
```{prf:theorem} Multiplication rule
:label: res-prob-cond-multiplication-rule
Let $A_1, A_2, \dots, A_n$ be a finite collection
of events and let $A$ be an event which occurs
if and only if each of these events occur.
In other words,
$$
A = A_1 A_2 \dots A_n.
$$
Then
$$
\PP(A) = \PP(A_1) \PP(A_2 | A_1) \PP(A_3 | A_1 A_2)
\dots
\PP(A_n | A_1 A_2 \dots A_{n-1}).
$$
```
```{prf:proof}
We can see that
$$
\PP(A) &= \PP(A_1 A_2 \dots A_n) \\
&= \PP(A_1) \frac{\PP(A_1 A_2 \dots A_n)}{\PP(A_1)} \\
&= \PP(A_1) \frac{\PP(A_1 A_2)}{\PP(A_1)}
\frac{\PP(A_1 A_2 \dots A_n)}{\PP(A_1 A_2)} \\
&= \PP(A_1) \frac{\PP(A_1 A_2)}{\PP(A_1)}\frac{\PP(A_1 A_2 A_3)}{\PP(A_1 A_2)}
\frac{\PP(A_1 A_2 \dots A_n)}{\PP(A_1 A_2 A_3)} \\
&= \PP(A_1) \frac{\PP(A_1 A_2)}{\PP(A_1)}\frac{\PP(A_1 A_2 A_3)}{\PP(A_1 A_2)} \dots
\frac{\PP(A_1 A_2 \dots A_n)}{\PP(A_1 A_2 A_3 \dots A_{n-1})} \\
&= \PP(A_1) \PP(A_2 | A_1) \PP(A_3 | A_1 A_2)
\dots
\PP(A_n | A_1 A_2 \dots A_{n-1}).
$$
```
```{prf:example}
:label: ex-prob-cond-mult-1
Draw 4 cards from a deck of 52 cards.
What is the probability that none of them
is a spade?
1. Define $A_i$ as the event that the $i$-th card
is not a spade.
1. Our event of interest is $A = A_1 A_2 A_3 A_4$.
1. There are 13 cards of the suit spade.
There are 39 other cards.
1. $\PP(A_1) = \frac{39}{52}$.
1. Given that $A_1$ has happened (i.e. the first card
is not a spade), we are left with 51 cards out of
which 38 are not spade.
1. Hence $\PP(A_2 | A_1) = \frac{38}{51}$.
1. The probability that the third card is not a spade is
$\PP(A_3 | A_1 A_2) = \frac{37}{50}$.
1. The probability that the fourth card is not a spade is
$\PP(A_4 | A_1 A_2 A_3) = \frac{36}{49}$.
1. Applying the multiplication rule
$$
\PP(A) = \PP(A_1 A_2 A_3 A_4)
&= \PP(A_1) \PP(A_2 | A_1) \PP(A_3 | A_1 A_2)
\PP(A_4 | A_1 A_2 A_3) \\
&= \frac{39}{52}\frac{38}{51}\frac{37}{50}\frac{36}{49}.
$$
Another way to calculate this is through counting
the number of ways four cards can be chosen.
1. Total number of ways four cards can be chosen
is $52 \choose 4$.
1. Total number of ways four cards can be chosen which
are not spades are $39 \choose 4$.
1. Hence the probability that none of the four cards
are spades is
$$
\frac{39 \choose 4}{52 \choose 4}
= \frac{39!}{4! 35!}\frac{4! 48!}{52!}
= \frac{39 \cdot 38 \cdot 37 \cdot 36}{52 \cdot 51 \cdot 50 \cdot 49}.
$$
```
### Marginal Probability
```{index} Marginal probability
```
```{prf:definition} Marginal probability
:label: def-prob-marginal-probability
Let $A$ and $B$ be two events.
The *marginal probability* of the event $A$
is the probability $\PP(A)$ which is not
conditioned on the event $B$.
```
### Total Probability Theorem
```{prf:theorem} Total probability theorem
:label: res-prob-total-prob
Let $A_1, \dots, A_n$ be disjoint events that
form a partition of the sample space $\Omega$.
Assume that $\PP(A_i) > 0$ for every $i$.
Then for any event $B$ we have
$$
\PP(B)
= \PP(A_1 B) + \dots + \PP(A_n B)
= \PP(A_1)\PP(B | A_1) + \dots + \PP(A_n)\PP(B | A_n).
$$
```
```{prf:proof}
.
1. Since $A_1, \dots, A_n$ are disjoint, hence
so are $A_1 B, \dots, A_n B$.
1. We have
$$
B = A_1 B \cup \dots \cup A_n B.
$$
1. Applying additivity axiom,
$$
\PP(B) = \PP(A_1 B) + \dots + \PP(A_n B).
$$
1. Applying conditional probability definition, we have
$$
\PP(B) = \PP(A_1)\PP(B | A_1) + \dots + \PP(A_n)\PP(B | A_n).
$$
```
## Bayes Rule
The most famous application of total probability theorem
is the Bayes rule. It relates the conditional probabilities
of the form $\PP(A | B)$
with conditional probabilities of the form $\PP(B | A)$
in which the order of conditioning is reversed.
```{prf:theorem} Bayes rule
:label: res-prob-bayes-rule
Let $A_1, \dots, A_n$ be disjoint events that
form a partition of the sample space $\Omega$.
Assume that $\PP(A_i) > 0$ for every $i$.
Let $B$ be another event such that $\PP(B) > 0$.
Then we have
$$
\PP(A_i | B)
&= \frac{\PP(A_i) \PP(B | A_i)}{\PP(B)} \\
&= \frac{\PP(A_i) \PP(B | A_i)}
{\PP(A_1)\PP(B | A_1) + \dots + \PP(A_n)\PP(B | A_n)}.
$$
```
```{prf:proof}
.
1. By definition of conditional probability, we have
$$
\PP (A_i B) = \PP(A_i | B) \PP(B)
$$
and
$$
\PP (A_i B) = \PP(B | A_i) \PP(A_i).
$$
1. Hence
$$
\PP(A_i | B) \PP(B) = \PP(B | A_i) \PP(A_i).
$$
1. Dividing both sides with $\PP(B)$, we get
$$
\PP(A_i | B) = \frac{\PP(A_i) \PP(B | A_i)}{\PP(B)}.
$$
1. Expanding $\PP(B)$ via
{prf:ref}`total probability theorem `,
we get the desired result.
```
```{prf:remark} Statistical inference
:label: rem-prob-bayes-inference
Bayes' rule is a key tool in the field of
*statistical inference*.
1. We conduct an experiment where can observe an
*effect* which may be due to a number of *causes*.
1. The events $A_1, \dots, A_n$ denote the causes
which may not be observable directly.
1. The event $B$ is the effect which can be observed.
1. The probability $\PP(B | A_i)$ models the relationship
between the cause $A_i$ and the effect $B$.
It represents the likelihood of $B$ happening
given that $A_i$ has happened.
1. We are often interested in knowing the probability
of $A_i$ given that $B$ has been observed.
1. This is an *inference* since the events $A_i$ cannot be
observed directly.
1. The probability $\PP(A_i | B)$ is known as
the *posterior probability* of the event $A_i$
given that $B$ has happened.
1. This is distinguished from the unconditional/marginal
probability $\PP(A_i)$ which is the probability
of the event $A_i$ without any information about
the event $B$.
1. $\PP(A_i)$ is known as the *prior* probability of
$A_i$.
```
## Independence
The conditional probability $\PP(A | B)$
captures the information that occurrence
of the event $B$ provides about the event $A$.
In many cases, it may happen that observing $B$
provides us no information about the event $A$.
Consequently, the probability $\PP(A)$ is
not altered when conditioning on the event $B$.
Then, we say that the events $A$ and $B$ are
independent.
```{index} Independence
```
```{prf:definition} Independence of two events
:label: def-prob-independence-2
Let $A$ and $B$ be two events.
We say that $A$ and $B$ are *independent* if
$$
\PP(A B) = \PP(A) \PP(B).
$$
```
Independence is a symmetric property. We can
see that if $A$ is independent of $B$ then
$B$ is independent of $A$.
It follows that for independent events
1. If $\PP(A) > 0$ then
$$
\PP(B | A) = \PP(B).
$$
1. And if $\PP(B) > 0$ then
$$
\PP(A | B) = \PP(A).
$$
```{prf:example}
Consider the problem of rolling a $4$-sided dice
twice. Assume that all outcomes are equally likely.
1. Let $A$ be the event that the first roll is $1$.
1. Let $B$ be the event that the sum of two rolls is $5$.
1. We have $A = \{(1,1), (1,2), (1,3), (1,4) \}$.
1. We have $B = \{(1, 4), (2, 3), (3, 2), (4, 1) \}$.
1. Number of outcomes is $16$.
1. $\PP(A) = \frac{1}{4}$.
1. $\PP(B) = \frac{1}{4}$.
1. We can see that $A B = \{(1, 4)\}$.
1. Then $\PP(A B) = \frac{1}{16}$.
1. We can see that $\PP(A B) = \PP(A) \PP(B)$.
1. The two events $A$ and $B$ are independent.
1. On the surface, it may seem that there should
be dependence between the events $A$ and $B$
but it is not so.
1. Since the outcomes in the event $B$ are equally
likely, we can carefully examine the list of
outcomes in the event $B$ to see that observing the event $B$
gives us no additional information about whether the
first roll will give us $1$.
Now consider the problem of rolling a $6$-sided
dice twice with the same events as before.
1. We have $A = \{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6) \}$.
1. We have $B = \{(1, 4), (2, 3), (3, 2), (4, 1) \}$.
1. Number of outcomes is $36$.
1. $\PP(A) = \frac{1}{6}$.
1. $\PP(B) = \frac{1}{9}$.
1. We can see that $A B = \{(1, 4)\}$.
1. Then $\PP(A B) = \frac{1}{36}$.
1. Clearly, $\PP(A B) \neq \PP(A) \PP(B)$.
1. The two events are dependent.
1. In fact $\PP(A | B) = \frac{1}{4}$
In other words, the probability of $A$ increases
given that $B$ has been observed.
1. Looking at the outcomes in $B$ we can see that
observance of $B$ eliminates the possibility of
$5$ and $6$ from the first roll.
1. This leads to the increase in the probability of $1$
in the first roll.
1. We can also see that $\PP(B | A) = \frac{1}{6}$.
1. The probability of $B$ given $A$ has also increased.
```
### Conditional Independence
Recall that the conditional probability is a probability
measure in itself. This enables us to introduce the
notion of conditional independence.
```{index} Conditional independence
```
```{prf:definition} Conditional independence
:label: def-prob-conditional-independence
Given an event $C$, two events $A$ and $B$ are
called *conditionally independent* if
$$
\PP (A B | C) = \PP (A | C) \PP (B | C).
$$
```
```{prf:theorem}
:label: res-prob-cond-ind-charac
Let $A, B, C$ be three different events.
Assume that $\PP(B C) > 0$.
Given that $C$ has occurred, $A$ and $B$ are
conditionally independent if and only if
$$
\PP(A | B C) = \PP (A | C).
$$
```
```{prf:proof}
Note that $\PP(B C) > 0$ implies that
$\PP(B) > 0$, $\PP(C) > 0$,
$\PP(B | C) > 0$ and $\PP(C | B) > 0$.
1. By conditional probability definition and
multiplication rule, we have
$$
\PP (A B | C) &= \frac{\PP (A B C)}{\PP(C)}\\
&= \frac{\PP(C) \PP(B | C) \PP (A | B C )}{\PP(C)}\\
&= \PP(B | C) \PP (A | B C ).
$$
1. Comparing this with the expression of conditional
independence, we have
$$
\PP(B | C) \PP (A | B C ) = \PP (A | C) \PP (B | C).
$$
1. Eliminating the common term, we have
$$
\PP (A | B C ) = \PP (A | C).
$$
```
It is important to note that
1. Unconditional independence of $A$ and $B$
doesn't imply conditional independence given $C$.
1. Conditional independence of $A$ and $B$ given $C$
doesn't imply unconditional independence.
```{prf:example}
:label: ex-prob-cond-prob-counter-1
Consider tossing a fair coin twice.
Assume that all outcomes are equally likely.
1. $\Omega = \{ HH, HT, TH, TT \}$.
1. Let $A$ be the event that the first toss is a head.
$A = \{ HH, HT \}$.
1. Let $B$ be the event that the second toss is a head.
$B = \{TH, HH \}$.
1. It is easy to see that $A$ and $B$ are independent.
1. Let $C$ be the event that the two tosses have
different results.
$C = \{TH, HT \}$.
1. $\PP (A | C) = \frac{1}{2}$.
1. $\PP (B | C) = \frac{1}{2}$.
1. $\PP (A B | C) = 0$.
1. Clearly, the two events are not conditionally
independent.
```
```{prf:example}
:label: ex-prob-cond-prob-counter-2
Let there be two coins. The first coin
is fair while the second coin is unfair
with the probability of head being $0.9$.
Assume that a coin can be chosen at random
with equal probability and then it is
tossed twice (the two tosses being independent).
1. Let $C$ be the event that the second (unfair) coin
was chosen.
1. Let $H_i$ be the event that the $i$-th toss
resulted in heads.
1. Given the choice of the coin, the two events
$H_1$ and $H_2$ are independent.
1. $\PP(H_1 | C) = 0.9$.
1. $\PP(H_2 | C) = 0.9$.
1. $\PP(H_1 H_2 | C) = 0.81$.
1. However, the two events $H_1$ and $H_2$ are
not independent unconditionally.
1. If the first toss is a head, then the probability
that the coin is unfair increases.
Consequently, the probability that the second
toss will also be a head increases.
1. From total probability theorem, we have
$$
\PP(H_1) = \PP(C)\PP(H_1 | C)
+ \PP(C^c)\PP(H_1 | C^c)
= 0.5 \cdot 0.9 + 0.5 \cdot 0.5
= 0.7.
$$
1. Similarly $\PP(H_2) = 0.7$.
1. However,
$$
\PP(H_1 H_2) = \PP(C)\PP(H_1 H_2 | C)
+ \PP(C^c)\PP(H_1 H_2 | C^c)
= 0.5 \cdot 0.81 + 0.5 \cdot 0.25
= 0.53.
$$
1. $\PP(H_1) \PP(H_2) = 0.7 \cdot 0.7 = 0.49 \neq \PP(H_1 H_2)$.
```
### Three Events
Naturally, we are interested in extending the notion of
independence for multiple events. However this is a bit
tricky. We start with the simpler case of three events
which can explain the issues involved.
Independence means that occurrence or non-occurrence of
any event or any pair of events should not provide
any information about the occurrence of any other
event or pair of events.
```{index} Independence; 3 events
```
```{prf:definition} Independence of three events
:label: def-prob-independence-3
Let $A, B$ and $C$ be three events.
We say that $A, B$ and $C$ are *jointly independent* if
$$
& \PP(A B) = \PP(A) \PP(B)\\
& \PP(B C) = \PP(B) \PP(C)\\
& \PP(A C) = \PP(A) \PP(C)\\
& \PP(A B C) = \PP(A) \PP(B) \PP(C).
$$
```
The first three conditions establish the
*pairwise independence* of the three sets. However, the
fourth condition is necessary and it doesn't follow
from the first three conditions.
Conversely, the fourth condition doesn't imply the first
three conditions.
Let us look at some counter examples.
```{prf:example}
:label: ex-prob-3-event-pw-ind-no-ind
Consider the experiment of tossing a fair coin twice
with equally likely outcomes.
1. $\Omega = \{ HH, HT, TH, TT \}$.
1. Let $H_i$ be the event that $i$-th toss is a head.
1. Let $A$ be the event that the two tosses have
different results.
1. $H_1 = \{ HT, HH \}$.
1. $H_2 = \{TH, HH \}$.
1. $A = \{HT, TH \}$.
1. $A H_1 = \{HT \}$.
1. $A H_2 = \{TH \}$.
1. $H_1 H_2 = \{ HH \}$.
1. $\PP(H_1) = \PP(H_2) = \PP(A) = \frac{1}{2}$.
1. $\PP(A H_1) = \PP (A H_2) = \PP (H_1 H_2) = \frac{1}{4}$.
1. We can see that $A, H_1, H_2$ are pairwise independent.
1. However $A H_1 H_2 = \EmptySet$ since occurrence of $A$
makes $H_1$ and $H_2$ mutually exclusive.
1. Hence $\PP (A H_1 H_2) = 0 \neq \PP(A) \PP(H_1) \PP(H_2)$.
1. Hence $A, H_1,H_2$ are not independent.
```
```{prf:example}
:label: ex-prob-3-event-3-ind-no-pw-ind
Consider the experiment of tossing a fair dice twice
with equally likely outcomes.
1. The sample space has $36$ possible outcomes.
1. Let $A$ be the event that the first roll has
the outcomes $1$, $2$, or $3$.
1. Let $B$ be the event that the first roll has
the outcomes $3$, $4$, or $5$.
1. Let $C$ be the event that the sum of two
rolls is $9$.
1. We can see that
$$
C = \{ (3,6), (4,5), (5,4), (6,3) \}.
$$
1. We have
$$
\PP(A) = \frac{1}{2},
\PP(B) = \frac{1}{2},
\PP(C) = \frac{1}{9}.
$$
1. We can see that
$$
A B C = \{ (3,6) \}.
$$
1. Hence $\PP(ABC) = \frac{1}{36}$.
1. Clearly,
$$
\PP(A B C) = \PP(A) \PP(B) \PP(C).
$$
1. Now $A B$ is the event that the first roll is $3$.
1. $B C = \{ (3,6), (4,5), (5,4) \}$.
1. $A C = \{ (3,6)\}$.
1. Hence, we have
$$
\PP(A B) = \frac{1}{6},
\PP(B C) = \frac{1}{12},
\PP(A C) = \frac{1}{36}.
$$
1. None of $A, B, C$ are pairwise
independent.
1. Hence $A,B,C$ are not independent.
```
### $n$ Events
We are now ready to generalize the
notion of independence to any finite
number of events. The key idea is that
information about occurrence or non-occurrence
of any subset of events should not provide
any information about the occurrence or
non-occurrence of the remaining events.
```{index} Independence; n events
```
```{prf:definition} Independence of $n$ events
:label: def-prob-independence-n
Let $A_1, A_2, \dots, A_n$ be $n$ events contained in $\FFF$.
We say that $A_1, A_2, \dots, A_n$ are *jointly independent*
if
$$
& \PP(A_{i_1} A_{i_2}) = \PP(A_{i_1}) \PP(A_{i_2})\\
& \PP(A_{i_1} A_{i_2} A_{i_3}) = \PP(A_{i_1}) \PP(A_{i_2}) \PP(A_{i_3})\\
& \vdots\\
& \PP(A_{i_1} A_{i_2} A_{i_3} \dots A_{i_k})
= \PP(A_{i_1}) \PP(A_{i_2}) \PP(A_{i_3}) \dots \PP(A_{i_k})\\
& \vdots\\
& \PP(A_1 A_2 \dots A_n) = \PP(A_1) \PP(A_2) \dots \PP(A_n)
$$
for all combinations of indices such that $1 \leq i_1 < i_2 < \dots < i_k \leq n$.
In other words, $A_1, A_2, \dots, A_n$ are *independent* if
$$
\PP (\bigcup_{i \in S}A_i) = \prod_{i \in S} \PP(A_i)
$$
for every (nonempty) subset $S$ of $\{1,2,\dots, n\}$.
```
### Independent Trials
```{index} Independent trials
```
```{prf:definition} Independent trials
:label: def-prob-independent-trials
If an experiment involves a sequence of
independent but identical stages, we say
that we have a sequence of *independent trials*.
```
```{index} Bernoulli trials
```
```{prf:definition} Bernoulli trials
:label: def-prob-bernoulli-trials
If every stage in a sequence of independent trials
has only two possible outcomes, we say that
we have a sequence of *Bernoulli trials*.
```
It is customary to denote the two outcomes
of a Bernoulli trial by $H$ and $T$
with the probabilities $p$ and $q=1-p$
respectively.
```{prf:example} Repeated coin tosses
:label: ex-prob-repeated-coin-tosses
Consider tossing a coin $n$ times
as a sequence of Bernoulli trials.
1. The outcomes of each experiment are $H$ (head) and $T$ (tail).
1. Let $\PP(H) = p$ and $\PP(T) = q = 1 - p$.
1. The outcome of $n$ tosses can be described as a string
of $n$ letters each of which is $H$ or $T$.
1. There are $2^n$ possible strings. They form the sample space
of the compound experiment.
1. Let a particular string have $k$ heads and $n-k$ tails.
1. Then the probability of this string is given by
$$
\PP(\zeta_1, \dots, \zeta_n) = \prod_{i=1}^n \PP(\{ \zeta_i \})
= p^k q^{n - k}.
$$
1. There are $n \choose k$ strings which consist of $k$ heads and
$n-k$ tails.
1. Each of these strings (singleton events) are mutually exclusive
of each other.
1. Hence, by the additivity axiom, the probability of having $k$
heads and $n-k$ tails in $n$ trials is given by
$$
{n \choose k} p^k q^{n - k}.
$$
```
```{prf:definition} Binomial probability
:label: def-prob-k-n-binomial-prob
The probability of $k$ heads in a sequence of $n$
Bernoulli trials, denoted by $p(k)$,
is known as *binomial probability* and
is given by
$$
p(k) = {n \choose k} p^k (1-p)^{n - k}
$$
where ${n \choose k}$ is a binomial coefficients
and $p$ is the probability of head.
```
See {ref}`sec:alg:combinatorics` for a quick review
of binomial coefficients.
```{prf:theorem} Sum of binomial probabilities
:label: res-prob-sum-binomial-probs
$$
\sum_{k=0}^n p(k) = 1.
$$
```
```{prf:proof}
Recall from {prf:ref}`res-comb-binomial-theorem`
that
$$
(a + b)^n = \sum_{k=0}^n \binom{n}{k} a^k b^{n - k}.
$$
Putting $a = p$ and $b=1-p$, we get
$$
1 = \sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n - k}
= \sum_{k=0}^n p(k).
$$
```
(sec:prob:compound:experiment)=
## Compound Experiments
Often we need to examine the outcomes of different experiments together.
Here are some examples:
- Tossing a coin and throwing a dice
- Tossing a coin twice in succession (first and second tosses are separate experiments)
Two or more experiments together form a compound experiment.
Repeated trials are an example of a compound experiment.
```{index} Compound experiment, Compound sample space
```
```{prf:definition} Compound experiment
:label: def-prob-compound-experiment
Let $A$ and $B$ be two different experiments.
Let $\Omega_1$ be the sample space of $A$
and $\Omega_2$ be the sample space of $B$.
Then the sample space of the *compound experiment*
is given by the Cartesian product $\Omega = \Omega_1 \times \Omega_2$.
```
### Product $\sigma$-Algebra
If $\FFF_1$ and $\FFF_2$ are $\sigma$-algebras over $\Omega_1$
and $\Omega_2$ respectively, it is natural to
ask how can we construct a $\sigma$-algebra
for the compound sample space.
Let $E_1$ be an event associated with experiment $A$
and $E_2$ be an event associated with experiment $B$.
Then $E = E_1 \times E_2$ is a *product event* associated with
the compound experiment. We have
$$
E_1 \times E_2 = \{\zeta = (\zeta_1, \zeta_2) \ST \zeta_1 \in E_1, \zeta_2 \in E_2 \}.
$$
However, we can see that the set
$$
\{ E_1 \times E_2 \ST E_1 \in \FFF_1, E_2 \in \FFF_2 \}
$$
is not a $\sigma$-algebra. Naturally, the
closest $\sigma$-algebra is the $\sigma$-algebra
generated by this set.
```{prf:definition} Product $\sigma$ algebra
:label: def-prob-prod-sigma-algebra
Let $(\Omega_1, \FFF_1)$ and $(\Omega_2, \FFF_2)$ be
two different $\sigma$-fields. Then the
*product* $\sigma$-*algebra* $\FFF_1 \otimes \FFF_2$
is the $\sigma$-algebra on $\Omega_1 \times \Omega_2$
generated by the collection of all product events:
$$
\FFF_1 \otimes \FFF_2 \triangleq
= \sigma(\{ E_1 \times E_2 \ST E_1 \in \FFF_1, E_2 \in \FFF_2 \}).
$$
The members of a product $\sigma$-algebra are known
as *compound events*.
```
### Compound Events
A *compound event* can be written
as a finite (or countable) disjoint union of product events
from the two experiments. A finite union looks like
$$
E = \bigcup_{i=1}^k E_{1, i} \times E_{2, i}
$$
where $E_{1,i}$ and $E_{2, i}$ are events in
$\FFF_1$ and $\FFF_2$ respectively.
```{prf:example} Compound events as union of product events
:label: ex-prob-compound-exp-union-products
1. Consider two experiments each of which consists
of throwing a die.
1. The sample space for both experiments is
$$
\Omega_1 = \Omega_2 = \{ 1,2,3,4,5,6 \}.
$$
1. There are $36$ possible outcomes in the compound experiment.
1. The compound sample space is given by
$$
\Omega = \Omega_1 \times \Omega_2 = \{
(1,1), (1,2), \dots, (1,6),
(2,1), \dots, (2,6),
\dots,
(6,1), \dots, (6,6)
\}.
$$
```
### Independent Experiments
```{index} Independent experiments
```
```{prf:definition} Independent experiments
:label: def-prob-independent-experiment
Two experiments are called *independent*
if the outcome of one experiment doesn't
depend on the (past, present or future) outcomes
of the other experiment.
In that case, for every product event $E = E_1 \times E_2$,
we can write
$$
\PP(E) = \PP(E_1) \PP(E_2).
$$
```
Let $E$ be a compound event of two independent
experiments given as a disjoint union of product events.
Then the probability measure for the compound event
is given by
```{math}
:label: eq-prob-compound-event-prob-measure-1
\PP(E) = \sum_{i=1}^k \PP(E_{1, i}) \PP(E_{2, i}).
```