786 CHAPTER 38. PROBABILITY
Proposition 38.7.15 Suppose Xi for i = 1, ...,m are independent random variables andE (Xi) = µ while E
((Xi−µ)2
)= σ2. Then if Z ≡ 1
m ∑mi=1 Xi is their average, then E (Z) =
µ and E((Z−µ)2
)= σ2/m.
Proof: E (Z) = E( 1
m ∑mi=1 Xi
)= 1
m ∑mi=1 E (Xi) =
1m ∑
mi=1 µ = µ . Also, using the inde-
pendence of these random variables,
E((Z−µ)2
)= E
( 1m
m
∑k=1
Xk−µ
)2
= E
( m
∑k=1
(Xk−µ
m
))2
=1
m2 E
(∑k,l
(Xk−µ)(Xl−µ)
)
=1
m2 ∑k,l
E ((Xk−µ)(Xl−µ))
=1
m2
m
∑k=1
E((Xk−µ)2
)=
σ2
m■
Then it follows from this proposition and Proposition 38.7.13 the following important re-sult which says that the average of observations of independent random variables whichhave the same mean and same variance converges in probability to 0 as more and moreindependent observations are taken. Actually much more can be said, but this is enoughhere.
Proposition 38.7.16 Let Xi, i = 1..., be independent random variables with common meanµ and common variance σ2 then for Zm ≡ 1
m ∑mk=1 Xk, the average of the first m,
limm→∞
P((Zm−µ)2 ≥ ε
)= 0
Proof: This follows from the above propositions which imply
P((Zm−µ)2 ≥ ε
)≤ 1
εE((Zm−µ)2
)=
1εm
σ2 ■
In words, this says that if you average independent observations (That is, the ith obser-vation does not depend on the others. For example, you throw the marked fish back into thelake and let them swim around before taking another observation.) then as you take moreand more of them, the probability that this average differs by very much from the true meanbecomes very small. This is a version of the law of large numbers. In words, the average isprobably close to the true mean if you average many independent observations.
Example 38.7.17 Let X have the hypergeometric distribution.
P(X = j) =
(mj
)(N−mk− j
)(
Nk
) ,k ≥ 1