38.7. GENERAL CONSIDERATIONS PROBABILITY 785
Observation 38.7.12 In every case, if a,b are numbers, then if everything makes sense,and X , X̂ are two random variables having the same probability distribution, meaning thatP(X ∈ F) = P
(X̂ ∈ F
)for all F an interval, then
E(aX +bX̂
)= aE (X)+bE
(X̂)
Suppose you had many random variables Xi each having the same distribution and thecollection of random variables independent, explained below. If you averaged Xi for all i,what you would get is probably close to E (X). This is why taking the expectation is ofinterest. I will give a brief explanation why this is so.
Where do independent random variables come from? In practice, you have independentobservations from an underlying probability space, meaning that it makes sense to ask forthe probability that a random variable is in suitable subsets of R or Rn. These observationsare independent in the sense that the outcome of an observation does not depend on theoutcome of the others. Then the numerical values are called independent random variables.A more precise description is given below.
First, here is an important formula. I will be considering only the case of a continuousdistribution in explaining this inequality, but it all works in general. The inequality is calledthe Chebychev inequality.
Proposition 38.7.13 Let X be a random variable. Then for ε > 0,
P(|g(X)| ≥ ε)≤ 1ε
E (|g(X)|)
Proof: By definition of what is meant by a distribution function, if E ≡ |g|−1 ([ε,∞))≡{x : |g(x)| ≥ ε} , then on this set, |g(x)|/ε ≥ 1 and off this set, |g(x)| f (x)≥ 0. Thus
P(|g(X)| ≥ ε) =∫
Ef (x)dx≤ 1
ε
∫R|g(x)| f (x)dx =
1ε
E (|g(X)|) ■
Now suppose you have Xi a random variable having distribution function f (x) and supposeµ = E (X) , σ2 = E
((X−µ)2
)both exist. Suppose Xi, i = 1, ... all these random variables
are independent as in the next definition.
Definition 38.7.14 Let there be random variables X1, ..., having well defined mean µ ≡E (Xk) and variance σ2 = E
((X−µ)2
). Then to say these are independent implies that
E ((Xi−µ)(X j−µ)) = E (Xi−µ)E (X j−µ) = 0 whenever i ̸= j. The more completemeaning of independence is as follows: For each m,
P(Xi ∈ Ei for each i≤ m) =m
∏i=1
P(Xi ∈ Ei) .
Here the Ei can be considered intervals. The idea is that what happens in terms of proba-bility involving each X j for j ̸= i does not affect probability involving Xi. Also, if you haveX1, ..., independent, then if g is some continuous function, then g(X1) , ... will also be inde-pendent. This is because g(Xi) ∈ Ei if and only if Xi ∈ g−1 (Ei). Then there is a significantobservation.