June 1, 2021
June 2, 2021
June 3, 2021
June 4, 2021
June 5, 2021
To understand the concept behind \(E_X\), consider a discrete random variable with range \(R_X = \{x_1, x_2, x_3, \ldots\}\). This random variable is a result of a random experiment. Suppose that we repeat this experiment a very large number of times \(N\), and that the trials are independent. Let \(N_1\) be the number of times we observe \(x_1\), \(N_2\) be the number of times we observe \(x_2\), \(N_k\) be the number of times we observe \(x_k\), and so on. Since \(P(X=x_k) = P_X(x_k)\), we expect that
$$P_X(x_1) \approx \frac{N_1}{N}$$
$$P_X(x_2) \approx \frac{N_2}{N}$$
$$P_X(x_k) \approx \frac{N_k}{N}$$
In other words, we have \(N_k \approx N \cdot P_X(x_k)\). Now, if we take the average of the observed values of \(X\), we obtain
$$\text{Average} = \frac{N_1x_1 + N_2x_2 + N_3x_3 + \ldots}{N}$$
$$= \frac{N \cdot P_X(x_1) \cdot x_1 + N \cdot P_X(x_2) \cdot x_2 + \ldots}{N}$$
$$= P_X(x_1) \cdot x_1 + P_X(x_2) \cdot x_2 + \ldots$$
$$= \sum P_X(x_i) \cdot x_i$$
Thus, the intuition behind \(E_X\) is that if you repeat the random experiment independently \(N\) times and take the average of the observed data, the average gets closer and closer to \(E_X\) as \(N\) gets larger and larger.
June 6, 2021
However, when we model real-life stochastic processes, we often do not know \(\theta\); we simply observe \(O\), and then our goal is to estimate \(\theta\).
We know that given a value of \(\theta\), the probability of observing \(O\) is \(P(O|\theta)\). Thus, a 'natural' estimation process is to choose that value of \(\theta\) that would maximize the probability that we would actually observe \(O\).
Find the parameter values \(\theta\) that maximize the following function:
$$L(\theta|O) = P(O|\theta)$$
L(\theta|O) is called the likelihood function. Notice that by definition, the likelihood function is conditioned on the observed \(O\) and that it is a function of the unknown parameters \(\theta\).
Denote the probability density function (pdf) associated with the outcomes \(O\) as: \(f(O|\theta)\). Thus, in the continuous case, we estimate \(\theta\) given observed outcomes \(O\) by maximizing the following function:
$$L(\theta|O) = f(O|\theta)$$