# ML estimators in normal populations

Acronym ML stands for maximum likelihood.

ML estimators are calculated in such a way that they maximize the likelihood of n independent observations. That is, if $$f(x_i; \theta_1, \ldots, \theta_p)$$ is the value of the density function for the i-th observation, which depends on parameters $$\theta_1, \ldots, \theta_p$$, the function to be maximized is $L(\theta_1, \ldots, \theta_p; x_1, \ldots, x_n) = \prod_{i=1}^n f(x_i; \theta_1, \ldots, \theta_p).$ The set of numbers $$\hat{\theta}_1, \ldots, \hat{\theta}_p$$, such that they maximize $$L$$, are the so called maximum likelihood estimators.

load("distrib")$ Given a normal, or gaussian, population with unknown parameters $$\mu$$ and $$\sigma$$, once we have an independent sample of size n, $$(x_1, x_2, \ldots, x_n)$$, our likelihood function takes the form L: product(pdf_normal(x[i],mu,sigma), i, 1, n);  $\prod_{i=1}^{n}{{{e^ {- {{\left(x_{i}-\mu\right)^2}\over{2\,\sigma^ 2}} }}\over{\sqrt{2}\,\sqrt{\pi}\,\sigma}}}$ Taking the logarithm of the above expression, we get the loglikelihood function, logL: ev(log(L), logexpand=all);  $\sum_{i=1}^{n}{\left(-\log \sigma-{{\left(x_{i}-\mu\right)^2}\over{ 2\,\sigma^2}}-{{\log \pi}\over{2}}-{{\log 2}\over{2}}\right)}$ As we want to maximize the likelihood, or equivalently the loglikelihood, we need to equal to zero the partial derivatives of the loglikelihood with respect to $$\mu$$ and $$\sigma$$. The derivative with respect to $$\mu$$ is logLmu: ev(diff(logL, mu), simpsum=true);  ${{\sum_{i=1}^{n}{x_{i}}-\mu\,n}\over{\sigma^2}}$ Equating to zero and isolating $$\mu$$ yields hatmu: solve(logLmu, mu);  $\left[ \mu={{\sum_{i=1}^{n}{x_{i}}}\over{n}} \right]$ Which we recognize as the sample mean, so that our result can be summarized as $$\hat{\mu}=\bar{x}$$. We repeat a similar procedure for $$\sigma^2$$, logLsi: ev(diff(logL, sigma), simpsum=true)$
hatsi: solve(logLsi, sigma^2);


$\left[ \sigma^2={{\sum_{i=1}^{n}{\left(x_{i}-\mu\right)^2}}\over{n }} \right]$

Which is the sample variance, so that $$\hat{\sigma}^2=s^2$$.

The expression for the variance can be written only in terms of the sample values,

subst(hatmu, hatsi);


$\left[ \sigma^2={{\sum_{i=1}^{n}{\left(x_{i}-{{\sum_{i=1}^{n}{x_{i} }}\over{n}}\right)^2}}\over{n}} \right]$

Let us now proceed with the numerical part of this session. Instead of taking our sample from a real world population, we can make a simulation calling function random_normal from package distrib. Let $$n=50$$ be the sample size, $$\mu=50$$, and $$\sigma=2.5$$,

(n: 50, mu: 50, sigma: 2.5) $fpprintprec: 4$
x: random_normal(mu, sigma, n) ;


$\left[ 53.21 , 50.45 , 52.16 , 51.37 , 44.37 , 52.65 , 50.92 , 51.25 , 52.18 , 48.49 , \\ 52.49 , 46.19 , 47.32 , 47.73 , 50.58 , 48.6 , 49.21 , 52.32 , 49.95 , 48.7 , \\ 55.38 , 51.36 , 48.3 , 49.34 , 52.6 , 48.8 , 47.52 , 44.53 , 49.25 , 44.47 , \\ 48.58 , 48.33 , 48.39 , 49.36 , 43.98 , 53.73 , 51.93 , 51.76 , 46.22 , 53.61 , \\ 51.2 , 50.16 , 54.3 , 46.86 , 56.86 , 49.21 , 50.75 , 45.22 , 53.87 , 48.85 \right]$

In order to calculate the sample mean and variance, we need to load the descriptive package. We also include the sample standard deviation,

load("descriptive") $ml: [mean(x), var(x), std(x)];  $\left[ 49.9 , 8.529 , 2.92 \right]$ As a result, $$\hat{\mu}=49.9$$, and $$\hat{\sigma}=2.92$$. Remember that our sample was artificially drawn from a normal population with parametrrs $$\mu=50$$, and $$\sigma=2.5$$. Also remember that the ML estimator of the variance is a biased estimator. With help of function histogram, also implemented in package descriptive, we can show a graphical representation of the sample together with the estimated gaussian probability density, [m, s]: [ml[1], ml[3]]$
draw2d(
grid       = true,
dimensions = [400,300],
histogram_description(
x,
nclasses     = 9,
frequency    = density,
fill_density = 0.5),
explicit(pdf_normal(z,m,s), z, m - 3*s, m + 3* s) ) \$