The normal, or Gaussian, distribution is the most frequently used probability model, both for real and theoretical reasons. In general, it is used to model physical quantities that are subject to a variety of small random fluctuations. It is also used to approximate other models based on the Central Limit Theorem.
A random variable X is normally distributed with parameters μ∈R and σ>0, X∼N(μ,σ), when its density function has the form
load("distrib")$ pdf_normal(x,mu,sigma);
e−(x−μ)22σ2√2√πσ
Its support is R.
The cumulative distribution function is defined in terms of the error function,
cdf_normal(x,mu,sigma);
erf(x−μ√2σ)2+12
The mean and the standard deviation are equal to the values of parameters μ and σ, respectively.
[mean_normal(mu,sigma), std_normal(mu,sigma)];
[μ,σ]
First quartile,
Q1: quantile_normal(1/4, mu,sigma);
√2inverse_erf(−12)σ+μ
Second quartile or median,
Q2: quantile_normal(1/2, mu,sigma);
μ
Third quartile,
Q3: quantile_normal(3/4, mu,sigma);
√2inverse_erf(12)σ+μ
Interquartile range,
IQR: Q3 - Q1;
√2inverse_erf(12)σ−√2inverse_erf(−12)σ
For the case of the standard distribution, where σ=1,
float(subst(sigma=1,IQR));
1.348979500392164
The moment generating function for any random variable X with distribution function F is defined as MX(t)=E[etX]=∫∞−∞etxdF(x). In the particular case of the normal distribution, the moment generating function becomes
assume(sigma>0) $ M: integrate(exp(t*x)*pdf_normal(x,mu,sigma),x, minf, inf);
eσ2t2+2μt2
Column matrix showing the four first moments of X∼N(μ,σ),
transpose( matrix( makelist( subst(t=0, diff(M, t, n)), n,4)));
(μσ2+μ23μσ2+μ33σ4+6μ2σ2+μ4)
The characteristic function is defined as ϕX(t)=E[eitX]=∫∞−∞eitxdF(x). In our case,
integrate(exp(%i*t*x)*pdf_normal(x,mu,sigma),x, minf, inf);
e2iμt−σ2t22
Random simulation of variable X∼N(15,3), together with its density function,
m: 15 $ s: 3 $ n: 100 $ dat: random_normal(m, s, n) $ /* we need package descriptive for the histogram */ load("descriptive") $ draw2d( grid = true, dimensions = [400,300], terminal = svg, histogram_description( dat, nclasses = 9, frequency = density, fill_density = 0.5), explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s) ) $
As in the simulation section above, let's generate n=15 random Gaussians with μ=15 and σ=3,
fpprintprec: 5$ dat: random_normal(15, 3, n:15);
[11.319,15.544,17.707,12.983,15.611,11.09,12.685,15.667,14.203,20.182,16.63,17.792,11.664,17.897,15.44]
We want to estimate μ and σ by the method of maximum likelihood. First, we obtain the log-likelihood function,
logexpand: all $ loglike: log(product (pdf_normal(dat[i],mu,sigma), i, 1, n));
−15logσ−(20.182−μ)22σ2−(17.897−μ)22σ2−(17.792−μ)22σ2−(17.707−μ)22σ2−(16.63−μ)22σ2−(15.667−μ)22σ2−(15.611−μ)22σ2−(15.544−μ)22σ2−(15.44−μ)22σ2−(14.203−μ)22σ2−(12.983−μ)22σ2−(12.685−μ)22σ2−(11.664−μ)22σ2−(11.319−μ)22σ2−(11.09−μ)22σ2−15logπ2−15log22
The log-likelihood estimators of μ and σ are those who maximize the likelihood or log-likelihood function,
ml: float( solve( [diff(loglike, mu)=0, diff(loglike, sigma)=0], [mu, sigma]));
[[μ=15.094,σ=−2.6359],[μ=15.094,σ=2.6359]]
Since σ must be positive, the true estimators are given by the second solution,
ml: ml[2];
[μ=15.094,σ=2.6359]
These values coincide with the sample mean and standard deviation,
load(descriptive) $ [mean(dat), std(dat)];
[15.094,2.6359]
The observational information matrix is given by
infomatrix: subst(ml, -hessian(loglike, [mu, sigma]));
(2.15885.2374×10−155.2374×10−154.3177)
According to the maximum likelihood theory, estimators are asympthotically normal with covariance matrix equal to the inverse of the information matrix,
covmat: invert(infomatrix);
(0.46321−5.6188×10−16−5.6188×10−160.23161)
The covariance of the two estimators is near zero. It can be demonstrated that they are in fact independent random variables.
Asymthotic confidence interval for μ with significance level α=0.05,
alpha: 0.05 $ float(rhs(ml[1])+[-1,1] * quantile_normal(1-alpha/2,0,1) * sqrt(covmat[1,1]));
[13.76,16.428]
Asymthotic confidence interval for σ,
float(rhs(ml[2])+[-1,1] * quantile_normal(1-alpha/2,0,1) * sqrt(covmat[2,2]));
[1.6927,3.5792]
Akaike's Information Criterion, in case we want to compare this model to others,
-2*subst(ml, loglike) + 2*2, numer;
75.645
© 2017, TecnoStats.