# Central Limit Theorem

Visualize Central Limit Theorem

Continuing previous example, we discussed how Weak Law of Large Numbers helps us getting the average height of students. Remember that the average height we calculated\((\overline{X}_n)\)is arandom variable, so it must have an underlying probability distribution and we don't know what that distribution is.Central Limit Theoremhelps us to approximate that unknown distribution. Once we know the underlying distribution we can answer a lot of questions.

## Introduction

Say we have a large population and we took a**sample**out of that population, and we want to study the

**distribution**of the

**average**of some specific property of that sample.

For example say we want to study the average height of students in our class.Central Limit Theorem(CLT) helps us in this. CLT says:

No matter what distribution(with finite mean and variance) ourpopulationfollows, as we increase sample size,Sampling distributionof the meanconvergesto a Normal distribution.

### Sampling Distribution of the Mean

Say you collected**multiple**samples (of same sample size) out of the population, then you take the average of all of those samples, and then you plot a histogram of those sample averages. This histogram is what we are referring as Sampling distribution of the mean.

Fortunately, we don't need

**multiple**samples. CLT helps us to approximate sampling distribution with just one sample.

We only have to take care that our sample size is sufficiently large.

Now the question arises, how large should our sample size be (say

\(n\)

)? Rule of thumb: If our distribution is symmetric around mean then\(n\geq 30\)is sufficient to apply Central Limit Theorem.

#### But remember it's just a rule of thumb!

The more the true distribution varies from the Normal distribution, the larger sample size is required.

If the distribution is not symmetric around mean, then\(n\geq 30\)might not be near to sufficient!

What you can do is, plot theCDFof your data and superimposeCDFof corresponding Normal distribution, and see that do they superimpose nicely? (We have coveredit in our Python / Julia simulation.)

Definition of nicely is up to you, how much error are you willing to accept.

You can also use some statistical test to check if our Sampling distribution matches the Normal distribution (like Kolmogorovâ€“Smirnov test). We will cover these tests in this guide.

## Convergence of Sampling Distribution

Now let's say we have a population and we draw\(n\)

random **I.I.D.**observations

\(X_1,X_2,\cdots,X_n\)

from it. These

\(n\)

**I.I.D.**observations are result of some random process with

**unknown**distribution. This random process has a finite mean say

\((\mu)\)

and a finite variance say \((\sigma^2)\)

. \(\mathbb{E}[X]=\mu\)

and \(Var(X)=\sigma^2\)

Estimator we used for

\(\mu\)

is \(\overline{X}_n=\frac{1}{n}\left(X_1+X_2+\cdots+X_n\right)\)

. And according to Weak Law of Large Numbers

\(\overline{X}_n \xrightarrow [n\to \infty ] {\mathbb{P}} \mu\)

## \(\displaystyle\text{Var}(\overline{X}_n)=\frac{\sigma^2}{n}\)

\(\displaystyle\text{Var}(\overline{X}_n)=\text{Var}\left(\frac{1}{n}\left(X_1+X_2+\cdots+X_n\right)\right)\)\(\displaystyle\text{Var}(\overline{X}_n) = \frac{1}{n^2}(\text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n))\)\(\displaystyle\text{Var}(\overline{X}_n) = \frac{1}{n^2}(\underbrace{\sigma^2 + \cdots + \sigma^2}_{\text{n times}} )\)\(\displaystyle\text{Var}(\overline{X}_n) = \frac{1}{n^2}(n\sigma^2)\)\(\displaystyle\text{Var}(\overline{X}_n)=\frac{\sigma^2}{n}\)

\(n\to \infty\)

, \(\text{Var}(\overline{X}_n)\to 0\)

, then as a consequence, the probability distribution of \(\overline{X}_n\)

is highly concentrated in an arbitrarily small interval around mean \(\mu\)

. So this probability distribution doesn't help us in any way, as it's totally concentrated around one number \(\mu\)

. \(\sqrt{n}\ \overline{X}_n\)

. \(\displaystyle\text{Var}(\sqrt{n}\overline{X}_n)=\sigma^2\)for the distribution of\(\sqrt{n}\ \overline{X}_n\), variance remains unchanged.\(\displaystyle\mathbb{E}[\sqrt{n}\overline{X}_n]=\sqrt{n}\mu\)but\(\mathbb{E}[\sqrt{n}\overline{X}_n] \xrightarrow [n\to \infty ]{} \infty\), so let's center the distribution around\(0\).

\(\sqrt{n}\ (\overline{X}_n - \mu)\)

. \(\displaystyle\text{Var}\left(\sqrt{n}\ (\overline{X}_n - \mu)\right)=\sigma^2\)\(\displaystyle\mathbb{E}\left[\sqrt{n}\ (\overline{X}_n - \mu)\right]=0\)

Now there is no effect of\(n\)on both the variance and the expectation of the distribution of\(\sqrt{n}\ (\overline{X}_n - \mu)\).

\(\displaystyle\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\)

. \(\displaystyle\text{Var}\left(\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\right)=1\)\(\displaystyle\mathbb{E}\left[\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\right]=0\)

Now we have standardize our random variable\(\overline{X}_n\)for everymean\((\mu)\),variance\((\sigma^2)\)andsample size\((n)\).

Say

\(Z_n := \displaystyle\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\)

, and \(Z\sim\mathcal{N}(0,1)\)

. (

\(Z\)

is a standard normal random variable with mean \(0\)

and variance \(1\)

). Now

**Central Limit Theorem**states,

For everySo Central Limit Theorem says that, as\(z:\)\[\lim_{n\to\infty}\mathbb{P}(Z_n \lt z) = \mathbb{P}(Z \lt z)\]\[\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \xrightarrow [n\to \infty ]{(d)} \mathcal{N}(0,1) \]

\(n\to \infty\)

then **CDF**(cumulative distribution function) of the random variable

\(Z_n\)

converges to the **CDF**of standard normal random variable

\(Z\)

. SoCentral Limit Theoremis a statement about the convergence ofCDF, it'snota statement about the convergence of PDF or PMF.

Rule of thumb to apply CLT: when\(n\geq 30\)then\(\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \xrightarrow []{(d)} \mathcal{N}(0,1) \)

## Rate of Convergence

Now we can finally answer our previous question in Weak Law of Large Numbers that, how fast(at what rate)\(\overline{X}_n\)

approaches to \(\mu\)

. If we draw a standard gaussian

\(\mathcal{N}(0,1) \)

(say) \(Z\)

, then with probability \(0.9974,\)

\(Z\in [-3,3]\)

\(P(-3\leq Z\leq3)=0.9974\)

(we can calculate it here).So

\(Z\)

is almost in between -3 and 3And we know that :

\[\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \xrightarrow [n\to \infty ]{(d)} \mathcal{N}(0,1) \]

And we say that

\(-3\leq\mathcal{N}(0,1)\leq3\)

So:

\[ -3\leq \sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \leq3 \\ \Rightarrow \left|\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \right| \leq 3 \\ \Rightarrow \left| \overline{X}_n-\mu \right| \leq \frac{3\sigma}{\sqrt{n}} \]

So according to CLT

\(f(n)=\sqrt{n}\)

Now let's see some Simulation, choose your language of choice,

,

Launch Statistics App

## Recommended Watching

Central Limit Theorem (by Prof. John Tsitsiklis)

Central Limit Theorem (by Khan Academy)

Central Limit Theorem (by Sir Josh Starmer)

Central Limit Theorem (by Sir Jeremy Balka)

Real-world application of the CLT (by 365 Data Science)