- Learn how to generate random numbers (sample) from some famous discrete and continuous distributions.
- Learn different ways of generating random numbers.
- Learn the visual presentations of a random sample including histogram, boxplot, probability plot and empirical CDF.

** 範例 1：Histogram**

Draw histogram of a random sample drawn from a normal distribution .

Note:

- This example demonstrate the how the
**numpy**method and**scipy**function generate random sample. - Has to know about “random seed”.
- The histogram can be created by counts or by frequency as shown below.
- Try on various sizes of random samples to see the differences on the shape of the histograms.
- The number of bins may alter the visual outlook of the histogram, that may lead to a wrong decision. Try to alter the number of bins especially for the middle size of sample.

```
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
# random number generator with random seed
rng = np.random.default_rng() # (seed=...)
x = rng.normal(loc = 0, scale = 1, size = 1000)
# x = norm.rvs(loc = 0, scale = 1, size = 1000)
fig, (ax1, ax2) = plt.subplots(2, 1)
# typical histogram by counts
ax1.hist(x, bins = 10, alpha = 0.6, color = 'b', edgecolor = 'y', linewidth = 1)
# histogram by frequency
ax2.hist(x, bins = 10, alpha = 0.6, color = 'b', edgecolor = 'y', linewidth = 1, density = True, rwidth = 0.9)
plt.suptitle('Histogram by counts and frequency')
plt.show()
```

** 範例 2：Boxplot**

Draw boxplots of two random samples drawn from two different normal distribution of and

Note:

- This example demonstrates the generation of random number with and without random seed. The reader needs to print out a few random numbers and observe the difference between using and not using random seed.
- The display of Boxplot can be very rich and colorful. The codes below show a couple: the dictionary of “boxprops” and “flierprops”.

```
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
rng = np.random.default_rng(0) # seed is fixed
n = 1000 # sampe size
x1 = norm.rvs(size = n, random_state = rng) # standard normal
x2 = norm.rvs(loc = 1, scale = 2, size = n)
# run a few times and check the random numbers generated
# with and without fixing the random seed.
# print(x1[0:3])
# print(x2[0:3])
fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (9, 6))
ax1.boxplot(np.c_[x1, x2], notch = True, vert = True, labels = ['X1','X2'])
# plt.show()
# use boxplot properties
boxprops = dict(linestyle = '--', linewidth = 3, color = 'darkgoldenrod')
flierprops = dict(marker='o', markerfacecolor = 'green',
markersize = 8, linestyle = 'none') # define outliers
labels = ['N(0,1)','N(1,1)']
ax2.boxplot(np.c_[x1, x2], boxprops = boxprops,
flierprops = flierprops, labels = labels)
plt.show()
```

** 範例 3：Probabilty plot**

Draw a normal probability plot for random sample drawn from normal distribution and a right-skewed distribution, respectively.

Note:

```
import numpy as np
import scipy.stats as stats
x1 = np.random.normal(loc = 2, scale = 5, size=100)
x2 = np.random.chisquare(df = 2, size = 100)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (12, 4))
stats.probplot(x1, dist = "norm", plot = ax1)
stats.probplot(x2, dist = "norm", plot = ax2)
# stats.probplot(x2, dist = "chi2", sparams = 2, plot = ax2)
ax1.set_title('Data from Normal')
ax2.set_title('Data from $\chi^2$')
plt.suptitle('probability Plot')
plt.show()
```

** 範例 4：Empirical CDF**

Draw empirical CDF (ECDF) of a random sample drawn by a given distribution e.g. . Then compare the ECDF with the real CDF and observe the difference when sample size is getting large.

Note:

- The empirical CDF simulates the CDF of the population by random sample of size n.
- The
**scipy**module provides function “cumfreq” (cumulative frequency over sorted sample) for ECDF. This is called “approach 1” in the following codes. - The easiest method to simulate the CDF from the sample is to treat the probability of occurrence for each sample point is . This method is called “approach 2” in the following codes.
- To see how ECDF resemble the real population CDF, the empirical-CDF and real-CDF are plotted on the same figure. The reader should change the sample size to get an idea of the resemblance according to sample size.

```
import numpy as np
from scipy.stats import norm
from scipy.stats import cumfreq # for ECDF
n = 100
sample = norm.rvs(size = n)
# Approach 1
num_bins = n # cumulative frequency over each sample
res = cumfreq(sample, num_bins)
# print(res.lowerlimit, res.binsize, res.cumcount.size)
x = res.lowerlimit + np.linspace(0,
res.binsize * res.cumcount.size, res.cumcount.size)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize = (6, 9))
# ax1.bar(x, res.cumcount/n, width=res.binsize)
ax1.plot(x, res.cumcount/n, drawstyle = 'steps-pre')
ax1.set_title('The Empirical CDF with {} steps'.format(num_bins), fontsize = 'small')
ax1.grid(True)
# Approach 2
x = np.sort(sample)
Y = np.arange(1, n+1) / n
ax2.plot(x, Y, drawstyle = 'steps-pre', label = 'Empirical CDF')
# draw the theoretical CDF of Z
Y_ = norm.cdf(x)
ax2.plot(x, Y_, color = 'r', linestyle = '--',
linewidth = 5, alpha = 0.5, label = 'real CDF')
ax2.legend(), plt.grid(True)
ax2.set_title('The Empirical CDF with {} samples'.format(num_bins), fontsize = 'small')
plt.show()
```

** 練習：Sampling Distribution**

Let the random variable be normally distributed with . If , the random variable is known to follow distribution with degree of freedom 1. Verify this well known theorem by generating random samples of size n from , squaring them and then drawing histogram, boxplot, probability plot and ECDF.

Note:

- To get a good view of the histogram, it needs to adjust the bin numbers. When attached by the real PDF line, it needs to select appropriate ylim.

** 練習：Sampling Distribution** , where

Use simulation data to verify that . Select an appropriate graph for the verification.

Note:

- Simulate the sampling distribution for the sample size of , respectively.

** 練習：Sampling Distribution for Central Limit Theorem (CLT)**

The experiment:

- Let
- Let
- As

- Let be uniform, ,
- Let .

- The histogram of a random sample (size N = 10000) of Y and the expected pdf line of . For example, the graph shown on the left hand side below comes from , , and .
- Compute the sample mean and sample standard deviation of N random samples of Y, where N=10000. The graph on the right hand side below shows the comparison of with the theoretical counterpart.
- The reader needs patience to work out the details including the appropriate Y range, the xticklabels, etc.

** 範例 5：Boxplot for multiple data sets**

Draw N boxplots in a figure for N data sets. In this example, the N data sets are drawn form N independent variables distributed normally. Each data set contains n data points.

Note:

- Boxplot presents the descriptive statistics of a data set visually.
- It is often used to compare among multiple data sets.

```
import matplotlib.pyplot as plt
from scipy.stats import norm
n, N = 20, 100
X = norm.rvs(size=(N, n))
plt.boxplot(X)
plt.title('The boxplots of {} independent random samples from $N(0,1)$'.format(n))
plt.show()
```

** 習題：**

- Choose three sampling distributions you have learned. Verify the theorems by simulations.