# 汪群超 Chun-Chao Wang

Dept. of Statistics, National Taipei University, Taiwan

# Lesson 8: Minimization of multi-variate function

### Objective

1. Learn how to draw the 3D graph for a two-variable function.
2. Learn how to draw the contour plot of a two-variable function.
3. Learn to find the constraint and unconstraint local minimum of a multivariate function.
4. Learn how to simulate data.
5. Solve the parameter estimation problem for normal and $\beta$ mixture data.

## Prerequisite

1. Draw the mesh (wireframe) graphs for $f(x,y) = xe^{-x^2-y^2}$
2. Find the minimum of $f(x,y)$.

Note:
1. Learn to express multivariate function.
2. The first 3D graph is called wireframe or mesh plot. The second is called surface plot.
3. In wireframe plot, the attributes “rstride” and “cstride” are sued to control the density of wires in a row and column directions. The smaller values, the denser wires. You may want to change these numbers to see the results.
4. A 3D graph has two viewing angles to adjust: elevation and azimuth. It is necessary to rotate the graph to get a best viewing angle, record the elevation and azimuth numbers, and then use view_init in the codes. It is noted that you can not rotate a 3D graph in jupyter notebook. Have to di it in python environment.
5. surface is a colorful 3D graph in which a color map can be applied by cmap. Refer to the document of matplotlib for more color map to use, e.g. plt.cm.bone
6. Since surface 3D graph is colorful, it is often matched with a color bar that digitizes colors.
import numpy as np
import matplotlib.pyplot as plt

f = lambda x : x[0] * np.exp(-x[0]**2 - x[1]**2)

x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 3, 100)
X, Y = np.meshgrid(x, y) # mesh grid matric
Z = f([X, Y])

# for wireframe, control the wire density by rstride and cstride
fig = plt.figure(figsize=(9, 6))
ax = plt.axes(projection = '3d')
ax.plot_wireframe(X, Y, Z, color ='blue',
alpha=0.3, rstride = 1, cstride = 1)
ax.set_xlabel('X'), ax.set_ylabel('Y')
ax.set_zlabel('f(X,Y)')
ax.view_init(10, -60)  #(elev=-165, azim=60)
plt.title('Wireframe (Mesh) Plot')
plt.show()

# for surface 3D plot
fig = plt.figure(figsize=(9, 6))
ax = plt.axes(projection = '3d')
surf = ax.plot_surface(X, Y, Z, color = 'r', \
rstride=4, cstride=4, alpha =0.6, cmap='ocean') # cmap = plt.cm.bone
# cmap = plt.cm.bone
fig.colorbar(surf, ax=ax, shrink=0.5, aspect=10) # aspect = length/width ratio
ax.view_init(10, -60)  #(elev=-165, azim=60)
ax.set_xlabel('X'), ax.set_ylabel('Y')
plt.title('Surface Plot')
plt.show()


Continued with the previous function, draw contour plots for a two-variable function.

1. The contour lines are the levels of various function values.
2. You need to decide how many contours lines or which levels to show on the graph. The following codes demonstrate both.
3. Instead of using lines to exhibit the function levels, the command contourf uses filled colors.
4. The following codes also demonstrate a colorbar with details.
... continued with the previous codes ...

# To draw a contour plot
levels = np.arange(-0.4, 0.4, 0.01) # levels of contour lines
contours = plt.contour(X, Y, Z, levels=levels) # check dir(contours)
# add function value on each line
plt.clabel(contours, inline = 0, fontsize = 10) # inline =1 or 0
cbar = plt.colorbar(contours)
plt.xlabel('X'), plt.ylabel('Y')
cbar.ax.set_ylabel('Z = f(X,Y)') # set colorbar label
plt.title('Contour Plot')
plt.grid(True)
plt.show()

... continued with the previous codes ...

# draw a contour plot with contourf
C1 = plt.contourf(X, Y, Z, 30, \
cmap = plt.cm.bone)
C2 = plt.contour(C1, levels = C1.levels, \
colors = 'r') # check dir(contours)
plt.colorbar(C1)
plt.xlabel('X'), plt.ylabel('Y')
plt.title('contourf + contour')
plt.show()


Compute the minimum of the function in the previous example .

Note:

1. The scipy.optimize.fmin is used to solve for the local minimum of a multivariate function.
2. The command scipy.optimize.fmin includes many attributes that are related to the algorithmic convergence. For example, “maxiter” defines 1000 as the maximum number of iterations; “maxfun” limits the number of function evaluations to 1000 times in the following codes; “disp = False” means “Do not show the convergence messages at the end; “full_output = True” will include the optimal function value in the output. Other important attributes include “xtol”, “ftol”. To include all parameters, we like to use a dictionary list for convenience. Refer to the last part of the following codes.
3. You may want to print the output variable “OptVal” to see what are inside.
4. scipy.optimize.fmin uses the downhill simplex algorithm without any derivatives.
import numpy as np
import scipy.optimize as opt

f = lambda x : x[0] * np.exp(-x[0]**2 - x[1]**2)
OptVal = opt.fmin(func = f, x0 = [0, -1], maxiter = 1e3, maxfun = 1e3, disp = False, full_output = True)
print('The minimum occurrs at x = {:.4f}, y = {:.4f}'.format(OptVal[0][0], OptVal[0][1]))
print('The function vaule is {:.4f}'.format(OptVal[1]))

# Use a dictionary to express algorithmic parameters
opts = dict(disp = 1, xtol = 1e-6, ftol = 1e-6, maxfun = 1e4, maxiter = 1e4, full_output = True)
OptVal = opt.fmin(func = f, x0=[0, 0], **opts)


For the previous function, put a marker on the spot of the minimum and the maximum, respectively on the contour plot.

For the function $f(x_1,x_2) = (x_1-2)^4 + (x_1-2)^2x_2^2 + (x_2+1)^2$

1. Draw the wireframe and contour graphs.
2. Solve the minimization problem: $\displaystyle\min_{x_1, x_2} f(x_1, x_2)$.

For the famous Rosenbrock’s banana test function
$f(x) = 100(y-x^2)^2 + (1-x)^2$

1. Draw wireframe and contour graphs of the function.
2. Compute the minimum of the function.

Solve the constraint optimization problem:

$\displaystyle \min_{x_1, x_2 \in \Omega} (x_1-2)^4 + (x_1-2)^2x_2^2 + (x_2+1)^2$

where $\Omega$ is as follows

1. $\Omega = \{x_1, x_2 \in R|\; x_1 \geq 0, x_2 \geq 0 \}$
2. $\Omega = \{x_1, x_2 \in R|\; x_1 \leq 1, x_2 \leq 2 \}$
3. $\Omega = \{x_1, x_2 \in R|\; 0 \leq x_1 \leq 1, 0 \leq x_2 \leq 2 \}$
4. $\Omega = \{x_1, x_2 \in R|\; 0 \leq x_1 \leq \infty, -\infty \leq x_2 \leq 2 \}$
5. $\Omega = \{x_1, x_2 \in R|\; x_1 +x_2 \leq 0.9 \}$
6. $\Omega = \{x_1, x_2 \in R|\; 1.5 \leq x_1 +x_2 \leq 2 \}$
7. $\Omega = \{x_1, x_2 \in R|\; \sqrt{x_1^2 +x_2^2} \leq 1 \}$
8. $\Omega = \{x_1, x_2 \in R|\; \sqrt{x_1^2 +x_2^2} \leq 1, \; x_1x_2 \geq 0\}$

Note:
1. The constraints include linear and nonlinear inequalities.
2. The constraint optimization is commonly seen in real applications.
3. In this example, scipy.optimize.minimize is employed to solve the constraint minimization problem.
4. The unconstraint version of this problem is solved previously. Compare the result with those under carious constraints.
5. The following codes suggest the approaches for various constraints. Uncomment the command lines to see the results.
6. The command minimize can also be used to solve unconstraint problem. Refer to the document for the difference between minimize and fmin of scipy.optimize.
import scipy.optimize as opt
import numpy as np

f = lambda x: (x[0] - 2)**4 + (x[0] - 2)**2*x[1]**2 + (x[1]+1)**2
opts = dict(disp = True, maxiter=1e4)
# 1
cons = [{'type': 'ineq', 'fun': lambda x:  x[0]}, {'type': 'ineq', 'fun': lambda x:  x[1]}]
bnds = []
# 2: approach 1
# cons = []
# bnds = [(None, 1), (None,2)]
# 2: approach 2
# cons = [{'type': 'ineq', 'fun': lambda x: 1 - x[0]}, {'type': 'ineq', 'fun': lambda x: 2 - x[1]}]
# bnds = []
# 3
# cons = []
# bnds = [(0, 1), (0, 2)]
# 4
# cons = []
# bnds = [(0, np.inf), (-np.inf, 2)]
# 5
# cons = {'type': 'ineq', 'fun': lambda x:  -x[0] - x[1] + 0.9}
# bnds = []
# 6: approach 1
# cons = [{'type': 'ineq', 'fun': lambda x:
#             x[0] + x[1] - 1.5},
#         {'type': 'ineq', 'fun': lambda x:
#             -x[0] - x[1] + 2}]
# bnds = []
# 6: approach 2
# A = [[1, 1], [-1, -1]]
# b = [-1.5, 2]
# cons = {'type': 'ineq', 'fun': lambda x:
#             A @ x + b}
# bnds = []
# 6: approach 3
# A = [1, 1]
# lb, ub = 1.5, 2
# cons = opt.LinearConstraint(A, lb, ub)
# bnds = []

# 7: approach 1
# cons = {'type': 'ineq', 'fun': lambda x:
#             1 - np.sqrt(x[0]**2 + x[1]**2) }
# 7: approach 2
# con = lambda x:  np.sqrt(x[0]**2 + x[1]**2)
# lb, ub = 0, 1
# cons = opt.NonlinearConstraint(con, lb, ub)
# bnds = []
# 8:
# cons = [{'type': 'ineq', 'fun': lambda x:
#             1 - np.sqrt(x[0]**2 + x[1]**2)},
#         {'type': 'ineq', 'fun': lambda x:
#             x[0] * x[1]}]
# bnds = []

res = opt.minimize(f, x0=[0, 0],
bounds = bnds,
constraints = cons,
options = opts,
tol = 1e-8)
# print(res)
print('x1 = {:.4f}, x2 = {:.4f}'.format(res.x[0], res.x[1]))
print('function value is {:.4f}'.format(res.fun))



$f(x|\Omega) = \pi_1 \beta(x|a_1,b_1) + \pi_2 \beta(x|a_2, b_2)$

Note: We want to estimate the unknown parameters $\Omega$ by maximizing the log joint likelihood functions as shown

$\displaystyle \max_{\Omega = \{\pi_1, \pi_2, a_1, b_1, a_2, b_2|\pi_1 + \pi_2 =1, \pi_1, \pi_2, a_1, b_1, a_2, b_2 >0\}} \ln \Pi_{i=1}^{n} f(x_i|\Omega)$

The objective function is writtern as

$L(\Omega) = \ln \Pi_{i=1}^{n} f(x_i|\Omega) = \sum_{i=1}^n \ln(\pi_1 \beta(x_i|a_1, b_1) + \pi_2 \beta(x_i|a_2, b_2))$

To do the simulation for this example:

1. Set up the scenario (the parameters), e.g. $\pi_1 = 0.7, a_1=2, b_1=9, a_2=3, b_2=3$, sample sie = 1000.
2. Plot the pdf graph of the population according to the scenario.
3. Generate simulated sample of size n according to the scenario.
4. Plot histogram of the random sample.
5. Estimate the parameters by solving the MLE problem.
6. Draw the estimated mixture pdf on top of histogram and the theoretical pdf.
import numpy as np
import scipy.optimize as opt
import matplotlib.pyplot as plt
from scipy.stats import beta, binom

# set up the parameters --------------
pi1, a1, b1, a2, b2 = 0.7, 2, 9, 3, 3
# draw the mixture pdf ---------------
f = lambda x: pi1 * beta.pdf(x, a1, b1) + (1-pi1) * beta.pdf(x, a2, b2)
x = np.linspace(0, 1, 1000)
plt.plot(x, f(x), color = 'r', linewidth = 3, label = 'True mixture')
# generate the simulated sample -------
n = 1000
n1 = binom.rvs(n, pi1)
n2 = n - n1
sample = np.r_[beta.rvs(a1, b1, size = n1),
beta.rvs(a2, b2, size = n2)]
# plot histogram ------------------------
plt.hist(sample, 50, edgecolor = 'y', density = True)
# max mle (min -mle) --------------------
L = lambda x : -np.sum(np.log(x[0] * beta.pdf(sample, x[1], x[2]) + (1 - x[0]) * beta.pdf(sample, x[3], x[4])))
# the constraints, bounds and options
cons = []
bnds = [(0, 1), (0, np.inf), (0, np.inf), (0, np.inf), (0, np.inf)]
opts = dict(disp = True, maxiter = 1e4)
x0 = [0.5, 1, 10, 5, 5] # initial guess
res = opt.minimize(L, x0 = x0,
bounds = bnds,
constraints = cons,
options = opts,
tol = 1e-8)
# show the results and comapre to res.fun with population parameters
print(res)
# print(L([pi1, a1, b1, a2, b2])) # the function value of the initial guess
# plot the estimated mixture pdf
f_hat = lambda x: res.x[0] * beta.pdf(x, res.x[1], res.x[2]) + (1-res.x[0]) * beta.pdf(x, res.x[3], res.x[4])
plt.plot(x, f_hat(x), color = 'k', linestyle = '--', \
linewidth = 3, label = 'Estimated mixture')
plt.legend()
plt.show()

Note:
1. This is a constraint optimization problem. The constraints are: $0 \leq\pi_1\leq 1$, $0 < a_1, b_1, a_2, b_2 < \infty$.
2. The default algorithm for constraint optimization by minimize is L-BFGS-B.

1. Re-do the previous example by setting up different scenarios. For example, observe the estimation accuracy by changing the sample size from small to large, say n= 50, 100, 500, 1000, 10000.
2. Re-arrange the proportion $\pi_1$, and the parameters $(a_1, b_1)$, $(a_2, b_2)$ of the two $\beta$ distributions.
3. For each scenario, repeat N times, e.g. N= 10000, and take average for each parameter and compare with the their truth values, respectively.

$\max_{\Omega=\{\pi_1, \pi_2,\mu_1,\sigma_1^2, \mu_2,\sigma_2^2| \pi_1+\pi_2=1, \pi_1, \pi_2, \sigma_1^2, \sigma_2^2>0\}} L(\Omega)$

$L(\Omega)=\sum_{i=1}^{N} \ln (\pi_1 f(x|\mu_1,\sigma_1^2) + \pi_2 f(x|\mu_2,\sigma_2^2))$

$f(x|\mu,\sigma^2)$ 為常態分配的機率密度函數。請模仿前範例的介紹，自行產生適用的資料並運用 minimize估計參數 $\Omega=\{\pi_1,\mu_1,\sigma_1^2, \mu_2,\sigma_2^2\}$。樣本數大小可以先設為 100，成功估計出參數後，再調整成 30, 50, 500, 1000，10000, 分別評估樣本數大小對估計值的影響。

$\max_{\alpha,\beta >0} \ln L(\alpha,\beta)$

$L(\alpha,\beta)=\prod_{i=1}^n f_t(v_i |\alpha,\beta)F_T(u_i|\alpha,\beta)^{-1}$

where $f_t(v|\alpha,\beta)=\alpha\beta v^{\beta-1}exp(-\alpha v^{\beta})$

$F_T(u|\alpha,\beta)=1-exp(-\alpha u^{\beta})$

（不熟悉的部分，再往前面的範例或章節找到可供參考的程式片段）

1. 先下載資料檔，取出資料並觀察資料的樣子。

2. 目標函數 $\ln L(\alpha,\beta)$ 需要進一步推導到比較適合的樣子，也就是將 $\prod$ 透過 $\ln$ 換成 $\Sigma$。並不是連乘的 $\prod$ 不能計算，非要換成連加的 $\Sigma$ 不可，而是當樣本數多時，連乘的計算比較不穩定，太大或太小的數值連乘可能超過硬體的極限。所以典型的最大概似估計問題，往往會在原目標函數前加上對數 $\ln$ 轉換成連加模式。請盡量將式子推到最精簡。(讀者仍可以試試看直接以 $\ln \prod$$\prod$ 的函數模式直接計算，也許也會得到相同的答案喔！)

3. 利用推導到精簡的目標函數，繪製立體圖與等高線圖。繪圖時，需要摸索參數的範圍，找到最佳的觀察位置。畫得好，隱約可以看出最大值的位置（如下圖）。

4. 接著開始部署 minimize 的各項停止條件及計算。有了等高線圖的幫助，通常答案已經呼之欲出，計算的結果只是得到一組更明確的數據。

$f(z)=\int_z^1 f_Y(y)f_X(z/y) \frac{1}{y} \;dy$

$\displaystyle \min_{a,b>0} \int_0^1 (f(z) - \beta(z|a,b))^2 \;dz$

1. 本題題意暗示 $Z$ 雖然不是貝他分配，但是接近某個貝他分配。所以不妨先繪製其機率密度函數 $f(z)$ 的長相，並找個貝他分配的機率密度函數疊上去比比看，如下圖。本題的意思便是要找一個最接近 $f(z)$ 的貝他分配。

2. 當匿名函數比較複雜時，可以搭配副程式將複雜的部分拉到副程式去做。

1. 這個目標函數牽涉到內外層兩個積分，使得程式的撰寫有點彆扭。因此建議先畫出 $f(z)$ 的 pdf 圖，也就是先處裡一個積分。

2. 接著試著寫出目標函數 $\int_0^1 (f(z) - \beta(z|a,b))^2 \;dz$。在進行下一步計算最小值之前，先測試函數是否寫對了（或至少能產生函數值），譬如帶入 $a=2, b=2$.

3. minimize 指令內建許多演算法（BFGS, Nelder-Mead simplex, Newton Conjugate Gradient, COBYLA or SLSQP,…etc.），各有擅長應對的函數（有些適合處理 constraint，有些不適合）。試著在用不同的演算法，譬如 method = ‘L-BFGS-B’，看看哪個比較快。