https://datascienceschool.net/view-notebook/4642b9f187784444b8f3a8309c583007/ ================================================================================ * Optimization problem in math notation * $$$x^*=\arg_x \max f(x)$$$ * Find argument solution $$$x^*$$$ which maximizes f(x) * $$$x^*=\arg_x \min f(x)$$$ * Find argument solution $$$x^*$$$ which minimizes f(x) * Code * Minimization optimzation output_data_list=function_f(input_data_list) index_of_min_val=get_index_of_min_val(output_data_list) arg_solution=input_data_list[index_of_min_val] * Code * Maximzation optimzation output_data_list=function_f(input_data_list) output_data_list=-output_data_list index_of_min_val=get_index_of_min_val(output_data_list) arg_solution=input_data_list[index_of_min_val] * function_f: objective function, cost function, loss function, error function ================================================================================ $$$f(x)=(x-2)^2 + 2 $$$ $$$x^{*}=\arg_x \min f(x)$$$ $$$x^{*}=2$$$ ================================================================================ Rosenbrock function $$$f(x, y) = (1 − x )^2 + 100(y − x^2)^2$$$ $$$x^{*},y^{*}=\arg_{x,y} \min f(x,y)$$$ $$$x^{*}=1,y^{*}=1$$$ ================================================================================ * Grid search ================================================================================ * Numerical optimization: You move x location until you find optimized y value, as less as possible * Tools for numerical optimzation * Tool1: Current x location is optimal location? * Tool2: How to find next x? ================================================================================ * Tool1: Current x location is optimal location? - Univariate function * $$$\dfrac{d f(x)}{dx}=0$$$ * $$$f^{'}(x)=0$$$ - Mutivariate function * $$$\nabla f = 0$$$ * Gradient of f is 0 * $$$g=0$$$ * g: gradient of f $$$\dfrac{\partial f(x_1,x_2,\cdots,x_N)}{\partial x_1}=0$$$ $$$\dfrac{\partial f(x_1,x_2,\cdots,x_N)}{\partial x_2}=0$$$ $$$\;\;\;\;\;\;\;\;\;\;\vdots$$$ $$$\dfrac{\partial f(x_1,x_2,\cdots,x_N)}{\partial x_N}=0$$$ * $$$g=0$$$ and $$$g' \gt 0$$$: minimum point * $$$g=0$$$ and $$$g' \lt 0$$$: maximum point ================================================================================ * Tool2: How to find next x? * Use SGD (Steepest Gradient Descent) $$$x_{k+1} \\ =x_k - \alpha \times \triangledown f(x_k) \\ =x_k - \alpha \times g(x_k)$$$ * Code alpha=learning_rate slope_current_x=slope(current_x) next_x=current_x-alpha*slope_current_x If slope_current_x==0, next_x=current_x ================================================================================ $$$x_{1}$$$ -> $$$x_{2}$$$ -> $$$x_{3}$$$ ================================================================================ * Newton method which uses second order derivative Suppose: objective function should be second order function Then, you can find optimal point $$$x_k$$$ by one try Method: Use "changed gradient vector" changed gradient vector in its direction and length = (inverse mat of Hessian mat) * (gradient mat) $$${x}_{n+1} = {x}_n - [{H}f({x}_n)]^{-1} \times \nabla f({x}_n)$$$ Meaning: 1. You don't need learning rate 2. You need one order derivative (gradient vector) 3. You need second order derivative (Hessian matrix) ================================================================================ * Example of Newton method * Second order univariate function f(x) $$$f(x) \\ = a(x-x_0)^2 + c \\ = ax^2 -2ax_0x + x_0^2+c$$$ * $$$x^{*} = \arg_{x}\max f(x)$$$ $$$x^{*} = x_0 $$$ * Apply Newton method $$$f'(x) = 2ax - 2ax_0$$$ $$$f''(x) = 2a$$$ $$${x}_{n+1} \\ = {x}_n - \dfrac{2ax_n - 2ax_0}{2a} \\ = x_n - (x_n - x_0) \\ = x_0$$$ ================================================================================ * Limitation of Newton method - Newton method fast converges - But you have to find 2nd order derivative (Hessian matrix function) - Newton method can't converges well based on shape of function ================================================================================ * Quasi-Newton method - It doesn't use Hessian matrix function which is calculated by human - It finds several y values at several x values around current_x - You approximate "2nd order derivative (Hessian matrix function)" by using above y values - BFSG (Broyden-Fletcher-Goldfarb-Shanno) is much used - CG (conjugated gradient) can directly calculate "changed gradient vector (1st order derivative" without using "2nd order derivative (Hessian mat function)" ================================================================================ * Optimization by using SciPy result=minimize(func,x0,jac=jac) * func: objective function like $$$f(x)$$$ * x0: random initial x values * jac: (optional) print "gradient vector (derivatives)" * result: object of OptimizeResult Object of OptimizeResult x: solution x values success: True if optimization succeeded status: End status. 0 if optimization succeeded message: fun: y value at x value jac: Value of Jacobian vector (gradient, derivative) at x hess_inv: Value of inverse Hessian matrix at x nfev: How many times "objective function" is called? njev: How many times "Jacobian" is calculated? nhev: How many times "Hessian" is calculated? nit: How many times x is moved? ================================================================================ # Objective function def f1(x): return (x-2)**2+2 # c x0: random initial x value x0=0 result=sp.optimize.minimize(f1,x0) print(result) # fun: 2.0 # hess_inv: array([[0.5]]) # jac: array([0.]) # message: 'Optimization terminated successfully.' # nfev: 9 # nit: 2 # njev: 3 # status: 0 # success: True # x: array([1.99999999]) nfev: 9 -> objective function is called 9 times nit: 2 -> x is moved 2 times It's because 1st order derivative (gradient vector) and 2nd order derivative (Hessian mat function) are not given So, SciPy had to calculate them around x values by calling objective function ================================================================================ To reduce computations, you can pass gradient vector f1p() def f1(x): return (x-2)**2+2 def f1p(x): """Derivative of f1(x)""" return 2*(x-2) x0=0 result = sp.optimize.minimize(f1, x0, jac=f1p) print(result) # fun: 2.0 # hess_inv: array([[0.5]]) # jac: array([0.]) # message: 'Optimization terminated successfully.' # nfev: 3 # nit: 2 # njev: 3 # status: 0 # success: True # x: array([2.]) ================================================================================ * Optimization with multivariate function def f2(x): return (1-x[0])**2+100.0*(x[1]-x[0]**2)**2 x0=(-2,-2) result=sp.optimize.minimize(f2,x0) print(result) # fun: 1.2197702024999478e-11 # hess_inv: array([[0.50957143, 1.01994476], # [1.01994476, 2.04656074]]) # jac: array([ 9.66714798e-05, -4.64005023e-05]) # message: 'Desired error not necessarily achieved due to precision loss.' # nfev: 496 # nit: 57 # njev: 121 # status: 2 # success: False # x: array([0.99999746, 0.99999468]) ================================================================================ * Global optimization problem If $$$f(x)$$$ has multiple local minima, you can't assure numerical optimization method reaches to global minimum Numerical optimization method depends on initial value, algorithm, parameter, etc Reached to global minimum over local minima Fell into local minium ================================================================================ * Convex problem * Optimzation problem * Constraint: x range should satisfy 2nd order derivative $$$\ge$$$ 0 $$$\dfrac{\partial^2 f}{\partial x^2} \geq 0 $$$ * Constraint on multivariate function: $$$x^THx \geq 0 \;\;\text{for all } x$$$ * Hessian function should satisfy "positive semidefinite" in given region * It's mathematically proved that convex problem has unique minimum point