Table of Contents 1 Exercise 2.1. Minimizing a quadratic function and the curse of dimensionality 2 Exercise 2.2. Implementing random search in Python 3 Exercise 2.3. Using random search to minimize a nonconvex function 4 Exercise 2.4. Random search with diminishing steplength 5 Exercise 2.5. Random descent probabilities 6 Exercise 2.6. Revisiting the curse of dimensionality 7 Exercise 2.7. Pseudo-code for the coordinate search algorithm 8 Exercise 2.8. Coordinate search applied to minimize a simple quadratic 9 Exercise 2.9. Coordinate search with diminishing steplength 10 Exercise 2.10. Coordinate search versus coordinate descent
In [1]:# imports from custom library
importsys sys.path.append('../') importmatplotlib.pyplot asplt importautograd.numpyasnp # import custom libraries frommlrefined_libraries importbasics_libraryasbaslib frommlrefined_libraries importcalculus_library ascalib frommlrefined_libraries importmath_optimization_library asoptlib # import demos for this notebook static_plotter=optlib.static_plotter.Visualizer(); optimizers=optlib.optimizers # this is needed to compensate for matplotlib notebook's tendancy to blow up imag es when plotted inline %matplotlib notebook frommatplotlibimportrcParams rcParams['figure.autolayout']=True %load_ext autoreload %autoreload 2 Exercise 2.1. Minimizing a quadratic function and the curse of dimensionality chapter_2_solutions file:///Users/Nurgetson/Dropbox/github_repos/mlrefined_ed2_hw_...
1 of 18 1/11/20, 9:18 AM
(Machine Learning Refined Foundations, Algorithms, and Applications, 2e Jeremy Watt, Reza Borhani, Aggelos Katsaggelos) (Solution Manual, For Complete File, Download link at the end of this File) (There is no Solution Manual for Chapter 1) 1 / 4
In this experiment we verify the curse of dimensionality issue associated with the use of randomly sampled points for naive evaluation for the simple quadratic function whose minimum is always regardless of the input dimension .In this experiment we create a range of these quadratics for input dimension to . We sample the input space of each quadratic times randomly and uniformly on the hypercube (this hypercube has sides).The printout below shows the minimum value attained for each dimensional quadratic after , , and samples. As we can see in the plot, the minimum value attained even after random samples increases substantially as the dimension of the quadratic increases - all due to the curse of dimensionality. As discussed above we would need to use exponentially many samples to counteract this problem, which quickly becomes infeasible.g ( w) =ww T g () =00
N × 1
N
N=1N=100
10, 000[ − 1 , 1 ] ×[ − 1 , 1 ] ×⋯×[ − 1 , 1 ]
N
1001 , 00010, 000
10, 000
In [46]:# run experiment for global random evaluation
optlib. random_method_experiments. random_eval_experiment() A few simple hand-calculations can precisely affirm why this is happening for the particular set of very simple functions we are studying here. To produce a random point in the input space of the function we sample each axis of the input hypercube according to a uniform distribution on the interval . Thus the average value along each input dimension is equal to 0 (as the average of a uniform on the interval is given as ).
[ − 1 , 1 ]
[ a , b ]( a+b ) 1 2 The problem is that the probability that all input elements are small in magnitude (close to zero or equal to zero) simultaneously gets exponentially smaller as we go up in dimension. In one dimension, the probability of selecting a value on the interval is - by definition - . However as we go up in dimension since each dimension is drawn independently this means that in dimensions the probability of drawing each element so that is .Thus as our dimension increases the probability of randomly accessing points close to the true global minimum at the origin diminishes exponentially. This again implies that in order to keep up with this our sampling would have to increase exponentially with dimension as well - which is computationally infeasible.[ − 0.1, 0.1]p ( v≤| 0.1| ) ==0.1 0.2 2 Nv i ≤| 0.1|v i p (≤| 0.1| ,i =1 , . . . , N) =( 0.1v i ) N Exercise 2.2. Implementing random search in Python Below we have a Python implementation of the random local search algorithm.chapter_2_solutionsfile:///Users/Nurgetson/Dropbox/github_repos/mlrefined_ed2_hw_...
2 of 181/11/20, 9:18 AM 2 / 4
In [4]:# random search function
defrandom_search( g , alpha_choice, max_its, w , num_samples):
# run random search weight_history=[]# container for weight history cost_history=[]# container for corresponding cost function histo ry alpha=0
forkinrange( 1 , max_its+ 1 ):
# check if diminishing steplength rule used
ifalpha_choice=='diminishing':
alpha=1 / float( k )
else:
alpha=alpha_choice
# record weights and cost evaluation weight_history. append( w ) cost_history. append( g ( w ))
# construct set of random unit directions directions=np. random. randn( num_samples, np. size( w ))
norms=np. sqrt( np. sum( directions* directions, axis=1 ))[:,np. newaxis]
directions=directions/ norms
### pick best descent direction # compute all new candidate points w_candidates=w+alpha* directions
# evaluate all candidates evals=np. array([g ( w_val)forw_valinw_candidates]) # if we find a real descent direction take the step in its direction ind=np. argmin( evals)
ifg ( w_candidates[ ind])
# pluck out best descent direction
d=directions[ ind,:]
# take step w=w+alpha* d
# record weights and cost evaluation weight_history. append( w ) cost_history. append( g ( w )) returnweight_history, cost_history Notice that the history of function evaluations retur ned is called cost_history. This is because - in the context of machine lear ning / deep lear ning - mathematical functions are often referred to as cost or loss functions.chapter_2_solutionsfile:///Users/Nurgetson/Dropbox/github_repos/mlrefined_ed2_hw_...
3 of 181/11/20, 9:18 AM 3 / 4
In [5]:# This code cell will not be shown in the HTML version of this notebook
# define function
g=lambdaw :np. dot( w . T , w )+2
# run random search algorithm alpha_choice=1 ;w=np. array([3 , 4 ]);num_samples=1000;max_its=5 ; weight_history, cost_history=random_search( g , alpha_choice, max_its, w , num_samples) # show run in both three-dimensions and just the input space via the contour plot static_plotter. two_input_surface_contour_plot( g , weight_history, view=[ 10, 30],xmi n=- 4.5,xmax=4.5,ymin=- 4.5,ymax=4.5, num_contours=20) Exercise 2.3. Using random search to minimize a nonconvex function As another example, we minimize the function using random local search again setting and 8 steps with for all steps. Here because an entire region of global minima exist at the method - as clumsy as it is given the settings - quickly finds a global minimum when initiated at .g (,) =tanh( 4+4) +max( 0.4, 1 ) +1w
w 1 w
w 1 w 2
P=1000α=1
g (,) =1w 1 w 2 = [] w
2 2 chapter_2_solutionsfile:///Users/Nurgetson/Dropbox/github_repos/mlrefined_ed2_hw_...
4 of 181/11/20, 9:18 AM
- / 4