A central problem in optimization and artificial intelligence is how an agent, faced with a large solution space and finite resources, should divide its time between exploiting parts of the space known to contain good solutions, and exploring unknown regions that may contain solutions far better, or much worse, than those found so far.
An intuitive example of the problem is humans choosing entertainment options. When deciding what music, films, literature, games etc. to consume, people tend to “stick with what they know,” i.e. options of similar type and genre, and therefore close together in the option space. Trying unfamiliar parts of the option space has a low likelihood of a high payoff, and a high likelihood of wasted time.
With time being a precious resource for all agents, the temptation is to only look at parts of the solution space guaranteed to provide good values, but this strategy risks missing out on solutions with values potentially far greater than any contained within familiar regions of the space.
Defined here is a variant of the random, neighborhood search, and hill climbing classes of algorithms. It is a highly explorative strategy in which the amount of time spent looking at a solution’s “neighborhood” is a direct function of the newly found best solution’s magnitude of improvement over that of the previous best. In other words, the greater the difference in value between the new best solution and the last best solution, the longer the algorithm will spend looking for even better options in the new best solution’s neighborhood.
When searching a new best solution’s neighborhood, the algorithm begins with values very close to the best solution, but begins expanding the size of the neighborhood the longer it goes without finding any improvements. If the variable amount of time allotted to the neighborhood search phase elapses without any improvements, the algorithm reverts back to a pure random search.
The algorithm was designed to optimize machine learning model parameters, but it can be applied to any numerical optimization problem.
Below is an outline of the algorithm, and below that is a Python implementation of the algorithm tuning the parameters of a LightGBM prediction model.
Square Neighborhood Algorithm
set var global_best to highest value found with random search set var best_parameters to an empty array set var neighborhood_search to 0 while true do evaluate pure random solution rs if rs better than global_best add square of difference(rs score, global_best) to neighbourhood_search set global_best to rs score and best_parameters to rs parameters set var neighborhood_steps to 0 set var neighborhood_size to 0.1 while neighborhood_search > 0 neighborhood solution ns = gaussian distr(mean(best_parameters), deviation(neighborhood_size)) if ns better than global_best add square of difference(ns - global_best) to neighborhood_search set global_best to ns score and best_parameters to ns parameters set var neighborhood_steps to 0 set var neighborhood_size to 0.1 else increment neighborhood_steps by 1 increment neighborhood_size by (0.0001 * neighborhood_steps) decrement neighborhood_search by 1
Optimizing A LightGBM Machine Learning Model
The Python code below uses the Square Neighborhood Algorithm to search through the space of parameters of a LightGBM machine learning model. The model is trained on, and used to make predictions about, the Kaggle Iowa housing dataset.
LightGBM is a gradient boosting framework with a large number of parameters, each of which can take a huge range of values. This makes parameter optimization a nontrivial problem. For simplicity, the demonstration below uses only a few of the dozens of model parameters available for tuning.
2020-06-21 12:06 +0000