A central problem in optimization and artificial intelligence is how an agent, faced with a large solution space and finite resources, should divide its time between exploiting parts of the space known to contain good solutions, and exploring unknown regions that may contain solutions far better, or much worse, than those found so far.

An intuitive example of the problem is humans choosing entertainment options. When deciding what music, films, literature, games etc. to consume, people tend to “stick with what they know,” i.e. options of similar type and genre, and therefore close together in the option space. Trying unfamiliar parts of the option space has a low likelihood of a high payoff, and a high likelihood of wasted time.

With time being a precious resource for all agents, the temptation is to only look at parts of the solution space guaranteed to provide good values, but this strategy risks missing out on solutions with values potentially far greater than any contained within familiar regions of the space.

Defined here is a variant of the random, neighborhood search, and hill climbing classes of algorithms. It is a highly explorative strategy in which the amount of time spent looking at a solution’s “neighborhood” is a direct function of the newly found best solution’s magnitude of improvement over that of the previous best. In other words, the greater the difference in value between the new best solution and the last best solution, the longer the algorithm will spend looking for even better options in the new best solution’s neighborhood.

When searching a new best solution’s neighborhood, the algorithm begins with values very close to the best solution, but begins expanding the size of the neighborhood the longer it goes without finding any improvements. If the variable amount of time allotted to the neighborhood search phase elapses without any improvements, the algorithm reverts back to a pure random search.

The algorithm was designed to optimize machine learning model parameters, but it can be applied to any numerical optimization problem.

Below is an outline of the algorithm, and below that is a Python implementation of the algorithm tuning the parameters of a LightGBM prediction model.

   

Square Neighborhood Algorithm


 set var global_best to highest value found with random search
 set var best_parameters to an empty array 
 set var neighborhood_search to 0

 while true do
      
       evaluate pure random solution rs
      
       if rs better than global_best 
         
          add square of difference(rs score, global_best) to neighbourhood_search
          set global_best to rs score and best_parameters to rs parameters
          set var neighborhood_steps to 0
          set var neighborhood_size to 0.1
        
          while neighborhood_search > 0
               
                neighborhood solution ns = gaussian distr(mean(best_parameters), deviation(neighborhood_size))
               
                if ns better than global_best
                   add square of difference(ns - global_best) to neighborhood_search
                   set global_best to ns score and best_parameters to ns parameters
                   set var neighborhood_steps to 0
                   set var neighborhood_size to 0.1
               
                else
                   increment neighborhood_steps by 1
                   increment neighborhood_size by (0.0001 * neighborhood_steps)
               
                decrement neighborhood_search by 1

   

Optimizing A LightGBM Machine Learning Model

The Python code below uses the Square Neighborhood Algorithm to search through the space of parameters of a LightGBM machine learning model. The model is trained on, and used to make predictions about, the Kaggle Iowa housing dataset.

LightGBM is a gradient boosting framework with a large number of parameters, each of which can take a huge range of values. This makes parameter optimization a nontrivial problem. For simplicity, the demonstration below uses only a few of the dozens of model parameters available for tuning.

temp.py

   1 
   2 
    import pandas as pd
   3 
    from lightgbm import LGBMRegressor
   4 
    from sklearn.metrics import mean_absolute_error
   5 
    from sklearn.model_selection import train_test_split
   6 
    from random import choice, randrange, uniform, gauss
   7 
    import clevercsv
   8 
 
   9 
 
  10 
 
  11 
    #Read datasets into memory
  12 
    complete_train = pd.read_csv(
  13 
        "train.csv",
  14 
        encoding = "UTF-8",
  15 
        index_col = "Id")
  16 
 
  17 
    complete_test = pd.read_csv(
  18 
        "test.csv",
  19 
        encoding = "UTF-8",
  20 
        index_col = "Id")
  21 
 
  22 
 
  23 
    #Seperate predictors from target variable
  24 
    X = complete_train.drop(
  25 
        columns = "SalePrice")
  26 
 
  27 
    y = complete_train[
  28 
        "SalePrice"]
  29 
 
  30 
 
  31 
    #Encode categoricals and impute missing data
  32 
    def encode_impute(*datasets):
  33 
        for dataset in datasets:
  34 
            for column in dataset.columns:
  35 
                dataset[
  36 
                    column].fillna(
  37 
                    -999,
  38 
                    inplace = True)
  39 
                if dataset[
  40 
                    column].dtype ==  "object":
  41 
                    dataset[
  42 
                        column] = dataset[
  43 
                        column].astype("category", copy = False)
  44 
 
  45 
    encode_impute(X)
  46 
 
  47 
 
  48 
    #Create validation set
  49 
    X_train, X_valid, y_train, y_valid = train_test_split(X, y)
  50 
 
  51 
 
  52 
    #Build and validate the learning model
  53 
    def model_check(parameters):
  54 
       
  55 
        local_copy ={
  56 
        key:value for key,value in parameters.items()}
  57 
       
  58 
        if local_copy["boosting_type"] == 0:
  59 
            local_copy["boosting_type"] = "gbdt"
  60 
        else:
  61 
            local_copy["boosting_type"] = "dart"
  62 
           
  63 
        model = LGBMRegressor().set_params(**local_copy)
  64 
        model.fit(X_train, y_train)
  65 
        prediction = model.predict(X_valid)
  66 
        error_rate = mean_absolute_error(
  67 
            y_valid, prediction)
  68 
 
  69 
        return error_rate
  70 
 
  71 
    #square_neighborhood algorithm
  72 
    def square_neighborhood():
  73 
       
  74 
        global_best = 17000
  75 
        best_parameters = {}
  76 
        neighborhood_search = 0
  77 
       
  78 
        while True:
  79 
           
  80 
            try:
  81 
               
  82 
                random_parameters = {
  83 
                    "boosting_type": randrange(2),
  84 
                    "n_estimators": randrange(10000),
  85 
                    "num_leaves": randrange(10000),
  86 
                    "learning_rate": uniform(0.01, 1)}
  87 
               
  88 
                global_evaluate = model_check(
  89 
                    random_parameters)
  90 
               
  91 
                if global_evaluate < global_best:
  92 
                   
  93 
                    neighborhood_search += round(
  94 
                        abs(
  95 
                            global_evaluate - global_best) ** 2)
  96 
                   
  97 
                    global_best = global_evaluate
  98 
                    best_parameters = random_parameters
  99 
                    neighborhood_steps = 0
 100 
                    neighborhood_size = 0.1
 101 
                   
 102 
                    with open(
 103 
                        "values.csv", "a", encoding = "UTF-8") as valuefile:
 104 
                        values = clevercsv.writer(valuefile)
 105 
                        values.writerow(
 106 
                            [global_best, best_parameters])
 107 
                       
 108 
                  
 109 
                    while neighborhood_search > 0:
 110 
                   
 111 
                        neighborhood_parameters = {
 112 
                            key:abs(
 113 
                                gauss(
 114 
                                    value,
 115 
                                    neighborhood_size)) if type(value) == float
 116 
                            else abs(
 117 
                                round(
 118 
                                    gauss(
 119 
                                        value,
 120 
                                        neighborhood_size)))
 121 
                            for key,value in best_parameters.items()}
 122 
 
 123 
                        neighborhood_evaluate = model_check(
 124 
                            neighborhood_parameters)
 125 
                       
 126 
                        if neighborhood_evaluate < global_best:
 127 
                           
 128 
                            neighborhood_search += abs(
 129 
                                neighborhood_evaluate - global_best) ** 2
 130 
                           
 131 
                            global_best = neighborhood_evaluate
 132 
                            best_parameters = neighborhood_parameters
 133 
                            neighborhood_steps = 0
 134 
                            neighborhood_size = 0.1
 135 
                           
 136 
                            with open(
 137 
                                "values.csv",
 138 
                                "a", encoding = "UTF-8") as valuefile:
 139 
                                values = clevercsv.writer(valuefile)
 140 
                                values.writerow(
 141 
                                    [global_best, best_parameters])
 142 
                       
 143 
                        else:
 144 
                           
 145 
                            neighborhood_steps += 1
 146 
                            neighborhood_size += 0.0001 * neighborhood_steps
 147 
                       
 148 
                        neighborhood_search -= 1                 
 149 
                           
 150 
            except:
 151 
       
 152 
                continue
 153 
       
 154 
       
 155 
    square_neighborhood()
 156 
 
 157 
 
 158 
 
 159 
 

   

Full code repo available on Github.

 

UPDATE: I’ve formulated an improved optimizer called the Interleaved Neighborhood Algorithm. You can read about it here.

 

Related posts:

Speculative Fiction Bot

EXP-RTL: Exponential Retaliation In Iterated Prisoner’s Dilemma Games

 

About Me