*Perquisites:*

*LGBM ==**lightgbm**(python package): Microsoft's implementation of gradient boosted machines**optuna**(python package): automated hyperparameter optimization framework favoured by Kaggle grandmasters. Being algorithm agnostic it can help find optimal hyperparameters for any model. Apart from gridsearch, it features tools for pruning the unpromising trails for faster results*

So what's the catch?

Complete model optimization includes many different operations:

- Choosing the optimal starting hyperparameters for your algorithm (conditional on the task type and data stats)
- Defining the hyperparameters to optimize, their grid and distribution
- Selecting the optimal loss function for optimization
- Configuring the validation strategy
- Further optimization (e.g. n_estimators tuning with early_stopping for tree ensembles like LGBM)
- Results analysis
- and much more…

Putting it all together even for a single task requires a lot of code and any subsequent tasks will require substantial modifications to this code

There is a reason this article includes two important keywords:

- LGBM — fastest gradient boosting framework
- optuna — fastest hyperparameter optimization framework

Wisely using them together will help you build the best and most optimal model in half the time

But one has to be prepared to deal with all the implications outlined above.

Luckily another open-source package combines the advantages of both these frameworks and provides a one-line method to create your best model with lightgbm and optuna

`pip install verstack`

Not only it finds optimal hyperparameters for your task, it also provides convenient methods for prediction and analytics. And it uses multiprocessing carefully and almost to the fullest capacity leaving behind some processing power for your machine to operate without freezing

We will use boston housing dataset from Kaggle for this demonstration

```
import pandas as pd
from verstack import LGBMTuner
# import the data
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
X = train.drop('medv', axis = 1)
y = train['medv']
# tune the hyperparameters and fit the optimized model
tuner = LGBMTuner(metric = 'mse') # <- the only required argument
tuner.fit(X, y)
# check the optimization log in the console.
pred = tuner.predict(test)
```

Basically that is it…

- optimal hyperparameters had been selected running the default 100 trials using early stopping so the best number of estimators has alse been defined
- optimization history, parameters and features importances had been saved in the plotting pipeline
- optimized model trained on the whole train data
- prediction methods had been prepared to predict on new data (including various evristics for predicting negatives in regression and predicting classes/probabilities in multiclass/binary)

What else?

Wait, there is much more…

#### categorical_feature support

LGBM has a neat feature that gears the model to figure out the encoding of categorical features inside your data so you don't have to encode them yourself. `LGBMTuner`

supports this integration.

According to LGBM docs you have to transform your unique categories into consecutive integers and then cast them into `"categoric"`

dtype like so:

```
df['Sex'].unique()
encoding_dict = {val:ix for ix, val in enumerate(df['Sex'].unique())}
df['Sex'] = df['Sex'].map(encoding_dict)
df['Sex'] = df['Sex'].astype('category')
print(df['Sex'].dtype)
#--->CategoricalDtype(categories=[0, 1], ordered=False)
```

And then just pass this data to LGBMTuner without any additional settings

```
from verstack import LGBMTuner
tuner = LGBMTuner(metric = 'accuracy')
X = df.drop('target', axis = 1)
y = df['target']
tuner.fit(X, y)
```

#### Custom grids to iterate over

*LGBMTuner is configured for best performance by default.*

Depending on the given task (classification/regression) and dataset length it will automatically set the fixed starting parameters for LGB model.

The default grid for parameters selection is the following:

These settings can be overridden as well as new parameters and their respective grids can be passed to the `LGBMTuner`

instance like so:

```
tuner = LGBMTuner(metric = 'auc', trials = 300)
# SHOW SUPPORTED AND SELECTED OPTIMIZATION PARAMETERS
tuner.grid
#--->{'boosting_type': None,
#--->'num_iterations': None,
#--->'learning_rate': None,
#--->'num_leaves': {'low': 16, 'high': 255}, <--- default setting
#--->'max_depth': None,
#--->'min_data_in_leaf': None,
#--->'min_sum_hessian_in_leaf': {'low': 0.001, 'high': 10.0}, <--- default setting
#--->'bagging_fraction': {'low': 0.5, 'high': 1.0}, <--- default setting
#--->'feature_fraction': {'low': 0.5, 'high': 1.0}, <--- default setting
#--->'max_delta_step': None,
#--->'lambda_l1': {'low': 1e-08, 'high': 10.0}, <--- default setting
#--->'lambda_l2': {'low': 1e-08, 'high': 10.0}, <--- default setting
#--->'linear_lambda': None,
#--->'min_gain_to_split': None,
#--->'drop_rate': None,
#--->'top_rate': None,
#--->'min_data_per_group': None,
#--->'max_cat_threshold': None}
# CHANGE SELECTED OPTIMIZATION PARAMETERS
# parameters can be passed by any of the following ways:
# - list (will be used for a random search)
# - tuple (will be used to define the uniform grid range between the min(tuple) and the max(tuple))
# - dict with keywords 'choice'/'low'/'high'
tuner.grid['boosting_type'] = ['gbdt', 'rf']
tuner.grid['max_data_in_leaf'] = {'choice' : [40, 50, 70]}
tuner.grid['learning_rate'] = (0.001, 0.1)
tuner.grid['lambda_l1'] = {'low': 0.1, 'high': 5}
tuner.fit(X, y)
```

User can configure custom grids for any/all the parameters in the above `dict`

which can be accessed after defining the class instance via `.grid`

attribute.

#### Custom LGBM (fixed) params

Based on many requests new release of `LGBMTuner`

1.1.0 supports setting any LGBM supported parameters.

If for example you need to configure LGBM for optimization with `is_unbalance`

argument or any other supported argument, use the `custom_lgbm_params`

argumet at `LGBMTuner`

init.

```
from verstack import LGBMTuner
my_custom_params = {'is_unbalance': True, 'zero_as_missing': True}
tuner = LGBMTuner(metric = 'auc', custom_lgbm_params = my_custom_params)
```

#### Metrics

LGBMTuner currently supports (evaluation metrics):

```
'mae', 'mse', 'rmse', 'rmsle', 'mape', 'smape', 'rmspe', 'r2', 'auc', 'gini', 'log_loss', 'accuracy', 'balanced_accuracy', 'precision', 'precision_weighted', 'precision_macro', 'recall', 'recall_weighted', 'recall_macro', 'f1', 'f1_weighted', 'f1_macro', 'lift'
# note the syntax
```

Evaluation metrics become optimization metrics in the case of regression, given the minimize only strategy. The only exception for regression is `'r2'`

. If this metric is selected when initializing `LGBMTuner`

, it will be substituted for `'mse'`

optimization during hyperparameters tuning and.

For classification, regardless of the selected evaluation metric `LGBMTuner`

will optimize the `cross_entropy`

when searching for hyperparameters.

#### Number of trials

A single trial is a single iteration of training/validation of a model with randomly selected parameters from the search space. By default `LGBMTuner`

will run 100 trials. Number of trials can be defined at tuner initialization: `tuner = LGBMTuner(metric = 'mse', trials = 500)`

#### Prediction

Calling `tuner.fit(X, y)`

will eventually fit the model with best params on the X and y

Then the conventional methods: `tuner.predict(test)`

and `tuner.predict_proba(test)`

are available

For classification tasks additional parameter `threshold`

is available: `tuner.predict(test, threshold = 0.3)`

*Tip: One may use the *`verstack.ThreshTuner`

* for optimizing the threshold parameter*

#### Visualizations

`LGBMTuner`

ships with different built in plotting methods for static `png`

and interactive `html`

plotting for feature importances and optimizations stats

When `LGBMTuner`

is initialized with default parameters, namely `visualization = True`

, it will create 4 static plots after optimization is complete. If you are using an interactive shell like Spyder or Jupiter, these plots will be displayed automatically at the end of tuning. This can be disabled at init with `tuner = LGBMTuner(metric = 'mse', vusialization = False)`

These plots are also available on demand by their corresponding methods

**Feature Importance**

```
tuner.fit(X, y)
tuner.plot_importances()
```

`figsize = (10, 6)`

and `n_features = 15`

are the default arguments but can be changed if required

An interactive plot is available as an html file, which is displayed automatically in the default browser:

`tuner.plot_importances(interactive = True)`

This html can be saved from the browser's file menu

**Trials validation results plot**

`tuner.plot_intermediate_values()`

Interactive argument is most useful in this case

`tuner.plot_intermediate_values(interactive = True)`

Here among all the trials you can see the pruned (terminated) trials and their evaluation results

**Parameters importances**

This is a parameters importance histogram plot that shows which params had the highest impact on the optimization metric

`tuner.plot_param_importances()`

`tuner.plot_param_importances(interactive = True)`

**Optimization history plot**

`tuner.plot_optimization_history()`

`tuner.plot_optimization_history(interactive = True)`

In an interactive mode you can see the objective function (optimization metric) values changes

#### Verbosity

This is an important part of the framework. The default verbosity level 1 will display essential optimization results in a nice structured way without cluttering your console all that much

By default the `fit`

method will output the optimal amount of information, including every i-th trial results (omitting the trials that had been pruned), and the final (optimized) model parameters.

All the verbosity options are 0,1,2,3,4,5 where 0 is completely silent except for fatal errors and built in exceptions; 1–5 are based on optuna.logging options. Default verbosity level 1 is enriched with essential optimization statistics (screenshots above)

#### Additional `LGBMTuner`

attributes

Feature importance values

```
tuner.feature_importances
>>> ID 0.08145
>>> crim 0.07421
>>> zn 0.00424
>>> indus 0.02870
>>> chas 0.00547
>>> nox 0.06929
>>> rm 0.13872
>>> age 0.11890
>>> dis 0.13448
>>> rad 0.02966
>>> tax 0.04619
>>> ptratio 0.03977
>>> black 0.06027
>>> lstat 0.16865
```

Initially defined params

```
tuner.init_params
>>> {'learning_rate': 0.01,
>>> 'num_leaves': 16,
>>> 'colsample_bytree': 0.9,
>>> 'subsample': 0.9,
>>> 'verbosity': -1,
>>> 'n_estimators': 10000,
>>> 'early_stopping_rounds': 100,
>>> 'random_state': 42,
>>> 'objective': 'regression',
>>> 'metric': 'l2',
>>> 'num_threads': 10,
>>> 'reg_alpha': 1}
```

Optimized params

```
tuner.best_params
>>> {'learning_rate': 0.01,
>>> 'num_leaves': 130,
>>> 'colsample_bytree': 0.8246563384855297,
>>> 'subsample': 0.5335500916057069,
>>> 'verbosity': -1,
>>> 'random_state': 42,
>>> 'objective': 'regression',
>>> 'metric': 'l2',
>>> 'num_threads': 10,
>>> 'reg_alpha': 0.0011166918277076062,
>>> 'min_sum_hessian_in_leaf': 0.00270990587924765,
>>> 'reg_lambda': 8.270186047772752e-06,
>>> 'n_estimators': 605}
```

Trained model instance

*Although after calling *`tuner.fit(X, y)`

* this *`LGBMTuner`

* instance is an object that contains the tuned and fitted LGBM model and the tuner itself contains all the necessary methods for predictions *`tuner.predict(test)`

* the actual LGBM booster model can be extracted from the *`tuner`

* object:*

```
tuner.fitted_model
>>> <lightgbm.basic.Booster at 0x7ff3b89a5b10>
```

Additional methods and attributes are well described in the documentation.

The proposed framework encapsulates extensive research and best Data Science practices to reduce the amount of stress and gain a significant improvement for any classification/regression tasks it might be used for

And be sure to check out the rest of the tools `verstack`

has to offer

The package includes solutions to some day-to-day tasks that didn't have convenient solutions before

Current modules:

`verstack.LGBMTuner`

`verstack.PandasOptimizer`

— automatic memory optimization when reading data into pandas. One-liner for 5-fold memory footprint reduction & significant training time decrease Medium article`verstack.Stacker`

— automated ensembling factory; create multilayer stacking ensembles with a few lines of code Medium article`verstack.FeatureSelector`

— automated feature selection tool based on quick recursive feature elimination by various ML models Medium article`verstack.DateParser`

— ultimate DateParser class that automatically finds and parses datetime feats from all the possible datetime formats in you dataframe Medium article`verstack.Multicore`

— parallelise any function with a single line of code (by far the most popular tool) Medium article`verstack.NaNImputer`

— impute all the NaN values by machine learning with a single line of code Medium article`verstack.ThreshTuner`

— automatic threshold selection for getting most out of the binary classification predicted probabilities Medium article`stratified_continuous_split`

— continuous data stratification Medium article- categoric encoders
`Factorizer`

`OneHotEncode`

`FrequencyEncoder`

`WeightOfEvidenceEncoder`

`MeanTargetEncoder`

Medium article `timer`

— convenient timer to measure any function execution

### Links

`verstack.LGBMTuner`

documentation

`verstack`

documentation