Python-package Introduction

This document gives a basic walk-through of BoostForest Python-package.

List of other helpful links

Install (in the future)

The preferred way to install BoostForest is via pip from Pypi:

pip install BoostForest

To verify your installation, try to import BoostForest in Python:

import BoostForest

Data Interface

The BoostForest Python module can load data from:

  • NumPy 2D array(s)
    import numpy as np
    data = np.random.rand(500, 10)
    label = np.random.randint(2, size=500)
    

Setting Parameters

BoostForest can use a dictionary to set parameters. For instance:

  • When min_sample_leaf_list and reg_alpha_list are certain values:

    param = {'max_leafs': 5, 'node_model': 'Ridge', 'min_sample_leaf_list':5, 'reg_alpha_list': 0.1, 'max_depth': None, 'elm_hidden_layer_nodes': 100, 'random_state':0}
    
  • When min_sample_leaf_list and reg_alpha_list are lists:

    param = {'max_leafs': 5, 'node_model': 'Ridge', 'min_sample_leaf_list': [5, 6, 7], 'reg_alpha_list': [0.1, 0.5, 1.0], 'max_depth': None, 'elm_hidden_layer_nodes': 100, 'random_state':0}
    

Training

Training a model requires a parameter dictionary and data set:

estimator = BoostForest.BoostTreeClassifier(**param).fit(data, label)

After training, the model can be saved:

estimator.save_model('model.joblib')

A saved model can be loaded:

import joblib
estimator = joblib.load('model.joblib')

Predicting

A model that has been trained or loaded can perform predictions on datasets:

# 7 entities, each contains 10 features
data = np.random.rand(7, 10)
ypred = estimator.predict(data)