SPBaDF#

Importing the necessary libraries#

[1]:
from imbalanced_spdf.ensemble import SPBaDF
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

Creating a synthetic dataset#

[2]:
X, y = make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=10, n_clusters_per_class=1, weights=[0.90], flip_y=0, random_state=1)

Take a look at the dataset#

[3]:
print(f"X shape: {X.shape}, with {X.shape[0]} samples and {X.shape[1]} features")
X shape: (100, 20), with 100 samples and 20 features
[4]:
print(f"y shape: {y.shape}, with {y.shape[0]} samples, the majority class has {len(y[y==0])} samples and the minority class has {len(y[y==1])} samples")
y shape: (100,), with 100 samples, the majority class has 91 samples and the minority class has 9 samples

Splitting the dataset into training and testing#

[5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

Fit a SPBaDF model#

[6]:
spbadf = SPBaDF(weight=10, n_trees=100)
[7]:
spbadf.fit(X_train, y_train)
[7]:
SPBaDF(n_trees=100, weight=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
[8]:
pred = spbadf.predict(X_test)
[9]:
f1_score(y_test, pred)
[9]:
1.0