Decision Tree Classifier

Recursive binary splitting based on Gini impurity reduction

Data Preprocessing:

Convert continuous Y values [10, 20, 30, 40, 50] into categorical classes: ["low", "low", "medium", "high", "high"]

Original Gini impurity from all Y classes
For each feature
For each possible split (based on X values)
Split Y classes into two groups
Calculate class frequencies and Gini impurity for each group
Combine Gini impurities with weighted average
Information Gain = Original Gini - Weighted Gini
Track split with highest gain for this feature
Select best split across all features

After finding the best split:

Create the split
Use the best split to divide the data into two child nodes
Recursively repeat the process
On each child node:
• For the left child, find the best split among all its data points
• For the right child, find the best split among all its data points
• Each child becomes a new decision node with its own best split
Continue until stopping criteria are met
• Maximum depth reached
• Minimum samples per leaf reached
• Information gain below threshold
• Node becomes "pure enough"
Create leaf nodes
When splitting stops, the node becomes a leaf that predicts the most frequent class of all Y values in that node

Prediction Example:

New data point: X₁=4, X₂=7

Decision Tree Split Analysis - Classifier

Data Preprocessing

Convert continuous Y values [10, 20, 30, 40, 50] into categorical classes: ["low", "low", "medium", "high", "high"]

Original Gini Impurity

Feature X values: [1, 3, 5, 7, 9]

All Y classes: ["low", "low", "medium", "high", "high"]

Class frequencies: "low"=2, "medium"=1, "high"=2

Class probabilities: P("low")=2/5, P("medium")=1/5, P("high")=2/5

Gini = 1 - (2/5)² - (1/5)² - (2/5)² = 1 - 0.16 - 0.04 - 0.16 = 0.64

Split 1: X < 1 vs X ≥ 1
Group 1 (X < 1): X = [], Y = [] (empty) → Gini = 0 (undefined/0 by convention)
Group 2 (X ≥ 1): X = [1, 3, 5, 7, 9], Y = ["low", "low", "medium", "high", "high"] → Gini = 0.64
Weighted Gini = (0/5) × 0 + (5/5) × 0.64 = 0.64
Information Gain = 0.64 - 0.64 = 0
Split 2: X < 3 vs X ≥ 3
Group 1 (X < 3): X = [1], Y = ["low"] → Gini = 0
Group 2 (X ≥ 3): X = [3, 5, 7, 9], Y = ["low", "medium", "high", "high"] → Gini = 0.625
Weighted Gini = (1/5) × 0 + (4/5) × 0.625 = 0.5
Information Gain = 0.64 - 0.5 = 0.14
Split 3: X < 5 vs X ≥ 5
Group 1 (X < 5): X = [1, 3], Y = ["low", "low"] → Gini = 0
Group 2 (X ≥ 5): X = [5, 7, 9], Y = ["medium", "high", "high"] → Gini = 0.444
Weighted Gini = (2/5) × 0 + (3/5) × 0.444 = 0.267
Information Gain = 0.64 - 0.267 = 0.373
Split 4: X < 7 vs X ≥ 7
Group 1 (X < 7): X = [1, 3, 5], Y = ["low", "low", "medium"] → Gini = 0.444
Group 2 (X ≥ 7): X = [7, 9], Y = ["high", "high"] → Gini = 0
Weighted Gini = (3/5) × 0.444 + (2/5) × 0 = 0.267
Information Gain = 0.64 - 0.267 = 0.373
Split 5: X < 9 vs X ≥ 9
Group 1 (X < 9): X = [1, 3, 5, 7], Y = ["low", "low", "medium", "high"] → Gini = 0.625
Group 2 (X ≥ 9): X = [9], Y = ["high"] → Gini = 0
Weighted Gini = (4/5) × 0.625 + (1/5) × 0 = 0.5
Information Gain = 0.64 - 0.5 = 0.14

Result

Splits 3 and 4 tie for highest Information Gain (0.373).
The algorithm would choose one of them (often the first encountered).

Random Forest Classifier

Bootstrap aggregating with majority voting

Create N bootstrap samples (with replacement)
For each bootstrap sample (in parallel)
Original Gini Impurity from Y values in this sample
For each node split
Randomly select subset of features (√n_features)
For each selected feature
For each possible split (based on X values)
Split Y values into two groups
Calculate class counts and Gini for each group
Combine Gini scores with weighted average
Information Gain = Original Gini - Weighted Gini
Track split with highest gain for selected features
Select best split and continue building tree
Majority vote from all N trees

XGBoost Classifier

Gradient boosting for classification tasks

Initialize predictions (class probabilities)
For each tree (sequentially)
Calculate probability residuals (gradients)
Original impurity from residuals (not original Y)
For each feature
For each possible split (based on X values)
Split residuals into two groups
Calculate gain metric for each group
Combine scores with weighted average
Information Gain = Residual impurity - Weighted impurity
Track split with highest gain for this feature
Select best split across all features
Update probabilities += learning_rate × tree_prediction
Final prediction = class with highest probability
Decision Tree Classifier Visualization