Scikit-learn Overview

Scikit-learn is a widely used open-source Python library for machine learning, providing simple and efficient tools for data analysis and modeling. It is built on top of popular libraries like NumPy, SciPy, and Matplotlib, and offers a wide range of algorithms for supervised and unsupervised learning.


Key Features

Scikit-learn is highly accessible for both beginners and professionals, with well-documented APIs and good integration with other data science tools like Pandas.



from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X = data.data  # Features
y = data.target  # Target labels

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize a Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)

# Print the accuracy
print(f"Accuracy: {accuracy:.2f}")

# Print a comparison of actual vs predicted values
print("\nActual vs Predicted:")
for actual, predicted in zip(y_test, y_pred):
    print(f"Actual: {actual}, Predicted: {predicted}")

# Print some sample data points from the test set to explain the results
print("\nSample Data Points (Features) from the Test Set:")
for i in range(5):
    print(f"Test Sample {i+1}: {X_test[i]}, Predicted Label: {y_pred[i]}, Actual Label: {y_test[i]}")

#------------------------------------------------------------------------------------#
# Output
# Accuracy: 1.00
# 
# Actual vs Predicted:
# Actual: 1, Predicted: 1
# Actual: 0, Predicted: 0
# Actual: 2, Predicted: 2
# Actual: 1, Predicted: 1
# Actual: 1, Predicted: 1
# Actual: 0, Predicted: 0
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 1, Predicted: 1
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 1, Predicted: 1
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
# 
# Sample Data Points (Features) from the Test Set:
# Test Sample 1: [6.1 2.8 4.7 1.2], Predicted Label: 1, Actual Label: 1
# Test Sample 2: [5.7 3.8 1.7 0.3], Predicted Label: 0, Actual Label: 0
# Test Sample 3: [7.7 2.6 6.9 2.3], Predicted Label: 2, Actual Label: 2
# Test Sample 4: [6.  2.9 4.5 1.5], Predicted Label: 1, Actual Label: 1
# Test Sample 5: [6.8 2.8 4.8 1.4], Predicted Label: 1, Actual Label: 1
#------------------------------------------------------------------------------------#


Scikit-learn Code Summary

This code performs the following actions:

About the Iris Dataset

Yes, the code will automatically load the Iris dataset using the load_iris() function provided by Scikit-learn. The Iris dataset is one of several built-in toy datasets included in Scikit-learn, so you don't need to manually download or load any external files.

Overview of How it Works

No external setup is needed to use this dataset—it's available right out of the box in Scikit-learn.


Scikit-learn Comparison

If the accuracy is 1.00, the actual labels and predicted labels should match perfectly for all test samples, and you will see identical values in the comparison output.