Scikit-learn is a widely used open-source Python library for machine learning, providing simple and efficient tools for data analysis and modeling. It is built on top of popular libraries like NumPy, SciPy, and Matplotlib, and offers a wide range of algorithms for supervised and unsupervised learning.
Scikit-learn is highly accessible for both beginners and professionals, with well-documented APIs and good integration with other data science tools like Pandas.
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load the Iris dataset
data = load_iris()
X = data.data # Features
y = data.target # Target labels
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize a Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the classifier
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
# Print the accuracy
print(f"Accuracy: {accuracy:.2f}")
# Print a comparison of actual vs predicted values
print("\nActual vs Predicted:")
for actual, predicted in zip(y_test, y_pred):
print(f"Actual: {actual}, Predicted: {predicted}")
# Print some sample data points from the test set to explain the results
print("\nSample Data Points (Features) from the Test Set:")
for i in range(5):
print(f"Test Sample {i+1}: {X_test[i]}, Predicted Label: {y_pred[i]}, Actual Label: {y_test[i]}")
#------------------------------------------------------------------------------------#
# Output
# Accuracy: 1.00
#
# Actual vs Predicted:
# Actual: 1, Predicted: 1
# Actual: 0, Predicted: 0
# Actual: 2, Predicted: 2
# Actual: 1, Predicted: 1
# Actual: 1, Predicted: 1
# Actual: 0, Predicted: 0
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 1, Predicted: 1
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 1, Predicted: 1
# Actual: 1, Predicted: 1
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 2, Predicted: 2
# Actual: 0, Predicted: 0
# Actual: 0, Predicted: 0
#
# Sample Data Points (Features) from the Test Set:
# Test Sample 1: [6.1 2.8 4.7 1.2], Predicted Label: 1, Actual Label: 1
# Test Sample 2: [5.7 3.8 1.7 0.3], Predicted Label: 0, Actual Label: 0
# Test Sample 3: [7.7 2.6 6.9 2.3], Predicted Label: 2, Actual Label: 2
# Test Sample 4: [6. 2.9 4.5 1.5], Predicted Label: 1, Actual Label: 1
# Test Sample 5: [6.8 2.8 4.8 1.4], Predicted Label: 1, Actual Label: 1
#------------------------------------------------------------------------------------#
This code performs the following actions:
Yes, the code will automatically load the Iris dataset using the load_iris()
function provided by Scikit-learn. The Iris dataset is one of several built-in toy datasets included in Scikit-learn, so you don't need to manually download or load any external files.
load_iris()
fetches the Iris dataset, which contains data on 150 samples of iris flowers, with features like sepal length, sepal width, petal length, and petal width.Bunch
object, which behaves like a dictionary and holds both the data (features) and the target (labels).data.data
for the features and data.target
for the labels.No external setup is needed to use this dataset—it's available right out of the box in Scikit-learn.
If the accuracy is 1.00, the actual labels and predicted labels should match perfectly for all test samples, and you will see identical values in the comparison output.