The classification model trained to predict customer segments based on the K-means clustering performed exceptionally well, as indicated by an overall accuracy of approximately 94.6%. The classification report further supports this, showing strong performance across all clusters. Specifically:
Cluster 0: The model achieved a precision of 0.92 and a recall of 0.90, resulting in an F1 score of 0.91. This indicates that while the model is slightly less precise in predicting this cluster compared to others, it still maintains a high level of accuracy.
Cluster 1: With a precision of 0.95 and a recall of 0.97, the model performs very well, achieving a high F1 score of 0.96. This suggests that the model is particularly effective at identifying members of this segment.
Cluster 2: The model exhibits excellent performance for this cluster, with both precision and recall at 0.96, resulting in a robust F1 score of 0.96.
Overall, the classifier demonstrates a balanced performance across all clusters, with the macro and weighted averages for precision, recall, and F1 score all reflecting a high degree of reliability and accuracy. This suggests that the model can be effectively used for predicting customer segments, making it a valuable tool for targeted marketing strategies and customer relationship management.
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
# Load the dataset
file_path = 'marketing_campaign.csv'
data = pd.read_csv(file_path, delimiter='\t')
# Selecting relevant features for clustering
features = data[['Income', 'Kidhome', 'Teenhome', 'Recency', 'MntWines',
'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
'MntGoldProds', 'NumDealsPurchases', 'NumWebPurchases',
'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth']]
# Handling missing values by filling with the mean of the column
features = features.fillna(features.mean())
# Standardizing the features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
# Using the elbow method to find the optimal number of clusters
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=0)
kmeans.fit(features_scaled)
wcss.append(kmeans.inertia_)
# Plotting the elbow graph
plt.figure(figsize=(10, 6))
plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method for Optimal Number of Clusters')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()
/home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) /home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
from sklearn.decomposition import PCA
# Set the optimal number of clusters
optimal_clusters = 3
# Perform K-Means clustering with the optimal number of clusters
kmeans = KMeans(n_clusters=optimal_clusters, random_state=0)
clusters = kmeans.fit_predict(features_scaled)
# Add the cluster results to the original data
data['Cluster'] = clusters
# Reduce the features to 2 dimensions using PCA for visualization
pca = PCA(n_components=2)
principal_components = pca.fit_transform(features_scaled)
# Adding the PCA results to the original data
data['PCA1'] = principal_components[:, 0]
data['PCA2'] = principal_components[:, 1]
# Visualize the clusters
plt.figure(figsize=(10, 8))
plt.scatter(data['PCA1'], data['PCA2'], c=data['Cluster'], cmap='viridis', marker='o', edgecolor='k', s=50)
plt.title('Customer Segments Visualization')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.colorbar(label='Cluster')
plt.show()
/home/charles/anaconda3/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features_scaled, clusters, test_size=0.3, random_state=42)
# Train a classifier (using Random Forest in this example)
classifier = RandomForestClassifier(random_state=42)
classifier.fit(X_train, y_train)
# Make predictions
y_pred = classifier.predict(X_test)
# Evaluate the classifier
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
Accuracy: 0.9464285714285714 Classification Report: precision recall f1-score support 0 0.92 0.90 0.91 209 1 0.95 0.97 0.96 173 2 0.96 0.96 0.96 290 accuracy 0.95 672 macro avg 0.94 0.95 0.95 672 weighted avg 0.95 0.95 0.95 672
The classification model trained to predict customer segments based on the K-Means clustering performed exceptionally well, as indicated by an overall accuracy of approximately 94.6%. The classification report further supports this, showing strong performance across all clusters. Specifically:
Cluster 0: The model achieved a precision of 0.92 and a recall of 0.90, resulting in an F1-score of 0.91. This indicates that while the model is slightly less precise in predicting this cluster compared to others, it still maintains a high level of accuracy. Cluster 1: With a precision of 0.95 and a recall of 0.97, the model performs very well, achieving a high F1-score of 0.96. This suggests that the model is particularly effective at identifying members of this segment. Cluster 2: The model exhibits excellent performance for this cluster, with a precision and recall both at 0.96, resulting in a robust F1-score of 0.96. Overall, the classifier demonstrates a balanced performance across all clusters, with the macro and weighted averages for precision, recall, and F1-score all reflecting a high degree of reliability and accuracy. This suggests that the model can be effectively used for predicting customer segments, making it a valuable tool for targeted marketing strategies and customer relationship management.