Behavioral Outlier Segmentation using credit card dataset
INFO 523 - Summer 2025 - Final Project
Saumya Gupta, Sathwika Karri
Objective
- Group customers based on credit card spending, payment, and usage behavior
- Identify customers likely to stop using their card and take proactive retention measures
Business Problem
- Detect clusters of customers by transaction behavior (recency, frequency, monetary) and classify risk levels (high, medium, low)
![]()
credit card
- Predict which customers might churn or switch to competitors
Analytical Approach
- Using cluster techniques to group customers based on spending, payment frequency, and transaction history to identify common behavioral segments
- Using regression analysis to predict the likelihood of customer churn for each individual based on behavioral patterns and segment
- Tailor retention strategies based on behavioral segments
Outlier Detection
Distribution Analysis
Summary of Cluster Algorithms
![]()
K-Means (k=3) produced 3 clusters with a moderate Silhouette Score (0.233), indicating some separation between clusters but not very strong.
Cluster Elbow Curve
![]()
- The Elbow Curve indicates that 4 clusters capture most of the variation in the data.
- Therefore, we divide customers into 4 behavioral segments for further analysis.
Clusters based on behavior
Customer Risk Label Distribution
![]()
- Most customers fall into Low Risk (261) and Medium Risk (6,139) categories.
- A smaller number of customers are in High Risk (2,453) and Extreme Risk (97)
Feature Correlation Analysis
- Highly correlated features (>0.8):
- PURCHASES ↔︎ ONEOFF_PURCHASES: 0.917
- PURCHASES_FREQUENCY ↔︎ PURCHASES_INSTALLMENTS_FREQUENCY: 0.863
- These correlations help identify redundant features for model training
Note: Correlation matrix heatmap would be generated during analysis
Churn Target Creation
- Synthetic churn target created using composite risk scoring
- Churn rate: 25.01% (2,238 out of 8,950 customers)
Target Distribution: - Non-Churn Customers: 6,712 (74.99%) - Churn Customers: 2,238 (25.01%)
Risk factors considered: - Low purchase frequency - High cash advance usage - Irregular payment patterns - High balance-to-credit ratio - Risk indicators
Model Evaluation Results
Conclusion
This project combines clustering and classification to analyze customer behavior in financial services. It identifies four customer segments with distinct risk profiles and achieves a 99.94% ROC-AUC in churn prediction. Key drivers of churn include cash advance usage and credit utilization. The findings enable data-driven retention strategies, improving customer retention and lifetime value through actionable insights in the credit card industry.
Limitations
- Synthetic Target: Churn target created using business rules rather than actual churn data
- Feature Availability: Some features like PRC_FULL_PAYMENT were not available in the dataset
- Temporal Aspect: No time-series data to capture actual churn patterns over time
- Domain Expertise: Risk scoring weights based on business assumptions rather than empirical validation
Acknowledgment
Thank you, Professor Greg Chism, for your guidance, support, and valuable feedback throughout this project.
Thank you to my classmates and team members for their collaboration, insights, and contributions.