Detecting Anomalies in Credit Card Transactions

Proposal

Exploring unsupervised anomaly detection methods on imbalanced financial data

Author

Affiliation

Reilly Rodriguez Spencer

College of Information Science, University of Arizona

Dataset

This project uses the Credit Card Fraud Detection dataset, containing transactions made by European cardholders over two days in 2013. The dataset includes 284,807 transactions, with only 492 labeled as fraudulent (Class = 1), making it highly imbalanced.

Although the dataset includes labeled anomalies, I am approaching this as an unsupervised anomaly detection task, simulating a real-world scenario where fraudulent examples are not labeled during model training. Labels are withheld during training and used only for model evaluation.

Shape: (284807, 31)

First 10 rows:

   Time        V1        V2        V3        V4        V5        V6        V7        V8        V9       V10       V11       V12       V13       V14       V15       V16       V17       V18       V19       V20       V21       V22       V23       V24       V25       V26       V27       V28  Amount  Class
0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599  0.098698  0.363787  0.090794 -0.551600 -0.617801 -0.991390 -0.311169  1.468177 -0.470401  0.207971  0.025791  0.403993  0.251412 -0.018307  0.277838 -0.110474  0.066928  0.128539 -0.189115  0.133558 -0.021053  149.62      0
1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803  0.085102 -0.255425 -0.166974  1.612727  1.065235  0.489095 -0.143772  0.635558  0.463917 -0.114805 -0.183361 -0.145783 -0.069083 -0.225775 -0.638672  0.101288 -0.339846  0.167170  0.125895 -0.008983  0.014724    2.69      0
2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461  0.247676 -1.514654  0.207643  0.624501  0.066084  0.717293 -0.165946  2.345865 -2.890083  1.109969 -0.121359 -2.261857  0.524980  0.247998  0.771679  0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752  378.66      0
3   1.0 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609  0.377436 -1.387024 -0.054952 -0.226487  0.178228  0.507757 -0.287924 -0.631418 -1.059647 -0.684093  1.965775 -1.232622 -0.208038 -0.108300  0.005274 -0.190321 -1.175575  0.647376 -0.221929  0.062723  0.061458  123.50      0
4   2.0 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941 -0.270533  0.817739  0.753074 -0.822843  0.538196  1.345852 -1.119670  0.175121 -0.451449 -0.237033 -0.038195  0.803487  0.408542 -0.009431  0.798278 -0.137458  0.141267 -0.206010  0.502292  0.219422  0.215153   69.99      0
5   2.0 -0.425966  0.960523  1.141109 -0.168252  0.420987 -0.029728  0.476201  0.260314 -0.568671 -0.371407  1.341262  0.359894 -0.358091 -0.137134  0.517617  0.401726 -0.058133  0.068653 -0.033194  0.084968 -0.208254 -0.559825 -0.026398 -0.371427 -0.232794  0.105915  0.253844  0.081080    3.67      0
6   4.0  1.229658  0.141004  0.045371  1.202613  0.191881  0.272708 -0.005159  0.081213  0.464960 -0.099254 -1.416907 -0.153826 -0.751063  0.167372  0.050144 -0.443587  0.002821 -0.611987 -0.045575 -0.219633 -0.167716 -0.270710 -0.154104 -0.780055  0.750137 -0.257237  0.034507  0.005168    4.99      0
7   7.0 -0.644269  1.417964  1.074380 -0.492199  0.948934  0.428118  1.120631 -3.807864  0.615375  1.249376 -0.619468  0.291474  1.757964 -1.323865  0.686133 -0.076127 -1.222127 -0.358222  0.324505 -0.156742  1.943465 -1.015455  0.057504 -0.649709 -0.415267 -0.051634 -1.206921 -1.085339   40.80      0
8   7.0 -0.894286  0.286157 -0.113192 -0.271526  2.669599  3.721818  0.370145  0.851084 -0.392048 -0.410430 -0.705117 -0.110452 -0.286254  0.074355 -0.328783 -0.210077 -0.499768  0.118765  0.570328  0.052736 -0.073425 -0.268092 -0.204233  1.011592  0.373205 -0.384157  0.011747  0.142404   93.20      0
9   9.0 -0.338262  1.119593  1.044367 -0.222187  0.499361 -0.246761  0.651583  0.069539 -0.736727 -0.366846  1.017614  0.836390  1.006844 -0.443523  0.150219  0.739453 -0.540980  0.476677  0.451773  0.203711 -0.246914 -0.633753 -0.120794 -0.385050 -0.069733  0.094199  0.246219  0.083076    3.68      0

Questions

How well do unsupervised ensemble-based models detect fraudulent transactions when labels are withheld during training?
How do distance-based and statistical approaches to outlier detection compare to ensemble-based methods in identifying rare but meaningful anomalies?

Analysis plan

Data Preprocessing (August 6–7)
- Load and clean the dataset; scale relevant features.
Modeling: Unsupervised Learning (August 8–10)
- Implement anomaly detection methods.
Evaluation (August 11–13)
- Assess model performance.
Visualization (August 13–16)
- Create performance plots and visual summaries of detected anomalies.
Presentation Prep (August 17–19)
- Finalize presentation materials and write-up of results.

Final project organization

final-project-rodriguez/
|
├─ data/
│  ├─ creditcard.csv
│  └─ README.md
|
├─ about.qmd
├─ index.qmd
├─ presentation.qmd
├─ proposal.qmd
├─ _quarto.yml
├─ requirements.txt
└─ README.md