Financial Fraud Detection using Machine Learning (XGBoost)

Below is the complete dashboard with Model Evaluations and a full list of transactions to go through. Scroll down for a detailed description of this case study. The full notebooks are available through my GitHub.

(Note: The dashboard takes a couple seconds to show up. Adblockers might block some visuals in the dashboard)

The Challenge
Financial fraud poses a significant threat to businesses and consumers alike. In both Australia and the United States, reported financial losses have reached alarming levels over $2.74 billion (ABC, 2024) in Australia and more than $10 billion (FTC, 2024) in the U.S. in 2023 alone. The number of scam reports in Australia rose by 18.5% (ABC, 2024), while the U.S. saw a 14% (FTC, 2024) increase in total fraud losses year over year. With millions of transactions occurring daily and the complexity of scams evolving often leveraging advanced technologies like AI manually reviewing each transaction for potential fraud is simply impossible. This growing challenge highlights the urgent need for automated, intelligent solutions, which is where machine learning models like XGBoost come to the rescue.

What is XGBoost?

XGBoost (eXtreme Gradient Boosting) is like a highly intelligent decision-making system. Imagine having thousands of small decision trees working together, each learning from the mistakes of others to make increasingly accurate predictions. It's similar to how a team of experts might collaborate, with each member bringing their unique insights to solve a complex problem. Other than a random forest however it does not just create random decision tree's and chooses the best among them it "learns" from previous ones and iteratively becomes better at predicting the best outcome. This is where the machine learning comes in.

The Journey: From Good to Excellent

Initial Model Results

My first attempt looked promising on the surface:

  • Overall accuracy: 99%

  • Legitimate transaction detection: Nearly perfect

  • Fraud detection rate: 93%

However, there was a significant problem:

  • For every 100 transactions flagged as fraudulent, only 4 were actual fraud

  • This meant 96 false alarms that would require unnecessary manual review

I therefore had to improve upon this model to make it viable in a real world scenario

The enhanced solution showed significant improvements.
Impressive Results

The enhanced model achieved:

  • Perfect accuracy on legitimate transactions

  • 96% precision in detecting fraud

  • Only 4 out of 100 fraudulent transactions are missed

  • Drastically reduced false alarms

How did I get there?

1. Enhanced Exploratory Data Analysis (EDA)

  • Initial Approach:

    • Basic summary statistics and class distribution were produced to gauge data quality and balance.

    • Visualizations were created to understand the distribution of transactions and fraud occurrences.

  • Enhanced Approach:

    • Additional layers to EDA were introduced, including deeper insights about transaction patterns and correlations.

    • The enhanced script provided more precise breakdowns of fraudulent versus legitimate transactions, revealing imbalances that, if unnoticed, could lead to misleading metrics (like high overall accuracy).

    • These insights helped identify critical areas for feature engineering and underscored the need to adjust evaluation metrics beyond just accuracy.

2. Advanced Feature Engineering

  • Initial Script:

    • Extracted basic features from the original data. Notable features were derived from the account names (like nameOrig and nameDest).

    • Limited to basic transformations with minimal domain-specific insights.

  • Enhanced Script:

    • Creation of New Features:

      • Features such as originType and destType were derived from specific characters in account IDs, which provided more granular differentiators between transactions.

      • Calculated differences like origBalanceDelta (difference between current and previous balance) to capture financial shifts more dynamically.

    • Improved Feature Engineering Workflow:

      • Systematic handling of missing values as well as scaling and transforming numeric features using methods like StandardScaler and PowerTransformer.

      • Inclusion of multiple financial aggregates and transaction behaviors that help the model learn more nuanced distinctions, especially in fraudulent behaviors.

The advanced feature engineering not only provided greater context to the model but also significantly improved its ability to differentiate between subtle patterns that are typical of fraudulent transactions.

3. Resampling Techniques and Class Imbalance Handling

  • Initial Approach:

    • Addressed class imbalance by using only SMOTE (Synthetic Minority Over-sampling Technique) from the imblearn library.

    • While effective, SMOTE alone sometimes led to synthetic examples that weren't fully representative of the underlying patterns.

  • Enhanced Approach:

    • Multiple Oversampling Strategies:

      • In addition to SMOTE, other oversampling methods such as ADASYN or combined techniques like SMOTEENN/SMOTETomek were considered. These combinations better capture the diversity within minority class examples and help reduce overfitting on synthetic samples.

    • Result Effect:

      • With a more robust method for generating synthetic data, the model showed a significant jump in fraud precision (up to 96%) and overall better F1 scores for the fraud class.

4. Model Parameter Tuning and Threshold Optimization

  • Initial Script:

    • Performed model training with a relatively straightforward parameter tuning approach using GridSearchCV to adjust XGBoost hyperparameters.

  • Enhanced Script:

    • Refined Hyperparameter Search:

      • The enhanced version experimented with a broader range of hyperparameters, tailoring the grid search to optimize models for detecting the minority fraud class.

    • Optimal Threshold Adjustment:

      • Rather than relying on the default probability threshold (0.5), the enhanced model computes an optimal threshold based on maximizing the F1 score.

        • This is achieved by generating a precision-recall curve and selecting the threshold where the F1 score (the harmonic mean of precision and recall) is maximized.

      • This critical adjustment reduced false positives substantially, ensuring that when the model flags a transaction as fraudulent, it is much more likely to be correct.

The dual strategy of tuning both the hyperparameters and the decision threshold led to much better-balanced performance—especially critical in business scenarios where the cost of false alarms (and thus manual review) must be minimized.

5. Improvement Summary

We improved the model significantly by:

  1. Better Data Processing: Enhanced the way we prepare and clean our transaction data

  2. Feature Engineering: Created more meaningful patterns for the model to learn from

  3. Advanced Parameter Tuning: Fine-tuned the model's settings for optimal performance

Business Impact

This improvement means:

  • Reduced Operational Costs: Fewer false alarms mean less time spent on manual reviews

  • Better Customer Experience: Fewer legitimate transactions flagged incorrectly

  • Improved Fraud Prevention: More accurate detection of actual fraud

  • Scalable Solution: Handles large transaction volumes efficiently

Technical Implementation

  • Built using Python and the XGBoost algorithm

  • Data stored and processed in BigQuery

  • Interactive visualization through Looker Studio

  • Based on a comprehensive financial transaction dataset

Looking Forward

This project demonstrates how machine learning can be practically applied to solve real-world business problems. The dramatic improvement from our initial to enhanced model shows the importance of continuous refinement and optimization in machine learning projects.