Poster
in
Workshop: Advances in Financial AI: Opportunities, Innovations, and Responsible AI
Rethinking tabular synthetic data generation for improving financial fraud detection: new challenges in the banking scenarios
Dae-Young Park · In-Young Ko
Tabular synthetic data generation has become crucial for more accurate financial fraud detection in the banking sector, especially where there are data privacy regulations such as General Data Protection Regulation (GDPR) restrict access to original datasets. In this study, we investigate and analyze two critical yet unexplored challenges that hinder the effectiveness of financial fraud detection models trained on the generated tabular synthetic data. First, we define the \textit{TSDG} challenge, where the performance of fraud detection models trained on tabular synthetic data significantly declines as the intensity of two key data characteristics increases — high data sparsity and a large number of attributes with non-normal distributions. This indicates that existing generative models fail to capture the structural complexities of financial transactions. Second, we define the \textit{ALFA} challenge, which stems from the irregularly recurrent temporal patterns, named as active lifetimes of fraudulent activities with heightened fraud frequency and intensity. Fraud detection models suffer from increased false positives and reduced true positive rates during these active lifetimes. Through extensive empirical studies on both private and public banking datasets, we demonstrate that existing tabular synthetic data generative models suffer from the TSDG challenge. We also reveal that fraud detection models suffer from the ALFA challenge. Our findings underscore the necessity for novel tabular synthetic data generation approaches and financial fraud detection models that directly address these two challenges, paving the way for more robust financial fraud detection applications in banking scenarios.