Sage Journals: Discover world-class research

Abstract

This study applies machine learning (ML) techniques to predict fines imposed by the Mutual Fund Dealers Association of Canada on investment advisors who violate securities laws. Anchored in deterrence theory, the research evaluates whether fine allocation reflects proportionality, consistency, and severity. Using probabilistic and ML models with feature selection and extraction methods like PCA and RFE, the study identifies key predictors of fines. Results show that investigation costs and commissions are consistent predictors, while more serious offenses like quasi-criminal violations have limited influence. These findings raise concerns about regulatory leniency and the adequacy of current fine structures securities violation enforcement. The results from this study offers insights for a data-driven framework to improve fairness and effectiveness in regulatory enforcement.

Keywords

analytical models law machine learning standards logistics deep Learning LSTM SciKit keras tensorflow legal text analytics

Introduction

Investment fraud in Canada has intensified over the past decade, raising concerns about the effectiveness of regulatory enforcement as a service provided by self-regulatory organizations (SROs). Securities violations have resulted in over $45 million in fines (Canadian Broadcasting Corporation, 2020), yet fraudulent activities continue to undermine investor confidence and market stability. High-profile cases, such as the Montreal-based Ponzi scheme that defrauded investors of over $500 million and the InvestTech FX fraud that deceived nearly 2000 investors worldwide, highlight critical gaps in regulatory oversight (Canadian Broadcasting Corporation, 2021). These incidents expose vulnerabilities in the financial sector and reinforce the need for a robust, data-driven regulatory approach to ensure compliance and protect investors. The ability of SROs to effectively allocate fines is not only a matter of enforcement but also a key aspect of service management in financial regulation, where fairness, consistency, and deterrence play a central role in maintaining trust in the industry.

Despite ongoing regulatory efforts by the Canadian Securities Administrators (CSA) and provincial securities regulators, the persistent nature of investment fraud raises questions about the adequacy of existing enforcement mechanisms. Complex fraud schemes often exploit offshore structures and target vulnerable investors, challenging the capacity of regulators to administer penalties that serve as effective deterrents. As a service function, regulatory enforcement should aim to ensure consistency in fine allocation and maintain proportionality in sanctioning offenders. However, concerns persist regarding whether fines reflect the severity of violations and whether penalties are applied equitably. A critical aspect of financial service management is understanding whether regulatory decisions align with deterrence theory, while remaining proportionate to the nature and extent of the misconduct.

This study applies machine learning algorithms to predict fines imposed by the Mutual Fund Dealers Association of Canada (MFDA), an SRO responsible for overseeing mutual fund dealers in Canada. The MFDA plays a crucial role in financial service management by enforcing ethical standards and compliance within the mutual fund industry. However, critics argue that SRO-imposed fines are often inconsistent and may not adequately reflect the actual harm caused to investors (Anand, 2018; Boyle et al., 2024; M. Lokanan & Masannagari, 2021). Grounded in deterrence theory, this research addresses the following question: How effectively can machine learning algorithms predict fines imposed by the MFDA, and to what extent do these fines align with the principles of proportionality and severity central to deterrence theory? The study pursues two key objectives:

(1) To develop machine learning models that predict fines imposed by the MFDA based on offense severity and other key case characteristics.

(2) To evaluate whether fines are proportionate and consistent with deterrence theory, ensuring that enforcement strategies promote regulatory fairness and deter future violations.

By positioning regulatory enforcement as a key component of financial service management, this study provides insights into how data-driven models can enhance the transparency, consistency, and effectiveness of fine allocation within the securities industry.

This study contributes to the literature on artificial intelligence (AI) in regulatory services by demonstrating how machine learning can enhance efficiency, transparency, and consistency in financial crimes enforcement. First, while previous studies have applied machine learning to detect financial crimes (M. Lokanan & Sharma, 2024) and predict dispute resolution outcomes (Fonseca, 2023; Wen & Ti, 2024), limited work has focused on predicting regulatory fines. Addressing this gap, the study applies machine learning to analyze fine allocation by the MFDA, offering new insights into the proportionality and fairness of financial sanctions.

Second, while prior research has primarily employed singular machine learning (ML) experiments to predict penalties in dispute resolution (M. Lokanan & Sharma, 2024; Ruohonen & Hjerppe, 2020; Wen & Ti, 2024), this study adopts a multi-experimental approach that systematically compares a range of statistical and ML models. Traditional benchmarks were established using Ordinary Least Squares (OLS) and Weighted Least Squares (WLS) regression models. Dimensionality reduction techniques, including Principal Component Analysis (PCA) and Singular Value Decomposition (SVD), were used to improve model interpretability, while Recursive Feature Elimination with Cross-Validation (RFECV) helped refine variable selection. The study also evaluates several ML models—including Linear Regression, Ridge, Lasso, Elastic Net, Random Forest, Decision Tree, and K-Nearest Neighbors (KNN)—to capture both linear and non-linear relationships in fine allocation. This comprehensive framework provides regulatory agencies with a data-driven methodology to integrate artificial intelligence into enforcement processes and enhance the efficiency and consistency of compliance monitoring.

The remainder of this paper is structured as follows: first, a literature review and discussion of deterrence theory as a framework for evaluating fine proportionality. Second, the experimental setup, including data collection, feature engineering, and the statistical and machine learning methods used. Third, findings from both baseline statistical models and machine learning algorithms are presented, highlighting key predictors and model performance. Fourth, the discussion interprets results in relation to deterrence theory and enforcement practices. The paper concludes by summarizing key contributions, outlining limitations, and suggesting future research directions to improve fine allocation in SRO regulatory processes.

Literature Review

Regulatory Enforcement and the Case for Data-Driven Fairness in Securities Penalties

Securities regulation in Canada is overseen by the Canadian Securities Administrators (CSA), with enforcement responsibilities delegated to provincial regulators and SROs, particularly the Canadian Investment Regulatory Organization (CIRO). CIRO—formed through the merger of the MFDA and IIROC—regulates investment and mutual fund dealers, overseeing licensing, compliance, enforcement, and investor protection (CIRO, 2024). Common offenses include fraud, insider trading, market manipulation, and disclosure violations (Anand, 2018; Russell & Cheng, 2019). However, current fine structures have drawn criticism for being inconsistent and insufficiently punitive, failing to reflect the harm caused or deter future misconduct (Boyle et al., 2024; Canadian Broadcasting Corporation, 2020). Others have argued that disparities in penalties for similar offenses undermine public trust, leading to calls for more uniform enforcement (Anand, 2018; Canadian Broadcasting Corporation, 2020; Russell & Cheng, 2019).

Fairness and consistency are essential for maintaining regulatory legitimacy (Agranov & Buyalskaya, 2022; Davies & Malik, 2022). Studies show that SRO-imposed penalties often fail to match the financial harm caused, with similar offenses receiving vastly different sanctions (Boyle et al., 2024; M. Lokanan & Sharma, 2024; Tuch, 2014). Subjective factors—such as an offender’s financial status or jurisdiction—also influence outcomes (Wen & Ti, 2024). Excessive reliance on post-offense mitigating factors further weakens proportionality-based penalties (Ghafoor et al., 2022; M. Lokanan & Masannagari, 2021). These inconsistencies raise concerns about regulatory bias and leniency, prompting calls for the adoption of computational techniques to improve fairness and consistency in enforcement practices. Recent research supports the use of data-driven tools to improve fairness in enforcement practices, arguing that machine learning models can systematically identify offense-relevant features, reducing human bias and enhancing consistency (Metsker et al., 2019; Ruohonen & Hjerppe, 2020). More recent research shows that machine learning’s predictive capabilities make it particularly effective for identifying the factors associated with securities law violations, addressing concerns around arbitrary or inconsistent fines (M. E. Lokanan & Sharma, 2022).

Machine Learning in Predictive Regulatory and Legal Applications

Machine learning has gained considerable traction in regulatory and legal domains for its potential to enhance predictive accuracy and decision-making efficiency. Machine learning has been used to predict tax rulings, outperforming traditional regression methods (Alarie et al., 2016). Support vector machine (SVM) classifiers have been applied to forecast human rights decisions from the European Court of Human Rights, achieving strong accuracy for past cases (Medvedeva et al., 2020). Similarly, K-Nearest Neighbors (KNN) and ensemble models have proven effective in dispute resolution, supporting fair and consistent rulings (M. E. Lokanan, 2023).

Advanced machine learning methods, including deep learning and natural language processing (NLP), have significantly enhanced predictive tools in legal contexts. Researchers have employed logistic regression, SVMs, convolutional neural networks (CNNs), and long short-term memory (LSTM) models to accurately predict court judgments (Shelar & Moharir, 2021). Others demonstrated that deep learning algorithms could outperform human experts in predicting appeal outcomes in Brazilian courts (Jacob de Menezes-Neto & Clementino, 2022). Systematic reviews further support these findings, showing that classifiers such as SVMs, classification and regression trees (CART), and boosted models consistently achieve high predictive performance (Rosili et al., 2021). Rosili et al. (2021) argued that the predictive accuracy is particularly strong when multiple algorithms are combined in ensemble models.

Despite the success of machine learning techniques in regulatory enforcement, limitations regarding tuning, model transparency, and interpretability persist. Machine learning lacks the higher-order reasoning required for nuanced legal decisions (Markou & Deakin, 2019; Surden, 2021) and often functions as a “black box,” limiting transparency (Bhambhoria et al., 2021; Možina et al., 2005). Critics also note machine learning’s limitations in replicating legal intuition, arguing that while these models can identify patterns and correlations in data, they often lack the contextual understanding, moral reasoning, and interpretive flexibility that human legal experts apply when making decisions (Osei Bonsu, 2020). However, most legal work involves service delivery, not just courtroom argumentation, making machine learning a valuable tool for augmenting efficiency and reducing errors in legal services (Brockman et al., 2002; Sheldon & Krieger, 2014).

The Use of “Prediction” in Machine Learning Models

The term prediction can carry varied meanings depending on the context. In common usage or metacognitive reasoning, it may suggest intuition, guesswork, or subjective inference. However, in data science and machine learning, prediction refers to a rigorous computational process in which algorithms estimate outcomes—such as regulatory fines—based on input features derived from structured data (Jacob de Menezes-Neto & Clementino, 2022; M. Lokanan & Sharma, 2024). The goal is not speculation, but accuracy and generalizability, achieved through empirically validated models optimized to minimize error.

In this study, prediction is used within the framework of supervised machine learning to evaluate whether fine amounts can be reliably estimated from observable case characteristics. Predictive performance, assessed using measures such as root mean square error (RMSE), serves as an objective benchmark for comparing algorithms and validating model accuracy. Unlike explanatory models focused on causal inference, predictive models help identify systematic patterns in enforcement decisions, offering insights into the consistency and proportionality of sanctions. Used alongside explanatory techniques, predictive modeling offers a complementary approach for assessing regulatory fairness in fine allocation.

Theoretical Framework: Deterrence Theory

Deterrence theory posits that the threat of punishment deters individuals from engaging in undesirable behaviors (Natarajan, 2016; Piquero et al., 2011; Rorie & West, 2022). Initially grounded in rational choice models—where actors weigh costs and benefits (Matsueda et al., 2006; Paternoster, 1989; Piliavin et al., 1986)—the theory has since evolved to consider psychological, social, and situational influences on behavior. Current work emphasizes the importance of the certainty, severity, and swiftness of sanctions (Abramovaite et al., 2023; Buckenmaier et al., 2021; Earnhart & Friesen, 2023; Roche et al., 2020) and integrates informal controls such as moral norms and peer influence (Heitkamp & Mowen, 2024; Homer & Maume, 2024; Kim et al., 2019; Wang et al., 2019).

Despite its ascendancy in explaining crime and deviance, deterrence theory has been subjected to criticism. One of the fundamental propositions of deterrence theory is that prospective offenders are actors who calculate the costs and benefits of their actions (Abramovaite et al., 2023; Piquero et al., 2011). However, critics argue that many decisions involve non-rational decision-making (or impulsive decision-making), where individuals act without premeditation due to immediate situational pressures, such as sudden financial strain or retaliatory violence in response to an insult. These crimes often occur in moments of intense emotional arousal or distress, leaving little room for the calculation of the net utility of potential risks and rewards (Mulder, 2018; Paternoster, 1989).

Others argue that deterrence theory overlooks how social, cultural, and economic contexts shape individuals’ responsiveness to sanctions (Kahan, 1997; Makkai & Braithwaite, 1994; Mulder, 2018). Company directors under economic or political strain may view misconduct as a necessity rather than a choice, with external pressures creating organizational stress that drives behavior beyond what deterrence theory can explain (Kahan, 1997; Makkai & Braithwaite, 1994). That said, others have shown that deterrence theory incorporates variables to address socio-economic factors that mitigate some of the concerns of classical deterrence theory (Auriol et al., 2022; Spalding, 2014).

Deterrence theory serves as a valuable framework for evaluating the consistency and proportionality of fines imposed to deter misconduct. The present study utilizes this framework in conjunction with machine learning techniques to assess the alignment of fine allocation with the severity of offenses. However, there are still important gaps in the current research: earlier studies have not used machine learning to look at fines imposed by Canadian SROs, and the use of deterrence theory for financial penalties has not been thoroughly examined. Furthermore, few studies empirically evaluate the fairness of fine allocation through predictive methodologies. By integrating machine learning with deterrence theory, this study provides novel insights into the consistency and proportionality of regulatory enforcement within the securities sector.

Experimental Setting

Data Source

Data for this paper comes from the MFDA tribunal hearings. The data came from cases heard by the MFDA tribunal between 2005 and 2019. The year 2005 marks the first set of data available, and we stop at 2019 to ensure that sufficient time has passed to allow for the publication and completeness of case outcomes. Rather than collecting data on a sample of the cases, we decided to code the entire population of cases available on the MFDA website. Sampling the data could have led to larger Canadian provinces being overrepresented in the database, introducing bias into the analysis. While we cannot guarantee that all cases heard by the MFDA are published online, we made extensive efforts to ensure that every available case on the MFDA website was coded for analysis. After cleaning the data to remove irrelevant entries such as news releases, procedural motions, and adjourned cases, we collected a total of 625 cases heard by the MFDA regulatory tribunal across Canada. The dataset includes detailed information on the infractions committed by individual offenders and the penalties imposed for those infractions, forming a comprehensive basis for the analysis.

Variables and Measurements

The dependent variable in this study is the total fines imposed by the MFDA tribunals for violations of securities law, which serves as a measure of punishment severity—a fundamental aspect of deterrence theory. The fines imposed on offenders reflect the gravity of the offense and various contextual factors that affect penalties. Aggravating factors, such as prior misconduct or investor harm, are likely to result in increased fines, whereas mitigating factors, such as cooperation or remedial actions, may lead to reduced fines. The range of penalties varies from $5000 to over $1 million, fulfilling both punitive and deterrent functions by promoting compliance and safeguarding investors’ interests.

Table 1 presents the independent variables utilized in this study. The certainty of punishment, defined as the perceived likelihood of a sanction being applied, is addressed through the use of proxy variables. The type of hearing indicates whether a formal proceeding occurred, suggesting regulatory action and the likelihood of sanctions. The offender’s appearance conveys transparency and cooperation, both of which are associated with an increased probability of enforcement. Disciplinary history reflects prior infractions, indicating ongoing regulatory scrutiny and heightened attention.

Table 1.

Variables and Measurements.

Variable name	Description	Measurement
District council	The regional council where the case was heard	Categorical
Type of hearing	The nature or type of tribunal hearing	Categorical
Offender appearance	Whether the offender appeared in the tribunal	Numeric: (1 = appeared, 0 = not appeared)
Number of clients	Total number of clients impacted by the offense	Numeric: Count of clients
Total lost	Total financial loss suffered by clients due to the offense	Numeric: Monetary value (in dollars)
Total invested	Total funds invested by clients with the offender	Numeric: Monetary value (in dollars)
Commission	Commission earned by the offender from the offense	Numeric: Monetary value (in dollars)
Offender occupation	The profession or role of the offender at the time of the offense	Numeric: Encoded categories
Firm type	The type of firm associated with the offender	Numeric: Encoded categories
Offender experience	The number of years of experience the offender had in their occupation	Numeric
Offender gender	The gender of the offender	Categorical (male or Female)
Disciplinary history	Whether the offender had a prior disciplinary history	Categorical (1 = Yes, 0 = No)
Quasi criminal	Indicator for quasi-criminal offenses	Categorical (1 = Yes, 0 = No)
COI	Indicator for conflict-of-interest offenses	Categorical (1 = Yes, 0 = No)
ISP	Indicator for insider trading offenses	Categorical (1 = Yes, 0 = No)
ICO	Indicator for improper conduct offenses	Categorical (1 = Yes, 0 = No)
Misconduct	Indicator for general misconduct offenses	Categorical (1 = Yes, 0 = No)
Aggravating factors	Number of factors that aggravated the severity of the offense	Numeric
Mitigating factors	Number of factors that reduced the severity of the offense	Numeric

Swiftness of punishment, or celerity, pertains to the promptness with which sanctions are administered following the detection of an offense. Direct measurement of swiftness was not feasible in the current study due to the lack of timestamped data indicating the timeline from offense detection to penalty imposition. Nevertheless, variables such as investigation costs and type of hearing were included as indirect proxies for procedural pace. The underlying rationale is that more streamlined and cost-effective enforcement processes may suggest a quicker resolution timeline, thereby reinforcing the perceived swiftness of punishment and enhancing the overall deterrent effect.

Additional variables were incorporated to control for contextual, economic, and demographic influences. District council, offender gender, occupation, firm type, and experience account for jurisdictional and individual-level variations. Financial variables, including total invested, total lost, and commission, serve to control for case magnitude, while the number of clients captures the scale of impact. Offense-type indicators, such as quasi-criminal activities, conflicts of interest, insider trading, improper conduct, and general misconduct, reflect normative distinctions that may influence the severity of fines.

Data Cleaning and Preprocessing

Addressing Missing Values

A structured approach was used to handle missing data while preserving analytical validity. Variables with under 30% missing values—such as “Number of Clients” (10%), “Total Lost” (22%), and “Total Invested” (15%)—were retained and imputed using KNN imputation, which estimates missing values based on similar cases (Cismondi et al., 2013; M. Lokanan & Sharma, 2024). KNN preserves relationships in the data and is suitable for moderate, non-random missingness. Variables with minimal missingness—like “Type of Hearing” (1%) and “Offender Gender” (4%)—were similarly imputed with negligible impact. Despite a higher missing rate (62%), “Commission” was retained due to its relevance in predicting fines. KNN imputation enabled plausible estimation by leveraging related features, minimizing data loss and maintaining dataset robustness.

Variable Encoding and Feature Engineering

Categorical variables were transformed using OneHotEncoder to make them compatible with machine learning algorithms. “Type of Hearing” and “Offender Gender” were converted into numerical format to ensure accurate interpretation by the models. New features were created to reveal deeper relationships within the data. For instance, the “Total Lost / Total Invested” ratio captured the proportional financial impact of offenses, while counts of aggravating and mitigating factors reflected severity. The ten district councils were grouped into three regional categories—Central, Western, and Atlantic Canada—to streamline jurisdictional analysis. These engineered features improved the model’s ability to detect meaningful patterns in the data.

Outlier Detection and Treatment

Outliers were identified using interquartile ranges (IQR) and z-scores to ensure robust preprocessing of the data. The IQR method flagged values in “Total Lost,” “Total Invested,” and “Commissions” as outliers. These features naturally involve large monetary values, and extreme values may reflect legitimate high-severity cases rather than anomalies. Given the dataset’s size (600 observations), such occurrences are expected and were not removed without further consideration.

To validate these findings, z-scores were also calculated. Outliers detected using z-scores were relatively few—ranging from 0 to 16 per variable—which is acceptable for a dataset of this scale. While z-scores assume normal distribution, which may not hold for all features, they helped cross-check the distribution of extreme values and confirmed that most flagged points were not errors but valid extremes (Chikodili et al., 2021). To minimize the influence of these values on model performance, a robust scaler was employed. Unlike z-score normalization or min-max scaling, which are sensitive to outliers, the robust scaler algorithm transforms data by centering on the median and scaling by the IQR. Scaling the data ensures that features like “Total Invested” and “Commissions” remain informative while preventing distortion due to outlier influence (Ozkara et al., 2023). The result is a dataset that preserves critical information while maintaining modeling stability.

Multicollinearity and Variation Inflation Factor

As shown in Figure 1, none of the features were highly correlated with each other, indicating that multicollinearity may not be a widespread issue in the dataset. However, specific variable relationships, such as those with moderate correlations (i.e., > .50), should still be carefully evaluated to ensure robust modeling. Note also that the correlation matrix only displays pairwise relationships between independent variables, providing a limited perspective on multicollinearity. In contrast, the Variance Inflation Factor (VIF) is a much more robust measure because it quantifies how strongly an independent variable is related to all other variables in the dataset and provides a comprehensive view of how the features interact, helping to address potential issues that might impact model stability and interpretability.

Figure 1.

Correlation matrix.

As shown in Table 2, none of the features exhibit a VIF above 5, with most values remaining well below 3. While a VIF between 5 and 10 may indicate moderate multicollinearity, values below 5 are generally considered acceptable, suggesting that multicollinearity is not a significant concern in this dataset (Salmerón et al., 2016). The two highest VIF values, “District council_Central Canada” (5.46) and “District council_Western Canada” (5.17), marginally exceed 5 but remain within a manageable range. By keeping these features, we also maintain the information power of the dataset and ensure that it is suitable for regression analysis.

Table 2.

Variance Inflation Factors.

Features	VIF
Offender appearance	1.36183
Number of clients	1.30681
Total lost	1.39350
Total invested	1.27602
Commission	1.48381
Offender occupation	1.42662
Firm type	1.35413
Offender experience	1.19235
Disciplinary history	1.25322
Cost	2.18638
Quasi criminal	1.41608
COI	1.66874
ISP	1.29071
ICO	1.50678
Misconduct	1.33505
Aggravating factors	1.98048
Mitigating factors	2.00495
District council_Central Canada	5.46492
District council_Western Canada	5.17013
Type of hearing_Terminated	1.12152
Type of hearing_Trial	2.55874
Offender gender_Male	1.20968

Splitting Data for Analysis

To ensure the validity and reliability of the models, multiple data-splitting techniques were employed. A standard train/test split was performed, allocating 80% of the data for training and 20% for testing to evaluate model performance on unseen data. Additionally, five-fold cross-validation was applied to further enhance robustness by training and testing the models on different subsets of the data. These approaches ensured reliable results by reducing the risk of overfitting and providing a comprehensive assessment of model performance for both in-sample and out-of-sample data from the train/test split method and across various data samples for five-fold cross-validation.

Parameter Tuning

Table 3 shows the parameters used to optimize the model. The hyperparameters for the models were tuned manually to optimize performance, as GridSearchCV proved computationally expensive and yielded minimal improvement. For Random Forest, parameters such as n_estimators=100, max_depth=None, and min_samples_split=2 were selected to balance complexity and generalization. Similarly, the Decision Tree used a limited depth (max_depth=8) and Gini impurity as the splitting criterion. Regularization parameters (alpha=1.0) were applied for Ridge, Lasso, and Elastic Net, with Elastic Net also using l1_ratio=0.5 to balance L1 and L2 penalties. KNN utilized n_neighbors=5 with uniform weights, while the SVM employed the RBF kernel and parameters like C=1.0 and gamma='scale'. XGBoost included settings like learning_rate=0.1, n_estimators=100, and max_depth=6 for effective boosting. Manual hyperparameter tuning ensured computational efficiency while achieving robust and reliable model performance tailored to the dataset.

Table 3.

Hyperparameters.

Algorithm	Hyperparameters
Random forest	n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=42
Decision tree	max_depth=8, min_samples_split=2, criterion='gini', random_state=42
Ridge regression	alpha=1.0, solver='auto'
Lasso regression	alpha=1.0, max_iter=1000
Elastic net	alpha=1.0, l1_ratio=0.5, max_iter=1000
K-nearest neighbors	n_neighbors=5, weights='uniform', metric='minkowski'
Support vector machine	kernel='rbf', C=1.0, gamma='scale', random_state=42
XGBoost	learning_rate=0.1, n_estimators=100, max_depth=6, min_child_weight=1, subsample=0.8

Experiments and Algorithm Selection

Table 4 summarizes the experiments conducted and the algorithms employed to ensure robust and reliable results. The study incorporated four key types of experiments. First, dimensionality reduction techniques, namely PCA and SVD, were used to simplify the feature set and address multicollinearity, improving interpretability and model efficiency. Second, feature selection was performed using RFECV to identify the most predictive features, complemented by 5-fold cross-validation to validate the models across different data splits and reduce overfitting. Third, single machine learning experiments were conducted with algorithms such as Ridge, Lasso, Elastic Net, KNN, Random Forest, and Decision Tree to optimize their predictive performance. Finally, statistical models using OLS and WLS regression were implemented via Statsmodels to provide a baseline and assess performance with traditional regression techniques. These comprehensive experiments collectively ensured that the models were rigorously tested and optimized for reliability and accuracy.

Table 4.

Types of Experiments.

Algorithm type	Algorithms	Dimensionality reduction		Feature extraction and cross validation		Machine learning and statistical
Algorithm type	Algorithms	PCA	SVD	RFECV	CV	ML models	Statsmodel
Linear	OLS regression (Statsmodel)						X
	WLS regression						X
	OLS regression (SKlearn)	X	X	X	X	X
	Ridge	X	X	X	X	X
	Lasso	X	X	X	X	X
	Elastic net	X	X	X	X	X
Non-linear	KNN	X	X	X	X	X
Ensemble and tree-based	Random forest	X	X	X	X	X
Ensemble and tree-based	Decision tree	X	X	X	X	X

Dimensionality Reduction

After data preprocessing, the final dataset consisted of 21 features. To avoid redundancy, PCA and SVD were applied to reduce the dataset’s dimensionality. These feature reduction techniques improved model interpretability and efficiency by focusing on the most critical features, particularly in a high-dimensional dataset. Additionally, the use of these dimensionality reduction methods helped address potential multicollinearity issues among variables, ensuring that the models performed more robustly and reliably.

Recursive Feature Elimination with Cross-Validation

RFECV was used to identify the most predictive variables. This method systematically removed irrelevant or redundant features, retaining only those with the strongest contributions to model performance. Feature selection improved the model’s accuracy and efficiency by reducing noise and focusing on the most critical predictors.

Linear Algorithms

As seen in Table XXX, linear, non-linear, ensemble, and tree-based algorithms were used to identify patterns and provide insights on the data. Linear algorithms assume a linear relationship between the features and the target variable, making them interpretable and efficient for structured datasets. The general formula for linear algorithms is:

ŷ = B_{0} + B_{1} X_{1} + B_{2} X_{2} \dots + B_{n} X_{n}

Where,

• Ŷ = predicted value

• B₀: is the intercept

• B₁, B₂, …B_n are the coefficients

• X₁, X₂, … X_n are the independent variables

The regression models build on each other to provide a more robust and reliable understanding of the findings. OLS regression with statsmodels serves as the baseline model, estimating coefficients and the p-values by minimizing the sum of squared residuals. OLS regression assumes homoscedasticity (constant variance of errors) and independence of the data. In the presence of heteroscedasticity, WLS regression is a better choice for the data (Halunga et al., 2017; Romano & Wolf, 2017). WLS regression addresses heteroscedasticity by weighting observations differently based on their variances, which makes it more suitable for datasets where error variances vary across observations (Funke et al., 2021; Zafar & Aslam, 2023). Ridge and Lasso regression build upon the OLS and WLS models by performing L2 regularization to penalize large coefficients to reduce the risk of overfitting by keeping all the variables in the model, while Lasso adds L1 regularization, which not only prevents overfitting but also performs feature selection by shrinking some coefficients to exactly zero. Elastic Net regression combines L1 and L2 penalties, balancing the strengths of Ridge and Lasso to handle correlated features and ensure both shrinkage and feature selection of relevant predictors while mitigating multicollinearity, making it particularly effective for datasets with highly correlated or numerous features.

Non-Linear Algorithms

Non-linear algorithms capture complex relationships between the features and the target variable that cannot be modeled by linear methods. A commonly used non-linear algorithm for regression tasks is KNN. For regression tasks, KNN works by predicting the target based on the average value of the KNN. KNN was chosen for this project because it is one of the methods used in the literature on regulatory enforcement. The formula for KNN is shown in equation…

\hat{y} = \frac{1}{k} \sum_{i \in N_{k} (x)} y_{i}

Where

• N_k(x) represents the set of k-nearest neighbors of x.

Ensemble and Tree-Based Algorithms

Ensemble and tree-based algorithms are more robust at finding patterns in non-linear data and were used to improve and optimize the performance of the models. Decision Trees split data recursively based on feature values, minimizing impurity measures using the Gini impurity. The formula for the Gini impurity is shown in equation…

G = 1 - \sum_{i = 1}^{k} p_{i}^{2}

Where

• p_i the proportion of samples in class i.

Random Forests, as an extension of decision trees, were employed because they are widely used in the literature on regulatory enforcement, demonstrating strong performance on regulatory datasets (Jacob de Menezes-Neto & Clementino, 2022). In this study, Random Forests enhance the traditional decision tree model by constructing multiple trees on random subsets of the data and combining their predictions through averaging, resulting in improved accuracy and robustness. Random Forests further improve generalization by aggregating diverse models. The formula for random forest is given as:

\hat{y} = \frac{1}{T} \sum_{t = 1}^{T} \hat{y_{t}}

Where

T is the total number of trees.

Findings and Analysis

Summary Results

Table 5 highlights substantial variability in key financial variables such as “Total Lost” and “Total Invested,” as reflected by their relatively high standard deviations. Such variability points to significant differences in the economic scale of enforcement cases. While these variables may contribute meaningfully to the modeling of fines, their actual influence must be confirmed through inferential or predictive analysis rather than summary statistics alone. “Aggravating Factors” and “Mitigating Factors” provide additional context on the severity and circumstances of each offense, and their inclusion—especially as interaction terms like “Misconduct_x_Aggravating_Factors”—may enhance the model’s ability to reflect the proportionality of sanctions. In contrast, the limited variation observed in “Disciplinary History” suggests a relatively narrow distribution of past infractions, potentially reducing its influence in predictive modeling. More consistent distributions in features such as “Cost” and “Commission” indicate possible relevance for forecasting fines, although their utility depends on their performance within the model. These descriptive patterns suggest that financial loss, offense severity, and mitigating factors are key elements in modelling the severity of fines. (Table 6)

Table 5.

Summary Results.

Independent variables	Count	Mean	std	min	25%	50%	75%	max
Offender appearance	557	2.91	1.46 E + 00	1	2	3	4.00	8.00 E + 00
Number of clients	507	18.85	6.63 E + 01	0	2	6	18.00	1.19 E + 03
Total lost	444	426282.87	2.29 E + 06	0	0	0	36000.00	2.00 E + 07
Total invested	488	697704.69	4.11 E + 06	0	0	0	48004.50	4.35 E + 07
Commission	214	63347.05	2.70 E + 05	0	0	0	30250.00	3.50 E + 06
Offender occupation	556	1.29	6.18 E - 01	1	1	1	1.00	4.00 E + 00
Firm type	556	4.76	1.80 E + 00	4	4	4	4.00	9.00 E + 00
Offender experience	555	2.92	1.60 E + 00	1	2	3	4.00	9.00 E + 00
Disciplinary history	473	0.02	1.44 E - 01	0	0	0	0.00	1.00 E + 00
Cost	559	4210.20	3.56 E + 03	0	2500	2500	5000.00	3.00 E + 04
Quasi criminal	566	0.10	2.94 E - 01	0	0	0	0.00	1.00 E + 00
COI	566	0.37	6.40 E - 01	0	0	0	1.00	3.00 E + 00
ISP	566	1.41	1.31 E + 00	0	0	1	2.00	7.00 E + 00
ICO	566	0.26	5.20 E - 01	0	0	0	0.00	2.00 E + 00
Misconduct	566	0.80	1.05 E + 00	0	0	0	1.00	5.00 E + 00
Aggravating factors	566	1.44	2.07 E + 00	0	0	1	2.00	9.00 E + 00
Mitigating factors	566	3.80	2.66 E + 00	0	1	4	6.00	1.10 E + 01
Total_lost_x_Aggravating_Factors	444	887220.83	5.27 E + 06	0	0	0	1.57	6.66 E + 07
Experience_x_Firm_type	554	13.92	9.62 E + 00	4	8	12	18.00	8.10 E + 01
Total_invested_x_Quasi_Criminal	488	29978.83	2.78 E + 05	0	0	0	0.00	3.79 E + 06
Misconduct_x_Aggravating_Factors	566	1.81	4.19 E + 00	0	0	0	1.00	3.00 E + 01
Misconduct_x_Mitigating_Factors	566	2.01	3.66 E + 00	0	0	0	3.00	2.00 E + 01

Table 6.

Results of the OLS Regression.

Independent variables	coef	std err	t	P>\|t\|	[0.025	0.975]
Const	7.5808	0.263	28.783	0.000	7.063	8.098
Offender appearance	−0.6419	0.199	−3.224	0.001	−1.033	−0.251
Number of clients	−0.6439	0.244	−2.64	0.008	−1.12	−0.168
Total lost	−0.0034	0.011	−0.323	0.747	−0.024	0.018
Total invested	0.0419	0.015	2.716	0.007	0.012	0.072
Commission	0.0562	0.015	3.626	0.000	0.026	0.087
Offender occupation	−0.5603	0.202	−2.773	0.006	−0.948	−0.173
Firm type	−0.0207	0.078	−0.266	0.79	−0.173	0.132
Offender experience	−0.4039	0.146	−2.773	0.006	−0.672	−0.135
Disciplinary history	−3.4039	0.951	−3.58	0.000	−5.272	−1.535
Cost	1.0277	0.205	5.021	0.000	0.627	1.428
Quasi criminal	0.4362	0.206	2.114	0.035	0.032	0.841
COI	0.0826	0.18	0.458	0.647	−0.271	0.437
ISP	0.8996	0.206	4.369	0.000	0.495	1.304
ICO	0.0154	0.046	0.335	0.738	−0.075	0.105
Misconduct	−0.0542	0.042	−1.285	0.199	−0.136	0.028
Aggravating factors	0.1252	0.205	0.611	0.541	−0.277	0.528
Mitigating factors	−0.8889	0.362	−2.457	0.014	−1.585	−0.193
District council_Central Canada	−0.6074	0.557	−1.09	0.276	−1.702	0.487
District council_Western Canada	0.1252	0.412	0.304	0.761	−0.585	0.835
Type of hearing_Terminated	−3.3573	1.273	−2.637	0.008	−5.859	−0.856
Type of hearing_Trial	1.1653	0.38	3.063	0.002	0.418	1.913
Offender gender_Male	0.3911	0.371	1.054	0.293	−0.338	1.12

Dep. Variable: Fines.

R-squared: 0.389.

Model: OLS

Adj. R-squared: 0.361.

Method: Least squares

F-statistic: 13.70.

Prob (F-statistic): 3.65e-38.

Log-likelihood: −1221.4.

No. Observations: 496

AIC: 2489.

Df residuals: 473

BIC: 2586.

Df model: 22.

Covariance type: Nonrobust.

Baseline Statistical OLS Models

The baseline OLS model exhibits moderate explanatory power, with an R² of 0.389 and an adjusted R² of 0.361, indicating that about 36.1% of the variance in fines is explained by the predictors. The model’s F-statistic of 13.70 (p < .001) confirms its overall statistical significance. Key financial predictors such as Commission, Hearing Cost, Total Invested, and Number of Clients show strong associations with fine levels, reflecting their substantive importance in enforcement decisions. Offense characteristics including Quasi-criminal (p < .035) and Improper Sales Practices (ISP) (p < .001), along with offender traits like Appearance (p < .001), Occupation, Experience, and Disciplinary History (all p < .006), also emerge as significant. Contextual and procedural elements such as Mitigating Factors (p < .014) and specific hearing types—Terminated (p < .008) and Trial (p < .002)—are similarly influential, highlighting the importance of legal context and circumstances in fines imposed on offenders.

On the other hand, Firm Type, Misconduct, and Aggravating Factors lack statistical significance. The non-significance of Aggravating Factors is unexpected given their theoretical relevance to offense severity, suggesting they may be inconsistently applied or less quantifiable in practice. Their influence may be overshadowed by variables more easily measured, such as costs or disciplinary history. The model’s overall fit remains limited; only 40% of the variance in fines is explained by the predictors. A drop to 36% in adjusted R² reflects the penalty for irrelevant predictors. These results point to potential heteroscedasticity, where variance in errors may differ across observations, possibly affecting the reliability of coefficient estimates.

To check for heteroscedasticity, we conducted a Breusch-Pagan test. The results revealed a test statistic of 56.62 and a p-value of 6.96 × 10⁻⁵, which is well below the common significance threshold of 0.05. Based on these findings, we reject the null hypothesis of homoscedasticity in the OLS regression model, confirming the presence of heteroscedasticity. The low p-value further suggests that the variance of the residuals is not constant across observations, which may impact the reliability of the coefficient estimates. To address heteroscedasticity, we employed WLS regression.

Table 7 shows the results of the WLS model. The WLS model demonstrates a significant improvement in explanatory power compared to the baseline OLS model. The R² and adjusted R² values both increase dramatically to 0.997, indicating that 99.7% of the variance in fines is now explained by the predictors in the model. This marks a substantial improvement from the OLS model’s adjusted R² value of 0.361. The F-statistic is 8417, with a highly significant p-value (p < .001), confirming the overall strength and significance of the WLS model.

Table 7.

Shows the Results of the WLS Model.

Independent variables	Coef	Std err	t	P>\|t\|	[0.025	0.975]
Const	7.875	0.041	193.027	0.000	7.795	7.955
Offender appearance	−0.5682	0.025	−22.353	0.000	−0.618	−0.518
Number of clients	0.042	0.009	4.594	0.000	0.024	0.06
Total lost	−0.0027	0.004	−0.774	0.439	−0.010	0.004
Total invested	−0.0156	0.001	−17.197	0.000	−0.017	−0.014
Commission	0.0623	0.003	19.213	0.000	0.056	0.069
Offender occupation	0.0462	0.01	4.45	0.000	0.026	0.067
Firm type	−0.0212	0.007	−3.116	0.002	−0.035	−0.008
Offender experience	0.3862	0.018	21.316	0.000	0.351	0.422
Disciplinary history	−2.3746	0.316	−7.515	0.000	−2.996	−1.754
Cost	1.008	0.03	34.004	0.000	0.95	1.066
Quasi criminal	0.1963	0.057	3.47	0.001	0.085	0.307
COI	0.056	0.01	5.872	0.000	0.037	0.075
ISP	0.0721	0.008	9.209	0.000	0.057	0.088
ICO	0.0054	0.006	0.848	0.397	−0.007	0.018
Misconduct	−0.0188	0.016	−1.154	0.249	−0.051	0.013
Aggravating factors	0.2314	0.02	11.477	0.000	0.192	0.271
Mitigating factors	−0.7337	0.039	−19.054	0.000	−0.809	−0.658
District council_Central Canada	−0.5978	0.094	−6.349	0.000	−0.783	−0.413
District council_Western Canada	0.0712	0.024	2.942	0.003	0.024	0.119
Type of hearing_Terminated	−3.5533	1.62	−2.194	0.029	−6.736	−0.371
Type of hearing_Trial	0.9879	0.04	24.858	0.000	0.910	1.066
Offender gender_Male	0.3113	0.045	6.901	0.000	0.223	0.400

Dep. Variable: Fines.

R-squared: 0.997.

Model: WLS.

Adj. R-squared: 0.997.

Method: Least squares.

F-statistic: 8417.

Prob (F-statistic): 0.00.

Log-likelihood: −724.87.

No. Observations: 496

AIC: 1496.

Df residuals: 473

BIC: 1592.

Df model: 22.

Covariance type: Nonrobust.

Several predictors remain statistically significant in both the WLS and OLS models, including Commission (p < .001), Cost (p < .001), Offender Experience (p < .001), Mitigating Factors (p < .001), and Quasi-Criminal offense (p < .001). The consistency in these features highlights their contribution to explaining fines across models. Offense-related variables such as Improper Sales Practices (ISP) (p < .001) and the type of hearing, in this case, Trial (p < .001), remain significant predictors of fines.

There were deviations with features from the OLS to the WLS model. Aggravating Factors, which were not significant in the OLS model, become highly significant (p < .001) in the WLS model. Firm type (p < .002) and District Council_Western Canada (p < .003) are newly significant in the WLS model, suggesting regional and organizational differences in determining fines. While Total Invested is significant in the WLS model (p < .001), it had a weaker influence in the OLS model, indicating that the variability in this feature’s impact was better captured with the heteroscedasticity adjustment. The significance of Offender Occupation (p < .000) also improves in the WLS model. These findings suggest that accounting for heteroscedasticity enhances the predictive power of this variable, indicating its importance in influencing fines. Conversely, Misconduct and Internal Control Offenses (ICO) remain statistically insignificant in both models, suggesting that they do not meaningfully contribute to explaining the variance in fines. The increase in adjusted R² and the changes in the significance of key predictors emphasize the importance of using a model that accounts for non-constant variance in residuals, ensuring more accurate coefficient estimates and better interpretation of results.

Figure 2 presents the results of the WLS regression model. The standardized coefficients highlight the relative importance and direction of influence for each predictor in determining fines. Type of hearing_Terminated and disciplinary history emerge as the most influential negative predictors, with large negative coefficients and statistically significant p-values (p < .01). These findings suggest a strong inverse relationship between these variables and the fines imposed. Specifically, the large negative coefficients indicate that when cases are resolved as terminated hearings or when offenders have no prior disciplinary history, the fines tend to decrease significantly. One possible explanation is that terminated hearings, often resolved quickly, may involve less severe cases or procedural dismissals, leading to lower penalties. Likewise, offenders without a history of misconduct may receive more lenient treatment, reflecting a regulatory approach that takes mitigating circumstances into account when determining fines.

Figure 2.

Standardized coefficient for regression model.

Variables such as mitigating factors, district council_Central Canada, and offender appearance also display significant negative coefficients, indicating that these factors play a lesser role in increasing fines. Specifically, their negative coefficients suggest that the presence of mitigating factors, cases heard in Central Canada, or the appearance of the offender at the hearing are associated with lower fines. These findings may reflect the consideration given to mitigating factors in penalty impositions, regional variations, or the perception that offenders appearing at hearings may demonstrate cooperation, potentially influencing the fines imposed by the district council.

Results from Machine Learning Models

Table 8 presents the RMSEs from the respective experiments. The analysis of RMSE results, relative to the mean (8.6) and standard deviation (3.6) of fines, provides critical insights into the accuracy and performance of the predictive models. Across all algorithms and experiments, the RMSEs range between 2.6 and 4.5, which are substantially lower than the mean of fines. These findings indicate that the models generally deliver accurate predictions. Moreover, most RMSE values are close to or below the standard deviation of fines, highlighting that the models effectively capture the natural variability within the data.

Table 8.

Results of the Machine Learning Models With Sklearn.

Algorithms	Single algorithms	PCA	SVD	RFECV	CV
Multiple regression	4.30	3.50	3.54	4.30	3.0
Ridge	4.20	3.50	3.54	4.29	3.0
Lasso	3.00	3.54	3.54	3.81	3.0
Elastic net	4.00	3.54	3.54	2.99	3.0
Random forest	2.60	3.81	3.81	4.21	3.0
Decision tree	4.00	4.06	3.67	4.36	3.0
KNN	4.50	3.88	3.89	4.35	3.0

Among the single-algorithm approaches, Random Forest achieves the best performance, with the lowest RMSE of 2.6. Similarly, Elastic Net with RFECV achieves the lowest RMSE overall at 2.99, demonstrating the effectiveness of dimensionality reduction and feature selection in enhancing model performance. The RMSE values for these models, being significantly below the standard deviation, suggest high precision and proportionality in their predictions. In contrast, KNN and Decision Tree consistently produce higher RMSE values across all experiments, indicating they may be less effective for this dataset.

Dimensionality reduction techniques, such as PCA and SVD, exhibit lower RMSE means relative to the fines, but some variability is observed with certain algorithms. Specifically, Random Forest, Decision Trees, and KNN show slightly higher RMSEs in relation to the standard deviation of fines, suggesting that these methods may not fully leverage the dimensionality reduction benefits for certain models. The results from the cross-validation experiments further validate the robustness and reliability of the models. Across all algorithms, the cross-validation experiments yield the most consistent RMSE scores, clustering around 3.0. These RMSE values are well below the mean and standard deviation of fines, indicating that the models generalize effectively to unseen data. The low RMSE values further affirm that the predictive models are reliable and capable of forecasting fines proportionate to actual values, making them suitable for practical application and deployment.

Feature Relevance

Given that Random Forest was the best predictor and that it has its own inbuilt feature relevance mechanism, it was used to identify the most important features contributing to the prediction of fines. Figure 3 shows the feature relevance derived from the Random Forest model, highlighting which variables have the greatest influence on fines. Notably, Cost, Total Invested, Number of Clients Affected, Offender Appearance, and Commissions Earned from the Fraud emerge as the top predictors, indicating their substantial contribution to the imposition of fines. These features represent tangible financial impacts and procedural considerations directly tied to the severity and scale of offenses. On the other hand, features such as ICO, COI, Misconduct, and Aggravating Factors had negligible effects on the prediction, suggesting that their role in determining fines is limited or overshadowed by more quantifiable variables (M. Lokanan & Masannagari, 2021). These findings underscore the importance of focusing on financially and procedurally significant factors when predicting regulatory penalties.

Figure 3.

Feature relevance.

Discussion and Conclusions

The findings underscore the importance of effective regulatory enforcement as a key aspect of financial service management. The superior explanatory power of the WLS model over the OLS model highlights the need for sophisticated analytical techniques to ensure consistency in fine allocation. Addressing heteroscedasticity and assigning appropriate weights to relevant predictors increased the adjusted R² from 0.361 in OLS to 0.997 in WLS, demonstrating the value of using weighted models for complex regulatory datasets (Halunga et al., 2017). Consistently significant predictors across models, such as Cost, Commission, Offender Experience, and Type of Hearing_Trial, reinforce their critical role in fine determination. However, additional significant predictors, including Aggravating Factors, the number of clients affected, and fraud-related financial losses, became evident only in the WLS model, suggesting the necessity of advanced methodologies to uncover hidden patterns in enforcement decisions (Funke et al., 2021; Zafar & Aslam, 2023).

Regulatory enforcement functions as a critical service provided by SROs to ensure compliance and maintain market integrity. The findings suggest that Cost incurred in investigations and Type of Hearing_Trial are the most influential positive predictors of fines, aligning with deterrence theory’s emphasis on ensuring sanctions are severe, certain, and proportional (Abramovaite et al., 2023; Piquero et al., 2011). The emphasis on investigation costs and the type of hearing reflects an effort by the MFDA to streamline enforcement processes, potentially enhancing the swiftness of sanctions and reinforcing the celerity dimension of deterrence theory. A more expedited enforcement process may enhance the perceived swiftness of punishment, thereby contributing to the deterrent effect. However, while these features reflect procedural efficiency, their relevance to deterrence theory lies in their influence on the timing and visibility of sanctions rather than in the direct severity or certainty of punishment. Similarly, Offender Experience, Offender Gender_Male, and Quasi-Criminal offenses significantly impact fines, implying that penalties are designed to discourage experienced offenders and address gender disparities in enforcement. These patterns align with the service management principle of fairness, ensuring that regulatory enforcement is predictable and transparent (Buckenmaier et al., 2021; Davies & Malik, 2022; Matsueda et al., 2006).

However, inconsistencies remain. Factors such as Total Lost, ICO, and Misconduct exhibit negligible influence, while interactions between Total Lost and Aggravating Factors, though statistically significant, have limited practical impact. The inconsistent application of these variables challenges the certainty and proportionality components of deterrence theory (Matsueda et al., 2006; Paternoster, 1989). These findings suggest that regulatory enforcement, as a service, does not uniformly apply penalties in ways that reflect the severity of financial misconduct.

One of the most pressing concerns is whether fines are severe enough to outweigh the perceived benefits of noncompliance. The significant role of Cost and Commission in fine determination suggests that penalties may be designed to recover losses rather than act as effective punitive measures. These findings align with critiques that regulatory sanctions often fail to impose adequate deterrents for serious violations (Davies & Malik, 2022; Earnhart & Friesen, 2023; Spalding, 2014). Quasi-Criminal offenses, which should result in the most severe penalties, had limited predictive power in fine determination, suggesting regulatory leniency and a failure to align enforcement with the principles of deterrence and proportionality (Anand, 2018; Boyle et al., 2024). The risk of offenders perceiving fines as manageable business costs rather than meaningful deterrents undermines the deterrent effect of financial penalties, raising concerns about whether fine structures require recalibration to ensure compliance and market stability.

These findings reinforce the need for data-driven, transparent, and consistent regulatory frameworks in the service management of financial oversight. Machine learning models provide a means to improve regulatory decision-making by identifying key predictors of fines, exposing inconsistencies, and enhancing fairness in enforcement practices. Regulatory bodies should leverage these models to refine their fine structures, ensuring penalties align with deterrence objectives and industry expectations. Implementing machine learning into enforcement decision-making can increase accountability and promote confidence in the regulatory system, reinforcing its role as an essential service in financial markets.

Conclusion and Future Directions

Through various experiments, we explored the application of machine learning algorithms to predict fines imposed by the MFDA. Specifically, we assessed the proportionality and severity of fines to determine their effectiveness as deterrents to securities law violations. The findings revealed that the statistical approach using WLS and the machine learning approach, particularly Random Forest, achieved high predictive accuracy, effectively identifying the costs incurred during the investigation, trial hearing, the total amount invested, and the number of clients affected as significant predictors of fines. These features appear to have influenced the amount of fines imposed across both the statistical and machine learning approaches. However, the findings also revealed discrepancies, with factors like Quasi-Criminal offenses and Aggravating Factors—typically associated with severity—having limited predictive power. These findings raise concerns about the proportionality of fines in certain cases and their ability to uphold deterrence principles.

The findings from this study provide critical insights into the imposition of fines by SROs in the securities industry, particularly regarding their deterrence objectives to hold offenders accountable to law and ethics standards. Deterrence theory underscores the need for penalties to be proportional, consistent, and sufficiently severe to discourage future misconduct. While features such as costs and the amount invested suggest a focus on restitution and punishment, the limited influence of offenses with criminal elements (i.e., Quasi-Criminal) raises questions about whether fines are applied in a manner that reflects the seriousness of violations. The inconsistencies in fines imposed by the MFDA’s tribunals may undermine the certainty and fairness of regulatory sanctions, thereby reducing their deterrent effect.

Limitations and Future Research

A key limitation of this study is the inability to assess the swiftness, or celerity, of punishment. Measuring celerity would require access to timestamped data such as the dates of offense, investigation, hearing, and final sanction, which were not available in the MFDA dataset. Without timestamped data, the analysis cannot capture how quickly sanctions follow violations, leaving a gap in understanding how timing affects deterrence. Future research should seek to obtain complete procedural records that include time intervals to examine whether delays in enforcement diminish the perceived certainty or effectiveness of sanctions.

A further limitation concerns the scope of the dataset, which includes only enforcement cases formally adjudicated by the MFDA. A substantial number of securities violations may go unreported or be resolved internally by firms or through informal settlements, leaving them absent from the available data. The absence of these cases constrains the generalizability of the findings and may result in an underestimation of the true extent of misconduct in the industry. Future research could address this limitation by incorporating data from other regulatory bodies and including non-adjudicated cases to capture a more complete picture of enforcement practices.

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding was receuved from the Social Sciences and Humanities Research Council of Canada (200775).

ORCID iD

Mark E. Lokanan

References

Abramovaite

Bandyopadhyay

Bhattacharya

Cowen

(2023). Classical deterrence theory revisited: An empirical analysis of police force areas in England and wales. European Journal of Criminology, 20(5), 1663–1680. https://doi.org/10.1177/14773708211072415

Agranov

Buyalskaya

(2022). Deterrence effects of enforcement schemes: An experimental study. Management Science, 68(5), 3573–3589. https://doi.org/10.1287/mnsc.2021.4036

Alarie

Niblett

Yoon

(2016). Using machine learning to predict outcomes in tax law. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2855977

Anand

(2018). The enforcement of financial market crimes in Canada and the United Kingdom. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3333163

Auriol

Hjelmeng

Soreide

(2022). Corporate criminals in a market context: Enforcement and optimal sanctions. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4175886

Bhambhoria

Dahan

Zhu

(2021). Investigating the state-of-the-art performance and explainability of legal judgment prediction. In Canadian Artificial Intelligence Association (CAIAC). Canadian AI 2021. https://doi.org/10.21428/594757db.a66d81b6

Boyle

Green

Laing

Morris

Murray

Reichelt

Seida

Szirmak

(2024). Five notable Canadian securities Litigation decisions from 2023. Blakes. https://www.blakes.com/insights/five-notable-canadian-securities-litigation-decisions-from-2023/

Brockman

Mather

McEwen

C. A.

Maiman

R. J.

(2002). Divorce lawyers at work: Varieties of professionalism in practice. Canadian Journal of Law and Society, 17(2), 194–198. https://doi.org/10.1017/S0829320100007353

Buckenmaier

Dimant

Posten

A.-C.

Schmidt

(2021). Efficient institutions and effective deterrence: On timing and uncertainty of formal sanctions. Journal of Risk and Uncertainty, 62(2), 177–201. https://doi.org/10.1007/s11166-021-09352-x

10.

Canadian Broadcasting Corporation . (2020). Canadian securities violations led to $45M in fines, about 19 years jail time in past year. Canadian Broadcasting Corporation. https://www.cbc.ca/news/business/canadian-securities-administrator-fines-jail-1.5623462

11.

Canadian Broadcasting Corporation . (2021). Victims of Ponzi scheme call for reopening of probe into offshore money transfers. Canadian Broadcasting Corporation. https://www.cbc.ca/news/victims-fraud-reopen-inquiry-finance-committee-1.5999494

12.

Chikodili

N. B.

Abdulmalik

M. D.

Abisoye

O. A.

Bashir

S. A.

(2021). Outlier detection in multivariate time series data using a fusion of K-Medoid, standardized Euclidean distance and Z-score. In Misra

Muhammad-Bello

(Eds.), Information and communication technology and applications (1350, pp. 259–271). Springer International Publishing. https://doi.org/10.1007/978-3-030-69143-1_21

13.

CIRO . (2024). About Canadian investment regulatory organization. Canadian Investment Regulatory Organization. https://www.ciro.ca/about-ciro

14.

Cismondi

Fialho

A. S.

Vieira

S. M.

Reti

S. R.

Sousa

J. M. C.

Finkelstein

S. N.

(2013). Missing data in medical databases: Impute, delete or classify? Artificial Intelligence in Medicine, 58(1), 63–72. https://doi.org/10.1016/j.artmed.2013.01.003

15.

Davies

Malik

(2022). Challenging existing regulatory approaches for White-Collar and corporate crimes. Journal of White Collar and Corporate Crime, 3(1), 3–6. https://doi.org/10.1177/2631309X211056378

16.

Earnhart

Friesen

(2023). Certainty of punishment versus severity of punishment: Enforcement of environmental protection laws. Land Economics, 99(2), 245–264. https://doi.org/10.3368/le.030521-0024R1

17.

Fonseca

de G.

(2023). A framework proposal to preview pecuniary fines for cyber crime using Brazilian res judicata processes and XGBoost algorithm. Authorea. https://doi.org/10.36227/techrxiv.20866024.v1

18.

Funke

S. K. I.

Sperling

Karst

(2021). Weighted linear regression improves accuracy of quantitative elemental bioimaging by means of LA-ICP-MS. Analytical Chemistry, 93(47), 15720–15727. https://doi.org/10.1021/acs.analchem.1c03630

19.

Ghafoor

Zainudin

Mahdzan

N. S.

(2022). Factors eliciting corporate fraud in emerging markets: Case of firms subject to enforcement actions in Malaysia. In Martin

Shilton

Smith

(Eds.), Business and the ethical implications of technology (pp. 281–302): Springer. https://doi.org/10.1007/978-3-031-18794-0_15

20.

Halunga

A. G.

Orme

C. D.

Yamagata

(2017). A heteroskedasticity robust Breusch–Pagan test for Contemporaneous correlation in dynamic panel data models. Journal of Econometrics, 198(2), 209–230. https://doi.org/10.1016/j.jeconom.2016.12.005

21.

Heitkamp

Mowen

T. J.

(2024). The influence of formal and informal sanctions on offending: The moderating role of legal cynicism. Crime & Delinquency, 70(9), 2488–2513. https://doi.org/10.1177/00111287231165214

22.

Homer

E. M.

Maume

M. O.

(2024). The deterrent effect of federal corporate prosecution agreements: An exploratory analysis. Journal of White Collar and Corporate Crime, 5(1), 15–27. https://doi.org/10.1177/2631309X221120003

23.

Jacob de Menezes-Neto

Clementino

M. B. M.

(2022). Using deep learning to predict outcomes of legal appeals better than human experts: A study with data from Brazilian federal courts. PLoS One, 17(7), Article e0272287. https://doi.org/10.1371/journal.pone.0272287

24.

Kahan

D. M.

(1997). Social influence, social meaning, and deterrence. Virginia Law Review, 83(2), 349. https://doi.org/10.2307/1073780

25.

Kim

Voiklis

Malle

B. F.

(2019). Modern moral judgments show traces of both ancient and culturally recent sanctioning systems. https://doi.org/10.31219/osf.io/gqar9

26.

Lokanan

Masannagari

(2021). Investigating aggravating & mitigating factors considered by IIROC in penalty imposition. International Review of Public Administration, 26(3), 270–290. https://doi.org/10.1080/12294659.2021.1966202

27.

Lokanan

Sharma

(2024). The use of machine learning algorithms to predict financial statement fraud. The British Accounting Review, 56(6), Article 101441. https://doi.org/10.1016/j.bar.2024.101441

28.

Lokanan

M. E.

(2023). Incorporating machine learning in dispute resolution and settlement process for financial fraud. Journal of Computational Social Science, 6(2), 515–539. https://doi.org/10.1007/s42001-023-00202-1

29.

Lokanan

M. E.

Sharma

(2022). Fraud prediction using machine learning: The case of investment advisors in Canada. Machine Learning with Applications, 8, Article 100269. https://doi.org/10.1016/j.mlwa.2022.100269

30.

Makkai

Braithwaite

(1994). The dialectics of corporate deterrence. Journal of Research in Crime and Delinquency, 31(4), 347–373. https://doi.org/10.1177/0022427894031004001

31.

Markou

Deakin

S. F.

(2019). Ex machina lex: The limits of legal computability. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3407856

32.

Matsueda

R. L.

Kreager

D. A.

Huizinga

(2006). Deterring delinquents: A rational choice model of theft and violence. American Sociological Review, 71(1), 95–122. https://doi.org/10.1177/000312240607100105

33.

Medvedeva

Vols

Wieling

(2020). Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law, 28(2), 237–266. https://doi.org/10.1007/s10506-019-09255-y

34.

Metsker

Trofimov

Petrov

Butakov

(2019). Russian court decisions data analysis using distributed computing and machine learning to improve lawmaking and law enforcement. Procedia Computer Science, 156, 264–273. https://doi.org/10.1016/j.procs.2019.08.202

35.

Možina

Žabkar

Bench-Capon

Bratko

(2005). Argument based machine learning applied to law. Artificial Intelligence and Law, 13(1), 53–73. https://doi.org/10.1007/s10506-006-9002-4

36.

Mulder

L. B.

(2018). When sanctions convey moral norms. European Journal of Law and Economics, 46(3), 331–342. https://doi.org/10.1007/s10657-016-9532-5

37.

Natarajan

(Ed.), (2016). Crime opportunity theories: Routine activity, rational choice and their variants. Routledge. https://doi.org/10.4324/9781315095301

38.

Osei Bonsu

(2020). Understanding the benefits, demerits and criticisms of the revolution of computational analysis and artificial intelligence in law. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3630932

39.

Ozkara

B. B.

Karabacak

Kotha

Cristiano

B. C.

Wintermark

Yedavalli

V. S.

(2023). Development of machine learning models for predicting outcome in patients with distal medium vessel occlusions: A retrospective study. Quantitative Imaging in Medicine and Surgery, 13(9), 5815–5830. https://doi.org/10.21037/qims-23-154

40.

Paternoster

(1989). Decisions to participate in and desist from four types of common delinquency: Deterrence and the rational choice perspective. Law & Society Review, 23(1), 7–40. https://doi.org/10.2307/3053879

41.

Piliavin

Gartner

Thornton

Matsueda

R. L.

(1986). Crime, deterrence, and rational choice. American Sociological Review, 51(1), 101. https://doi.org/10.2307/2095480

42.

Piquero

A. R.

Paternoster

Pogarsky

Loughran

(2011). Elaborating the individual difference component in deterrence theory. Annual Review of Law and Social Science, 7(1), 335–360. https://doi.org/10.1146/annurev-lawsocsci-102510-105404

43.

Roche

S. P.

Wilson

Pickett

J. T.

(2020). Perceived control, severity, certainty, and emotional fear: Testing an expanded model of deterrence. Journal of Research in Crime and Delinquency, 57(4), 493–531. https://doi.org/10.1177/0022427819888249

44.

Romano

J. P.

Wolf

(2017). Resurrecting weighted least squares. Journal of Econometrics, 197(1), 1–19. https://doi.org/10.1016/j.jeconom.2016.10.003

45.

Rorie

West

(2022). Can “focused deterrence” produce more effective ethics codes? An experimental study. Journal of White Collar and Corporate Crime, 3(1), 33–45. https://doi.org/10.1177/2631309X20940664

46.

Rosili

N. A. K.

Hassan

Zakaria

N. H.

Kasim

Rose

F. Z. C.

Sutikno

(2021). A systematic literature review of machine learning methods in predicting court decisions. IAES International Journal of Artificial Intelligence, 10(4), 1091. https://doi.org/10.11591/ijai.v10.i4.pp1091-1102

47.

Ruohonen

Hjerppe

(2020). Predicting the amount of GDPR fines. https://doi.org/10.48550/ARXIV.2003.05151

48.

Russell

Cheng

(2019). A critical analysis of securities crime in Canada. Canadian Journal of Criminology and Criminal Justice, 61(1), 86–104. https://doi.org/10.3138/cjccj.2017-0037

49.

Salmerón Gómez Roman , et al. (2016). “Collinearity diagnostic applied in ridge estimation through the variance inflation factor.” Journal of Applied Statistics, 43(10), 1831–1849.

50.

Shelar

Moharir

(2021). Predicting outcomes of court judgments—a machine learning approach. In 2021 International Conference on Intelligent Technologies (CONIT) (pp. 1–6). Hubli, India. https://doi.org/10.1109/CONIT51480.2021.9498385

51.

Sheldon

K. M.

Krieger

L. S.

(2014). Service job lawyers are happier than money job lawyers, despite their lower income. The Journal of Positive Psychology, 9(3), 219–226. https://doi.org/10.1080/17439760.2014.888583

52.

Spalding

A. B.

(2014). Restorative justice for multinational corporations. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2403930

53.

Surden

(2021). Machine learning and law: An overview. In Vogl

(Ed.), Research handbook on big data law. Edward Elgar Publishing. https://doi.org/10.4337/9781788972826.00014

54.

Tuch

(2014). The self-regulation of investment bankers. Washington Law Review, 83.

55.

Wang

Song

(2019). The deterrence effect of a penalty for environmental violation. Sustainability, 11(15), 4226. https://doi.org/10.3390/su11154226

56.

Wen

(2024). A study of legal judgment prediction based on deep learning multi-fusion models—data from China. Sage Open, 14(3), Article 21582440241257680. https://doi.org/10.1177/21582440241257682

57.

Zafar

Aslam

(2023). An adaptive weighted least squares ratio approach for estimation of heteroscedastic linear regression model in the presence of outliers. Communications in Statistics - Simulation and Computation, 52(6), 2365–2375. https://doi.org/10.1080/03610918.2021.1907408

Using Machine Learning to Predict Regulatory Fines: Enhancing Deterrence in White-Collar Financial Crime Enforcement

Abstract

Keywords

Introduction

Literature Review

Regulatory Enforcement and the Case for Data-Driven Fairness in Securities Penalties

Machine Learning in Predictive Regulatory and Legal Applications

The Use of “Prediction” in Machine Learning Models

Theoretical Framework: Deterrence Theory

Experimental Setting

Data Source

Variables and Measurements

Data Cleaning and Preprocessing

Addressing Missing Values

Variable Encoding and Feature Engineering

Outlier Detection and Treatment

Multicollinearity and Variation Inflation Factor

Splitting Data for Analysis

Parameter Tuning

Experiments and Algorithm Selection

Dimensionality Reduction

Recursive Feature Elimination with Cross-Validation

Linear Algorithms

Non-Linear Algorithms

Ensemble and Tree-Based Algorithms

Findings and Analysis

Summary Results

Baseline Statistical OLS Models

Results from Machine Learning Models

Feature Relevance

Discussion and Conclusions

Conclusion and Future Directions

Limitations and Future Research

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

References