Machine Learning and Artificial Intelligence in Modeling

Expert-defined terms from the Advanced Certificate in Model Risk Management (Germany) course at London School of Business and Administration. Free to read, free to share, paired with a professional course.

Machine Learning and Artificial Intelligence in Modeling

Algorithmic Bias – Unintended systematic error introduced by a model that… #

Related terms: fairness, discrimination, ethical AI. Explanation: Bias can arise from skewed training data, inappropriate feature selection, or model design choices. Example: A credit‑scoring model that underestimates the creditworthiness of applicants from a specific region because the historical data reflects past lending discrimination. Practical application: Model risk managers must audit data sources, test for disparate impact, and apply mitigation techniques such as re‑weighting or adversarial debiasing. Challenges: Detecting subtle bias, balancing fairness with predictive performance, and complying with emerging regulations (e.g., EU AI Act).

AUC‑ROC – Area Under the Receiver Operating Characteristic curve, a scala… #

Related terms: confusion matrix, precision‑recall, threshold analysis. Explanation: The ROC curve plots true‑positive rate against false‑positive rate at various threshold settings; the AUC summarises the overall performance regardless of a specific cutoff. Example: An insurance fraud detection model with an AUC of 0.87 indicates strong discrimination. Practical application: Used in model validation to compare alternative models and to set optimal decision thresholds. Challenges: AUC can be misleading with highly imbalanced data; alternative metrics may be required.

AutoML – Automated Machine Learning, a suite of tools that automate model… #

Related terms: pipeline automation, Neural Architecture Search, model selection. Explanation: AutoML platforms generate multiple candidate models, evaluate them on validation data, and recommend the best performing pipeline. Example: A bank uses an AutoML service to quickly prototype a churn‑prediction model, reducing development time from weeks to days. Practical application: Accelerates experimentation in model risk management, especially for non‑technical stakeholders. Challenges: Black‑box nature of generated pipelines can hinder interpretability and governance; oversight is needed to ensure compliance with internal model policies.

Batch Learning – A learning paradigm where the model is trained on the en… #

Related terms: offline training, retraining schedule, static model. Explanation: Batch learning assumes the data distribution is relatively stable during training; the model is retrained periodically to incorporate new information. Example: A loan‑approval scoring model is retrained quarterly using the latest credit bureau data. Practical application: Suitable for environments where data arrives in large, infrequent batches and the cost of continuous updates is prohibitive. Challenges: May lag behind rapid market changes, leading to model drift; requires robust version control and impact analysis before deployment.

Bayesian Networks – Probabilistic graphical models that represent variabl… #

Related terms: probabilistic inference, dag, causal modeling. Explanation: Nodes correspond to random variables; edges encode conditional probability relationships, enabling the computation of joint distributions and posterior updates given evidence. Example: A risk model for operational losses uses a Bayesian network to capture dependencies between fraud incidents, internal controls, and audit findings. Practical application: Facilitates scenario analysis, sensitivity testing, and incorporation of expert judgment. Challenges: Structure learning can be computationally intensive; specifying accurate conditional probability tables often requires domain expertise.

Black‑Box Model – A model whose internal logic is opaque or difficult to… #

Related terms: interpretability, model transparency, explainable AI. Explanation: While black‑box models may deliver high predictive accuracy, their lack of explainability raises concerns for regulatory scrutiny and stakeholder trust. Example: A deep‑learning model predicts market volatility but provides no insight into which features drive the forecast. Practical application: Used when performance outweighs interpretability, especially in high‑frequency trading. Challenges: Requires post‑hoc explanation techniques (e.g., SHAP, LIME) and rigorous validation to satisfy model risk policies.

Calibration – The process of aligning predicted probabilities with observ… #

Related terms: probability scaling, reliability diagram, Platt scaling. Explanation: A well‑calibrated model outputs a 70% probability of default that, over many cases, corresponds to an actual default rate of roughly 70%. Example: A logistic regression model for credit risk is calibrated using isotonic regression to correct systematic under‑estimation of risk. Practical application: Critical for risk‑based pricing and capital allocation where probability estimates directly affect decisions. Challenges: Calibration may degrade predictive power; maintaining calibration over time requires monitoring and periodic recalibration.

Concept Drift – The phenomenon where the statistical properties of the ta… #

Related terms: data drift, model decay, adaptive learning. Explanation: Drift can be gradual (e.g., shifting consumer preferences) or abrupt (e.g., regulatory changes). Detection methods include monitoring performance metrics and statistical tests on feature distributions. Example: A fraud detection model experiences a sudden increase in false negatives after a new payment method is introduced. Practical application: Model risk frameworks mandate drift detection triggers, scheduled re‑training, and impact assessments. Challenges: Distinguishing genuine drift from random noise, and deciding when to update versus when to retain the existing model.

Confidence Interval – A range of values, derived from sample statistics,… #

g., 95%). Related terms: standard error, coverage probability, bootstrap. Explanation: Confidence intervals provide a quantitative measure of uncertainty around point estimates such as model coefficients or predicted loss amounts. Example: The 95% confidence interval for the expected loss of a portfolio is €1.2 M to €1.5 M. Practical application: Used in model validation reports to express the precision of risk estimates and to set capital buffers. Challenges: Requires assumptions about data distribution; non‑parametric methods may be needed for heavy‑tailed financial data.

Cross‑Validation – A resampling technique that partitions data into compl… #

Related terms: k‑fold, hold‑out set, validation strategy. Explanation: In k‑fold cross‑validation, the data is split into k equal parts; each part serves as a test set once while the remaining k‑1 parts form the training set. The average metric across folds estimates out‑of‑sample performance. Example: A credit‑risk model is assessed using 5‑fold cross‑validation, yielding an average AUC of 0.81. Practical application: Provides robust performance estimates for model selection and hyper‑parameter tuning. Challenges: Computationally expensive for large datasets; care needed to avoid leakage when time series data is involved.

Data Augmentation – Techniques that artificially expand the training data… #

Related terms: synthetic data, over‑sampling, SMOTE. Explanation: Augmentation helps mitigate data scarcity, improve model robustness, and reduce overfitting. Common methods include adding noise, resampling, or generating synthetic records using generative models. Example: A fraud detection model uses SMOTE to balance the minority class by creating synthetic fraudulent transactions. Practical application: Particularly valuable in domains with rare events (e.g., operational loss modeling). Challenges: Synthetic data may not capture complex real‑world dependencies, potentially introducing bias.

Data Leakage – The inadvertent inclusion of information in the training d… #

Related terms: target leakage, information leakage, validation contamination. Explanation: Leakage can occur through features that are proxies for the target, or by using future data in the training set. Example: Including a “loan repayment status” variable when predicting loan default results in perfect accuracy during validation but fails in production. Practical application: Rigorous data pipeline design and strict separation of training, validation, and test sets are essential safeguards. Challenges: Leakage can be subtle, especially when derived features or time‑based aggregations are involved.

Decision Tree – A non‑parametric supervised learning algorithm that recur… #

Related terms: splitting criterion, pruning, Gini impurity. Explanation: Each internal node represents a test on a feature, each branch a possible outcome, and each leaf node a predicted value or class. Example: A simple decision tree predicts loan approval based on debt‑to‑income ratio and credit score. Practical application: Provides intuitive visual explanations, often used as base learners in ensemble methods. Challenges: Prone to overfitting; depth control and pruning are necessary for stable performance.

Deep Learning – A subset of machine learning that employs multilayered ar… #

Related terms: convolutional neural network, backpropagation, gradient descent. Explanation: Deep architectures can capture complex, non‑linear relationships, making them suitable for unstructured data such as images, text, and speech. Example: A recurrent neural network forecasts macro‑economic indicators using historical time‑series data. Practical application: Used for sentiment analysis of news feeds, scenario generation, and high‑frequency price prediction. Challenges: Requires large labeled datasets, extensive computational resources, and sophisticated interpretability tools for regulatory compliance.

Ensemble Methods – Techniques that combine multiple base models to produc… #

Related terms: bagging, boosting, stacking. Explanation: By aggregating diverse predictions, ensembles reduce variance, bias, or improve robustness. Common ensembles include Random Forests (bagging) and Gradient Boosting Machines (boosting). Example: An insurance pricing model blends a linear regression, a decision tree, and a gradient‑boosted classifier to achieve higher accuracy than any single model. Practical application: Widely adopted in credit scoring, fraud detection, and market risk modeling. Challenges: Increased complexity, longer training times, and difficulty in interpreting the contribution of each component model.

Feature Engineering – The process of creating, transforming, and selectin… #

Related terms: feature extraction, domain knowledge, dimensionality reduction. Explanation: Effective engineering can capture non‑linear relationships, encode business logic, and reduce noise. Techniques include binning, interaction terms, logarithmic scaling, and time‑series lag creation. Example: Converting raw transaction timestamps into “days since last transaction” improves churn prediction. Practical application: Critical in model risk management where transparency and controllability of inputs are required. Challenges: Time‑consuming, may introduce leakage if future information is embedded inadvertently, and requires continuous monitoring as data evolves.

Feature Selection – The identification of a subset of relevant variables… #

Related terms: filter methods, wrapper methods, embedded methods. Explanation: Techniques range from simple statistical tests (e.g., chi‑square) to sophisticated algorithms such as recursive feature elimination or L1 regularisation. Example: A credit‑risk model retains only 12 out of 50 candidate variables after applying mutual information ranking. Practical application: Reduces model complexity, improves interpretability, and speeds up training. Challenges: Correlated features may mask importance; selection must be validated on out‑of‑sample data to avoid overfitting.

Gradient Boosting – An ensemble technique that builds models sequentially… #

Related terms: learning rate, loss function, XGBoost. Explanation: By focusing on difficult cases, gradient boosting achieves high accuracy but can be prone to overfitting if not regularised. Example: A gradient‑boosted tree model predicts probability of default with an AUC of 0.84, outperforming a logistic regression baseline. Practical application: Preferred for tabular financial data due to its ability to handle mixed data types and missing values. Challenges: Requires careful tuning of hyper‑parameters (e.g., number of trees, depth, learning rate) and robust validation to ensure stability.

Hyperparameter Tuning – The optimisation of model configuration settings… #

Related terms: grid search, random search, Bayesian optimisation. Explanation: Hyperparameters include learning rate, regularisation strength, tree depth, etc. Proper tuning balances bias‑variance trade‑off and maximises out‑of‑sample performance. Example: Using Bayesian optimisation, the optimal learning rate for a neural network is identified as 0.003 with a dropout rate of 0.25. Practical application: Integral to model development pipelines; automated tools can streamline the process. Challenges: Computationally expensive, risk of over‑optimising to validation data, and may require domain‑specific constraints.

Imbalanced Data – Datasets where the class distribution is heavily skewed… #

Related terms: class weighting, undersampling, cost‑sensitive learning. Explanation: Standard accuracy metrics become misleading; alternative metrics such as AUC, F1‑score, or precision‑recall are preferred. Example: In a fraud detection dataset, fraudulent transactions represent 0.3% of all records; applying SMOTE balances the classes for training. Practical application: Essential in operational risk where rare loss events must be predicted accurately. Challenges: Synthetic balancing may distort underlying distributions; careful evaluation is required to avoid inflated performance.

Interpretability – The degree to which a human can understand the cause o… #

Related terms: explainable AI, model transparency, post‑hoc explanation. Explanation: Interpretability is vital for regulatory approval, stakeholder trust, and debugging. Techniques include global methods (e.g., feature importance) and local methods (e.g., LIME, SHAP). Example: A SHAP summary plot shows that “debt‑to‑income ratio” contributes most to default risk predictions. Practical application: Model risk frameworks often require documented interpretability levels for each model tier. Challenges: Trade‑off between accuracy and explainability; deep models may need surrogate explainers that approximate behaviour.

K‑Fold Cross‑Validation – A specific form of cross‑validation that divide… #

Related terms: stratified k‑fold, repeatable splits, validation robustness. Explanation: Stratification ensures each fold preserves the original class distribution, which is crucial for imbalanced datasets. Example: Using 10‑fold stratified cross‑validation, a logistic regression model achieves a mean AUC of 0.79 with low variance across folds. Practical application: Provides reliable performance estimates and aids hyper‑parameter selection. Challenges: Increased computational load; for time‑series data, forward‑chaining validation may be more appropriate.

LIME – Local Interpretable Model‑agnostic Explanations, a technique that… #

g., linear model). Related terms: model‑agnostic, local fidelity, explainability tool. Explanation: By perturbing input data around a specific instance and fitting a simple model, LIME reveals which features most influence that prediction. Example: LIME explains why a particular loan application was denied, highlighting high “outstanding debt” as the dominant factor. Practical application: Supports compliance audits and customer communication by providing case‑by‑case rationales. Challenges: Explanations depend on perturbation strategy; may be unstable for high‑dimensional data.

Model Governance – The set of policies, procedures, and controls that ove… #

Related terms: model lifecycle, risk appetite, audit trail. Explanation: Governance ensures models align with regulatory expectations, internal risk limits, and business objectives. Components include documentation standards, approval hierarchies, version control, and periodic back‑testing. Example: A bank’s model risk committee reviews all new credit‑risk models, requiring a validation report, stress‑testing results, and a data lineage diagram before approval. Practical application: Provides a structured framework for managing model risk across the enterprise. Challenges: Balancing thorough oversight with agility, especially in fast‑moving AI environments; integrating governance into automated pipelines.

Model Validation – The independent assessment of a model’s performance, a… #

Related terms: back‑testing, stress testing, independent review. Explanation: Validation examines predictive accuracy, stability, sensitivity, and compliance with documentation standards. Example: A validation team conducts a 12‑month back‑test of a market‑risk model, comparing predicted VaR against realized losses and confirming coverage at the 99% confidence level. Practical application: Required by regulatory bodies (e.g., BaFin, ECB) to certify model reliability before capital allocation. Challenges: Access to high‑quality out‑of‑sample data, managing model drift, and reconciling differing stakeholder expectations.

Overfitting – The condition where a model captures noise or random fluctu… #

Related terms: underfitting, regularisation, model complexity. Explanation: Overfitted models exhibit high training accuracy but significantly lower validation or test performance. Example: A neural network with excessive layers predicts perfectly on the training set but yields an AUC drop from 0.95 to 0.62 on the hold‑out set. Practical application: Detected through cross‑validation, learning curves, and monitoring of validation metrics. Mitigation strategies include simplifying the architecture, applying dropout, and using regularisation penalties. Challenges: Determining the optimal complexity level, especially when data is limited.

Predictive Analytics – The use of statistical and machine learning techni… #

Related terms: forecasting, time‑series analysis, scenario modelling. Explanation: Predictive models output probabilities, point estimates, or risk scores that inform decision‑making. Example: A predictive analytics platform estimates the probability of loan default for each applicant, feeding the scores into the underwriting engine. Practical application: Core to credit risk, fraud detection, market risk, and operational loss modelling. Challenges: Ensuring data quality, handling concept drift, and maintaining model interpretability for regulatory compliance.

Regularisation – Techniques that add a penalty term to the loss function… #

Related terms: L1 penalty, L2 penalty, ridge regression. Explanation: L1 (lasso) encourages sparsity, potentially performing feature selection; L2 (ridge) shrinks coefficients towards zero, preserving all features. Example: Adding an L2 regularisation term to a logistic regression reduces coefficient variance and improves out‑of‑sample AUC from 0.78 to 0.81. Practical application: Standard in linear models, also incorporated in deep learning via weight decay. Challenges: Selecting the optimal regularisation strength (λ) often requires cross‑validation.

Reinforcement Learning – A learning paradigm where an agent interacts wit… #

Related terms: policy, Markov Decision Process, Q‑learning. Explanation: The agent learns through trial‑and‑error, receiving feedback (rewards or penalties) for each action. Example: An algorithmic trading system learns optimal execution strategies by receiving reward based on execution cost and market impact. Practical application: Portfolio optimisation, dynamic hedging, and robo‑advisory. Challenges: Defining appropriate reward functions, ensuring stability and convergence, and managing exploration‑exploitation trade‑offs in a regulated financial setting.

Sampling Bias – Systematic error introduced when the sample used to train… #

Related terms: selection bias, non‑random sampling, coverage error. Explanation: Bias can distort model predictions, leading to systematic mis‑estimation of risk. Example: Training a credit‑risk model on only high‑net‑worth customers causes under‑prediction of risk for mass‑market borrowers. Practical application: Model risk managers must assess data collection processes and may apply re‑weighting or stratified sampling to correct bias. Challenges: Detecting hidden bias, especially when external validation data is scarce.

Sensitivity Analysis – The systematic variation of model inputs to assess… #

Related terms: what‑if analysis, scenario testing, elasticity. Explanation: Techniques range from one‑at‑a‑time perturbations to global methods like Sobol indices. Example: Varying the unemployment rate by ±2% changes the predicted loan‑loss provision by ±5%, indicating high sensitivity. Practical application: Supports model validation, stress testing, and regulatory reporting. Challenges: Computational cost for complex models; interactions among variables may require sophisticated multivariate approaches.

Shapley Values – A game‑theoretic attribution method that distributes a m… #

Related terms: cooperative game theory, feature importance, SHAP. Explanation: Shapley values satisfy fairness axioms, providing both global and local interpretability. Example: A SHAP analysis shows that “credit history length” contributes +0.12 to the default probability for a specific applicant. Practical application: Widely adopted for explaining black‑box models in finance, satisfying audit requirements. Challenges: Exact computation is exponential; approximations are used for high‑dimensional data, which may affect precision.

Supervised Learning – A machine‑learning paradigm where models are traine… #

Related terms: classification, regression, labelled data. Explanation: The algorithm minimises a loss function that quantifies the difference between predicted and true outputs. Example: A supervised classifier predicts whether a transaction is fraudulent (binary label) based on transaction attributes. Practical application: Core to credit scoring, fraud detection, and loss‑frequency modelling. Challenges: Requires high‑quality labelled data; label noise can degrade performance and increase model risk.

Transfer Learning – The practice of leveraging knowledge from a pre‑train… #

Related terms: fine‑tuning, pre‑trained model, domain adaptation. Explanation: Early layers capture generic patterns (e.g., edges in images) that can be repurposed, reducing the need for extensive training data. Example: A language model pre‑trained on general news articles is fine‑tuned on financial disclosures to extract sentiment for credit risk assessment. Practical application: Accelerates model development, especially when proprietary data is limited. Challenges: Risk of negative transfer if source and target domains differ significantly; requires careful validation.

Uncertainty Quantification – The process of characterising and communicat… #

Related terms: probabilistic forecasting, prediction intervals, Monte Carlo simulation. Explanation: Methods include Bayesian inference, bootstrapping, and ensemble variance estimation. Example: A Bayesian neural network provides a posterior distribution for default probability, yielding a 95% credible interval of 3.2%–4.6%. Practical application: Informs capital allocation decisions and risk‑adjusted pricing by reflecting prediction uncertainty. Challenges: Computationally intensive; integrating uncertainty metrics into existing risk frameworks may require cultural and procedural changes.

Validation Set – A subset of data reserved for tuning model hyper‑paramet… #

Related terms: hold‑out set, development set, train‑validation‑test split. Explanation: Distinct from the training set (used for learning) and test set (used for final unbiased evaluation). Example: After training a gradient‑boosted model on 70% of the data, the remaining 15% serves as a validation set to select the optimal number of trees. Practical application: Prevents information leakage and ensures that hyper‑parameter choices generalise to unseen data. Challenges: In time‑series contexts, validation must respect temporal order to avoid look‑ahead bias.

XGBoost – Extreme Gradient Boosting, a highly efficient and scalable impl… #

Related terms: tree boosting, regularised learning, parallel processing. Explanation: Incorporates regularisation, handling of missing values, and built‑in cross‑validation, making it popular for tabular financial data. Example: An XGBoost model predicts probability of loan default with an AUC of 0.86, outperforming a logistic regression baseline. Practical application: Frequently used in credit risk, fraud detection, and operational loss forecasting due to its speed and accuracy. Challenges: Hyper‑parameter tuning complexity, potential overfitting if depth and number of trees are not constrained, and the need for interpretability tools (e.g., SHAP) for regulatory acceptance.

Yield Curve Modeling – The statistical representation of interest rates a… #

Related terms: Nelson‑Siegel, bootstrapping, term structure. Explanation: Machine‑learning techniques such as spline regression, Gaussian processes, or deep learning can capture non‑linear dynamics and regime shifts. Example: A neural network trained on historical yield data predicts future curve shapes, improving the valuation of interest‑rate derivatives. Practical application: Supports scenario generation for stress testing, valuation adjustments, and asset‑liability management. Challenges: Ensuring no arbitrage violations, maintaining economic interpretability, and handling limited data for long‑dated maturities.

Zero‑Inflated Models – Statistical models designed for count data with an… #

non‑zero) with a count component. Related terms: hurdle model, Poisson regression, negative binomial. Explanation: The zero‑inflated part models the probability of structural zeros, while the count part models the frequency of events when they occur. Example: Modeling the number of operational loss events per month, where many months have zero losses, using a zero‑inflated Poisson model. Practical application: Improves fit for rare‑event data in operational risk and insurance claim frequency. Challenges: Parameter identification can be difficult; model selection must be justified with likelihood‑ratio tests and out‑of‑sample validation.

Adversarial Robustness – The ability of a model to maintain performance w… #

Related terms: adversarial examples, defence mechanisms, security testing. Explanation: Small, often imperceptible changes to input features can cause large prediction errors in vulnerable models. Example: Adding subtle noise to transaction attributes leads a fraud detection model to misclassify fraudulent activity as legitimate. Practical application: Security‑oriented model risk assessments incorporate adversarial testing to gauge vulnerability. Challenges: Generating realistic adversarial scenarios, balancing robustness with model accuracy, and integrating defence strategies into production pipelines.

Bagging – Bootstrap Aggregating, an ensemble technique that builds multip… #

Related terms: random forest, variance reduction, bootstrap sampling. Explanation: By averaging across diverse models, bagging reduces variance and improves stability without substantially increasing bias. Example: A random forest comprising 200 decision trees predicts loan default probabilities with lower variance than a single tree. Practical application: Preferred for high‑dimensional tabular data where overfitting is a concern. Challenges: Requires sufficient computational resources; interpretability may suffer compared to a single simple model.

Calibration Curve – A graphical tool that plots predicted probabilities a… #

Related terms: reliability diagram, probability calibration, binning. Explanation: The curve reveals whether predicted probabilities are systematically over‑ or under‑confident. Example: A calibration curve for a credit‑risk model shows that predictions around 0.2 correspond to an actual default rate of 0.35, indicating under‑estimation. Practical application: Guides recalibration methods (e.g., isotonic regression) and informs risk‑adjusted pricing. Challenges: Requires sufficient data in each probability bin; sparse data can lead to noisy curves.

Cross‑Entropy Loss – A loss function commonly used for classification tas… #

Related terms: log‑loss, softmax, maximum likelihood. Explanation: For binary classification, the loss is –[y log(p) + (1–y) log(1–p)], where y is the true label and p the predicted probability. Example: Minimising cross‑entropy during training of a neural network improves the separation between fraudulent and legitimate transactions. Practical application: Provides a smooth gradient for optimisation algorithms, facilitating deep learning training. Challenges: Sensitive to outliers; class imbalance may require weighting or focal loss adjustments.

Data Governance – The overarching framework that defines policies, standa… #

Related terms: data stewardship, metadata management, compliance. Explanation: Effective governance ensures that data used for model development is accurate, consistent, and fit for purpose. Example: A data governance board enforces a master data schema for customer attributes, reducing discrepancies across credit‑risk models. Practical application: Supports model risk management by providing traceability, auditability, and risk‑adjusted data controls. Challenges: Aligning governance with agile development cycles, handling data from disparate sources, and maintaining up‑to‑date documentation.

Ensemble Averaging – The process of combining predictions from multiple m… #

Related terms: model blending, stacked generalisation, diversity. Explanation: Averaging reduces variance and can improve overall accuracy when individual models have uncorrelated errors. Example: An ensemble of three models (logistic regression, random forest, gradient‑boosted tree) yields a lower mean squared error than any single component. Practical application: Common in competitions and production risk models where robustness is paramount. Challenges: Managing increased computational overhead and ensuring that ensemble components adhere to governance standards.

Feature Drift – A specific type of data drift where the distribution of i… #

Related terms: covariate shift, population shift, monitoring. Explanation: Unlike target drift (concept drift), feature drift does not affect the relationship between inputs and output but can still cause mismatches if the model is sensitive to certain feature ranges. Example: A macro‑economic feature such as “inflation rate” shifts its distribution due to policy changes, affecting a credit‑risk model trained on pre‑shift data. Practical application: Continuous monitoring of feature statistics and automatic alerts trigger re‑training or model recalibration. Challenges: Distinguishing benign drift from harmful drift, and updating models without disrupting downstream systems.

Gaussian Process – A non‑parametric Bayesian approach to regression and c… #

Related terms: kernel methods, kriging, covariance function. Explanation: Predictions are Gaussian‑distributed with mean and variance derived from the kernel, providing natural uncertainty estimates. Example: A Gaussian process models the term‑structure of interest rates, delivering both point forecasts and credible intervals. Practical application: Useful for small‑sample, high‑uncertainty problems such as scenario generation for stress testing. Challenges: Computational complexity scales cubically with data size; sparse approximations are needed for large datasets.

Hyperparameter – A configuration setting for a learning algorithm that is… #

Related terms: model parameters, tuning, configuration space. Explanation: Examples include learning rate, number of hidden layers, regularisation strength, and tree depth. Hyperparameters control model capacity, convergence speed, and generalisation. Example: Selecting a learning rate of 0.01 for a gradient‑descent optimiser after a grid‑search. Practical application: Systematic hyperparameter optimisation is part of the model development lifecycle. Challenges: Large search spaces can be computationally expensive; automated methods (e.g., Bayesian optimisation) help mitigate cost.

Interpretability‑by‑Design – The practice of constructing models that are… #

Related terms: transparent modeling, explainable AI, model simplicity. Explanation: By choosing algorithms with clear decision logic, organisations reduce reliance on post‑hoc explanation tools. Example: A credit‑scoring model uses a weighted sum of a limited set of features, allowing regulators to trace each input’s contribution directly. Practical application: Preferred for high‑impact models where regulatory scrutiny is intense. Challenges: May sacrifice predictive performance; trade‑offs must be documented and justified.

Kernel Trick – A technique that enables linear algorithms to learn non‑li… #

Related terms: support vector machine, radial basis function, feature space. Explanation: The algorithm only requires inner products between data points, which kernels compute directly without explicit transformation. Example: An

June 2026 intake · open enrolment
from £90 GBP
Enrol