Unit 9: Research Methods in Neuropsychology
Expert-defined terms from the Professional Certificate in Neuropsychological Testing course at London School of Business and Administration. Free to read, free to share, paired with a professional course.
ANOVA (Analysis of Variance) – a statistical technique used to compare me… #
related terms: F‑test, between‑subjects design. In neuropsychology, ANOVA can determine whether performance on a memory test differs among patients with Alzheimer’s disease, vascular dementia, and healthy controls. Practical application includes evaluating the impact of a cognitive rehabilitation program across multiple time points. Challenges involve meeting assumptions of normality and homogeneity of variance; violations may require transformation or non‑parametric alternatives.
Artifact – any extraneous factor that contaminates data, leading to inacc… #
related terms: confound, bias. For example, fatigue during a prolonged neuropsychological battery can produce lower scores unrelated to the construct being measured. Researchers must control artifacts by standardizing testing conditions, scheduling breaks, and monitoring participant state. The main challenge is identifying subtle artifacts that are not obvious from observation alone.
Blinding – the process of keeping participants, assessors, or analysts un… #
related terms: single‑blind, double‑blind, masking. In a drug trial evaluating a new cholinesterase inhibitor, double‑blinding prevents expectancy effects from influencing test performance. Practical application includes using coded test forms and ensuring data analysts receive de‑identified datasets. Challenges arise when side‑effects reveal group status, potentially compromising blinding integrity.
Case‑Control Study – an observational design that compares individuals wi… #
related terms: retrospective study, odds ratio. A neuropsychology example investigates whether a history of traumatic brain injury is more prevalent among patients with early‑onset dementia. Practical steps include matching controls on age, education, and sex. Limitations involve recall bias and difficulty establishing temporal relationships.
Cross‑Sectional Design – a study that assesses participants at a single p… #
related terms: snapshot, prevalence. Researchers may administer a battery of executive function tests to a cohort of older adults to estimate the prevalence of frontal lobe deficits. This design is efficient for generating hypotheses and estimating population parameters. However, it cannot infer causality or developmental trajectories, and cohort effects may confound age‑related findings.
Ecological Validity – the extent to which study findings generalize to re… #
related terms: external validity, generalizability. A laboratory memory test may have high internal reliability but low ecological validity if it does not reflect everyday memory demands. To enhance ecological validity, neuropsychologists may employ virtual reality simulations of daily tasks. The challenge lies in balancing experimental control with realistic task demands.
Effect Size – a quantitative measure of the magnitude of a phenomenon, in… #
related terms: Cohen’s d, η². In a pilot study comparing two rehabilitation protocols, a Cohen’s d of 0.8 indicates a large effect, suggesting clinical relevance even if statistical significance is marginal due to a small sample. Researchers must report effect sizes to aid meta‑analysis and clinical decision‑making. Small effect sizes may still be meaningful in high‑stakes contexts, complicating interpretation.
Factor Analysis – a multivariate technique that reduces a large set of va… #
related terms: principal component analysis, construct validity. Neuropsychologists use factor analysis to determine whether a battery of language tasks loads onto distinct phonological and semantic factors. Practical application includes refining test batteries and informing theoretical models of cognition. Challenges include selecting the appropriate extraction method, determining the number of factors, and ensuring adequate sample size.
Frequency Distribution – a tabular representation of how often each score… #
related terms: histogram, descriptive statistics. Plotting the distribution of reaction times on a Stroop task helps identify outliers and assess normality. Researchers can use this information to decide whether parametric tests are appropriate. A skewed distribution may require transformation or the use of non‑parametric tests, presenting analytical challenges.
General Linear Model (GLM) – a flexible statistical framework that includ… #
related terms: predictors, covariates. In neuropsychology, a GLM can examine the effect of age (continuous predictor) and diagnosis (categorical predictor) on executive function scores while controlling for education (covariate). The GLM allows interaction terms to explore whether age-related decline differs by diagnosis. Complexity increases with multiple predictors, and multicollinearity can undermine model stability.
Hypothesis – a testable statement predicting a relationship between varia… #
related terms: null hypothesis, alternative hypothesis. An example hypothesis: “Patients with frontal lobe lesions will perform worse on the Wisconsin Card Sorting Test than patients with temporal lobe lesions.” Formulating clear hypotheses guides statistical testing and interpretation. Poorly specified hypotheses may lead to ambiguous results and limit the study’s contribution to theory.
Informed Consent – the ethical process by which participants voluntarily… #
related terms: assent, ethical approval. In neuropsychological studies involving vulnerable populations (e.g., dementia patients), researchers must assess decision‑making capacity and may obtain surrogate consent. Practical considerations include simplifying language, allowing time for questions, and documenting consent. Challenges involve balancing the need for scientific rigor with respect for autonomy.
Kurtosis – a statistical measure describing the “tailedness” of a distrib… #
related terms: skewness, normality. High kurtosis indicates heavy tails and potential outliers, which can affect parametric test assumptions. When analyzing neuropsychological test scores, researchers should inspect kurtosis values to determine whether data transformation or robust statistical methods are needed. Interpreting kurtosis in small samples can be unstable, posing a methodological hurdle.
Longitudinal Study – a design that follows the same participants over mul… #
related terms: cohort study, repeated measures. An example involves tracking cognitive decline in individuals with mild cognitive impairment over five years using annual neuropsychological assessments. Longitudinal data allow researchers to model trajectories, identify predictors of rapid decline, and test causal hypotheses. However, attrition, practice effects, and time‑consuming data collection are major challenges.
Meta‑Analysis – a quantitative synthesis that aggregates effect sizes acr… #
related terms: systematic review, forest plot. A meta‑analysis of cognitive training interventions in stroke survivors can reveal the average improvement in processing speed. Practical steps include literature search, coding of study characteristics, and assessing heterogeneity. Publication bias and variability in methodological quality can threaten validity, requiring sensitivity analyses.
Neuroimaging – techniques that visualize brain structure or function, oft… #
related terms: fMRI, DTI, PET. Combining functional MRI with a verbal fluency task enables researchers to link activation patterns to performance scores. Practical applications include validating test constructs and exploring neural correlates of deficits. Limitations involve high cost, the need for specialized expertise, and the risk of over‑interpreting correlational findings.
Operational Definition – a precise description of how a variable will be… #
related terms: construct, measurement. For “working memory capacity,” an operational definition might be the total correct items on the N‑back task at a 2‑back level. Clear operational definitions facilitate replication and reduce ambiguity. In neuropsychology, complex constructs often require multiple operationalizations, which can complicate data integration.
Power Analysis – a priori calculation that determines the sample size nee… #
related terms: Type II error, effect size. Conducting a power analysis before a study on attentional deficits ensures adequate participants to detect clinically meaningful differences. Underpowered studies risk false negatives, while over‑powered designs may waste resources. Accurate power estimation depends on realistic assumptions about variance and effect magnitude.
Randomization – the process of assigning participants to experimental con… #
related terms: allocation concealment, stratified randomization. Randomization minimizes systematic differences between groups, enhancing internal validity. In a trial of a new cognitive enhancer, participants are randomly allocated to drug or placebo arms. Practical considerations include using computer‑generated sequences and ensuring allocation concealment. Randomization can be compromised by attrition or protocol deviations, threatening validity.
Reliability – the consistency of a measurement across time, items, or rat… #
related terms: test‑retest reliability, inter‑rater reliability, internal consistency. The Cronbach’s alpha of a new executive function scale indicates internal consistency; a coefficient above .80 is generally acceptable. Reliable measures are essential for detecting true changes in longitudinal studies. Challenges include balancing reliability with breadth of construct coverage and accounting for practice effects.
Sensitivity – the ability of a test to correctly identify individuals who… #
related terms: false negative, diagnostic accuracy. A neuropsychological screening tool with high sensitivity for early Alzheimer’s disease reduces missed diagnoses. Practical application involves selecting tests that capture subtle deficits. High sensitivity often comes at the cost of reduced specificity, leading to more false positives, which must be managed in clinical decision‑making.
Specificity – the capacity of a test to correctly identify individuals wh… #
related terms: false positive, diagnostic accuracy. A memory test that distinguishes healthy aging from mild cognitive impairment with high specificity reduces unnecessary referrals. Enhancing specificity may involve tightening cut‑off scores, but this can lower sensitivity. Researchers must balance both metrics based on the intended use of the assessment.
Standardization – the process of administering and scoring a test under u… #
related terms: normative data, test manual. Standardized administration of the Trail Making Test ensures comparability across sites. Practical benefits include reliable scoring and the ability to compare an individual’s performance to normative samples. Challenges arise when cultural or linguistic differences affect test performance, necessitating adaptation and re‑norming.
Statistical Significance – the probability that an observed effect is unl… #
05). related terms: p‑value, null hypothesis. Finding a statistically significant difference in processing speed between two groups suggests a real effect, but does not convey magnitude or clinical relevance. Over‑reliance on p‑values can lead to “p‑hacking.” Researchers should complement significance testing with effect sizes and confidence intervals.
Test‑Retest Reliability – the stability of scores when the same test is a… #
related terms: temporal stability, intraclass correlation. High test‑retest reliability for a visuospatial task indicates that scores are not substantially influenced by random error. In longitudinal neuropsychology, this property is crucial for tracking disease progression. Practice effects and participant fatigue can inflate reliability estimates, posing interpretive challenges.
Validity – the extent to which a test measures what it intends to measure #
related terms: construct validity, criterion validity, face validity. Construct validity of a new attention measure is demonstrated when scores correlate with established attention tests and diverge from memory tests. Validity is multidimensional; establishing it requires convergent and discriminant evidence, as well as predictive data. A test may be reliable yet lack validity, limiting its scientific and clinical utility.
Within‑Subject Design – an experimental arrangement where the same partic… #
related terms: repeated measures, counterbalancing. In a neurocognitive study, participants complete a working memory task under both drug and placebo conditions, allowing direct comparison while controlling for individual differences. Advantages include increased statistical power and reduced sample size requirements. However, order effects, carry‑over, and fatigue must be mitigated through appropriate counterbalancing and washout periods.
Z‑Score – a standardized score indicating how many standard deviations an… #
related terms: normative scoring, percentile rank. Converting raw scores on the Rey Auditory Verbal Learning Test to Z‑scores facilitates comparison across age‑adjusted norms. Practical use includes identifying individuals who fall below a clinical cut‑off (e.g., Z < ‑1.5). Limitations arise when normative samples are not representative of the target population, potentially biasing interpretation.
Attrition – the loss of participants over the course of a longitudinal or… #
related terms: dropout, missing data. In a five‑year study of cognitive decline, 20 % attrition may bias results if those who drop out differ systematically (e.g., more severe impairment). Strategies to reduce attrition include maintaining contact, offering incentives, and flexible scheduling. When attrition occurs, researchers must employ appropriate statistical techniques (e.g., mixed‑effects models) to handle missing data.
Bias – systematic error that distorts the true relationship between varia… #
related terms: selection bias, observer bias. In a study recruiting volunteers from a memory clinic, selection bias may over‑represent individuals with higher education, limiting generalizability. Identifying and minimizing bias involves careful study design, randomization, blinding, and transparent reporting. Some biases, such as confirmation bias, may infiltrate data interpretation and require vigilant peer review.
Construct Validity – the degree to which a test accurately measures the t… #
related terms: convergent validity, discriminant validity. Demonstrating construct validity for a new executive function test involves showing strong correlations with established executive measures (convergent) and weak correlations with unrelated domains such as visual acuity (discriminant). Ongoing validation studies are required as the test is applied to diverse populations. Failure to establish construct validity limits the test’s scientific credibility.
Data Imputation – the statistical technique of estimating missing values… #
related terms: multiple imputation, missing data mechanisms. When a subset of participants fails to complete the Digit Span test, researchers may apply multiple imputation to generate plausible scores based on observed data patterns. Imputation reduces bias from listwise deletion but introduces uncertainty; analyses should incorporate imputation variance and assess sensitivity to different imputation models.
Effect Modifier – a variable that changes the magnitude or direction of t… #
related terms: interaction, moderator. In a study of cognitive training effects, education level may act as an effect modifier, with higher gains observed among participants with more schooling. Detecting effect modification involves testing interaction terms in regression models. Interpretation requires caution, as spurious interactions can arise from small sample sizes or model over‑fitting.
Factorial Design – an experimental framework that examines the effects of… #
related terms: main effect, interaction effect. A 2 × 2 factorial design might assess the impact of medication (drug vs. placebo) and cognitive strategy training (present vs. absent) on processing speed. This design efficiently evaluates both individual and combined influences. Complexity increases with each added factor, and interpreting higher‑order interactions can be challenging.
Generalizability – the extent to which study findings apply to broader po… #
related terms: external validity, transferability. Results from a neuropsychological investigation conducted in a tertiary care hospital may not generalize to community clinics with different demographic profiles. Researchers enhance generalizability by using diverse samples, multi‑site recruitment, and transparent reporting of inclusion criteria. Nonetheless, ecological constraints often limit complete external validity.
Hierarchical Linear Modeling (HLM) – a statistical method for analyzing d… #
g., repeated measures within participants). related terms: mixed‑effects model, multilevel analysis. In a longitudinal study of cognitive recovery after stroke, HLM can model individual trajectories while accounting for group‑level predictors such as rehabilitation intensity. Practical advantages include handling unequal time intervals and missing data. Challenges involve specifying appropriate random‑effects structures and ensuring sufficient Level‑2 units for reliable estimation.
Inter‑Rater Reliability – the degree of agreement among different observe… #
related terms: Cohen’s κ, intraclass correlation. When multiple clinicians rate the severity of aphasia from a language sample, high inter‑rater reliability indicates consistent scoring criteria. Training raters, using detailed scoring rubrics, and conducting calibration sessions improve reliability. Low agreement may necessitate revising the scoring system or limiting assessments to a single trained rater.
Longitudinal Mixed‑Effects Model – an advanced statistical approach that… #
g., treatment) with random effects (e.g., individual variation) across time. related terms: growth curve analysis, repeated measures ANOVA. This model allows researchers to examine how cognitive scores change over years while accounting for baseline differences and within‑subject correlation. It accommodates unbalanced data and missing observations, making it ideal for neuropsychological research where follow‑up intervals vary. Model complexity and convergence issues can pose analytical hurdles.
Manipulation Check – a test to verify that an experimental manipulation s… #
related terms: fidelity, check measure. In a study examining the effect of stress on working memory, a manipulation check might involve measuring cortisol levels to confirm stress induction. Including manipulation checks strengthens internal validity by demonstrating that the independent variable was effectively implemented. Failure of a manipulation check may require redesigning the experimental protocol.
Neuropsychological Test Battery – a collection of standardized assessment… #
related terms: assessment protocol, comprehensive evaluation. A typical battery may include measures of attention, memory, language, visuospatial skills, and executive function. Practical considerations involve test selection based on hypothesis, administration time, and participant fatigue. Standardizing order and ensuring adequate breaks reduce order effects. Balancing thoroughness with practicality remains a core challenge.
Outcome Measure – the primary variable used to assess the effect of an in… #
related terms: dependent variable, endpoint. In a cognitive rehabilitation trial, the primary outcome might be the change in Trail Making Test Part B time from baseline to 12 weeks. Selecting a sensitive, reliable, and clinically meaningful outcome is crucial for detecting true effects. Over‑reliance on a single outcome can obscure broader benefits; thus, researchers often include secondary measures.
Practice Effect – improvement in test performance due to repeated exposur… #
related terms: learning effect, test‑retest improvement. In longitudinal assessments, participants may score higher on the Stroop test simply because they have become familiar with the task. Researchers mitigate practice effects by using alternate test forms, incorporating sufficient intervals between assessments, or statistically adjusting for expected gains. Failure to account for practice effects can inflate perceived treatment efficacy.
Psychometric Properties – characteristics that describe the quality of a… #
related terms: measurement theory, test characteristics. Evaluating the psychometric properties of a new attention scale involves calculating internal consistency, test‑retest reliability, and convergent validity with existing scales. Robust psychometric evidence supports the instrument’s adoption in clinical and research settings. Inadequate psychometric data limit interpretability and may lead to misdiagnosis.
Quasi‑Experimental Design – a study that lacks random assignment but stil… #
related terms: non‑randomized, natural experiment. An investigation comparing cognitive outcomes before and after a new hospital policy, without a control group, exemplifies a quasi‑experimental approach. While more feasible in real‑world settings, these designs are vulnerable to confounding variables. Employing techniques such as propensity‑score matching can strengthen causal inference.
Reliability Coefficient – a numerical index (often between 0 and 1) that… #
related terms: Cronbach’s alpha, split‑half reliability. An alpha of .92 for a memory questionnaire indicates excellent internal consistency. Researchers report reliability coefficients to assure readers of measurement stability. Different coefficients apply to different reliability types; selecting the appropriate metric is essential for accurate reporting.
Sample Size Determination – the process of calculating the number of part… #
related terms: effect size, significance level. Using software to conduct a power analysis for a between‑groups ANOVA with an expected medium effect size may suggest recruiting 30 participants per group. Under‑powered studies risk Type II errors, while oversized samples may be wasteful or raise ethical concerns. Accurate sample size estimation depends on realistic assumptions about variability and anticipated effect magnitude.
Standard Deviation – a measure of dispersion indicating how much individu… #
related terms: variance, spread. Reporting the mean and standard deviation for a digit symbol substitution test allows clinicians to gauge typical performance ranges. High standard deviation may reflect heterogeneous abilities within the sample, affecting the interpretability of group comparisons. When distributions are skewed, median and interquartile range may provide more informative summaries.
Statistical Power – the probability of correctly rejecting a false null h… #
related terms: beta error, sensitivity. A study with 80 % power has a 20 % chance of missing an existing effect. Enhancing power can be achieved by increasing sample size, reducing measurement error, or using more sensitive statistical tests. Reporting achieved power post‑hoc is discouraged; prospective power analysis is preferred for study planning.
Test Administration Protocol – a standardized set of procedures governing… #
related terms: administration guidelines, scoring manual. The protocol for the Boston Naming Test specifies stimulus presentation order, timing, and examiner prompts. Strict adherence ensures comparability across testers and sites. Deviations can introduce systematic error, reducing reliability and validity. Training and periodic fidelity checks help maintain protocol integrity.
Usability Testing – evaluation of how easily clinicians and participants… #
related terms: user experience, interface design. Before launching an electronic version of a cognitive battery, researchers conduct usability testing to identify navigation issues and improve data entry efficiency. Practical benefits include reduced administration errors and increased adoption rates. Challenges include recruiting representative users and balancing functionality with simplicity.
Validity Evidence – the body of data supporting the intended interpretati… #
related terms: construct validation, criterion-related evidence. Collecting validity evidence may involve correlating a new visuospatial test with established measures, demonstrating predictive ability for functional outcomes, and gathering expert judgments. Comprehensive validity evidence bolsters confidence in clinical decision‑making. Ongoing validation is required as test use expands to new populations or settings.
Weighted Composite Score – a single index created by combining multiple t… #
related terms: factor‑derived score, index. A weighted composite of memory, attention, and executive function scores may be used to track overall cognitive health. Determining appropriate weights involves factor analysis, expert consensus, or regression modeling. While composites simplify data interpretation, they can obscure domain‑specific changes and depend on the reliability of constituent measures.