Advanced Social Media Research Methods
Expert-defined terms from the Professional Certificate in Social Media Research Methods (United Kingdom) course at London School of Business and Administration. Free to read, free to share, paired with a professional course.
Algorithmic Bias – systematic distortion introduced by computational mode… #
Algorithmic Bias – systematic distortion introduced by computational models that favor certain groups over others.
Explanation #
In social media research, bias can arise from training data that over‑represent dominant voices, leading to skewed sentiment scores or network metrics.
Example #
A sentiment classifier trained on English‑language tweets may misclassify slang used by younger users, inflating negative sentiment.
Practical application #
Researchers audit model outputs against demographic benchmarks to detect disproportionate error rates.
Challenges #
Access to diverse training sets, transparency of proprietary algorithms, and balancing accuracy with fairness.
API (Application Programming Interface) – set of protocols and tools for… #
API (Application Programming Interface) – set of protocols and tools for building software and enabling machines to retrieve data from platforms.
Explanation #
Social media platforms expose APIs that allow researchers to programmatically collect posts, user metadata, and engagement metrics.
Example #
Using Twitter’s v2 API to pull tweet IDs, timestamps, and retweet counts for a hashtag campaign.
Practical application #
Automating longitudinal data collection for trend analysis.
Challenges #
Changing API policies, data caps, and the need for authentication handling.
Audience Segmentation – process of dividing a social media audience into… #
Audience Segmentation – process of dividing a social media audience into distinct groups based on characteristics or behavior.
Explanation #
Segmentation helps researchers target analyses to specific user subsets, improving relevance of findings.
Example #
Grouping Instagram followers by age, location, and engagement frequency to compare brand perception.
Practical application #
Tailoring content strategy recommendations for each segment.
Challenges #
Incomplete user profiles, privacy restrictions, and dynamic audience movement across segments.
Bot Detection – identification of automated accounts that generate conten… #
Bot Detection – identification of automated accounts that generate content without human intervention.
Explanation #
Bots can amplify or distort social signals; detecting them ensures data integrity.
Example #
Applying a random forest model to flag accounts with high posting frequency, low linguistic diversity, and disproportionate retweet ratios.
Practical application #
Cleaning datasets before sentiment or network analysis.
Challenges #
Evolving bot behavior, false positives that remove genuine high‑activity users, limited ground‑truth data.
Cluster Analysis – statistical technique that groups objects (e #
g., users, posts) based on similarity across multiple dimensions.
Explanation #
In social media research, clusters reveal communities, content themes, or behavioral archetypes.
Example #
Using k‑means on tweet vectors to discover topical clusters around a political event.
Practical application #
Informing content recommendation engines or crisis‑communication plans.
Challenges #
Selecting the appropriate number of clusters, high dimensionality of text data, and interpretability of results.
Content Analysis – systematic coding and interpretation of textual, visua… #
Content Analysis – systematic coding and interpretation of textual, visual, or audio material to identify patterns.
Explanation #
Researchers assign categories to social media posts to quantify themes, emotions, or rhetorical strategies.
Example #
Coding Facebook comments for expressions of trust, fear, or anger during a public health campaign.
Practical application #
Measuring the impact of messaging on audience sentiment.
Challenges #
Subjectivity in coding, large volume of data, and maintaining reliability across coders.
Cross‑Platform Analysis – comparative study of user behavior, content, or… #
Cross‑Platform Analysis – comparative study of user behavior, content, or network structures across multiple social media services.
Explanation #
Enables researchers to understand how platform affordances shape communication patterns.
Example #
Comparing the diffusion speed of a viral video on TikTok versus YouTube.
Practical application #
Advising brands on optimal platform mix for campaign rollout.
Challenges #
Differing data formats, API restrictions, and aligning metrics (e.g., likes vs. hearts).
Data Ethics – principles governing the responsible collection, storage, a… #
Data Ethics – principles governing the responsible collection, storage, analysis, and dissemination of social media data.
Explanation #
Researchers must balance scientific inquiry with the rights and expectations of platform users.
Example #
Anonymising user handles before publishing network diagrams.
Practical application #
Designing research protocols that meet institutional review board (IRB) standards.
Challenges #
Ambiguity of public vs. private data, cross‑jurisdictional legal frameworks, and potential re‑identification risks.
Data Mining – extraction of useful patterns and knowledge from large data… #
Data Mining – extraction of useful patterns and knowledge from large datasets using computational techniques.
Explanation #
In social media contexts, data mining uncovers hidden trends such as emerging hashtags or coordinated campaigns.
Example #
Applying Apriori algorithm to discover co‑occurring hashtags during a protest.
Practical application #
Early detection of misinformation spikes.
Challenges #
High‑velocity data streams, storage costs, and algorithmic scalability.
Data Visualisation – graphical representation of data to communicate insi… #
Data Visualisation – graphical representation of data to communicate insights clearly and efficiently.
Explanation #
Visual tools help stakeholders grasp complex social media dynamics at a glance.
Example #
A Sankey diagram illustrating user flow from organic posts to paid advertisements.
Practical application #
Real‑time monitoring dashboards for crisis communication teams.
Challenges #
Over‑simplification, selection bias in what is visualised, and accessibility for non‑technical audiences.
Deep Learning – subset of machine learning employing neural networks with… #
Deep Learning – subset of machine learning employing neural networks with many layers to model complex patterns.
Explanation #
Deep models excel at processing unstructured social media content such as images, video, and natural language.
Example #
Using a transformer‑based model to detect sarcasm in Twitter replies.
Practical application #
Automated content moderation and sentiment detection at scale.
Challenges #
Need for large labelled datasets, computational expense, and opacity of model decisions.
Engagement Metrics – quantitative indicators of user interaction with soc… #
Engagement Metrics – quantitative indicators of user interaction with social media content.
Explanation #
Metrics capture the resonance of posts and inform effectiveness assessments.
Example #
Calculating average engagement per follower for a brand’s Instagram campaign.
Practical application #
Benchmarking performance against industry standards.
Challenges #
Metric manipulation (e.g., bought likes), platform algorithm changes, and differing meanings across platforms.
Ethnographic Listening – qualitative approach that treats social media as… #
Ethnographic Listening – qualitative approach that treats social media as a cultural field, observing conversations to understand lived experiences.
Explanation #
Researchers immerse themselves in online communities to capture nuanced meanings and norms.
Example #
Following a Reddit community over months to trace evolving attitudes toward remote work.
Practical application #
Informing policy recommendations with grassroots perspectives.
Challenges #
Maintaining researcher neutrality, managing large thread volumes, and ethical considerations of covert observation.
Geotagging – attaching geographic coordinates to social media content, ei… #
Geotagging – attaching geographic coordinates to social media content, either automatically or manually.
Explanation #
Provides spatial context for posts, enabling mapping of phenomena like disaster response or event attendance.
Example #
Mapping tweets with embedded coordinates to visualise evacuation routes during a flood.
Practical application #
Supporting emergency services with real‑time situational awareness.
Challenges #
Sparse geotagged data, privacy concerns, and accuracy of user‑provided locations.
Hashtag Mining – systematic extraction and analysis of hashtag usage to t… #
Hashtag Mining – systematic extraction and analysis of hashtag usage to track topics, movements, or brand conversations.
Explanation #
Hashtags act as user‑generated metadata that can be aggregated to reveal collective interests.
Example #
Identifying the rise of #MeToo across multiple platforms through frequency counts and co‑occurrence networks.
Practical application #
Real‑time monitoring of public sentiment during product launches.
Challenges #
Ambiguity (e.g., #Apple as fruit vs. company), spam hashtags, and multilingual variations.
Influencer Identification – process of locating individuals who wield dis… #
Influencer Identification – process of locating individuals who wield disproportionate sway over audience attitudes and behaviours.
Explanation #
Influencers are detected via network metrics (e.g., betweenness, eigenvector) or content reach.
Example #
Using PageRank on a retweet network to surface users who act as bridges between communities.
Practical application #
Selecting partnership candidates for marketing campaigns.
Challenges #
Distinguishing authentic influence from artificial amplification, and accounting for platform‑specific visibility algorithms.
Keyword Extraction – automated identification of salient terms from text… #
Keyword Extraction – automated identification of salient terms from text corpora.
Explanation #
Helps summarise large volumes of posts and feed topic‑based analyses.
Example #
Applying TF‑IDF to a set of YouTube comments to surface recurring concerns about product durability.
Practical application #
Drafting FAQ sections based on frequent user queries.
Challenges #
Handling slang, emojis, and multilingual content; balancing precision and recall.
Latent Dirichlet Allocation (LDA) – probabilistic model for discovering a… #
Latent Dirichlet Allocation (LDA) – probabilistic model for discovering abstract topics within a collection of documents.
Explanation #
LDA treats each document as a mixture of topics, each represented by a distribution over words.
Example #
Running LDA on a corpus of brand‑related tweets to uncover themes such as “customer service,” “price,” and “innovation.”
Practical application #
Tracking shifts in public discourse over time.
Challenges #
Selecting the correct number of topics, interpreting vague topic labels, and computational intensity on large datasets.
Linkage Disequilibrium – (Note #
Not directly social media; omit).
Machine Learning (ML) – suite of algorithms that enable computers to lear… #
Machine Learning (ML) – suite of algorithms that enable computers to learn patterns from data without explicit programming.
Explanation #
In social media research, ML powers classification, prediction, and anomaly detection tasks.
Example #
Training a supervised classifier to label tweets as “spam” vs. “organic.”
Practical application #
Automating moderation pipelines for large communities.
Challenges #
Model drift as platform language evolves, data imbalance, and interpretability for stakeholders.
Network Centrality – quantitative measures that indicate the importance o… #
Network Centrality – quantitative measures that indicate the importance of nodes within a social network.
Explanation #
Centrality helps identify key actors, information brokers, or potential spreaders of content.
Example #
Calculating betweenness centrality to find users who connect otherwise separate discussion clusters on a political forum.
Practical application #
Targeting interventions to curb misinformation diffusion.
Challenges #
Dynamic networks where centrality values shift rapidly, and computational cost for large graphs.
Natural Language Processing (NLP) – interdisciplinary field combining lin… #
Natural Language Processing (NLP) – interdisciplinary field combining linguistics and computer science to enable machines to understand human language.
Explanation #
NLP tools parse social media text to extract sentiment, topics, or entities.
Example #
Using NER to identify brand names mentioned in Instagram captions.
Practical application #
Building dashboards that track competitor mentions in real time.
Challenges #
Short, noisy text; prevalence of emojis and slang; multilingual posts.
Noise Filtering – removal or down‑weighting of irrelevant or low‑quality… #
Noise Filtering – removal or down‑weighting of irrelevant or low‑quality data points that can obscure true patterns.
Explanation #
Social media streams contain bots, duplicate posts, and off‑topic chatter that must be cleaned.
Example #
Applying a regex filter to discard tweets containing only URLs.
Practical application #
Improving the accuracy of sentiment models by eliminating non‑textual noise.
Challenges #
Balancing aggressive filtering (risking loss of genuine content) against under‑filtering (retaining bias).
Observational Study – research design that monitors social media behaviou… #
Observational Study – research design that monitors social media behaviour without manipulating variables.
Explanation #
Provides naturalistic insight into user interactions, trends, and network evolution.
Example #
Tracking hashtag usage over a six‑month period to assess campaign longevity.
Practical application #
Informing strategic decisions based on organic audience response.
Challenges #
Inability to infer causality, susceptibility to confounding events, and data access limits.
Participatory Research – collaborative approach where community members c… #
Participatory Research – collaborative approach where community members co‑design and co‑interpret studies.
Explanation #
Engages social media users as active contributors, enhancing relevance and trust.
Example #
Hosting a Twitter chat where participants help craft survey questions about digital well‑being.
Practical application #
Generating policy recommendations that reflect lived experiences.
Challenges #
Managing divergent viewpoints, ensuring methodological rigour, and safeguarding participant anonymity.
Platform Governance – policies and mechanisms that platforms employ to re… #
Platform Governance – policies and mechanisms that platforms employ to regulate content, user conduct, and data access.
Explanation #
Governance shapes the data landscape researchers can access and the behaviour they observe.
Example #
Understanding Instagram’s algorithmic feed changes to interpret fluctuations in post reach.
Practical application #
Adjusting data collection strategies to comply with new API restrictions.
Challenges #
Rapid policy shifts, opaque algorithmic decisions, and cross‑platform inconsistencies.
Privacy #
preserving Analytics – techniques that enable insight extraction while protecting individual identities.
Explanation #
Researchers add statistical noise or aggregate data to meet legal and ethical standards.
Example #
Reporting only median engagement rates for demographic groups to avoid re‑identification.
Practical application #
Publishing findings that satisfy institutional review boards and GDPR.
Challenges #
Balancing data utility with privacy guarantees, and limited tool support for complex social media datasets.
Qualitative Coding – systematic assignment of textual segments to predefi… #
Qualitative Coding – systematic assignment of textual segments to predefined categories, often performed by human analysts.
Explanation #
Captures nuanced meanings that automated methods may miss, such as sarcasm or cultural references.
Example #
Coding Facebook comments for expressions of empowerment in a gender‑equality campaign.
Practical application #
Producing rich case studies that complement quantitative metrics.
Challenges #
Time‑intensive, scalability limits, and potential coder bias.
Real‑time Monitoring – continuous collection and analysis of social media… #
Real‑time Monitoring – continuous collection and analysis of social media data as events unfold.
Explanation #
Enables rapid detection of crises, viral trends, or sentiment shifts.
Example #
Using a streaming API to flag spikes in negative sentiment surrounding a product recall.
Practical application #
Activating crisis‑communication protocols within minutes of issue emergence.
Challenges #
High data velocity, need for automated anomaly detection, and risk of false alarms.
Sentiment Analysis – computational technique that determines the emotiona… #
Sentiment Analysis – computational technique that determines the emotional valence (positive, negative, neutral) of text.
Explanation #
Applied to posts, comments, or reviews to gauge public mood toward brands, policies, or events.
Example #
Scoring tweets about a new health policy to assess public acceptance.
Practical application #
Guiding messaging adjustments in real time.
Challenges #
Sarcasm detection, domain‑specific vocabularies, and language diversity.
Social Listening – systematic tracking of online conversations to extract… #
Social Listening – systematic tracking of online conversations to extract insights about brand perception, competitor activity, or emerging topics.
Explanation #
Combines keyword tracking, sentiment analysis, and volume metrics to create a holistic view of the digital landscape.
Example #
Monitoring mentions of a product across Twitter, Reddit, and forums during a launch week.
Practical application #
Informing product development cycles with user‑generated feedback.
Challenges #
Data overload, distinguishing signal from background chatter, and integrating cross‑platform data.
Social Network Analysis (SNA) – methodological framework for studying the… #
Social Network Analysis (SNA) – methodological framework for studying the structure and dynamics of social relations represented as graphs.
Explanation #
SNA uncovers how information, influence, and behaviours propagate through online communities.
Example #
Mapping retweet networks to identify echo chambers during an election.
Practical application #
Designing interventions that target bridge actors to reduce polarization.
Challenges #
Large‑scale graph computation, dynamic edge formation, and privacy‑preserving visualization.
Text Mining – extraction of useful information from unstructured textual… #
Text Mining – extraction of useful information from unstructured textual data using statistical and linguistic methods.
Explanation #
Enables large‑scale analysis of posts, comments, and messages without manual reading.
Example #
Mining YouTube video titles to detect emerging slang terms.
Practical application #
Updating brand lexicons for automated moderation tools.
Challenges #
Noisy data, multilingual content, and evolving language norms.
Topic Modeling – unsupervised learning technique that discovers latent th… #
Topic Modeling – unsupervised learning technique that discovers latent themes within a corpus of documents.
Explanation #
Groups words that frequently appear together, providing a high‑level overview of discourse.
Example #
Applying NMF to a set of Instagram captions to reveal themes such as “travel,” “food,” and “fitness.”
Practical application #
Tracking the rise or decline of specific topics over campaign phases.
Challenges #
Interpreting abstract topics, choosing the number of topics, and handling short‑text sparsity.
Trend Detection – identification of sudden increases or decreases in the… #
Trend Detection – identification of sudden increases or decreases in the frequency of specific terms, hashtags, or content types.
Explanation #
Helps researchers spot viral phenomena, emerging crises, or shifts in public interest.
Example #
Using Kleinberg’s burst algorithm to detect a spike in #BlackFriday mentions.
Practical application #
Allocating marketing resources to capitalize on emerging trends.
Challenges #
Distinguishing organic spikes from coordinated manipulation, and handling seasonal baseline fluctuations.
Twitter Spaces Analysis – study of live audio conversations hosted on the… #
Twitter Spaces Analysis – study of live audio conversations hosted on the Twitter platform.
Explanation #
Captures spontaneous discussions, speaker dynamics, and audience engagement in an emerging format.
Example #
Transcribing and coding a Space on climate policy to extract key arguments and sentiment.
Practical application #
Informing policy briefings with real‑world stakeholder positions.
Challenges #
Limited transcript availability, variable audio quality, and rapidly changing participant rosters.
User‑Generated Content (UGC) Analytics – systematic examination of conten… #
User‑Generated Content (UGC) Analytics – systematic examination of content created by platform users rather than brands or organisations.
Explanation #
UGC provides authentic insights into consumer preferences, experiences, and brand perception.
Example #
Analyzing TikTok videos featuring a product to assess feature usage patterns.
Practical application #
Guiding product design based on real‑world usage demonstrated by users.
Challenges #
Heterogeneous formats (video, text, images), copyright considerations, and volume management.
Video Analytics – extraction of metadata, visual features, and narrative… #
Video Analytics – extraction of metadata, visual features, and narrative elements from video content.
Explanation #
Enables researchers to study visual storytelling, brand placement, and audience reactions in moving images.
Example #
Detecting logo appearance frequency in livestreams using object detection models.
Practical application #
Measuring advertising exposure in user‑generated videos.
Challenges #
High computational load, varied video quality, and need for multimodal fusion (audio + visual).
Voice of the Customer (VoC) – systematic collection and analysis of custo… #
Voice of the Customer (VoC) – systematic collection and analysis of customer feedback expressed across social channels.
Explanation #
VoC aggregates complaints, praises, and suggestions to inform service improvement.
Example #
Aggregating Facebook comments about a new app feature to identify pain points.
Practical application #
Prioritising product backlog items based on frequency and sentiment weight.
Challenges #
Filtering out noise, aligning social feedback with internal metrics, and ensuring representative sampling.
Web Scraping – automated extraction of data from web pages when APIs are… #
Web Scraping – automated extraction of data from web pages when APIs are unavailable or insufficient.
Explanation #
Allows researchers to harvest publicly displayed content such as comments, profile bios, or embedded media.
Example #
Scraping public Instagram post captions to build a corpus for language trend analysis.
Practical application #
Compiling a dataset of user‑generated reviews where API access is restricted.
Challenges #
Legal compliance with terms of service, anti‑scraping defenses (CAPTCHAs), and data quality consistency.
Word Embeddings – vector representations of words that capture semantic r… #
Word Embeddings – vector representations of words that capture semantic relationships based on co‑occurrence patterns.
Explanation #
Embeddings enable similarity calculations, clustering, and downstream NLP tasks.
Example #
Using pre‑trained Word2Vec vectors to find synonyms for “sustainable” in a corpus of eco‑focused tweets.
Practical application #
Enhancing keyword expansion for brand monitoring.
Challenges #
Domain mismatch (generic embeddings vs. platform‑specific slang), and handling out‑of‑vocabulary tokens.
Zero‑Shot Classification – technique where a model assigns labels to data… #
Zero‑Shot Classification – technique where a model assigns labels to data it has never seen during training, using natural language descriptions of classes.
Explanation #
Useful for rapidly categorising emerging topics without the need for labelled training data.
Example #
Prompting a transformer model to label Instagram comments as “complaint,” “praise,” or “question” based solely on textual definitions.
Practical application #
Deploying flexible classifiers during fast‑moving crisis events.
Challenges #
Model confidence calibration, prompt design nuances, and potential bias from pre‑training corpora.