Introduction

Module 01 GESIS Fall Seminar “Introduction to Computational Social Science”

Johannes B. Gruber

GESIS

John McLevey

University of Waterloo

Schedule: GESIS Fall Seminar in Computational Social Science

Course Schedule
time Session
Day 1 Introduction to Computational Social Science
Day 2 Obtaining Data
Day 3 Computational Network Analysis
Day 4 Computational Text Analysis
Day 5 Large Language Models in the Social Sciences

Who is Johannes?

  • Senior Researcher/Team Lead in Data Services for the Social Sciences @ GESIS
  • Previously PostDoc in Communication at VU Amsterdam, University of Amsterdam, and European New School of Digital Studies
  • Interested in:
    • Computational Social Science
    • Data Quality
    • Open and reproducible science
    • Automated Content Analysis
    • Hybrid Media Systems and Information Flows
    • Protest and Democracy
  • Experience:
    • R user since 2015
    • R package developer since 2017
    • Worked on several packages for text analysis, API access and web scraping (spacyr, quanteda.textmodels, LexisNexisTools, paperboy, traktok, rollama, amcat4-r, atrrr, rwhatsapp and more)

Contact:

  • @jbgruber.bsky.social
  • @JohannesBGruber

Who is John?

John McLevey (he / him)
Associate Professor, University of Waterloo
PI, Computational Social Science Lab


I first got into computational social science via natural language processing and probabilistic topic modelling as a PhD student ~ 2009.


Within computational social science, my interests and expertise are primarily in network science, computational text analysis, generative modelling, bayesian data analysis, causal inference, and scientific computing. I also work in cognitive social science, political sociology, sociology of science, and environmental sociology.

What is computational social science?

“Anything that’s cool.”
- Matt Salganik (Author of Bit by Bit)


Computational social science is diverse and multi-paradigmatic

Photo by Super Straho on Unsplash

Exploratory Data Analysis

The process

Source: Wickham, Çetinkaya-Rundel, and Grolemund (2023)

  • “Data science is the process by which data becomes understanding, knowledge and insight”–Hadley Wickham
  • “You can’t do data science in a GUI”–Hadley Wickham

The steps

  1. Generate questions about your data.
  2. Search for answers by visualising, transforming, and modelling your data.
  3. Use what you learn to refine your questions and/or generate new questions.

CSS Examples

Sounding the alarm based COVID numbers

Trump’s tweets

Data from http://varianceexplained.org:

Rows: 1,512
Columns: 16
$ text          <chr> "My economic policy speech will be carried live at 12:15 P.M. Enjoy!", "Join me in Fayetteville, North Carolina tomorrow evening at 6pm. Tickets n…
$ favorited     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ favoriteCount <dbl> 9214, 6981, 15724, 19837, 34051, 29831, 19223, 19543, 75488, 23661, 28069, 35205, 36936, 32716, 34109, 19436, 19330, 30869, 19431, 27568, 25860, 2…
$ replyToSN     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ created       <dttm> 2016-08-08 15:20:44, 2016-08-08 13:28:20, 2016-08-08 00:05:54, 2016-08-07 23:09:08, 2016-08-07 21:31:46, 2016-08-07 13:49:29, 2016-08-07 02:19:37…
$ truncated     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ replyToSID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ id            <chr> "762669882571980801", "762641595439190016", "762439658911338496", "762425371874557952", "762400869858115588", "762284533341417472", "7621109187213…
$ replyToUID    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ statusSource  <chr> "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>", "<a href=\"http://twitter.com/download/iphone\" rel=\"…
$ screenName    <chr> "realDonaldTrump", "realDonaldTrump", "realDonaldTrump", "realDonaldTrump", "realDonaldTrump", "realDonaldTrump", "realDonaldTrump", "realDonaldTr…
$ retweetCount  <dbl> 3107, 2390, 6691, 6402, 11717, 9892, 5784, 7930, 24663, 7903, 8561, 13129, 13250, 9356, 10385, 8066, 5418, 16786, 5681, 13869, 8989, 8674, 12351, …
$ isRetweet     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ retweeted     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ longitude     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ latitude      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Time of Day

Part of tutorial: Robinson (2016)

Overrepresented words

Part of tutorial: Robinson (2016)

Sentiment analysis

Part of tutorial: Robinson (2016)

Sentiment analysis

Reverse Engineering Chinese Social Media Censorship

Background

  • collected massive dataset of Chinese social media posts and blogs
  • goal of the study: evaluate text-as-data methods in Chinese
  • to validate: go back to the posts to see context
  • But wait! they are gone!
  • Since the scraping of Chinese media was done faster than the deletions: opportunity to study traces of censorship!

Source: King, Pan, and Roberts (2013); King, Pan, and Roberts (2014)

Research Question: Which ones are gone?

  • Criticism of the government?
    • No!
  • Is collective action censored?
    • Yes!

Source: King, Pan, and Roberts (2013)

Manterrupting in the German Bundestag

RQ: are women more often interrupted than men in politics? Och (2020)

Manterrupting in the German Bundestag

RQ: are women more often interrupted than men in politics? Och (2020)

References

Atteveldt, Wouter van, Damian Trilling, and Carlos Arcíla. 2021. Computational Analysis of Communication: A Practical Introduction to the Analysis of Texts, Networks, and Images with Code Examples in Python and R. Hoboken, NJ: John Wiley & Sons. https://cssbook.net.
Bail, Chris. 2022. Breaking the Social Media Prism: How to Make Our Platforms Less Polarizing. Princeton University Press.
Centola, Damon. 2018. “How Behavior Spreads: The Science of Complex Contagions.” Princeton University Press, no. 588: 1–10.
Edelmann, Achim, Tom Wolff, Danielle Montagne, and Christopher A Bail. 2020. “Computational Social Science and Sociology.” Annual Review of Sociology 46 (1): 61–81.
Grigoropoulou, Nikolitsa, and Mario Small. 2022. “The Data Revolution in Social Science Needs Qualitative Research.” Nature Human Behaviour 6 (7): 904–6.
Grimmer, Justin, Margaret Roberts, and Brandon Stewart. 2022. Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press.
King, Gary, Jennifer Pan, and Margaret E. Roberts. 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review 107 (2 (May)): 1–18.
———. 2014. “Reverse-Engineering Censorship in China: Randomized Experimentation and Participant Observation.” Science 345 (6199): 1–10. http://www.sciencemag.org/content/345/6199/1251722.abstract.
Kitts, James, Helene Grogan, and Kevin Lewis. 2023. “Social Networks and Computational Social Science.” In The SAGE Handbook of Social Network Analysis, edited by John McLevey, John Scott, and Peter J. Carrington. Sage.
McElreath, Richard. 2018. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. Chapman; Hall/CRC.
McLevey, John. 2022. Doing Computational Social Science: A Practical Introduction. Sage.
McLevey, John, Pierson Browne, and Tyler Crick. 2021. “Reproducibility and Principled Data Processing.” In Handbook of Computational Social Science, Volume 2, 108–24. Routledge.
Nelson, Laura K. 2020. “Computational Grounded Theory: A Methodological Framework.” Sociological Methods & Research 49 (1): 3–42.
O’Neil, Cathy. 2017. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown.
Och, Malliga. 2020. “Manterrupting in the German Bundestag: Gendered Opposition to Female Members of Parliament?” Politics & Gender 16 (2): 388–408. https://doi.org/10.1017/S1743923X19000126.
Robinson, David. 2016. “Text Analysis of Trump’s Tweets Confirms He Writes Only the (Angrier) Android Half.” Variance Explained. http://varianceexplained.org/r/trump-tweets/.
Salganik, Matthew. 2019. Bit by Bit: Social Research in the Digital Age. Princeton University Press.
Smaldino, Paul. 2023. Modeling Social Behavior: Mathematical and Agent-Based Models of Social Dynamics and Cultural Evolution. Princeton University Press.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd edition. Beijing Boston Farnham Sebastopol Tokyo: O’Reilly.