References

Acemoglu, Daron, Simon Johnson, and James A Robinson. 2001. “The Colonial Origins of Comparative Development: An Empirical Investigation.” American Economic Review 91 (5): 1369–1401.
Akerlof, George A. 1978. “The Market for ’Lemons’: Quality Uncertainty and the Market Mechanism.” In Uncertainty in Economics, 235–51. Elsevier.
Alexander, Monica. 2019. “Reproducibility in Demographic Research.” https://www.monicaalexander.com/posts/2019-10-20-reproducibility/.
Alexander, Rohan, and Monica Alexander. 2020. “The Increased Effect of Elections and Changing Prime Ministers on Topics Discussed in the Australian Federal Parliament Between 1901 and 2018.” https://rohanalexander.com/pdfs/AlexanderAlexander-EffectofElectionsandPrimeMinisters.pdf.
Allen, Eric J, Patricia M Dechow, Devin G Pope, and George Wu. 2017. “Reference-Dependent Preferences: Evidence from Marathon Runners.” Management Science 63 (6): 1657–72.
Alsan, Marcella, and Marianne Wanamaker. 2018. “Tuskegee and the Health of Black Men.” The Quarterly Journal of Economics 133 (1): 407–55.
Arel-Bundock, Vincent. 2020. Modelsummary: Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready. https://CRAN.R-project.org/package=modelsummary.
Aschwanden, Christie. 2020. Artificial Intelligence Makes Bad Medicine Even Worse. https://www.wired.com/story/artificial-intelligence-makes-bad-medicine-even-worse/.
Athey, Susan, and Guido W Imbens. 2017. “The State of Applied Econometrics: Causality and Policy Evaluation.” Journal of Economic Perspectives 31 (2): 3–32.
Barrett, Malcolm. 2021. Data Science as an Atomic Habit. https://malco.io/2021/01/04/data-science-as-an-atomic-habit/.
Bastian, Hilda. 2020. “A Timeline of the Oxford-AstraZeneca Covid-19 Vaccine Trials.” http://hildabastian.net/index.php/100.
Blair, Graeme, Jasper Cooper, Alexander Coppock, and Macartan Humphreys. 2019. “Declaring and Diagnosing Research Designs.” American Political Science Review 113: 838–59. https://declaredesign.org/paper.pdf.
Blei, David M. 2012. “Probabilistic Topic Models.” Communications of the ACM 55 (4): 77–84.
Blei, David M, and John D Lafferty. 2009. “Topic Models.” In Text Mining, 101–24. Chapman; Hall/CRC.
Blei, David M, Andrew Y Ng, and Michael I Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3 (Jan): 993–1022.
Bloom, Howard, Andrew Bell, and Kayla Reiman. 2020. “Using Data from Randomized Trials to Assess the Likely Generalizability of Educational Treatment-Effect Estimates from Regression Discontinuity Designs.” Journal of Research on Educational Effectiveness, 1–30. https://doi.org/10.1080/19345747.2019.1634169.
Brandt, Allan M. 1978. “Racism and Research: The Case of the Tuskegee Syphilis Study.” Hastings Center Report, 21–29.
Bronte, Charlotte. 1847. Jane Eyre. https://www.gutenberg.org/files/1260/1260-h/1260-h.htm.
Brook, Robert H, John E Ware, William H Rogers, Emmett B Keeler, Allyson Ross Davies, Cathy D Sherbourne, George A Goldberg, Kathleen N Lohr, Patricia Camp, and Joseph P Newhouse. 1984. “The Effect of Coinsurance on the Health of Adults: Results from the RAND Health Insurance Experiment.”
Bryan, Jennifer, and Jim Hester. 2020. What They Forgot to Teach You about r. https://rstats.wtf/index.html.
Bryan, Jennifer, Jim Hester, David Robinson, and Hadley Wickham. 2019. Reprex: Prepare Reproducible Example Code via the Clipboard. https://CRAN.R-project.org/package=reprex.
Bryan, Jenny. 2020. Happy Git and GitHub for the useR. https://happygitwithr.com.
Bürkner, Paul-Christian. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.
Cohn, Alain. 2019. Data and code for: Civic Honesty Around the Globe.” Harvard Dataverse. https://doi.org/10.7910/DVN/YKBODN.
Cohn, Alain, Michel André Maréchal, David Tannenbaum, and Christian Lukas Zünd. 2019a. “Civic Honesty Around the Globe.” Science 365 (6448): 70–73.
———. 2019b. “Supplementary Materials for: Civic Honesty Around the Globe.” Science 365 (6448): 70–73.
Cooksey, Brian. 2014. “An Introduction to APIs.” Zapier. https://zapier.com/learn/apis/.
Cox, Murray. 2021. Inside Airbnb - Toronto Data.” http://insideairbnb.com/get-the-data.html.
Cunningham, Scott. 2020. Causal Inference: The Mixtape. https://www.scunning.com/mixtape.html.
———. 2021. Causal Inference: The Mixtape. Yale Press.
Dagan, Noa, Noam Barda, Eldad Kepten, Oren Miron, Shay Perchik, Mark A Katz, Miguel A Hernán, Marc Lipsitch, Ben Reis, and Ran D Balicer. 2021. “BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting.” New England Journal of Medicine.
Dahly, Darren. 2020. A Brief History of Medical Statistics and Its Impact on Reproducibility. https://statsepi.substack.com/p/a-brief-history-of-medical-statistics.
Darling, William M. 2011. “A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 642–47.
“Data Science Radar: How to Identify World-Class Data Science Capabilities.” 2020. Mango Solutions. https://www.mango-solutions.com/data-science-radar-how-to-identify-world-class-data-science-capabilities/.
Farrugia, Patricia, Bradley A Petrisor, Forough Farrokhyar, and Mohit Bhandari. 2010. “Research Questions, Hypotheses and Objectives.” Canadian Journal of Surgery 53 (4): 278.
Finkelstein, Amy, Sarah Taubman, Bill Wright, Mira Bernstein, Jonathan Gruber, Joseph P Newhouse, Heidi Allen, Katherine Baicker, and Oregon Health Study Group. 2012. “The Oregon Health Insurance Experiment: Evidence from the First Year.” The Quarterly Journal of Economics 127 (3): 1057–1106.
Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Fisher, Ronald. 1935. The Design of Experiments. Oliver; Boyd.
Fitts, Alexis Sobel. 2014. “The King of Content: How Upworthy Aims to Alter the Web, and Could End up Altering the World.” Columbia Journalism Review. https://archives.cjr.org/feature/the_king_of_content.php.
FT Visual & Data Journalism team. 2020. “Coronavirus Tracked: See How Your Country Compares.” https://ig.ft.com/coronavirus-chart/.
Gagolewski, Marek. 2020. R Package Stringi: Character String Processing Facilities. http://www.gagolewski.com/software/stringi/.
Gelfand, Sharla. 2020. Opendatatoronto: Access the City of Toronto Open Data Portal. https://CRAN.R-project.org/package=opendatatoronto.
Gelman, Andrew. 2016. “What Has Happened down Here Is the Winds Have Changed.” https://statmodeling.stat.columbia.edu/2016/09/21/what-has-happened-down-here-is-the-winds-have-changed/.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Cambridge University Press.
Gertler, Paul J, Sebastian Martinez, Patrick Premand, Laura B Rawlings, and Christel MJ Vermeersch. 2016. Impact Evaluation in Practice. The World Bank.
Goodrich, Ben, Jonah Gabry, Imad Ali, and Sam Brilleman. 2020. “Rstanarm: Bayesian Applied Regression Modeling via Stan.” https://mc-stan.org/rstanarm.
Greenland, Sander, Stephen J Senn, Kenneth J Rothman, John B Carlin, Charles Poole, Steven N Goodman, and Douglas G Altman. 2016. “Statistical Tests, p Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337–50.
Griffiths, Thomas, and Mark Steyvers. 2004. “Finding Scientific Topics.” PNAS 101: 5228–35.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Grün, Bettina, and Kurt Hornik. 2011. topicmodels: An R Package for Fitting Topic Models.” Journal of Statistical Software 40 (13): 1–30. https://doi.org/10.18637/jss.v040.i13.
Henry, Lionel, and Hadley Wickham. 2020. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Hernan, Miguel A, and James M Robins. 2020. What If. CRC Press.
Hugh-Jones, David. 2020. Huxtable: Easily Create and Style Tables for LaTeX, HTML and Other Formats. https://CRAN.R-project.org/package=huxtable.
Hulley, Stephen B. 2007. Designing Clinical Research. Lippincott Williams & Wilkins.
Iannone, Richard, Joe Cheng, and Barret Schloerke. 2020a. Gt: Easily Create Presentation-Ready Display Tables. https://CRAN.R-project.org/package=gt.
———. 2020b. Gt: Easily Create Presentation-Ready Display Tables. https://CRAN.R-project.org/package=gt.
Imai, Kosuke. 2017. Quantitative Social Science. Princeton University Press.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2017. An Introduction to Statistical Learning with Applications in r.
Kay, Matthew. 2020. tidybayes: Tidy Data and Geoms for Bayesian Models. https://doi.org/10.5281/zenodo.1308151.
Kearney, Michael W. 2019. “Rtweet: Collecting and Analyzing Twitter Data.” Journal of Open Source Software 4 (42): 1829. https://doi.org/10.21105/joss.01829.
Keyes, Os. 2019. “Counting the Countless.” Real Life. https://reallifemag.com/counting-the-countless/.
Kohavi, Ron, Diane Tang, and Ya Xu. 2020. Trustworthy Online Controlled Experiments: A Practical Guide to a/b Testing. Cambridge University Press.
Kuhn, Max, and Hadley Wickham. 2020. Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. https://www.tidymodels.org.
Levitt, Steven D. 1997. “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime.” The American Economic Review 87 (3).
———. 2002. “Using Electoral Cycles in Police Hiring to Estimate the Effects of Police on Crime: Reply.” American Economic Review 92 (4): 1244–50.
Locke, Steph, and Lucy D’Agostino McGowan. 2018. datasauRus: Datasets from the Datasaurus Dozen. https://CRAN.R-project.org/package=datasauRus.
Lopp, Sean. 2017. “R for Enterprise: Understanding r’s Startup.” R Views. https://rviews.rstudio.com/2017/04/19/r-for-enterprise-understanding-r-s-startup/.
Lumley, Thomas. 2020. “Survey: Analysis of Complex Survey Samples.”
Matias, J. Nathan, Kevin Munger, Marianne Aubin Le Quere, and Charles Ebersole. 2019. “The Upworthy Research Archive.” https://upworthy.natematias.com.
Mattson, Greggor. 2017. “Artificial Intelligence Discovers Gayface. Sigh.” https://greggormattson.com/2017/09/09/artificial-intelligence-discovers-gayface/amp/.
McCrary, Justin. 2002. “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime: Comment.” American Economic Review 92 (4): 1236–43.
Meng, Xiao-Li, and others. 2018. “Statistical Paradises and Paradoxes in Big Data (i): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election.” The Annals of Applied Statistics 12 (2): 685–726.
Müller, Kirill. 2017b. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
———. 2017a. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Obermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.” Science 366 (6464): 447–53.
———. 2019. “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.” Science 366 (6464): 447–53.
Ooms, Jeroen. 2019a. Pdftools: Text Extraction, Rendering and Converting of PDF Documents. https://CRAN.R-project.org/package=pdftools.
———. 2019b. Pdftools: Text Extraction, Rendering and Converting of PDF Documents. https://CRAN.R-project.org/package=pdftools.
———. 2019c. Tesseract: Open Source OCR Engine. https://CRAN.R-project.org/package=tesseract.
Oostrom, Tamar. 2021. “Funding of Clinical Trials and Reported Drug Efficacy.” https://drive.google.com/file/d/1EQLCH0ns99IxYBkxPNbagcZtGgE9a8MQ/view.
Oreopoulos, Philip, and Uros Petronijevic. 2018. “Student Coaching: How Far Can Technology Go?” Journal of Human Resources 53 (2): 299–329. https://doi.org/10.3368/jhr.53.2.1216-8439R.
Oxford-AstraZeneca. 2020. “Azd1222 Vaccine Met Primary Efficacy Endpoint in Preventing COVID-19.” https://www.astrazeneca.com/media-centre/press-releases/2020/azd1222hlr.html.
Pavlik, Kaylin. 2019. “Understanding + Classifying Genres Using Spotify Audio Features.” https://www.kaylinpavlik.com/classifying-songs-genres/.
Pedersen, Thomas Lin. 2020. Patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.
Pitman, Jim. 1993. Probability.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Raju, Tonse. 2005. William Sealy Gosset and William a. Silverman: Two "Students" of Science. Pediatrics. Vol. 116. https://doi.org/10.1542/peds.2005-1134.
Robinson, David, Alex Hayes, and Simon Couch. 2020. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Robinson, Emily, and Jacqueline Nolis. 2020. Build a Career in Data Science. https://livebook.manning.com/book/build-a-career-in-data-science?origin=product-look-inside.
Salganik, Matthew. 2018. Bit by Bit: Social Research in the Digital Age. Princeton University Press.
Shetty, Shravya, and Daniel Tse. 2020. Using AI to Improve Breast Cancer Screening. https://blog.google/technology/health/improving-breast-cancer-screening/.
Silge, Julia. 2018. Text Classification with Tidy Data Principles. https://juliasilge.com/blog/tidy-text-classification/.
Simpson, Dan. 2017. “It Seemed to Me That Most Destruction Was Being Done by Those Who Could Not Choose Between the Two.” https://statmodeling.stat.columbia.edu/2017/09/12/seemed-destruction-done-not-choose-two/.
Slowikowski, Kamil. 2021. Ggrepel: Automatically Position Non-Overlapping Text Labels with ’Ggplot2’. https://CRAN.R-project.org/package=ggrepel.
Steyvers, Mark, and Tom Griffiths. 2006. “Probabilistic Topic Models.” In Latent Semantic Analysis: A Road to Meaning, edited by T. Landauer, D McNamara, S. Dennis, and W. Kintsch.
Stock, James H, and Francesco Trebbi. 2003. “Retrospectives: Who Invented Instrumental Variable Regression?” Journal of Economic Perspectives 17 (3): 177–94.
Taback, Nathan. 2020. Design of Experiments and Observational Studies. https://scidesign.github.io/designbook/.
Taddy, Matt. 2019. Business Data Science. McGraw Hill.
Thompson, Charlie, Josiah Parry, Donal Phipps, and Tom Wolff. 2020. Spotifyr: R Wrapper for the ’Spotify’ Web API. http://github.com/charlie86/spotifyr.
Tierney, Nicholas. 2017. “Visdat: Visualising Whole Data Frames.” JOSS 2 (16): 355. https://doi.org/10.21105/joss.00355.
Tukey, John W. 1962. “The Future of Data Analysis.” The Annals of Mathematical Statistics 33 (1): 1–67.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.
———. 2019a. Httr: Tools for Working with URLs and HTTP. https://CRAN.R-project.org/package=httr.
———. 2019b. Rvest: Easily Harvest (Scrape) Web Pages. https://CRAN.R-project.org/package=rvest.
———. 2019c. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2020a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2020b. Tidyverse. https://www.tidyverse.org/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019b. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
———, et al. 2019a. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2020. Usethis: Automate Package and Project Setup. https://CRAN.R-project.org/package=usethis.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science. https://r4ds.had.co.nz/.
Wickham, Hadley, Jim Hester, and Winston Chang. 2020. Devtools: Tools to Make Developing r Packages Easier. https://CRAN.R-project.org/package=devtools.
Wright, Philip G. 1928. The Tariff on Animal and Vegetable Oils. Macmillan Company.
Wu, Changbao, and Mary E Thompson. 2020. Sampling Theory and Practice. Springer.
Zhu, Hao. 2020. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.