HCH 19

Quantitative approaches to discourse on social media

Tatjana Scheffler

Social media such as Twitter, Facebook, blogs, forums, etc. are an abundant data source for texts generated by a diversity of users online. This provides unique opportunities and challenges for researchers in the humanities and social sciences working with textual data. In this workshop, we address some of the specific challenges posed by social media data: For one, the large amount of data necessitates automatic methods for collecting and storing texts, as well as quantitative approaches to analyzing the resulting corpora. In addition, the language in social media contains many non-standard features which on the one hand, may prevent the use of established tools for natural language processing, and on the other hand, may themselves constitute exciting opportunities for research. In particular, the conversational nature of many kinds of social media draws attention to our lack of theoretical and practical knowledge about how to model dialog and discourse (as opposed to monogical texts).

In this workshop, we will present methods from computational linguistics that enable the collection and analysis of large corpora of social media data, with a particular focus on interactive language. The workshop is aimed at young researchers who want to start working quantitatively with social media data. Since we do not assume programming abilities, the focus will be on available tools and methods for computational linguistic analysis that are approachable for researchers in the humanities and social sciences and can be immediately applied to your next research project. In addition, we will discuss state-of-the-art analyses of the nature and variability of language on social media and approaches to using social media data as a sensor for non-linguistic social data (e.g., health, human well-being, or politics).

Topics covered will include:

Collecting social media corpora
Working with non-standard language
Computational social science: detecting user properties
Discourse structure of social media multilogs
Available tools and methods

Hands-on analyses will be carried out using and adapting existing scripts in Python. (In preparation, you may want to install the Python3 distribution through Anaconda.) Finally, we will give pointers to tutorials that allow you to implement even more powerful analyses.

Lecture slides

Workshop resources will be linked here in the future.