HCH 19

CLiC 2 – Digital humanities and the corpus linguistic study of fiction

Michaela Mahlberg

Along with developments in the digital humanities more widely, there is an increasing interest into the corpus linguistic study of fictional texts - sometimes referred to under the umbrella term ‘corpus stylistics’ (Semino and Short 2004). In order to be able to account as fully as possible for properties of literary texts, we need to create tools and develop methodologies that are tailored to the task at hand. Such tools also illustrate overlapping concerns in the digital humanities and in corpus linguistics. In this paper, I will illustrate key functionalities of the web application CLiC and its latest release CLiC 2. CLiC has been specifically designed for the corpus linguistic study of narrative fiction. The case studies that I will present look at textual patterns that contribute to the creation of fictional characters. The examples will be drawn from the CLiC corpora. The CLiC corpora comprise over 140 books and 16 million words across four subcorpora: the corpus of Dickens’s Novels, the 19th Century Reference Corpus (19C), the Corpus of 19th Century Children’s Literature (ChiLit) and the Corpus of Additional Requested Texts (ArTs). For all CLiC texts, direct speech and specific places around speech have been marked up (Mahlberg et al. 2016). Hence, CLiC can run searches across defined textual subsets and support the analysis of features of narrative fiction. An important question is how a range of features and patterns in fiction can be brought together in a coherent theoretical framework. My suggestions towards such a framework focus on a lexically-driven approach to fictional speech and body language and raise more fundamental questions about how far corpus linguistics can change our theoretical perspective on fiction and connects with broader concerns in the digital humanities.

Lecture Slides will appear here soon.