HCH 19

Providing new views on textual data with knowledge graphs

Leo Born, Juri Opitz

When (historical) text collections grow on scale, it can become increasingly more difficult to obtain a bigger picture without quantitative analyses. Computational methods and tools can aid us in achieving bigger pictures. To that end, we distinguish between (i) an explorative methodology which aims at helping in formulating research questions by offering new and structured views on text collections, and (ii) a research question-driven methodology which uses computational methods and tools in a task-specific way in order to address one or several research questions.

Practically, the workshop covers two aspects:

FROM TEXT TO TRIPLES: a hands-on introduction into the Python NLP package spacy by using it to construct a simple knowledge graph from a corpus of historical documents.
FROM TRIPLES TO THESES: analyses of the resulting knowledge graph using the package networkx and using visualizations of the knowledge graph in a dedicated web application.

In the course of the workshop, particular emphasis will be put on the constraints research questions pose on these methods as well as constraints that these methods pose on research questions. Furthermore, we will raise the following important questions: What do we want to and what can we model? How can we harness the strengths of computational statistical methods? How can we minimize or -- more fundamentally -- become aware of biases which may distort the perception of the extracted structures in unwanted ways?

Prerequisites: basic knowledge of the Python programming language

Lecture slides

Workshop resources will be linked here in the future.