In this talk, I will discuss UC Berkeley D-Lab's online hate speech as a prolific social disease. Hate speech directly harms vulnerable populations, including women, people of color, religious minorities, immigrants, and people with disabilities. Targets of hate speech can be driven away from public forums, sometimes being forced to delete social media accounts to avoid abuse. Hate speech can cause emotional trauma, including depression, fear, and isolation. It may culminate in offline violence through swatting and doxxing, the encouragement of suicide, or terror attacks. Niche communities form reservoirs of evolving hate speech. Self-reinforcing discussions lead to the normalization of abuse, celebrate dehumanization of minority groups, and can foment violent radicalization.
Our scientific understanding is comparatively meager: we have little empirical knowledge of the problem's true scale or the causal mechanisms involved. Research is hampered by the complexity of defining the term, as a result, many available datasets are unreliable. Data and AI models to detect hate speech are often proprietary. Keyword searches and dictionary methods are often imprecise and overly blunt tools for detecting the nuance and complexity of hate speech. Without the tools to identify, quantify, and classify hate speech, we cannot even begin to consider how to address the causes and consequences of it.
To overcome these challenges and others, this study attempts to develop a methodology for the identification and analysis of incidents of online hate speech. In partnership with Google Jigsaw, DLab's solution sets a new standard for the data science of hate speech: it 1) establishes a theoretically grounded definition of hate speech inclusive of research/policy/practice, 2) develops and applies a multicomponent labeling instrument, 3) creates a new crowdsourcing tool to scalably label comments, 4) curates an open, reliable multiplatform labeled hate speech corpus, 5) grows existing data and tool repositories within principles of replicable and reproducible research, enabling greater transparency and collaboration, 6) creates new knowledge through ethical online experimentation (and citizen science), and 7) refines AI models. Ultimately, we seek to understand the causal mechanisms for intervention and evaluation. All of these innovations are guided by an advisory group, our consortium and a new opensource platform with tools that will make these resources available. Policy recommendations and advocacy organizations will educate and grow the larger community.
Lecture Slides will appear here soon.