New technology detects and measures online hate against women
ITU researchers have completed a large-scale collection, classification, and categorization of misogyny in Danish social media. It is a potential gamechanger in the fight against abusive online behaviour.
Leon DerczynskiComputer Science DepartmentResearchalgorithmssocial media
Written 27 July, 2021 07:15 by Theis Duelund Jensen
For the past ten months, Research Assistant in Computer Science Philine Zeinert, her colleagues, and their team of annotators have waded through a sea of abusive language in social media posts to carefully categorise the presence of misogyny online. The result is the first ever Danish dataset on online misogyny which can, among other things, help social media platforms detect and moderate hateful online behaviour.
Along with her advisor, associate professor at ITU, Leon Derczynski, and Villum Fellow Nanna Inie, Philine Zeinert presents her findings in a paper entitled “Annotating Online Misogyny” which will be published in August at the ACL-IJCNLP 2021-conference.
Neosexism denies the existence of sexism
For the project, Philine Zeinert and her colleagues collected tens of thousands of posts from three social media platforms – Facebook, Twitter, and Reddit – and with the aid of a team of external annotators labelled the data and compiled a codebook. It was an enormous undertaking with very few similar projects on which to model their approach.
“When we started there were only two non-English datasets, we were aware of – one in Spanish and one in Italian. We found just a handful of English-language taxonomies for labeling online misogyny that were demonstrated on subsamples of collected posts. The distribution of posts between the categories revealed great differences between the three language areas. That was the initial observation that made us consider a local context,” says Philine Zeinert.
What the researcher and her colleagues discovered in their data analysis was a category of misogyny which was not present in any similar projects covering other languages. Applying a term from social science, the project has found that Neosexism is the dominant form of misogyny in Danish online discourse.
“After studying misogyny in the Danish context, we combined misogyny categorisations and added the concept of Neosexism,” says Philine Zeinert. “Neosexism is seen in cases where people deny the existence of discrimination or use straw man arguments to avoid the topic. They attack the very premise that discrimination exists and resent complaints of misogyny.”
For instance, a sample statement from the codebook like "Please show me research that concludes women miss out on promotions because they take maternity leave?" is an example of denial of iniquity, the former, while “Classic. If a woman has a problem, society is to blame. If a man has a problem, it is his own fault. Sexism thrives among the feminists” is an example of a straw man, the latter.
“That is not to say that Neosexism is the most prevalent form of misogyny in a Danish context. It just means that this form was prevalent in the data we were able to gather,” says Philine Zeinert.
How can this dataset help fight online sexism?
According to a 2017 survey by Amnesty International, almost one in four women have experienced abuse and harassment online. A recent Megafon survey commissioned by TV2 shows that 68 percent of Danes avoid online discourse because of the aggressive tone that marks the debate. Last year, Facebook settled in a lawsuit initiated by moderators tasked with identifying and censoring abusive language on the platform, because the constant exposure to harsh language led to the development of PTSD and depression in the moderators. In other words, there is a great need for systems that can automate the detection of online misogyny and other forms of online abuse.
"To detect misogyny online, you need three things: you need to know what you are trying to detect, you need to have examples of those things, and you need a model, trained on those examples. The first, we achieve by creating an accurate labelling scheme, a taxonomy, which categorizes forms of misogyny. The second, we present as a dataset of 2.000 carefully labelled instances of misogyny from a total dataset of 28.000 posts. The third, our model, performs so well that it can already automatically detect 85 percent of not before seen misogynistic content" explains Leon Derczynski, Associate Professor of Computer Science at ITU and co-author of the work.
“There are a lot of efforts in the world to regulate abusive language online. It is just not possible for humans to process the amount of data. At the same time, new abusive language patterns and codes develop and spread at a rapid pace. To detect that type of content automatically, we need a thorough grasp of the task and the methods available,” says Philine Zeinert. “With this project, we are presenting universal, theoretically-backed ways to do so.”
Theis Duelund Jensen, Press Officer, Tel: +45 2555 0447, email: thej@itu.dk