Skip to main content ITU
IT Universitety of Copenhagen - Logo
  • Programmes
  • Professional Education
  • Research
  • Collaboration
  • About ITU
  • Organisation
    • Board of Directors
    • Advisory Panels
  • Values, strategy and principles
    • Diversity Equity and Inclusion
    • Pedagogical principles
  • Facts and Figures
    • Annual reports
    • Key figures
    • Development Contracts
    • Quality and Educational Environment
    • Transparency and Openness
    • Articles of association
    • Asset Management
    • The story of ITU
  • Press and news
    • News from ITU
    • Press contacts
    • Press photos
    • Find an expert
    • Logos
  • Vacancies
    • Job agent
    • Test policy
    • Competence profiles
Search
  • Dansk
  • English

ITU

Frontpage

ITU / Programmes

Programmes

ITU / Professional Education

Professional Education

ITU / Research

Research

ITU / Collaboration

Collaboration

ITU / About ITU

About ITU

ITU / Programmes / BSc Programmes New

BSc Programmes New

ITU / Programmes / MSc Programmes New

MSc Programmes New

ITU / Programmes / Student Life

Student Life

ITU / Programmes / International students

International students

ITU / Programmes / Open House new

Open House new

ITU / Professional Education / Master in IT Management

Master in IT Management

ITU / Professional Education / Single subjects

Single subjects

ITU / Professional Education / Short courses

Short courses

ITU / Professional Education / Contact

Contact

ITU / Research / Research centers

Research centers

ITU / Research / Sections and research groups

Sections and research groups

ITU / Research / Research resources

Research resources

ITU / Research / PhD Programme

PhD Programme

ITU / Collaboration / Collaboration with students

Collaboration with students

ITU / Collaboration / Employer Branding

Employer Branding

ITU / Collaboration / Research innovation

Research innovation

ITU / Collaboration / Student entrepreneurship

Student entrepreneurship

ITU / About ITU / Organisation

Organisation

ITU / About ITU / Values, strategy and principles

Values, strategy and principles

ITU / About ITU / Facts and Figures

Facts and Figures

ITU / About ITU / Press

Press

ITU / About ITU / Vacancies

Vacancies
  • Programmes
  • Professional Education
  • Research
  • Collaboration
  • About ITU
  • BSc Programmes
  • MSc Programmes
  • Student Life
  • International students
  • Open House
  • Master in IT Management
  • Single Subjects
  • Short courses
  • Contact
  • Centres, hubs & labs
  • Sections and research groups
  • Research resources
  • PhD Programme
  • Collaboration with students
  • Employer Branding
  • Research innovation
  • Student entrepreneurship
  • Organisation
  • Values, strategy and principles
  • Facts and Figures
  • Press and news
  • Vacancies
  • BSc in Global Business Informatics
  • BSc in Digital Design and Interactive Technologies
  • BSc in Software Development
  • BSc in Data Science
  • Applying for a BSc programme
  • MSc in Digital Innovation & Management
  • MSc in Digital Design and Interactive Technologies
  • MSc in Software Design
  • MSc in Data Science
  • MSc in Computer Science
  • MSc in Games
  • Master's reform
  • Applying for an MSc programme
  • Practical information for international students
  • Ask a student
  • Women in tech
  • Student organisations at ITU
  • Study start
  • Labs for students
  • Special Educational Support (SPS)
  • Study and Career Guidance
  • Exchange students
  • Guest students
  • ITU Summer University
  • Open House - BSc programmes
  • Open House - MSc programmes
  • Centre for Digital Play
  • Centre for Climate IT
  • Center for Computing Education Research
  • Centre for Digital Welfare
  • Centre for Information Security and Trust
  • Danish Institute for IT Program Management
  • Maritime Hub
  • Labs
  • Data Science
  • Data, Systems and Robotics
  • Digital Business Innovation
  • Digitalization Democracy and Governance
  • Human-Computer Interaction and Design
  • Play Culture and AI
  • Software Engineering
  • Technologies in Practice
  • Theoretical Computer Science
  • Research groups
  • ITU Research Portal
  • Find researcher
  • Research ethics and integrity
  • Good Scientific Practice
  • Technical Reports
  • About the PhD Programme
  • PhD Courses
  • PhD Defences
  • PhD Positions
  • Types of Enrolment
  • PhD Admission Requirements
  • PhD Handbook
  • PhD Support
  • Project collaboration
  • Project Market
  • Project postings
  • Post a project posting in the job bank
  • IT Match Making
  • Post a job in the job bank
  • Hire an Industrial PhD
  • ITU NextGen
  • ITU Business Development
  • Board of Directors
  • Advisory Panels
  • Diversity Equity and Inclusion
  • Pedagogical principles
  • Annual reports
  • Key figures
  • Development Contracts
  • Quality and Educational Environment
  • Transparency and Openness
  • Articles of association
  • Asset Management
  • The story of ITU
  • News from ITU
  • Press contacts
  • Press photos
  • Find an expert
  • Logos
  • Job agent
  • Test policy
  • Competence profiles
ITU  /  About ITU  /  Press  /  News from ITU  /  ITU led project will make automated translation more reliable

ITU led project will make automated translation more reliable

According to associate professor at the IT University, Leon Derczynski, the Danish Gigaword Project has the potential to improve everything from automated translation to misinformation detection.

Leon DerczynskiComputer Science DepartmentResearchalgorithmsdata science

Written 2 June, 2021 10:32 by Theis Duelund Jensen

In today’s world we conduct a lot of text and language processing with computers but compared to humans, computers need much more data to understand language. Here is a good example of the problem and the reason why you should never rely solely on Google Translate:

 Google Translate forsøger at oversætte idiom

Instead of sounding like a nutcase, you will sound like a “nøddetaske” (literally, “nut purse,” which is not a word in Danish) instead of one of many appropriate Danish translations of the term, such as “galning” or “tosse.” There is a good explanation, though. Google Translate is working with a model – an algorithm trained on data to replicate a specific decision process, for instance, deciding on the proper translation of a sentence – whose Danish language data set is very limited. This is where the IT University-led Danish Gigaword Project comes into play.

The research project, led by associate professor at ITU, Leon Derczynski, and Manuel R. Ciosici, a research scientist at the University of Southern California Information Sciences Institute and visiting scholar at ITU, has compiled the first gigaword dataset with over a billion Danish words, which has the potential to make a service like Google’s much more accurate when translating Danish.

- For a language like English, there was a billion-word data set some thirty years ago. Even the 360,000 speakers of Icelandic have a gigaword project. Danish lacks behind and the gap is widening. It is important, because if you want to do any kind of language understanding for Danish, you need a large dataset to make proper tools, says Leon Derczynski.

And that is exactly the goal of the Danish Gigaword Project. In terms of Natural Language Processing, it takes Danish from a so-called low-resource language to a high-resource language, which ultimately means, we will see better machine translation quality, better speech recognition, and better search results with search engines once the dataset is in use.

Diverse input

But what exactly is a gigaword corpus? In short, it is a vast set of data on the Danish language as it appears in written sources. But to build a dataset that reflects all the nuances and complexities of written communication in a particular language, you need more than just a lot of data; you need a lot of data from a lot of different sources.

Mapping abusive language

The Danish Gigaword Project also has the potential to improve abusive language detection on various digital platforms. Recently, the data analytics agency, Analyse & Tal, published a report on mapping abusive language discourses on Facebook which was enabled by the Danish Gigaword Project.

You can read the full report here (Danish).


- If you train computers only on news text than they can only understand news text, but in our day-to-day life we do not really communicate like, for instance, DR or Weekendavisen. We use text in much more varied ways. I wanted to get as many different types of digital Danish as I possibly could, says Leon Derczynski, who started the project in 2019 and has since led a group of volunteers from all corners of the Danish tech and research spheres.

The paper Leon Derczynski and his co-authors have written on the Danish Gigaword project (“DAGW”), which they are presenting today at the Nordic Conference on Computational Linguistics, details the many sources from which data has been gathered, among them everything from the Danish Parliament’s records of meetings and speeches and a research project on spontaneous speech to Danish Wikipedia pages and a digitized version of the Bible.

Copyright challenges

However, compiling a billion-word dataset comes with a set of unique challenges when you are working in a Danish context. For one thing, licensing is much more restrictive in Denmark compared to for instance in the USA.

- One of the big barriers to our work in Denmark is that people are very cautious about sharing data. In the USA, The New York Times, Associated Press, Xinhua News Agency, and Agence France-Presse donated a combined billion words’ worth of articles to become a public English corpus. Licensing is a significant issue in Denmark, so it has been harder to build this dataset and make it available – which is the core goal. It must be available to researchers as well as to companies, so they can develop new technologies, says Leon Derczynski.

Although copyright laws in Denmark are reasonable, they do present significant challenges to researchers, according to Leon Derczynski. However, he is in positive dialogue with major news outlets about data donations and TV2 Regionerne has already supplied DAGW with approximately 50,000 news articles published between 2010 and 2019.

Better data, better tools

Language technology is not always visible, but it is applied in almost every conceivable context, which is why a vast language corpus is necessary to develop good tools. The Danish billion-word corpus does not only have the potential to improve translation services like Google’s or pave the way for automated grammar correction services like the ones that exist for English; it can help us improve online discourse:

- A lot of my other research is about misinformation detection, online bullying, and harassment. This is hard to detect in a Danish context, because the models for Danish do not have adequate data. But now we have much more detailed information about Danish. This means we can have better misinformation detection and ultimately a better online discourse, says Leon Derczynski.

Ultimately, the DAGW is all about providing researchers and developers with more data to work from. As such, the billion-word corpus’ possibilities are endless and that is exactly what motivated Leon Derczynski to start the project in the first place:

- The big language models which occasionally make the news in relation to artificial intelligence only speak English; that sucks if the language you work in is Danish. Now that we have Danish Gigaword, we can train much more advanced models than before, and start to catch up.

More information:

Follow updates about the Danish Gigaword Project at gigaword.dk

Theis Duelund Jensen, Press Officer, Tel: +45 2555 0447, email: thej@itu.dk




News

Nutan Limaye granted 12.8 million DKK to explore the limits of computation

Nutan Limaye granted 12.8 million DKK to explore the limits of computation

30 January, 2026

The Carlsberg Foundation has granted Professor at the IT University, Nutan Limaye, 12.8 million DKK to develop a new theory on the limits of algorithms.

PhD student co-authors book on cyber dilemmas

PhD student co-authors book on cyber dilemmas

9 January, 2026

The Danish book, Cyberdilemmaer - om mødet med virkeligheden, which has just been published, focuses on everyday situations where we must make decisions about IT security. One of the book’s authors is Raha Asadi, a PhD student at the IT University of Copenhagen, who hopes the book will help create a basis for dialogue between technical and non-technical colleagues.

New ITU Podcast: The IT security expert on the political desire for mass surveillance?

New ITU Podcast: The IT security expert on the political desire for mass surveillance?

21 December, 2025

In a pilot episode of the IT University’s new podcast series, Tech-away, Carsten Schürmann, Head of the Centre for Information Security and Trust, discusses the CSA Regulation, under which the EU wants to require all messaging services to scan our digital communications. Although the proposal was blocked by German politicians, the political desire for mass surveillance is far from dead.

ITU researcher secures DKK 6.99 million for linguistically grounded language models

ITU researcher secures DKK 6.99 million for linguistically grounded language models

18 December, 2025

Carlsberg Foundation funds project to embed real-world language knowledge into AI – beyond scale and compute.

ITU researchers receive international award

ITU researchers receive international award

16 December, 2025

AIS Impact Award goes to Danish researchers for the first time, recognising ITU research that sets the standard for ethical use of blockchain technology.

Privacy is not dead yet

Privacy is not dead yet

14 December, 2025

While some politicians keep pushing for “lawful access” to our private messages, a new method may make it possible to keep our private communications private – even if end-to-end encryption in, for instance, Signal is “lawfully” decrypted. Associate professor at the IT University of Copenhagen, Rosario Giustolisi, explains how.

How to prepare for the threat of quantum computers

How to prepare for the threat of quantum computers

30 November, 2025

The dawn of quantum computers threatens to break the security we have relied on for decades. To counter this, Bernardo David, associate professor at the IT University of Copenhagen, is developing information-theoretic cryptography schemes.

New research project explores human-AI entanglement to promote responsible use

New research project explores human-AI entanglement to promote responsible use

24 November, 2025

Professor at ITU, Jichen Zhu, has secured 7.19 million kroner from the Independent Research Fund Denmark for a new project that investigates how people interact with artificial intelligence in highly subjective domains such as emotion recognition – and how to design tools that support more responsible use.

ITU researcher wants to make AI more trustworthy

ITU researcher wants to make AI more trustworthy

21 November, 2025

Associate Professor Christian Hardmeier has been granted DKK 7.18 million from the Independent Research Fund Denmark. The grant is given for a project that investigates how large language models can better communicate uncertainty to users.

Can nature’s own design process help the next big AI leap?

Can nature’s own design process help the next big AI leap?

19 November, 2025

ITU professor Sebastian Risi is the co-author of a new open-access resource on an emerging field that could shape the future of artificial intelligence.

Nordic project with ITU participation aims to strengthen future citizenship

Nordic project with ITU participation aims to strengthen future citizenship

6 November, 2025

In an era of algorithms, misinformation, and weakened trust in institutions, children and young people must be equipped to navigate democracy in the digital society. A new Nordic project focuses on solutions – and Associate Professor Gitte Stald from the IT University of Copenhagen plays a central role.

Denmark on the Digital Frontline: Lessons learned from Ukraine

Denmark on the Digital Frontline: Lessons learned from Ukraine

3 November, 2025

Drawing on data from Ukraine, researchers from the IT University of Copenhagen are investigating how to prepare Danish society for cyberattacks on the digital critical infrastructure.

Professor Portrait: Oliver Krancher explores knowledge and learning in the digital workplace

Professor Portrait: Oliver Krancher explores knowledge and learning in the digital workplace

3 November, 2025

With a background in business information systems and a passion for understanding how organisations use technology, Professor Oliver Krancher has spent his career investigating the role of knowledge in digital work. On 14 November, he will present his inaugural lecture at ITU.

The climate is changing – and so are we

The climate is changing – and so are we

29 October, 2025

With a grant of 3.1 million kroner from the Independent Research Fund Denmark, Associate Professor Vedran Sekara from the IT University of Copenhagen will map how human behaviour and mobility are changing in response to climate change.

Decoding the Brain: Can AI help predict human behaviour?

Decoding the Brain: Can AI help predict human behaviour?

27 October, 2025

What if it were possible to read the brain like a book? Paolo Burelli and his colleagues at the IT University’s brAIn Lab work at the cutting edge of digital technology and neuroscience. On 5 November, Paolo Burelli will present their research at Digital Tech Summit, in a talk titled “Decoding the Brain: How AI Unlocks Human Behavior.”

Professor Portrait: Rasmus Ejlers Møgelberg creates new mathematical worlds

Professor Portrait: Rasmus Ejlers Møgelberg creates new mathematical worlds

20 October, 2025

With a background in mathematics and a passion for the abstract layers of the discipline, Professor Rasmus Ejlers Møgelberg develops theories that enhance the understanding and robustness of modern software. Rasmus Ejlers Møgelberg will deliver his inaugural lecture at the IT University on 24 October.

ITU researchers secure prestigious Villum Experiment grants

ITU researchers secure prestigious Villum Experiment grants

2 October, 2025

Projects in infant cognition, robotics, and privacy-preserving AI receive funding for early-stage research.

Sami Brandt is the first winner of the ITU Research Award

Sami Brandt is the first winner of the ITU Research Award

29 September, 2025

Sami Brandt is the winner of the first-ever ITU Research Award. Please find out more about Sami Brandt and his research in this article.

Cancan Wang wins the 2025 ITU Teaching Award

Cancan Wang wins the 2025 ITU Teaching Award

29 September, 2025

Associate Professor Cancan Wang from the Digitalization, Democracy, and Governance (DDG) section at ITU is this year’s Teaching Award recipient. We spoke with Cancan about her teaching practices and what the award means to her.

Professor portrait: Eva Rotenberg wants to make algorithms simpler – for everyone’s benefit

Professor portrait: Eva Rotenberg wants to make algorithms simpler – for everyone’s benefit

22 September, 2025

On October 3, 2025, at 14:30, Professor Eva Rotenberg will deliver an inaugural lecture in Auditorium 02 at the IT University of Copenhagen. The lecture is titled: “A story of shortest paths.”

IT-Universitetet i København - Logo

Contact

IT University of Copenhagen
Rued Langgaards Vej 7
DK-2300 Copenhagen S
Denmark

Telephone: +45 7218 5000
E-mail: itu@itu.dk
All contact information
How to get here
Building accessibility

Explore

News
Vacancies
Events

Useful links

ITU Library Service
ITU Student
ITU Alumni
Body of External Examiners
Press

Invoicing

CVR-nr. 29 05 77 53
P-number: 1005162959
EAN-nr. 5798000417878
Send invoice

Web

Web Accessibility Statement
Privacy Statement

ITU at Instagram ITU at Facebook ITU at Linkedin ITU at Youtube ITU at Bluesky

This page is printed from https://en.itu.dk/Programmes/Open-House/BSc