Teaching computers to understand human language

Despite big advances in machine learning, making computers understand language is still a big challenge. Daniel Varab and his two classmates from Software Development trained a computer program to detect contradictions in texts – a technology that might eventually help us keep track of statements made by politicians and contradictions in the law.

Computer Science DepartmentEducationartificial intelligencealgorithmsITU thesis

What was your thesis about?

Inspired by the presidential election in the United States, we thought it would be fun if a computer program could automatically find contradictions in the things politicians say during the election race. For example, Donald Trump stated in 1999:

"Look, I’m very pro-choice. I hate the concept of abortion. I hate it. I hate everything it stands for. I cringe when I listen to people debating the subject, but you still — I just believe in choice."

In August 2015, however, he said: "I am very, very proud to say that I'm pro-life."

It would be great if a computer could help us find such contradictions.

So we immersed ourselves in Natural Language Processing (NLP), a field concerned with getting computers to understand human language. NLP is used for instance in the iPhone’s Siri, Google Translate and in Word’s spell check. It is also used to analyze whether texts are positively or negatively charged.

More specifically, we worked with contradiction detection – that is, a method of getting computers to assess whether two sentences contradict each other.

How do you teach a computer to find contradictions?

By feeding it a ton of examples of sentence pairs that contradict each other and sentence pairs that do not. We trained a machine learning algorithm with a data set from Stanford University with 500,000 sentence pairs and then tested it on sentences it had never seen before.

We found that the model worked best when we provided it with information about how linguists define a contradiction. For example, two sentences probably contradict each other if they contain antonyms. There is much hype about machine learning algorithms finding patterns in information all by themselves, but in practice, you get much further if you help them.

In the end, our model could detect with an accuracy of 86 percent whether two sentences contradicted each other. Funny enough, only 87 percent of a control group of humans could agree on the same sentences.

»

Human language is largely about interpretation, and this is one of the reasons why teaching it to computers is so difficult.

Daniel Varab, MSc in Software Development
«
Human language is largely about interpretation, and this is one of the reasons why teaching it to computers is so difficult.

What can we use it for?

There is a huge amount of information out there, and it’s impossible for people to have an overview of everything that is said and written by, for example, media and politicians.

It would be useful to have a tool that could automatically find contradictions, for instance between what a politician said two months ago and what he is saying today. Such a tool could also be used to detect contradictions in legal texts or to spot fake news.

There are many exciting perspectives, but still some way to go before computer’s understanding of languages ​​is sophisticated enough.

Further information

Vibeke Arildsen, Press Officer, phone 2555 0447, email viar@itu.dk