Con­ver­sa­tions with ar­ti­fi­cial in­tel­li­gence: new re­search pro­ject to fo­cus on mul­ti­lin­gual ques­tion-an­swer sys­tems

 |  Research

Billions of people use the internet every day, producing nonillions of bytes of data. Artificial intelligence (AI) is used to obtain structured findings from these huge volumes of data. In particular, this benefits companies that need to make business-critical decisions based on data. The problem: Even though data is available in many different languages today, there are few multilingual datasets such as knowledge graphs, which model information in a structured way and provide the basis for many AI applications. In a new research project, scientists from the specialized Data Science group in the Institute of Computer Science at Paderborn University, along with partners from the industry, are working on an approach that will allow end users to retrieve large volumes of multilingual text data using knowledge graphs. This key component will improve the efficiency of AI-supported solutions in companies, for instance in question-answer systems (QA) in the form of chatbots or an enterprise search, in other words in-house search engines.

The project, titled “Polylingual Hybrid Question Answering” (PORQUE), will receive a total of 1.2 million euros from the Federal Ministry of Education and Research (BMBF) over the next three years as part of the “Eurostar” support program. The project partners include Semantic Web Company (the consortium leader) and the software developer SiteFusion.

New platform collects multilingual data

“Our project aims to further develop polylingual, or multilingual, conversational AI so that users will have the ability to query many different multilingual data sources. That will allow companies to use data that is available worldwide and to make informed business-critical decisions,” says Professor Axel-Cyrille Ngonga Ngomo, head of the specialized Data Science group in the Institute of Computer Science at Paderborn University.

The challenge here is responding to complex questions across multiple languages, based on large volumes of heterogeneous data. “Our approach is innovative because it combines automatic machine translation and knowledge graphs,” explains the computer scientist. “Today, knowledge graphs form the essential basis for many AI applications and AI assistants to function – they are found in solutions for finding information and in QA systems,” says Artem Revenko, Director of Research, PoolParty Semantic Suite. For instance, the data sets are hidden behind blocks of information that Google displays for search queries before a page is even called up, or that Amazon uses to answer questions asked of Alexa. “In addition to the complication that human beings can ask a question in many different ways, there is a shortage of knowledge graphs in languages other than English, since nearly half of all information on the web is not available in English,” explains Ngonga Ngomo. “Even though a great effort has already been made to provide knowledge graphs across languages, the majority of the popular knowledge graphs, e.g. DBpedia, are most extensive in their English version. This lack of multilingual data sets restricts the transfer of machine-learning-based models – like QA systems – to certain languages,” continues the scientist.

Answering cross-lingual questions from the European market

The innovative platform for providing multilingual answers will be a hybrid solution, explains Ngonga Ngomo. “Our platform will include the translation and cross-lingual enrichment of knowledge graphs, coupled with information from texts from the web. Once a knowledge graph is enriched with multilingual content, we plan to use it as background knowledge to create and improve the quality of polylingual QA systems.” This is especially relevant in the European context, he adds, since data in this region is available in a large number of languages.

Until now, says Ngonga Ngomo, there have been very few solutions that combine entity names from texts (like the names of people or places) with polylingual domain knowledge in order to answer questions. He adds, “To date, commercial applications that allow multilingual QA have been heavily dependent on people to handle some of the data quality assurance, which is time-consuming and cost-intensive. By combining machine translation as an automated system with specific language processing technologies, we allow end users to ask multilingual questions and automatically receive precise answers.”

Further informationen on the Data Science group: en.cs.uni-paderborn.de/ds 

Jennifer Strube, Press, Communications and Marketing

Photo (Judith Kraft): Prof. Axel-Cyrille Ngonga Ngomo heads the specialized Data Science group in the Institute of Computer Science at Paderborn University.

Contact