Agreements and studies to promote minority languages

The first part of the project is taking place outside of a lab environment. To obtain the data required to train the artificial intelligence models, there is a need to compile as much material as possible for Asturian, Aragonese and Aranese. "That's why this first phase focuses on securing agreements with regional governments, universities and publishers to provide the materials for creating the parallel corpora to train the neural system," said Oliver.

In this regard, recently saw the inking of an agreement with the Government of Asturias on assigning the entire corpus of texts translated from Spanish into Asturian held by its Directorate General of Language Policy. The agreement also stipulates that, if the Government of Asturias so requests, it can gain access to the technological and linguistic developments achieved by the TAN-IBE project for use in its own possible machine translation projects.

"Ultimately, our goal with this project is to help promote the use of these languages with fewer resources and foster more publishing in them," said Oliver. "For example, all laws could be published in two languages, quickly and efficiently, using fewer resources, although a human review would always be required. What's more, those who don't dare to use these languages because they don't feel confident enough can use these tools as support for improving their texts. Lastly, languages like Asturian, Aragonese and Aranese need to be included in digital technologies. If not, they may start disappearing and be forgotten."

This UOC research helps foster achievement of UN Sustainable Development Goal 4, Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all.

PID2021-124663OB-I00 project funded by MCIN/AEI/10.13039/501100011033/ERDF, EU.

