Lightweight neural translation technologies for low-resource languages (LiLowLa)
Research and innovation project PID2021-127999NB-I00 funded by the Spanish Ministry of Science and Innovation (MCIN), the Spanish Research Agency (AEI/10.13039/501100011033), and European Regional Development Fund A way to make Europe.
Period of activity: from 01/09/2022 to 31/08/2025.
LiLowLa aims at boosting the performance of machine translation (MT) and translation memory (TM) technologies for low-resource language pairs through the following objectives:
- Improvement of the efficiency, robustness, and applicability of neural MT systems in scenarios involving low-resource language pairs
- Improvement of current web crawling methods to avoid downloading documents that end up being useless after their processing
- Widening of the applicability of TMs in professional computer-aided translation tools by allowing them to exploit monolingual corpora when MT is not a viable option or the database of existing translations is not sufficiently large.
These objectives require advancing the state of the art in low-resource NMT, corpus crawling and TM-based CAT tools. We will investigate on how to make NMT more robust and efficient by distilling the knowledge in large pre-trained neural models initially developed for high-resource language pairs, and research on new lightweight data augmentation techniques to make the most of the scarce resources available. We will also research on the use of reinforcement learning to improve current methods for corpus crawling, and the integration of cross-lingual sentence embeddings into CAT tools to permit the search of translation proposals in monolingual corpora, which are easier to obtain than parallel corpora. Finally, the inner workings of the neural systems that we will develop will be analysed via modern interpretability techniques as a way to provide well-motivated feedback to improve them.
- Felipe Sánchez-Martínez (Principal investigator)
- Juan Antonio Pérez-Ortiz (Principal investigator)
- Mikel L. Forcada
- Miquel Esplà-Gomis
- Víctor M. Sánchez-Cartagena
- Cristian García-Romero
- Aarón Galano-Jiménez
- Leopoldo Pla
- See list of publications.