Author Archives: Felipe Sánchez Martínez

Opening for a PhD position at the Transducens research group in Alicante (Spain), working with LLMs for translating low-resource languages

The Transducens research group (https://transducens.dlsi.ua.es) at Universitat d’Alacant (https://www.ua.es) is excited to announce an opening for a PhD position in Alicante, Spain.

This is a fantastic opportunity to contribute to a research project funded by the Spanish Research Agency and the European Social Fund: A way to make Europe focusing on crucial areas:

  • Multilingual machine translation: Exploitation of unstructured linguistic information to improve the translation of low-resource language pairs with large language models (LLMs).
  • Large language models:  Development of advance chain-of-thought prompting techniques to enhance the translation process with LLMs.
  • Societal impact: Development of machine translation systems for low-resource and extremely low-resource language pairs with a large potential impact in their respective linguistic communities.

More information available here.

Project granted: AI-Driven Translation for Low-Resource Languages and Cultures

The Transducens research group at theUniversitat d’Alacant has been granted a project from the “Proyectos de Generación de Conocimiento” (https://lnkd.in/e47ADqNY), funded by the Agencia Estatal de Investigación (Spanish Research Agency) and the European Social Fund.

This is a project that brings together the expertise of the Universitat d’Alacant, the Barcelona Supercomputing Center and the Universitat Oberta de Catalunya. The Universitat d’Alacant is the coordinating institution, with Víctor Manuel Sánchez Cartagena and myself serving as principal investigators.

The project is set to kick off in September 2025 and will focus on leveraging Large Language Models (LLMs) for low-resource and extremely low-resource language translation. In particular, we will (1) create high-quality datasets for target low-resource languages, including textual and image data, and refining existing parallel corpora using LLM-based cleaning techniques, (2) push the boundaries of linguistic integration into LLMs by leveraging resources like grammar books and dictionaries, and developing innovative chain-of-thought prompting techniques, and (3) explore pixel-based translation by adapting multimodal LLMs for low-resource scenarios, addressing visually ambiguous translations.

Beyond advancing machine translation, this project has profound implications for cultural preservation, facilitating the integration of migrants in vulnerable situations, and empowering underrepresented communities. We will demonstrate our methods by releasing LLMs tailored to low-resource languages of the Iberian Peninsula, extremely low-resource Mayan languages, and languages of migrants in vulnerable circumstances.

Join Our Team!
In addition to the project funding, we have also been granted a PhD contract! We will be looking to hire a talented individual once the project officially starts in September 2025. If you’re passionate about the future of language technology and want to contribute to a project with significant societal impact, stay tuned for more details on how to apply!

Starting the LiLowLa project

On 1st September 2022 the LiLowLa project (Lightweight neural translation technologies for low-resource languages) started.

The main objectives of LiLowLa are: The development of a smart crawling method able to prioritize the most productive websites; the development of data augmentation techniques for training neural machine translation systems for low-resource languages; to devise a method for distilling the translation knowledge encoded in large pre-trained models; to enable translation memory-based computer-aided translation tools to exploit target-language monolingual corpora; and to deepen the understanding of how NMT systems behave at prediction time and during training.