Project granted: AI-Driven Translation for Low-Resource Languages and Cultures

The Transducens research group at theUniversitat d’Alacant has been granted a project from the “Proyectos de Generación de Conocimiento” (https://lnkd.in/e47ADqNY), funded by the Agencia Estatal de Investigación (Spanish Research Agency) and the European Social Fund.

This is a project that brings together the expertise of the Universitat d’Alacant, the Barcelona Supercomputing Center and the Universitat Oberta de Catalunya. The Universitat d’Alacant is the coordinating institution, with Víctor Manuel Sánchez Cartagena and myself serving as principal investigators.

The project is set to kick off in September 2025 and will focus on leveraging Large Language Models (LLMs) for low-resource and extremely low-resource language translation. In particular, we will (1) create high-quality datasets for target low-resource languages, including textual and image data, and refining existing parallel corpora using LLM-based cleaning techniques, (2) push the boundaries of linguistic integration into LLMs by leveraging resources like grammar books and dictionaries, and developing innovative chain-of-thought prompting techniques, and (3) explore pixel-based translation by adapting multimodal LLMs for low-resource scenarios, addressing visually ambiguous translations.

Beyond advancing machine translation, this project has profound implications for cultural preservation, facilitating the integration of migrants in vulnerable situations, and empowering underrepresented communities. We will demonstrate our methods by releasing LLMs tailored to low-resource languages of the Iberian Peninsula, extremely low-resource Mayan languages, and languages of migrants in vulnerable circumstances.

Join Our Team!
In addition to the project funding, we have also been granted a PhD contract! We will be looking to hire a talented individual once the project officially starts in September 2025. If you’re passionate about the future of language technology and want to contribute to a project with significant societal impact, stay tuned for more details on how to apply!

The International Association for Machine Translation Awards Professor Mikel L. Forcada

The International Association for Machine Translation (IAMT) has announced the awarding of its Award of Honour to Professor Mikel L. Forcada, one of the founders of the Transducens research group, now retired from the University of Alicante. This prestigious award recognizes Professor Forcada’s extensive and outstanding career in the field of machine translation, as well as his continued support to the scientific community in this field.

Mikel L. Forcada’s nomination received unanimous support from the organizations that comprise IAMT in America (AMTA), Europe (EAMT), and Asia (AAMT), all of which highly value his long-standing and distinguished contribution to machine translation.

The award will be formally presented at the MT Summit 2025, which will take place in Geneva, Switzerland, in June 2025.

Starting the LiLowLa project

On 1st September 2022 the LiLowLa project (Lightweight neural translation technologies for low-resource languages) started.

The main objectives of LiLowLa are: The development of a smart crawling method able to prioritize the most productive websites; the development of data augmentation techniques for training neural machine translation systems for low-resource languages; to devise a method for distilling the translation knowledge encoded in large pre-trained models; to enable translation memory-based computer-aided translation tools to exploit target-language monolingual corpora; and to deepen the understanding of how NMT systems behave at prediction time and during training.

Kick-off meeting of the GoURMET project held at Alacant

The kick-off meeting of the European project GoURMET was held at the Universitat d’Alacant on January 22-23. The Transducens Research Group is one of the partners of the project “GoURMET: Global Under-Resourced MEdia Translation“, that focuses on building machine translation systems to translate global news into scarce-resourced languages.

The project, that will last for three years. The consortium consists of the University of Edinburgh (coordinator), the Universitat d’Alacant, the University of Amsterdam, the British Broadcasting Corporation (BBC), and Deutsche Welle (DW). The project is funded in the framework of the European initiative Research & Innovation Action H2020-ICT-2029.

Starting ParaCrawl project

The 18-month ParaCrawl project (Action entitled “Provision of Web-Scale Parallel Corpora for Official European Languages”, Action No 2016-EU-IA-0114) started on September 15, 2017. Transducens group is one of the members of the consortium, together with the University of Edinburgh (coordinating), TAUS, Prompsit and Johns Hopkins University (subcontrator).

ParaCrawl will create parallel corpora to/from English for all official EU languages by a broad web crawling effort. State-of-the-art methods will be applied for the entire processing chain from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for CEF.AT and translation memories for DG Translation. It will also make available consortium partners’ open-source tools to CEF Automated Translation and all other interested parties.

The Apertium project accepted for Google-summer of code 2017

The free/open-source machine translation platform Apertium, originally created by the Transducens group, has been selected one more year as one of the projects supported by Google in their Google Summer of Code program. Students from around have applied for one of the 10 projects granted to collaborate with this free/open-source machine translation platform. The complete list of ideas submitted to this edition of Google Summer of Code can be checked here. Chosen projects can be checked at https://summerofcode.withgoogle.com/archive/2017/organizations/6618812501721088/#projects.
 

Pedro Pernías’ group awarded by Google in the MOOC Focused Research Awards

The group leaded by Pedro A. Pernías Peco, one of the members of our group, has been awarded in the MOOC Focused Research Awards organised by Google. Their work Improvement of students’ interaction in MOOCs using participative networks has been one of the 7 projects awarded, which will be supported by Google for their development. This project will focus on studying the involvement of MOOC students in the platform UniMOOC.

The Apertium project accepted for Google Summer of Code 2014

The free/open-source machine translation platform Apertium, originally created by the Transducens group, has been selected again as one of the projects supported by Google in their Google Summer of Code program. Students from around the world can now apply for one of the 5,500 USD grants for working for three months in one of the ideas proposed for the project (check the list of ideas here). Proposals and new ideas can be discused with the mentors in the IRC chanel #apertium in freenode.net or through the mail-list apertium-stuff@lists.sourceforge.net. The next steps will be:

  • From February 24 to March 10: discussion of the ideas with project developers.
  • From March 10 to March 21: submission of proposals.

Interested students can contact with Mikel Forcada.