Starting the LiLowLa project

On 1st September 2022 the LiLowLa project (Lightweight neural translation technologies for low-resource languages) started.

The main objectives of LiLowLa are: The development of a smart crawling method able to prioritize the most productive websites; the development of data augmentation techniques for training neural machine translation systems for low-resource languages; to devise a method for distilling the translation knowledge encoded in large pre-trained models; to enable translation memory-based computer-aided translation tools to exploit target-language monolingual corpora; and to deepen the understanding of how NMT systems behave at prediction time and during training.

Kick-off meeting of the GoURMET project held at Alacant

The kick-off meeting of the European project GoURMET was held at the Universitat d’Alacant on January 22-23. The Transducens Research Group is one of the partners of the project “GoURMET: Global Under-Resourced MEdia Translation“, that focuses on building machine translation systems to translate global news into scarce-resourced languages.

The project, that will last for three years. The consortium consists of the University of Edinburgh (coordinator), the Universitat d’Alacant, the University of Amsterdam, the British Broadcasting Corporation (BBC), and Deutsche Welle (DW). The project is funded in the framework of the European initiative Research & Innovation Action H2020-ICT-2029.

Starting ParaCrawl project

The 18-month ParaCrawl project (Action entitled “Provision of Web-Scale Parallel Corpora for Official European Languages”, Action No 2016-EU-IA-0114) started on September 15, 2017. Transducens group is one of the members of the consortium, together with the University of Edinburgh (coordinating), TAUS, Prompsit and Johns Hopkins University (subcontrator).

ParaCrawl will create parallel corpora to/from English for all official EU languages by a broad web crawling effort. State-of-the-art methods will be applied for the entire processing chain from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for CEF.AT and translation memories for DG Translation. It will also make available consortium partners’ open-source tools to CEF Automated Translation and all other interested parties.

The Apertium project accepted for Google-summer of code 2017

The free/open-source machine translation platform Apertium, originally created by the Transducens group, has been selected one more year as one of the projects supported by Google in their Google Summer of Code program. Students from around have applied for one of the 10 projects granted to collaborate with this free/open-source machine translation platform. The complete list of ideas submitted to this edition of Google Summer of Code can be checked here. Chosen projects can be checked at https://summerofcode.withgoogle.com/archive/2017/organizations/6618812501721088/#projects.
 

Pedro Pernías’ group awarded by Google in the MOOC Focused Research Awards

The group leaded by Pedro A. Pernías Peco, one of the members of our group, has been awarded in the MOOC Focused Research Awards organised by Google. Their work Improvement of students’ interaction in MOOCs using participative networks has been one of the 7 projects awarded, which will be supported by Google for their development. This project will focus on studying the involvement of MOOC students in the platform UniMOOC.

The Apertium project accepted for Google Summer of Code 2014

The free/open-source machine translation platform Apertium, originally created by the Transducens group, has been selected again as one of the projects supported by Google in their Google Summer of Code program. Students from around the world can now apply for one of the 5,500 USD grants for working for three months in one of the ideas proposed for the project (check the list of ideas here). Proposals and new ideas can be discused with the mentors in the IRC chanel #apertium in freenode.net or through the mail-list apertium-stuff@lists.sourceforge.net. The next steps will be:

  • From February 24 to March 10: discussion of the ideas with project developers.
  • From March 10 to March 21: submission of proposals.

Interested students can contact with Mikel Forcada.

 

The Apertium project in Google Code-In 2013

The Apertium project, a free/open-source rule-based machine translation platform in which the Transducens group has been strongly involved since its inception, is, for the fourth year in a row, one of the 10 free/open-source organizations selected by Google for the Google Code-In.

Google Code-In is a contest to introduce pre-university students (aged 13 to 17) to free/open-source software development. Students from all around the world can participate by tackling small tasks, which may include code writing, debugging, documentation, production of training material, etc. For each three tasks, students get a Google Code-In T-shirt. Each participating organization will select two winners of the Grand Prize: a trip to the Google headquarters for the students and a parent or tutor.

The Apertium project has proposed a wide variety of tasks, including the creation of documentation to help users and developers, the development of dictionaries and rules for new or existing languages, the development of programs to transform other existing free/open-source resources into Apertium format, the creation of extensions to ease the use of Apertium from third-party software, etc. There are also debugging and quality assessment tasks, and tasks in which texts are annotated so that they can be used to test and train Apertium modules.

The first students have already come by the Apertium IRC channel (#apertium at irc.freenode.net) to ask about the different tasks, even if the contest does not officially start until November 16, and mentors (approximately 20 for Apertium this year) have already started to guide them.

If you are a pre-university student interested in contributing to the development of our free/open-source machine translation system, come by and participate!

 

DATeCH 2014 – May 19-20, Madrid

DATeCH brings together researchers and practitioners looking for innovative approaches for the creation, transformation and exploitation of historical documents in digital form.

www.datech2014.info

Topics

OCR technology and tools for historical documents including:

  • Methods and tools for post-correction of OCR results.
  • Automated quality control for mass OCR data.
  • Innovative access methods for historical texts and corpora.
  • Natural language processing of ancient languages (Latin, Greek).
  • Visualization techniques and interfaces for search and research in digital humanities.
  • Crowdsourcing techniques for collecting and annotating data in digital humanities.
  • Enrichment of and metadata production for historical texts and corpora.

Important dates

  • January 7, 2014 – Paper submission deadline
  • February 28, 2014 – Decision notification
  • March 31, 2014 – Camera-ready papers due
  • May 19-20, 2014 – Conference

DATeCH 2014 is supported by the Succeed project and the Impact Centre of Competence in Digitisation.