Recent projects

  • Paracrawl project (2017-2019): ParaCrawl will create and release large parallel corpora to/from English for all official EU languages by a broad web crawling effort. State-of-the-art methods will be applied for the entire processing chain from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for CEF.AT and translation memories for DG Translation. It will also make available consortium partners’ open-source tools to CEF Automated Translation and all other interested parties. Throughout the project there will be four large parallel corpora releases and two software releases. Project website.
  • Effortune project (2015-2018): Effortune project (Optimización de la Traducción Automática Estadística Guiada por el Esfuerzo) is a project aimed at exploring new evaluation metrics for machine translation that correlate better with post-editting effort. This project is funded by the Spanish Government through project TIN2015-69632-R. Project website.