Starting ParaCrawl project

The 18-month ParaCrawl project (Action entitled “Provision of Web-Scale Parallel Corpora for Official European Languages”, Action No 2016-EU-IA-0114) started on September 15, 2017. Transducens group is one of the members of the consortium, together with the University of Edinburgh (coordinating), TAUS, Prompsit and Johns Hopkins University (subcontrator).

ParaCrawl will create parallel corpora to/from English for all official EU languages by a broad web crawling effort. State-of-the-art methods will be applied for the entire processing chain from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for CEF.AT and translation memories for DG Translation. It will also make available consortium partners’ open-source tools to CEF Automated Translation and all other interested parties.