Resources: Data

 

Parallel corpora


These collections are sentence-level, aligned corpora, done through a combination of automatic and manual alignment techniques. The data was sourced from the South African government domain:

  • ENG-AFR (± 421 318 sentences)

  • ENG-NSO (± 44 980 sentences)

  • ENG-ZUL (± 35 489 sentences)

 

Licenced under Creative Commons Attribution Non-Commercial ShareAlike 2.5 South Africa
Attribute work to: CTexT (Centre for Text Technology, North-West University), South Africa; Department of Arts and Culture, South Africa.

 

Download corpora Get Autshumato at SourceForge.net. Fast, secure and Free Open Source software downloads®

 

Translation memories


These translation memories are in the translation memory eXchange format (TMX). To be used with the Autshumato ITE or any other TMX-enabled computer-assisted translation tool.

  • ENG-AFR (English to Afrikaans)

  • AFR-ENG (Afrikaans to English)

  • ENG-NSO (English to Sepedi)

  • NSO-ENG (Sepedi to English)

  • ENG-ZUL (English to IsiZulu)

  • ZUL-ENG (IsiZulu to English)

 

Licenced under Creative Commons Attribution Non-Commercial ShareAlike 2.5 South Africa
Attribute work to: CTexT (Centre for Text Technology, North-West University), South Africa; Department of Arts and Culture, South Africa.

 

Download translation memories Get Autshumato at SourceForge.net. Fast, secure and Free Open Source software downloads®



SouceForge.net, Slashdot, freshmeat, and ThinkGeek are registered trademarks or trademarks of SourceForge, Inc., in the United States and other countries.