Bilingual Parallel Corpora for Linguistic Research

  1. Irene Doval Reixa
Libro:
CILC2016: 8th International Conference on Corpus Linguistics
  1. Antonio Moreno Ortiz (ed. lit.)
  2. Chantal Pérez-Hernández (ed. lit.)

Editorial: EasyChair

Ano de publicación: 2016

Páxinas: 88-96

Congreso: International Conference on Corpus Linguistics (8. 2016. Málaga)

Tipo: Achega congreso

DOI: 10.29007/BCQD DIALNET GOOGLE SCHOLAR

Obxectivos de Desenvolvemento Sustentable

Resumo

In this paper it will reflect on the specific needs of the linguistic research regarding the construction of bilingual parallel corpora and primarily on the conclusions to be drawn for their design, compilation and domains. A research group of the university in Santiago is currently building a bilingual parallel corpus (Corpus PaGeS) consisting of original texts in German and Spanish together with their translations into the other language, as well as German and Spanish translations from a third language. This corpus was originally intended for linguistic research purposes, specifically, the analysis of the expression of the spatial relations. Initially a brief survey of some significant existing related corpora is performed, and their limitations for linguistic studies are outlined. The different issues that were taken into account for the design of the corpus will be explained, such as type of texts, domains, regional language variety or quality and direction of translations. After describing the manual preparation process of the texts to make the documents suitable for further processing it is explained the manual and automatic annotation procedure: the metadata, and the automatically linguistic annotation. Then the process of sentence alignment and the manual review of the alignment are described and finally the next steps of future work are outlined