Obtaining computational resources for languages with scarce resourcesfrom closely related computationally-developed languages. The Galician and Portuguese case

Paulo Malvar Fernández; José Ramón Pichel Campos; Óscar Senra Gómez; Pablo Gamallo Otero; Alberto García

Obtaining computational resources for languages with scarce resourcesfrom closely related computationally-developed languages. The Galician and Portuguese case

Paulo Malvar Fernández ¹
José Ramón Pichel Campos ¹
Óscar Senra Gómez ¹
Pablo Gamallo Otero ²
Alberto García ³

1 Area of Language Technology, imaxin|software, Santiago de Compostela
2 Universidade de Santiago de Compostela

Universidade de Santiago de Compostela

Santiago de Compostela, España

ROR https://ror.org/030eybx10
3 Engineering department of Igalia, A Coruña

Show affiliations +

Book:

Language Windowing through Corpora

Isabel Moskowich-Spiegel Fandiño (coord.)
Begoña Crespo García (coord.)
Inés Lareo Martín (coord.)
Paula Lojo Sandino (coord.)

Publisher: Servizo de Publicacións ; Universidade da Coruña

ISBN: 978-84-9749-401-4

Year of publication: 2010

Volume Title: Part II, L-Z

Volume: 2

Pages: 529-536

Congress: International Conference on Corpus Linguistics (2. 2010. A Coruña)

Type: Conference paper

DIALNET GOOGLE SCHOLAR RUC editor

Abstract

In order to build many statistically-driven NLP tools, it is essential to use a significantly large amount ofdata. To overcome the limitation of the scarcity of computational resources for languages such asGalician it is necessary to develop new strategies. In the case of Galician, well-known romanicists havetheorized that Galician and Portuguese are two varieties of European Portuguese. From a pragmaticstandpoint, this assumption could open up a new line of research to supply Galician with richcomputational resources. Drawing from the ENGLISH-Portuguese Europarl parallel corpus,imaxin|software has compiled an English-Galician parallel corpus that we used to build an EnglishGalician Statistical Machine Translation prototype whose performance is comparable to GoogleTranslate. We contend that this strategy can be implemented to develop a great variety of computationaltools for languages like Galician that are closely related to languages for which there already exist greatcomputational resources

Data source: Dialnet

Obtaining computational resources for languages with scarce resourcesfrom closely related computationally-developed languages. The Galician and Portuguese case

Universidade de Santiago de Compostela

Abstract