Uma utilidade para o reconhecimento de topónimos em documentos medievais

  1. Canosa, Xavier
  2. Gamallo, Pablo 1
  3. Varela, Xavier
  4. Taboada, José Ángel
  5. Martínez Lema, Paulo
  6. Garcia, Marcos
  1. 1 Universidade de Santiago de Compostela
    info

    Universidade de Santiago de Compostela

    Santiago de Compostela, España

    ROR https://ror.org/030eybx10

Revista:
Linguamática

ISSN: 1647-0818

Ano de publicación: 2019

Volume: 11

Número: 1

Páxinas: 3-15

Tipo: Artigo

DOI: 10.21814/LM.11.1.291 DIALNET GOOGLE SCHOLAR lock_openAcceso aberto editor

Outras publicacións en: Linguamática

Obxectivos de Desenvolvemento Sustentable

Resumo

This paper describes a method to build a tool aimed at recognizing geographical named entities in medieval texts. The new tool has been developed using the corresponding modules for contemporary languages contained in LinguaKit, a suite of NLP tools. A collection of manually annotated corpora served as a resource to build a gazetteer of medieval toponyms and find patterns to improve and implement new rules for the recognition of place names. In addition to the gazetteer, a list of triggers was the most determinant factor to improve recall. Final adjustments considered the most frequent terms of the lexicon and grammatical contexts for geographical named entities. In the process of building a model of medieval language and a specific lexicon, the available tool can already be used to annotate texts and shows a significant improvement when compared with previous modules. However, most work remains to be done in terms of adding specific gazetteers for entities other than