Automatic generation of nominal phrases for Portuguese and Galician

  1. Domínguez Vázquez, María José 1
  2. Simões, Alberto 3
  3. Bardanca Outeiriño, Daniel 2
  4. Caíña Hurtado, María 1
  5. Iglesias Allones, José Luis 1
  1. 1 Universidade de Santiago de Compostela, Instituto da Lingua Galega - ILG, Santiago de Compostela, Spain
  2. 2 Universidade de Santiago de Compostela, CiTIUS, Santiago de Compostela, Spain
  3. 3 2Ai, School of Technology, IPCA, Barcelos, Portugal
Journal:
Natural Language Processing

ISSN: 2977-0424

Year of publication: 2024

Pages: 1-25

Type: Article

DOI: 10.1017/NLP.2024.32 WoS: WOS:001327415700001 GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Natural Language Processing

Abstract

This paper presents XeraWord, an innovative tool for automatically generating nominal phrases. XeraWord can be used for different tasks, ranging from teaching languages to the creation of examples in lexicography, or even for the development of resources for natural language processing. In this area, Xera was the first experiment, allowing the automatic generation of nominal phrases in three languages: German, French and Spanish. This tool was extended to support other languages, namely, Portuguese and Galician.We start by presenting the theory behind the development of Xera and its new version, XeraWord, namely, the applied base methodology, and the natural language processing resources used to support it. Then, TraduWord, a tool specifically developed to construct resources for new languages, is presented. This tool allows the semi-automatic translation of the data required for the nominal phrase generation. For this, we discuss its advantages and disadvantages, analysing the quality of the translated resources, as well as the amount of manual work required to validate and correct these resources.

Bibliographic References

  • Foley, (1984), Functional syntax and universal grammar
  • Fillmore, (1977), Linguistic Structures Processing, pp. 55
  • Prinsloo, (2011), Electronic lexicography in the 21st Century: New Applications for New Users (eLex2011), pp. 215
  • 10.1515/lex-2018-0008
  • Mikolov, T. , Chen, K. , Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space, Published by arXiv, https://arxiv.org/abs/1301.3781v3
  • 10.1515/9783110209419
  • Nied Curcio, M. (2014). Die Benutzung von Smartphones im Fremdsprachenerwerb und -unterricht. In Proceedings of the 16th EURALEX International Congress, Bolzano, Italy: EURAC research, pp. 263–280.
  • Bardanca Outeiriño, D. (2020). Automatic generation of dictionaries, Master’s thesis. Universidade de Santiago de Compostela.
  • Gómez Guinovart, (2011), Linguamática, 3, pp. 61
  • Arias Arias, I. (2022). Anotação semântica (semi)automática de corpora: A frase nominal em alemão, Master’s thesis. Universidade do Minho.
  • Kipper, K. , Dang, H. T. and Palmer, M. (2000). Class-based construction of a verb lexicon. In 7th National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 691–696.
  • 10.1093/ijl/11.4.315
  • 10.1515/9783111555522.223
  • 10.1007/978-94-015-7911-7
  • Izquierdo Beviá, R. , Suárez Cueto, A. and Rigau Claramunt, G. (2007). Exploring the automatic selection of basic level concepts. In Proceedings of the International Conference in Recent Advances in Natural Language Processing, pp. 298–302.
  • 10.3115/1706238.1706254
  • Bosque, (2012), Estudios de lingüística española. Homenaje a Manuel Seco, pp. 119
  • 10.1145/505168.505170
  • 10.18653/v1/K16-1028
  • Otter, (2020), IEEE Transactions on Neural Networks and Learning Systems, pp. 1
  • Simone, (2012), Diccionarios que todavía no existen. Lecture at the V Congreso internacional de lexicografía
  • 10.1093/oso/9780199277704.001.0001
  • Hermanns, (2007), Theorie und Praxis des Verstehens und Interpretierens
  • Bosque, (2023), Redes: diccionario combinatorio del español contemporáneo
  • 10.5788/30-1-1548
  • 10.21814/lm.12.1.308
  • 10.5788/22-1-1009
  • 10.1093/ijl/ecy014
  • 10.1093/ijl/3.4.235
  • Álvez, J. , Atserias, J. , Carrera, J. , Climent, S. , Laparra, E. , Oliver, A. and Rigau, G. (2008). Complete and consistent annotation of WordNet using the top concept ontology. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco: European Language Resources Association (ELRA), pp. 1529–1534.
  • Domínguez Vázquez, M. J. , Valcárcel Riveiro, C. and Bardanca Outeiriño, D. (2021). Ontología léxica. Santiago de compostela. Available at: http://portlex.usc.gal/ontologia/.
  • Padró, (2012), Linguamática, 3, pp. 13
  • Domínguez Vázquez, M. J. , Solla Portela, M. A. and Valcárcel Riveiro, C. (2019). Resource interoperability: Exploiting lexicographic data to automatically generate dictionary examples. In Kosem I., Kuhn T. Z., Correia M., Ferreira J. P., Jansen M., Pereira I., Kallas J., Jakubíček M., Krek S., and Tiberius C. (eds), Proceedings of the eLex 2019 conference: Electronic Lexicography in the 21st Century, pp. 51–71.
  • Humblé, P. (1998). The use of authentic, made-up and ’controlled’ examples in foreign language dictionaries. In Proceedings of the 8th EURALEX International Congress, pp. 593–599.
  • 10.21814/lm.12.2.337
  • Jacinto García, (2015), Forma y función del diccionario. Hacia una teoría general del ejemplo lexicográfico
  • 10.1353/lan.1991.0021
  • Mel’čuk, (1996), Lexical functions in lexicography and natural language processing, pp. 37
  • Domínguez Vázquez, M. J. , Bardanca Outeriño, D. and Simões, A. (2021). Automatic lexicographic content creation: Automating multilingual resources development for lexicographers. In Post Editing Lexicography. Proceedings of the eLex 2021 Conference, pp. 269–287.
  • 10.1609/aaai.v30i1.9810
  • Polenz, (2012), Deutsche Satzsemantik: Grundbegriffe des Zwischen-den-Zeilen-Lesens
  • 10.1515/9783110269451
  • Domínguez Vázquez, (2020), Studies on Multilingual Lexicography, pp. 135
  • 10.1142/9789813227927_0017
  • Ruppenhofer, J. , Ellsworth, M. , Schwarzer-Petruck, M. , Johnson, C. R. and Scheffczyk, J. (2006). FrameNet II: Extended theory and practice . Berkeley, CA: International Computer Science Institute, Technical report.
  • 10.1007/s10579-017-9408-5
  • Vigoni-Theses, V. (2018). Dictionaries for the future, Available at https://www.emlex.phil.fau.eu/files/2019/03/Villa-Vigoni-Theses-2018-English.pdf.
  • Laufer, B. (1992). Corpus-based versus lexicographer examples in comprehension and production of new words. In Proceedings of the 5th EURALEX International Congress, pp. 71–76.
  • 10.13053/cys-19-4-2196
  • 10.1162/coli_a_00385
  • 10.1109/TNNLS.2020.2979670
  • Fuertes-Olivera, (2018), Revista Internacional de Lenguas Extranjeras, 10, pp. 75
  • 10.1515/9783110630268
  • Køhler Simonsen, H. (2020). Augmented writing and lexicography: A symbiotic relationship? In Proceedings of XIX EURALEX Congress, vol. 1, pp. 509–514.
  • 10.1162/tacl_a_00051
  • Meyers, A. , Reeves, R. , Macleod, C. , Szekely, R. , Zielinska, V. , Young, B. and Grishman, R. (2004). Annotating noun argument structure for NomBank. In 4th International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), pp. 803–806.
  • Domínguez Vázquez, (2015), Aktuelle Perspektiven Der Kontrastiven Sprachwissenschaft: Deutsch-Spanisch-Portugiesisch. Zwischen Tradition und Innovation, pp. 97
  • 10.18653/v1/W19-3320
  • Spohr, (2011), In e-Lexicography: The Internet, Digital Initiatives and Lexicography, pp. 103
  • Kabashi, (2018), 18th EURALEX International Congress: Lexicography in Global Contexts, pp. 855
  • 10.1007/978-3-319-13623-3_25
  • Engel, (2004), Deutsche Grammatik - Neubearbeitung