Nos_MeteoGalicia-GL Data-to-Text

  1. Corbelle, Javier González 1
  2. Diz, Alberto Bugarín 1
  3. Moral, Jose María Alonso 1
  4. Taboada, Juan 2
  1. 1 Universidade de Santiago de Compostela (USC)
  2. 2 MeteoGalicia, Xunta de Galicia

Editor: Zenodo

Year of publication: 2023

Type: Dataset

Abstract

MeteoGalicia-GL is the first known data-to-text dataset in Galician language. The dataset is made up of 3,302 records of meteorological prediction tabular data along with handwritten textual descriptions in Galician. All the data provided in this dataset have been collected from the Galician Official Meteorological Agency (MeteoGalicia). The dataset comprises real tabular data and texts written by expert meteorologists from MeteoGalicia. Additionally, the texts provided in MeteoGalicia-GL were cured by CiTIUS staff, in order to check the consistency of the data tables with the captions. The dataset is stored in the "dataset" directory, where we can find individual files for each tabular data file and caption in their respective folders. The tabular data files contained in MeteoGalicia-GL represent the state-of-the-sky for a day in Galicia (NW Spain), by categorical values in Galician, e.g., "despexado" (“sunny”), "nubrado" (“cloudy”), etc. Each data table is organized into 4 columns and 32 rows. The first column (<em>"Zona"</em>) indicates one out of the 32 geographical zones of Galicia for the forecast, while the rest of the columns indicate the state-of-the-sky value for the different periods of the day: morning (<em>"Mañá"</em>), afternoon (<em>"Tarde"</em>) and night (<em>"Noite"</em>).