Efficient query over large datasets of analytical chemistry
- Luaces Cachaza, David
- José Ramón Ríos Viqueira Director
- Tomás F. Pena Director
Universidade de defensa: Universidade de Santiago de Compostela
Fecha de defensa: 14 de xullo de 2023
- Sergio Ilarri Artigas Presidente/a
- José Manuel Cotos Yáñez Secretario
- Laura Po Vogal
Tipo: Tese
Resumo
The efficient management of molecular data is one of the most demanded technologies by the industry. A very important type of search is the substructure searching. The molecular structures may be encoded as graphs where the vertices and bonds represent the atoms and bonds, respectively. In this Thesis, a cutting edge system that enables the storage and querying of molecular data has been designed and implemented, paying attention to the molecular substructure search, where new filter-then-verify(FTV) methods, beyond the state-of-the-art, were designed, implemented, and tested, achieving performance gains over 75% in the filtering stage. A generic framework for the implementation of FTV techniques on a distributed architecture was also developed, enabling the application of the FTV methods on very large graph databases, achieving a great performance gain in both index building and query execution. Finally, the Thesis presents a study for the use of different FTV solutions to obtain approximate results in an interactive searching application.