Change-point detection methods for behavioral shift recognition in mental healthcare

  1. Romero Medrano, Lorena
Supervised by:
  1. Antonio Artés Rodríguez Director
  2. Pablo Martínez Olmos Co-director

Defence university: Universidad Carlos III de Madrid

Fecha de defensa: 20 January 2023

Committee:
  1. Fernando Pérez Cruz Chair
  2. Jorge López Castromán Secretary
  3. Vanessa Gómez Verdejo Committee member

Type: Thesis

Abstract

Human behavior analysis has been approached from different perspectives along time. In recent years, the emergence of new technologies and digitalization advances have risen as an alternative tool for behavior characterization, as well as for the detection of changes over time. In particular, the generalized use of smartphones and electronic devices, which are continuously collecting data from the user, provide a representation of behavior in different areas of a person’s life, such as mobility, physical activity or social interactions. In addition, they allow us a passive monitorization, that is, without the need for the user to interact directly with the device, collecting information in an unobtrusive manner and therefore without altering their daily routine. This methodology implies, among other advantages, that the user does not subjectively influence the information collected, obtaining objective representations of their behavior. This approach to the characterization and analysis of behavior and its changes has many applications, notably in medicine. In this work, we focus specifically on the field of mental health, where the characterization and early detection of behavioral changes is important in order to prevent relapses in psychiatric patients and, in particular, in those with a history of suicidal behavior to try to prevent possible suicide attempts or psychiatric emergency admissions. Our approach is based on the development and application of mathematical and statistical models that can help us to detect these changes from passively collected data. However, despite the mentioned advantages, working with data collected through electronic devices and, specifically in a clinical scenario, is a challenge due to its characteristics. These are data with a very complex structure since, first of all, they are irregularly sampled in time (the samples can be stored every 5 minutes, when a specific activity starts or daily). Second, each observation can be heterogeneous, where by heterogeneous we mean that it is made up of several sources of different statistical type (continuous, discrete) or same type but, statistically, with different marginal distributions. In addition, the existence of several sources and the frequency of the samples causes that each day is represented by a high-dimensional vector, focusing on the need for scalable algorithms. Lastly, these are data sequences with many missing values and very diverse patterns due, for example, to the lack of permissions on the phone, disconnection periods or, simply, the temporal irregularity already mentioned. The preprocessing of data with these characteristics requires a huge effort and time cost that is not feasible when dealing with such a demanding goal, as it is the prediction and prevention of suicide attempts, since the information must be processed in real time as every minute is important. Therefore, we need methods that are fast, efficient, accurate and adapted to the complexity of the data we are working with. For this reason, instead of focusing our efforts on data mining, which is generally conditioned to a specific initial hypothesis and hinders reproducibility, we work on methods that are capable of handling data sequences with the previously aforementioned characteristics, and do it in an online manner. That is, algorithms capable of processing the samples as they are being recorded. In this thesis, we focus on the development of probabilistic models for behavior change detection, proposing algorithms that can work on heterogeneous, multi-source, high-dimensional sequential data with missing values. In our scenario, we assume that the joint distribution of the data changes at a given moment, segmenting the sequence, and our goal is to detect this change and to do so with the least possible delay. The research line followed during the thesis is mainly organized in three blocks, that are summarized in the following. I Modelling Digital Phenotype for Medical Applications We begin the thesis by describing the benefits of using digital phenotyping for the characterization of human behavior changes, and we introduce an example of a specific monitoring e-health system with which we have worked: Evidence-Based Behavior (eB2) System, an e-health solution whose goal is the improvement of the treatment quality of mental health patients by obtaining faster and more precise answers in the mental health service cycle. We also detail the collection and aggregation methods used by the platform and the posterior summarization of the raw measurements as necessary first step before modelling. As a second step, the transformation of processed data into behavioral digital biomarkers allows to obtain valuable information, that can be used as indicators for prevention of suicide risk events using AI techniques. We present two works on data mining in medicine through digital phenotype modelling: the prediction of disability level in different domains of daily life (Disability Assessment Prediction) and the analysis of causal relationships between variables in order to detect negative effects caused by isolation during the Covid-19 pandemic in psychiatric patients. - Disability Assessment Prediction. WHODAS 2.0 is a standardized assessment instrument developed by the World Health Organization for the measurement of health and disability in the population and in clinical practice. We provide a baseline analysis of the feasibility of using machine learning to predict patients’ WHODAS 2.0 disability scores from passively gathered data. These approaches are particularly important since they may enable the analysis of individuals’ functioning and disability evolution and provide a clinical tool to monitor the progression and efficacy of treatment. In addition, they provide the opportunity to build targeted just-in-time adaptive interventions in a designated population. - Analysis of Covid-19 lockdown effects. The Covid-19 pandemic rose the concern that the social and physical distancing measures implemented as a response may negatively impact health in other areas, via both decreased physical activity and increased social isolation. Specifically, we investigate whether increased time spent using social media apps would predict maintenance of higher physical activity levels, pre- vs post- imposition of lockdown conditions. To address this question, we analyze passively sensed app use and physical activity (step count) data, and self-reported emotional state. This information is used to explore the idea that increased social media use may help protect against negative effects of lockdown-induced isolation on mood-either directly, or indirectly, via increased physical activity. II Change-Point Detection Models for Heterogeneous Data After working with data-driven approaches, we move on to the second block of the thesis, that is the most extended and technical one. In this part, we go a step further and, change the focus from the previous chapters: from fully adapting our data to existing methods, to proposing algorithms that are specific for heterogeneous, multi-source, high-dimensional sequential data with missing values. We focus on the development of change point detection algorithms and present the benefits of using latent variable models to deal with the problem of high-dimensional data sets, and provide methods that are able of integrating data from different statistical type. Change-point detection (CPD) methods aim to identify abrupt transitions in sequences of observations, for both univariate and multivariate cases. Typically, a change-point (CP) is only considered if there is a noticeable difference between the generative parameters of the data before and after the change-point event. Since the identifiability of change-points is directly related to the discrepancy between distributions governing each partition, we consider a Bayesian framework, that provides a reliable solution to obtain uncertainty measures over both the parameters and the CP locations. In particular, we focus on the existing Bayesian Online CPD algorithm (BOCPD) , that uses this idea to derive a recursive exact inference method. However, when observations become high-dimensional and the number of parameters in the model grows exponentially, there is not enough evidence in the sequential data to obtain reliable estimates of the true generative parameters. Latent variable models are particularly amenable to overcome the high-dimensionality issue. Under the assumption that change-points lie on a lower-dimensional manifold, one can extend the BOCPD algorithm to accept subsets of surrogate discrete latent variables. Each data point is therefore linked to a single assignment, as it is done in mixture models. The main drawback is that true latent class assignments are never observed but inferred, leading to introduce pseudo-observations. For this purpose, there are two main strategies: i) use the posterior probability vector as a continuous multivariate datum, i.e. as a Dirichlet distributed variable or ii) observe single point-estimates of the discrete latent variable. Despite that the first idea was explored in previous works out of the CPD problem, it still requires expensive approximate methods due to non-tractability issues. The second idea allows reliable detection instead, particularly when posterior densities over the latent variables are certain enough. We consider the case of having poor inference of point-estimates over the latent variables that lead to catastrophic results on the CPD. Our contribution is to provide a novel extension for the hierarchical CP model that improves the detection rate and reduces delay even under extremely flat posterior distributions with high variance. The proposed solution considers latent variable samples as multivariate observations, that we model as multinomial distributed. This keeps the original analytic simplicity of the Bayesian CPD inference as well as the complexity cost remains significantly low. Our method is validated through experiments on synthetic data, where we prove the utility of the new inference mechanism in terms of precision and delay in the detection. We also provide insights to be applicable in real-world scenarios, such as change-point detection in monitored psychiatric patients of a human behavior study. Multi-Source Change-Point Detection Over Local Observation Models The hierarchical extension previously introduced lies on the assumption that there is a unique univariate latent representation that simultaneously summarizes the statistical information of every source. This approach solves the high-dimensional data problem. However, the latent variable modelling still implies the use of different likelihood functions and entails an optimization problem over a product of functions with different support that results in some variables underrepresented in favor of others, loosing essential generative in formation for the global detection. This joint dimensionality reduction has an implicit smoothing effect, making the method not sufficiently sensitive when dealing with interspersed changes of different intensity within the same sequence. Solely the high-intensity CPs are detected in these cases. Even though, the presence of missing temporal data for just a subset of sources can increase this smoothing effect, motivating the search for a more sensitive way to fuse all the sources while taking into account the aforementioned features of the data. To overcome the limitations of the described setting, we propose a Change-Point Detector based on Local Observation Models (LOM-based CPD) that generalizes and extends the use of latent variables models for change-point detection. The LOM-based CPD tackles the problem in a two-stage modelling method. In the first stage, we propose several Local Observation Models (LOMs) that are based on partitioning the feature space depending on the context-meaning, multi-source and mixed-type nature of the data. This allows the dimensionality reduction of the observations and control over how the local CP information is transferred to homogeneous local spaces, implying technical advantages in the inference pro- cess and solving the heterogeneous initial problem. In particular, we propose four observation models (OMs): - Full joint representation OM, where we consider a univariate latent variable at each time instant, assuming that there is a unique latent representation that holds the generative characteristics of every source simultaneously. This is the approach followed in the development of the first hierarchical extension, and implies working with heterogeneous likelihood functions. - Independent source representation OM, where we define an observation model based on the assumption that there exists a latent representation for each data source. That is, the number of local sets is equal to the number of sources at each time instant. This proposal has the advantage of not only solving the high-dimensionality problem but also avoiding the product of mixed-type likelihoods that can bias the resulting posterior for the latent classes. - Data-type based representation OM. This approach is based on the previous one and motivated by the technical advantage of avoiding the product of mixed-type likelihoods. We propose a partition of the feature space based on the data-type of the sources, assuming that there is a latent representation for each group. - Prior knowledge based representation OM. In this approach, we propose to group the sources using contextual information of the data such us external relations between sources due to the collection method or context meaning. This approach make sense in a more applied scenario, where we want for example define domains like mobility, physical activity or social interactions, to study behavioral changes over domains instead of over variables, that could be more informative in a health analysis context. In the second stage, different Factorization Models for the CP detector are proposed to consider several weighting mechanisms for the homogeneous local latent representations obtained from the first stage, resulting in a generalized hierarchical CPD methodology that holds for any observation model previously introduced. We present three factorizations models.The first model is based on assuming that the contribution at each time step is independent for each of the local representations. The other two approaches, however, base on the assumption that each of the local representations contributes to the global detection in a different degree, leading to consider models that weight the contribution of each local set, where these weights are learnt from the data. In the experiments, the first approach (correctly combined with different OMs), shows better performance metrics. In the other hand, these differences are not that higher with respect to the weighting models, that in fact provide explainability of each source contributing to the detection, that is useful mainly when our goal is to apply these models to real-world data sets. We evaluated and tested the proposed models on synthetic data, demonstrating an improvement in the precision and a reduction in the delay of the detection, proving their robustness against the presence of missing data. We also apply some of these methods to a real data set within a study of behavioral change characterization in psychiatric patients with a history of suicide-related events. We present individualized models for change detection over passively-sensed data via smartphones, and use suicide attempts and psychiatric emergency admissions as real labels with the aim of predicting them one week in advance. III Behavioral Change Detection in Mental Healthcare The third part of the thesis closes the loop and, consists on the application of some of the developed change-point detection method to a real medical study. The cohort was composed of psychiatric patients with a history of suicidal behavior and/or ideation, as part of the SmartCrisis study. In this study, participants were outpatients with any psychiatric diagnosis undergoing follow-up in the program for secondary suicide prevention at the Fundacion Jimenez Diaz Mental Health Department. Inclusion criteria were age 18 years or older, a history of suicide behavior and/or suicidal ideation according to the Columbia Suicide Severity Rating Scale (CSSRS), ability to understand and sign the informed consent form, and ownership of a smartphone connected to a WiFi network at least once a week. Patients were not compensated for their participation and all of them downloaded the Evidence-Based Behavior (eB2) app to their smartphones, presented in the first part of the thesis. The data was passively collected by their smartphones, including the distance walked, steps taken, time spent at home and the time using applications. We developed individualized models where daily activity profiles were constructed for each patient according to these data. After that, the change-point detection was applied for the resulting data sequence to detect abrupt variations between these profiles distributing over time. Such changes were considered as critical periods, separating behavioral patterns, and we tested their relationship with the recorded suicide events. The behavioral changes identified by the algorithm predicted suicide risk in a time frame of one week with a good accuracy, in particular, an Area Under the Curve (AUC) of 0.79.