Metric multi-dimensional scaling for longitudinal data embeddings in pharmacometrics
Abstract
Longitudinal data in pharmacometrics typically involves multiple time-varying inputs and outputs for each subject in a population. Each subject can have a different number of observations at different time points, leading to irregular data structures that are difficult to analyze directly. Nonlinear mixed effects (NLME) models are the standard approach for modeling such data, but they can be computationally intensive and may not scale well with large datasets or complex models. In particular, for a large number of input covariates and output biomarkers and endpoints, the computational cost of fitting NLME models can become prohibitive. Some machine learning (ML) methods can be useful in eliminating useless covariates and biomarkers for a relatively low computational budget. Many ML models require fixed-size data as inputs and outputs. Such a tabular representation of a (usually) more complex data structure is commonly known as an embedding. In this work, we generate dissimilarity-preserving embeddings for longitudinal data commonly used in pharmacometrics. We use metric multi-dimensional scaling (MMDS) along with dynamic time warping (DTW) to generate fixed-size embeddings for each time-varying variable of each subject in a population. An experiment on a synthetic pharmacokinetic dataset shows that the proposed procedure can generate useful embeddings that preserve neighborhood structures. This has potential applications in covariate and biomarker elimination as well as model evaluation, to be investigated in future works.