Developing a robust computational pipeline for model-based and data-driven phenotype clustering

Systems Pharmacology
Modeling & simulation


Develop a robust computational method for patient stratification that works in very challenging conditions, where the limited number of subjects included in the population pool or the high complexity of the considered disease prevent the application of standard statistical methods.

What we did

We defined an innovative method for phenotype classification that combines experimental data and a mathematical description of the disease biology. The methodology exploits the mathematical model for inferring additional subject features relevant for the classification. Finally, the algorithm identifies the optimal number of clusters in an unsupervised manner and classifies the samples based on a subset of the features estimated during the model fit.



We applied the algorithm to analyze a real clinical test case, in the context of a lysosomal rare disorder, for which the amount of available data was very limited. Our methodology allowed the inference of an additional phenotype division that the experimental data alone did not show.


Sanofi is dedicated to making a difference in patients' daily lives, wherever they live, and enabling them to enjoy a healthier life.


Simoni G, Kaddi C, Tao M, Reali F, Tomasoni D, Priami C, Azer K, Neves-Zaph S, Marchetti L, A robust computational pipeline for model-based and data-driven phenotype clustering, Bioinformatics 37(9):1269–1277, 2021.

Other Case Studies

Related Content