SymbolicR

SymbolicR

Extending PBPK models through data-driven Symbolic Regression

SymbolicR is an open-source R framework for discovering non-linear formulas directly from data through an incremental, memorized symbolic regression workflow. Rather than forcing users into a one-shot black-box search, SymbolicR supports a progressive and human-in-the-loop process: researchers can explore candidate expressions randomly, refine them with genetic optimization, and explicitly test user-provided formulas through interoperable search modes. This makes formula discovery more transparent, auditable, and aligned with the way scientific hypotheses are developed and validated.

graphical abstract

The framework is designed for real-world modeling tasks where interpretability matters as much as predictive performance.

SymbolicR has already been used to extend physiologically based pharmacokinetic (PBPK) models: an ad hoc symbolic regression procedure identified analytical formulas linking in silico molecular properties to mechanistic model parameters.

In our study, the resulting extended PBPK model generalized to an independent validation set and achieved a median absolute average fold error of 1.18, supporting its practical value for de-risking aberrant pharmacokinetic behavior in drug development.

A key strength of SymbolicR is its lightweight adoption path. The package has a small core dependency set for its main functionality, with only a limited number of suggested packages for vignettes and visualization, making it easy to install and integrate into existing R-based scientific workflows.

SymbolicR is currently available both on GitHub, and CRAN.

If you are going to cite or use this software, please cite DOI

References

Tomasoni, D., Paris, A., Visintainer, R. et al. Predicting Aberrant Fc-fusion Protein Pharmacokinetics from In Silico Structural Properties and Physiologically Based Pharmacokinetic (PBPK) Modeling. AAPS J 28, 87 (2026). https://doi.org/10.1208/s12248-026-01232-z