VisualSHIELD

Downloads

VisualSHIELD source code (GitHub)

Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD

VisualSHIELD is an open-source, extensible web interface that simultaneously provides a standardized deployment of the DataSHIELD infrastructure and a graphical user interface to dsSwissKnife and other R packages in order to simplify the definition of an analysis workﬂow and the visualization of the results.

It was implemented as a Shiny module, a graphical R package that can be embedded into any user-deﬁned Shiny app to provide the federated analysis capability. It was designed with an open-source architecture that makes it extensible and provides a clear framework for the addition of user-deﬁned federated analyses.

The tool provides a simple graphical user interface that integrates DataSHIELD analysis methods such as

histograms
contour plots
heatmaps (Figure 1)
boxplots
correlation matrix

A novel interactive linear regression functionality was implemented in VisualSHIELD (Figure 2) by augmenting the DataSHIELD GLM functionality with statistics not available in DataSHIELD such as R² , adjusted R², and F-score.
Further, automatic variables conversion is achieved by adding the target type after the variable name,
separated by the ‘#’ sign.

Ex. IMP3#num

Further, VisualSHIELD integrates dsSwissKnife analysis methods such as

K-nearest neighbors
principal component analysis
randomForest

and a custom feature selection method (Figure 3).

We used VisualSHIELD to compare traditional machine learning methods, with equivalent methods implemented within DataSHIELD. Specifically, we trained the methods using unresticted data access, and then compared the resulting classifiers with those obtained in DataSHIELD, and found that

the classifiers under consideration do not generalize well when applied to unseen data
logistic regression method worked better on average, closely followed by random forest
some classifiers cannot be trained because they are disclosive of individual-level data

Based on our results, we conclude that the smaller choice of models trainable in a privacy-preserving environment has an acceptable low impact on performances, ideally compensated by the larger choice of the federated dataset researchers might have access to.

If you are going to cite or use this software, please use

References

Tomasoni D, Lombardo R and Lauria M (2024) Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD. Front. Genet. 15. doi: 10.3389/fgene.2024.1270387 link

Back to prototypes