Catalog of Regulatory Science Tools to Help Assess New Medical Devices
This regulatory science tool presents a method (DRAGen) that identifies the types of errors which Artificial Intelligence (AI) / Machine Learning (ML) medical image classification algorithms might make when used on new populations.
Technical Description
Decision Region Analysis for Generalizability (DRAGen) is a python-based software tool designed to provide insight into an AI/ML model’s ability to generalize. The inputs to this tool are imaging data, corresponding subgroup and class information, and a trained binary classification AI/ML model in ONNX format. The tool creates virtual samples from the provided imaging data. The AI/ML model inference on the virtual samples is used to estimate the composition of the decision space. This composition provides insight into the types of errors the model is likely to make when utilized on new populations. The outputs of this tool are summarized decision region compositions provided in Hierarchical Data Format version 5 (HDF5) files and summary plots.
Additional information on the inputs and outputs, including example code snippets, can be found in User Manual in the GitHub DRAGen repository [2].
Intended Purpose
DRAGen is intended to provide insight into the type of mistakes that the AI/ML model may be more likely to make (i.e., false-positives versus false-negatives) when used on samples not represented in the data used during development and evaluation. Using a user-provided imaging data set, DRAGen generates virtual samples to estimate the composition of the regions of decision space near the data set and provide insight as to how the model may perform when provided samples similar to, but not represented by the data set currently available.
DRAGen is particularly applicable in situations with limited testing data, as it can provide insight into the types of errors that the model is likely to make on subgroups which are not well represented by the testing data [1].
Testing
DRAGen was tested using a simulation study in which an AI/ML model was trained to classify disease status and demographic variables from clinical data [1, 3, 4]. DRAGen analysis revealed a tendency for the decision space of a model to be dominated by a single “preferred” class for each classification task [5]. In Burgon et al. (2024) [1], the preferred class was shown to indicate which types of errors were most often made when the classifier is presented with patients upon which the model did not generalize well. The DRAGen GitHub repository [2] includes an example Jupyter notebook containing an end-to-end case study.
Limitations
The limitations of this tool includes the following:
- The tool is only designed to be used for binary image classification models.
- The tool requires input models to be saved in ONNX format. (The link to instructions on converting models to ONNX format is provided in the documentation.)
- Currently it is unknown how many samples are needed (per subgroup/overall) to provide a reliable estimation of decision space composition.
- The tool has only been tested on Linux.
Supporting Documentation
Tool Website:
- Primary: https://github.com/DIDSR/DRAGen
References
[1] A. Burgon, N. Petrick, B. Sahiner, K. H. Cha and R. K. Samala, "Decision Region Analysis for Generalizability (DRAGen) of AI models: Estimating model generalizability in the case of cross-reactivity and population shift," Journal of Medical Imaging, vol. 11(1), pp. 014501-014501, 2024. https://doi.org/10.1117/1.JMI.11.1.014501
[2] A. Burgon and R. Samala, "DRAGen," 2023. [Online]. Available: https://github.com/DIDSR/DRAGen.
[3] A. Burgon, N. Petrick, B. Sahiner, G. Pennello, K. H. Cha and R. K. Samala, "A tool for the assessment of AI generalizability via decision space composition," Proc. SPIE 12927, Medical Imaging 2024: Computer-Aided Diagnosis, 129271H, 2024. https://doi.org/10.1117/12.3008580
[4] A. Burgon, N. Petrick, B. Sahiner, G. Pennello and R. K. Samala, "Predicting AI model behavior on unrepresented subgroups: A test-time approach to increase variability in a finite test set," FDA Science Forum, 2023. [Online]. Available: https://www.fda.gov/science-research/fda-science-forum/predicting-ai-model-behavior-unrepresented-subgroups-test-time-approach-increase-variability-finite
[5] A. Burgon, N. Petrick, B. Sahiner, G. Pennello and R. K. Samala, "Decision region analysis to deconstruct the subgroup influence on AI/ML predictions," Proc. SPIE 12465, Medical Imaging 2023: Computer-Aided Diagnosis, 124651H, 2023. https://doi.org/10.1117/12.2653963
Contact
Tool Reference
- RST Reference Number: RST24AI04.01
- Date of Publication: 09/19/2025
- Recommended Citation: U.S. Food and Drug Administration. (2025). DRAGen: Decision Region Analysis for Generalizability (RST24AI04.01). https://cdrh-rst.fda.gov/dragen-decision-region-analysis-generalizability