Leveraging LLMs to transform complex, unstructured medical texts into a standardized, validated, and machine-readable format.
Complex German-language medical text.
Translation, Extraction, and Quality Control.
Clean, validated, and machine-readable output.
A quick, targeted LLM call determines if complex entities (e.g., nodules) are present.
Only if detected, a second, focused LLM call extracts detailed characteristics based on a specific schema.
Result: Significantly improves accuracy and prevents the model from "hallucinating" information.
The system asks the LLM to identify and fill in data points missed in the initial pass.
It then asks the LLM to review all filled data and correct any inaccuracies it finds.
Result: An automated "peer review" that ensures the highest possible data quality and reliability.
The entire extraction process is driven by JSON schemas. This makes the system highly configurable, extensible, and easy to adapt to new report formats or data types without changing the core code.
Unlocks valuable information trapped in text-based reports, enabling large-scale studies, predictive models, and a deeper understanding of disease patterns.
Making clinical data more FAIR:
Findable, Accessible, Interoperable, and Reusable.