Clinical Data Scientist (Clinical Informatics Team)
About QuantHealth
QuantHealth is a growing AI startup in the clinical trial space, leveraging AI, biomedical data, knowledge graphs, and real-world patient data to simulate and optimize clinical trials for pharmaceutical companies.
Our platform helps customers simulate clinical trials, reduce development risk and cost, shorten timelines, and improve the probability of clinical trial success.
About the Role
As a Clinical Data Scientist in the Clinical Informatics Team (R&D), you will play a key role in building production-grade AI-powered data systems and intelligent clinical data workflows that transform complex real-world clinical data into structured, production-ready assets for QuantHealth’s clinical simulation platform.
This role sits at the intersection of data engineering, applied AI, and biomedical data systems. You will work closely with clinical data scientists, Clinical Informaticians, DataOps, and engineering teams to develop and productionize intelligent workflows for large-scale clinical data processing, information extraction, data harmonization, and AI-assisted clinical data curation.
The role focuses heavily on working with large-scale structured and unstructured healthcare data, including clinical notes, longitudinal patient records, biomedical literature, and real-world evidence datasets. You will design and maintain production-grade pipelines and AI workflows that balance extraction quality, scalability, operational reliability, and computational cost.
The ideal candidate combines strong data science and data engineering capabilities with hands-on experience developing AI-powered data workflows using modern Python-based tooling and LLM technologies. Healthcare or biomedical data experience is a strong advantage, but we also welcome technically strong candidates with demonstrated ability to learn complex domains quickly.
You will own AI-powered clinical data workflows end-to-end, from early prototyping through production deployment, monitoring, and iterative improvement.
Responsibilities
- Design, develop, and maintain scalable AI/data pipelines and intelligent clinical data workflows for processing large-scale structured and unstructured clinical data.
- Build production-grade LLM-based systems for clinical note extraction, phenotype extraction, entity normalization, literature mining, and biomedical information retrieval.
- Develop automated clinical data curation systems and structured extraction workflows that transform highly heterogeneous, incomplete, and longitudinal real-world healthcare data into high-quality analytical assets.
- Integrate and harmonize heterogeneous clinical and biomedical datasets from multiple structured and unstructured sources.
- Develop and maintain reusable structured data assets supporting QuantHealth’s clinical simulation and retrospective analysis platform.
- Design and implement longitudinal clinical data representations and feature engineering workflows, including handling missingness, temporal relationships, data harmonization, and imputation strategies.
- Develop agentic AI workflows and human-in-the-loop systems for clinical data interpretation, validation, and quality assurance.
- Design and implement evaluation frameworks for AI extraction quality, structured outputs, reliability, and workflow performance.
- Build internal applications and lightweight tooling for clinical review, validation, QA workflows, dashboards, and AI-assisted data exploration.
- Develop scalable ETL and data transformation pipelines using Python, Spark, Databricks, SQL, and cloud-based environments.
- Collaborate closely with Clinical Informaticians, CDS researchers, DataOps, and engineering teams to productionize AI workflows and data systems.
- Optimize workflows for scalability, robustness, computational efficiency, latency, and operational cost.
- Write modular, maintainable, and production-quality code, including testing, monitoring, and validation mechanisms.
- Contribute to internal standards and best practices for AI systems, data engineering, LLM workflows, and clinical data infrastructure.
- Stay current with emerging technologies and methodologies in LLMs, agentic systems, biomedical AI, healthcare NLP, and scalable AI infrastructure.
Qualifications
- MSc or BSc in computer science, data science, computational biology, biomedical engineering, or a related quantitative field.
- 3+ years of hands-on experience in applied AI, data engineering, data science, machine learning engineering, or large-scale data systems development.
- Strong Python programming skills and experience building modular, production-quality data pipelines.
- Practical experience with SQL, Spark, Databricks, and large-scale distributed data processing environments.
- Experience building and maintaining production-grade AI systems, LLM workflows, or large-scale data pipelines, including testing, validation, monitoring, and performance optimization.
- Experience working with LLM-based systems, structured extraction pipelines, retrieval workflows, or AI-assisted automation systems.
- Strong understanding of software engineering best practices, including code organization, testing, reproducibility, and maintainability.
- Experience working with APIs, cloud-based environments, and scalable data infrastructure.
- Strong problem-solving abilities and ability to work effectively in highly ambiguous and evolving environments.
- Excellent communication and collaboration skills in cross-functional technical organizations.
Strong Advantages
- Experience working with healthcare, biomedical, clinical, or real-world patient data.
- Experience developing healthcare NLP or clinical note extraction systems.
- Experience with agentic AI systems, retrieval-augmented generation (RAG), or structured output pipelines.
- Experience building internal AI applications, QA tooling, dashboards, or lightweight full-stack data products.
- Experience with PyTorch, MLflow, FastAPI, Streamlit, Snowflake, or modern AI engineering frameworks.
- Familiarity with healthcare ontologies and vocabularies such as ICD, SNOMED, RxNorm, LOINC, or OMOP.
- Experience working with messy, heterogeneous, longitudinal real-world data (RWD/EHR).
- Experience with temporal modeling, missing data handling, imputation methodologies, or longitudinal patient-level feature engineering.
- Experience in healthcare analytics, HMOs, biotech, pharma, computational biology, or biomedical AI environments.
- Experience balancing AI workflow quality, scalability, latency, and operational cost in production environments.
Create a Job Alert
Interested in building your career at QuantHealth? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
