Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI
Three hundred fifty million Americans rely on a healthcare system whose decision-making has become slow, costly, and adversarial - care delayed by prior authorization and paperwork, claims that misfire, clinical decisions made without the right information at the right moment, and patients who struggle to navigate or afford the care they need.
Deloitte has a new AI-first effort,, backed by $1B in committed investment, building the reasoning models and agentic systems to rebuild how that system decides - across payers, providers, and life sciences, and for the patients they serve - so that care is faster, fairer, and far less wasteful.
This is not AI applied at the margins.
It is a ground-up rebuild of the decision-making machinery behind American healthcare, at national scale.
This is resourced to do real post-training at scale - committed investment in GPU compute and training infrastructure, not toy fine-tunes.
As a Research Engineer on our post-training team, you will design, train, evaluate, and align the models that reason about healthcare - working across the full post-training lifecycle to shape model behavior for clinical and operational decisioning across the industry.
Healthcare decisioning is one of the cleanest verifiable-reward domains outside math and code: the problems are hard.
We ground that reward in real signals - clinical policy and criteria, adjudicated outcomes, and clinical-expert judgment - so correctness is checkable rather than asserted.
You will own the post-training stack for our clinical reasoning models end to end - from data and reward design through trained, evaluated models that ship.
This is not a prompt-engineering role.
We are looking for people who understand not just how to use LLMs, but how to improve and shape model behavior through advanced post-training.
You do not need a healthcare background.
We pair every engineer with clinical and domain experts and teach you the domain - you bring the modeling depth.
We hire on demonstrated depth, not years - the level you join at is determined through our interview process, based on the depth and judgment you demonstrate, not your years in a title.
Work you'll do
Post-training & alignment
• Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.
• Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
• Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.
Reward modeling & data
• Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.
• Curate, clean, synthesize, and evaluate large-sca...
- Rate: Not Specified
- Location: Gilbert, US-AZ
- Type: Permanent
- Industry: Management
- Recruiter: Deloitte
- Contact: Not Specified
- Email: to view click here
- Reference: 355692
- Posted: 2026-06-20 08:07:54 -
- View all Jobs from Deloitte
More Jobs from Deloitte
- Sr Buyer and Category Leader
- Senior Manager of Global Benefits
- Multi-Craft Maintenance Technician
- Inside Sales Supervisor
- Senior Risk Analyst
- Industrial Electrical Technician - Talladega, AL
- Production Supervisor
- Process Engineer
- Safety Specialist
- Customer Account Coordinator
- Upstream Seed Treatment Key Account Manager
- Project Manager - Mount Jewett
- Field Services Recruiter
- Quality Technician
- Lab Technician
- Data Platform & Engineering Specialist
- Sr. Valve Engineer
- Field Sales Executive - Adelaide
- Optical Supervisor - The Promenade at Westlake
- Optometrist, PT - Annapolis Town Center