Sr Site Reliability Engineer
Business Unit:
STChealth is a company focused on vaccine intelligence and immunization data management — it connects public and private healthcare sources to deliver real-time immunization information.
Their platform is used by thousands of locations, and they emphasize data integrity, real-time analytics, and enabling better decision-making in public health.
Headquarters: Phoenix, Arizona (US).
Job Summary:
The Site Reliability Engineer (SRE) supports a U.S.
public health SaaS platform processing protected health information (PHI) under HIPAA.
The role emphasizes automation, monitoring, and reliability engineering for regulated environments.
The SRE will partner closely with U.S.-based teams to enhance observability, CI/CD automation, and operational maturity in non-production and staging systems—maintaining compliance with HIPAA, SOC2, and corporate data protection standards.
Core Responsibilities
- Automate infrastructure provisioning, configuration, and maintenance using Terraform, Ansible, and Python.
- Build, enhance, and maintain CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline for continuous delivery and consistency across environments.
- Implement and optimize monitoring solutions using Datadog, Prometheus, Grafana, and ELK/EFK stacks to ensure high service reliability.
- Develop alerting strategies and escalation paths aligned to service-level objectives (SLOs) and key performance indicators (KPIs).
- Build custom scripts and automation for patching, validation, and system health checks.
- Partner with U.S.
SREs and Engineering teams on environment management, change control, and incident response improvements.
- Analyze logs and performance metrics to identify stability issues, optimize cloud costs, and drive continuous improvement.
- Maintain detailed runbooks, SOPs, and documentation supporting operational readiness and knowledge transfer.
- Contribute to open-source or internal tooling that enhances automation, monitoring, or observability capabilities.
- Conduct periodic reliability reviews, performance tests, and failover simulations to validate readiness.
- Support adoption of infrastructure-as-code, immutable environments, and container orchestration (Docker/Kubernetes).
- Promote DevOps and SRE best practices across the engineering organization.
Tools & Technologies
AWS (EC2, S3, Lambda, CloudWatch, IAM, RDS, ECS/EKS), Terraform, Ansible, Python, Bash, Jenkins, GitHub Actions, Docker, Kubernetes, Prometheus, Grafana, ELK/EFK, Loki, Jira, Confluence.
Qualifications
- 5–7 years in SRE, DevOps, or Infrastructure Engineering.
- Bachelor’s degree in computer science or related field of study preferred, or equivalent experience
- Experience supporting U.S.
healthcare or other regulated SaaS systems (HIPAA, SOC2, ISO27001).
- Strong scripting and automation (Ansible, Jenkins, Python, Bash, Terraform, CloudFormation).
- Understanding of CI/CD, networking, and secure cloud architecture.
- Prove...
- Rate: 92268
- Location: Mumbai, IN-MH
- Type: Permanent
- Industry: IT
- Recruiter: Bizmatics India Private Limited
- Contact: Not Specified
- Email: to view click here
- Reference: R0032625
- Posted: 2026-03-23 07:24:41 -
- View all Jobs from Bizmatics India Private Limited
More Jobs from Bizmatics India Private Limited
- Postbote für Pakete und Briefe (m/w/d)
- Senior Consultant, Permitted Bat Biologist, Natural Resources
- Consulting Director, Natural Resources
- Managing Consultant, Scientist/Project Manager
- Executive Administrative Assistant
- Production Operator
- Manager, Capacity Analytics
- Multi-Craft Maintenance Technician
- Dry End Superintendent
- Event Manager
- Sales Manager
- Process Safety Manager
- Executive Assistant and Office Manager
- Kilns Forklift Operator- Talladega, AL
- Electrical Technician
- Advanced Quality Planning Engineer
- Project Controls Analyst
- Summer 2026 Maintenance Intern
- Bolt Up Fitter
- Maintenance Mechanic - Multiple shifts available (CPP Syracuse) (Chittenango, NY)