Senior Site Reliability Engineer
Essential Functions:
* Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.
* Define and maintain operational standards, runbooks, support procedures, escalation paths, and service-level objectives.
* Evaluate system architecture and changes to ensure they balance functional requirements, service quality, reliability, security, and compliance needs.
* Drive continuous improvement in platform stability, maintenance, and availability.
* Provide advanced technical support and troubleshooting for complex platform and service issues affecting internal users and stakeholders.
Experience and Skills Required:
* 8+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Systems Engineering, or related infrastructure roles supporting production services.
* Strong experience with Linux systems administration and troubleshooting in enterprise environments.
* Strong experience operating and maintaining on-prem Kubernetes platforms and all related components including CRI, CNI, and CSI plugins.
* Experience deploying and maintaining applications on Kubernetes using Helm, Kustomize, and similar tooling.
* Experience supporting DevOps tooling such as GitLab, Artifactory, Jira, Confluence.
* Experience with GitOps tools such as FluxCD or ArgoCD.
* Proficiency scripting with at least one of Python, Go, or Bash.
* Strong experience designing, maintaining, and maturing observability tooling including monitoring, dashboards, logging and tracing, and supporting SLOs.
* Strong understanding of reliability engineering concepts:
+ Service health indicators
+ High availability design, failure reduction, and testing
+ Operational readiness practices, including developing documentation, runbooks, and architectural descriptions
+ Incident response, root cause analysis, remediation/recovery
* Ability to obtain a security clearance, which includes U.S.
citizenship.
Preferred:
* Experience with multiple Linux distributions including Ubuntu.
* Experience with at least one of the following: Tanzu Kubernetes, Nutanix Kubernetes Platform, Canonical Kubernetes.
* Experience with cloud platforms such as AWS and Azure.
* Experience with infrastructure automation and configuration management.
* Experience managing AI tooling on Kubernetes including MCP Servers, LLM platforms (vLLM, Ollama), Kubeflow.
* Experience with security and compliance considerations in regulated environments.
* DoD experience.
* Active or inactive Secret Security Clearance.
Education:
* Bachelor’s degree in CS, Software Engineering or other IT-related field or equivalent experience
REMOTE WORK NOTICE: This position may be performed fully remote, hybrid, or onsite at an ARA office.
Preference will be given to c...
- Rate: Not Specified
- Location: Albuquerque, US-NM
- Type: Permanent
- Industry: Management
- Recruiter: Applied Research Associates, Inc
- Contact: Nina Uka
- Email: to view click here
- Reference: SENIO009685-00001
- Posted: 2026-04-04 07:53:32 -
- View all Jobs from Applied Research Associates, Inc
More Jobs from Applied Research Associates, Inc
- Consulting Associate, Environmental Construction (DDD)
- Right of Way (ROW) Agent (Field Based)
- Enviromental Health Safety Associate Manager (Field Based)
- SWPPP Superintendent - Southeast Portfolio
- Cutting Operator
- Power Fluid Technician - Prosperity Plywood
- Project Controls Specialist
- Project Controls Specialist
- Project Controls Specialist
- Civil Foreman
- Bolt-up Fitter Helper
- Advisor - Thayer Street
- Sales Manager - Buckhead
- Optometric Technician - Lexington Avenue
- Sales Supervisor - Belle Hall
- PRN Physical Therapist Assistant
- PRN Occupational Therapy Assistant
- Optical Manager - Montgomery Mall
- Licensed Optician -Cherry Hill Mall
- Occupational Therapist