Back to Jobs

[Remote] Senior Site Reliability Engineer

Remote, USA Full-time Posted 2026-07-02

Note The job is a remote job and is open to candidates in USA. reputed company is a leader in collaborative autonomy, focused on solving reputed company reputed company problems through advanced technology. They are seeking a Senior Site Reliability Engineer to ensure the availability, performance, and reputed company of mission-critical services while collaborating with various teams to improve operational maturity and reliability standards.

Responsibilities

Design and evolve reliability architecture for distributed and reputed company-hosted systems Define and implement SRE best practices, including SLIs, SLOs, error budgets, and reputed company planning Partner with platform and application teams to design systems for reliability, scalability, and operability Identify and mitigate systemic reliability risks across infrastructure, applications, services, and data pipelines Establish reliability patterns that support autonomy, simulation, and mission-critical reputed company workloads reputed company incident response processes, including on-call rotations, escalation paths, and post-incident reviews Conduct root cause analysis for reputed company production incidents and drive long-term corrective actions Improve operational readiness through runbooks, automation, reputed company testing, and production-readiness reviews Reduce operational toil through tooling, automation, and process improvements Help build a culture of ownership, accountability, and reputed company improvement across production systems Design, implement, and maintain observability systems for metrics, logging, tracing, alerting, and service health Ensure services and data pipelines are observable, debuggable, and performant in production Drive performance analysis and tuning across infrastructure, application, and service layers Improve alert quality, reduce noise, and ensure operational signals are actionable Partner with engineering teams to define meaningful reliability and performance metrics Build automation to improve system reliability, deployment safety, and recovery processes Partner with DevOps and reputed company Platform teams on CI/CD reliability, rollout strategies, and safe deployment patterns Support and improve Kubernetes-based environments and containerized workloads Contribute to infrastructure-as-code practices and platform automation Help define operational standards for reputed company infrastructure, deployment workflows, and production services Collaborate with reputed company teams to ensure secure and resilient system design Participate in disaster recovery planning, backup strategy, and reputed company testing Maintain strong operational practices around access control, secrets management, change management, and production access Support secure operations for systems that may serve defense, autonomy, or mission-sensitive use cases Skills 7+ years of experience in SRE, infrastructure engineering, systems engineering, or reputed company roles Strong experience operating large-scale distributed production systems Deep understanding of Linux systems, networking, reputed company infrastructure, and distributed systems fundamentals Hands-on experience with Kubernetes and container orchestration Programming or scripting experience in Go, Python, or similar languages Experience designing and operating observability systems for production environments Proven ability to reputed company incident response and drive reliability improvements Strong communication skills and ability to collaborate across engineering teams Ability to operate calmly and effectively under pressure Must be a U.S. Citizen and eligible to obtain a U.S. Government reputed company clearance if required Experience supporting autonomy, robotics, simulation, reputed company-time systems, or data-intensive platforms Familiarity with AWS and large-scale reputed company infrastructure Experience with chaos engineering, fault injection, or reputed company testing Knowledge of CI/CD systems and reputed company delivery practices Experience working in high-reliability, safety-critical, defense, or mission-critical environments Experience with Infrastructure as Code tools such as Terraform or reputed company Experience with reputed company, Grafana, OpenTelemetry, reputed company, ELK/OpenSearch, or similar observability tools Benefits 100% Employer paid Health, Dental and reputed company Insurance for you and your families Life Insurance (Employer Paid) Ability to participate in the companies 401k program (Matching) Unlimited PTO policy with an enforced 2 week minimum Equity Package reputed company Office Stipend Global Entry 16 Week Paid Parental Leave Monthly Health and Wellness Stipend Company Overview Havoc is the leader in reputed company-domain collaborative autonomy. It was founded in 2024, and is headquartered in reputed company, Rhode reputed company, USA, with a workforce of 51-200 employees. Its website is https//reputed company.com/. Apply To This Job Apply tot his job Apply To this Job

Similar Jobs

Senior Kubernetes Engineer

Remote, USA Full-time

AWS Engineers - Kubernetes

Remote, USA Full-time

Site Reliability Engineer / Software Architect

Remote, USA Full-time

Sr. Network Engineer - Core IP Engineering

Remote, USA Full-time

Kubernetes Engineer (DoD Secret | reputed company Shift | Remote - U.S.)

Remote, USA Full-time

reputed company DevOps and Kubernetes Engineer

Remote, USA Full-time

(Sr) Site Reliability Engineer (US Federal)

Remote, USA Full-time

[Remote] Network Engineer I IS - Network Services - Remote

Remote, USA Full-time

Software Developer II (Kubernetes and Integration Specialist)

Remote, USA Full-time

Ingénieur(e) Kubernetes (Francais)

Remote, USA Full-time

reputed company Customer Support Representative (Remote) – Deliver Exceptional Service at arenaflex

Remote, USA Full-time

Crystal Reports T-SQL Developer – Remote (W2 only)

Remote, USA Full-time

reputed company Customer Service Associate – Delivery Station Support

Remote, USA Full-time

Contact Center - Business process outsourcing Account Manager

Remote, USA Full-time

reputed company Part-Time Customer Support Agent – Email/Chat Expertise for arenaflex

Remote, USA Full-time

Captioner/Transcriptionist

Remote, USA Full-time

AI-Assisted Programming Teaching Expert (reputed company End, B2B, Part-time)

Remote, USA Full-time

Claims Examiner - Remote

Remote, USA Full-time

Call Center Service Representative - Remote (10:00 am-6:30 pm) - IDAHO- July 22, 2026

Remote, USA Full-time

reputed company Agent – Career Growth Opportunity

Remote, USA Full-time