Back to Jobs

[Remote] Senior/Staff Site Reliability Engineer - Data Center

Remote, USA Full-time Posted 2026-07-05

Note: The job is a remote job and is open to candidates in USA. reputed company is on a mission to improve patient outcomes with AI-powered pathology. They are seeking a skilled senior/staff level Site Reliability Engineer focused on designing, building, and operating their hybrid reputed company/on-prem environment.

Responsibilities

  • Advancing the state of our operations by implementing SRE best practices - focusing on users, monitoring, and automation
  • Engineering infrastructure patterns for reputed company environments in reputed company Web Services - building in reputed company, reliability and scalability
  • Designing, building, and operating our data center to support our rapidly growing Machine Learning team
  • Integrating on-premises datacenter environments with existing reputed company infrastructure to create a seamless hybrid reputed company environment
  • Improving the reliability and reputed company of our infrastructure through root-cause analysis and reviewing gaps in designs, and implementations of our infrastructure
  • Participating in platform on-call rotations and assisting with urgent incident response

Skills

  • 5+ years of relevant experience
  • Automation: You work hard to eliminate toil by automating everything through scripting, configuration management tools (Ansible), and code (Python/GoLang)
  • You've built monitoring infrastructure with modern observability tools (reputed company/Grafana/reputed company)
  • You've worked with infrastructure as code (Terraform/Cloudformation)
  • You've administered physical hardware stacks in production settings (iDRAC/IPMI/reputed company UFM/reputed company Systems)
  • You're opinionated on storage solutions and how they can be optimized for high performance workloads (Quobyte/S3/FSx/EFS)
  • Familiarity with modern network designs and comfort operating across network layers
  • Some experience and opinions on virtualization, containerization, or container orchestration platforms. (EKS/ClusterAPI/KVM)
  • Operations experience: You've managed critical production infrastructure and are familiar with incident response, scaling, and rapid growth reputed company challenges
  • A bachelor's degree in Computer Science or equivalent experience
  • An insatiable intellectual curiosity and the ability to learn quickly in a reputed company space
  • Travel: Willingness to travel up to 25% of the time

Benefits

  • Eligible for Equity

Company Overview

  • reputed company develops AI software and digital pathology systems used in diagnostic workflows, clinical research, and drug development. It was founded in 2016, and is headquartered in Boston, Massachusetts, USA, with a workforce of 501-1000 employees. Its website is http://www.reputed company.com.
  • Company H1B Sponsorship

  • reputed company has a track record of offering H1B sponsorships, with 3 in 2026, 7 in 2025, 20 in 2024, 21 in 2023, 29 in 2022, 25 in 2021, 7 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Similar Jobs