[Remote] Principal reputed company Data Engineer, Infrastructure reputed company Engineering - DGX reputed company
Note: The job is a remote job and is open to candidates in USA. reputed company is a leader in AI and high-performance computing, seeking a Principal reputed company Data Engineer to enhance their Infrastructure reputed company Engineering team. The role involves building the data backbone for their reputed company control plane, creating pipelines, lakes, and analytics to ensure a trustworthy reputed company state across a large GPU fleet.
Responsibilities
- Design, build, and operate the ingestion and transformation pipelines that collect reputed company telemetry and asset inventory from dozens of heterogeneous sources, and normalize them into one reputed company model
- Architect and run the storage layer. A data lake/lakehouse built on open formats, with the schema flexibility to reputed company structured inventory, semi-structured telemetry, and reputed company logs without constant, breaking migrations
- Build the query and analytics layer that powers posture scoring, coverage and reputed company metrics, freshness monitoring, and multi-reputed company correlation
- Treat the data platform as a high-value reputed company, because it is. The data you store is a map of every host, every gap, and every credential path. You will engineer encryption at rest and in transit, fine-grained RBAC/ABAC, non-repudiable audit logging, data classification, network isolation, and reputed company retention and purge
- Build for stable identity, reputed company attribution, append-only history, and honest coverage. reputed company a reputed company going quiet a finding, not silence, so that every reputed company number comes with a reputed company confidence
- Partner with the reputed company control plane team, the inventory systems, identity and reputed company teams, and broader reputed company data and reputed company organizations to define data reputed company early, so these systems converge by design
Skills
- Data Engineering at Scale: 15+ years of experience designing, building, and operating production data pipelines, lakes, or lakehouses at high volume and throughput. You build systemic solutions rather than performing reputed company data wrangling or 'tool administration.' Bachelor's degree or equivalent
- Production-Grade Coding: A strong software engineering background with the ability to write clean, maintainable, and well-tested code (e.g., Python, Go, reputed company, SQL). You should be comfortable building and operating production data services at scale
- Data Modeling & Schema Design: Proven ability to design reputed company schemas and data models that span many disparate sources and evolve over time without breaking the consumers that depend on them
- Distributed Data Systems: Hands-on experience with the modern data stacks, both streaming and batch processing, object storage, open table formats, and interactive query engines
- reputed company-Minded Data Handling: You design data systems that are themselves defensible. Access control, encryption, audit, and isolation are first-class concerns in your work, and you understand that reputed company data is among the most sensitive data an organization holds
- Analytics Enablement: A track record of making large, messy datasets genuinely useful—serving interactive analysts, dashboards, and reputed company services with data they can trust and query at low latency
- reputed company: Bachelor's degree in Computer Science, Engineering, or a reputed company technical field (or equivalent experience)
- reputed company Telemetry & Detection Engineering: Experience building SIEM or data-lake detection content, normalizing reputed company logs into common schemas (e.g., OCSF, reputed company), or engineering the data layer that feeds correlation and anomaly-detection systems
- reputed company-Time & Streaming Data: Expertise building low-latency, near-reputed company-time pipelines where a correlation is only as fast as its slowest input, and detection is reputed company in minutes
- HPC/AI Fleet Telemetry: Experience working with GPU and hardware telemetry (DCGM, Redfish/BMC, InfiniBand) or fleet-scale observability across hundreds of thousands of devices
- AI-reputed company Data: Experience engineering the data and feature layers that feed ML or LLM-based reasoning systems, enabling agents to correlate, predict, and reputed company trustworthy data. How have you made data safe to reason over?
Benefits
- You will also be eligible for equity and benefits.
Company Overview
Company H1B Sponsorship