[Remote] QA Engineer, AI Products

Remote, USA Full-time Posted 2026-06-26

Note: The job is a remote job and is open to candidates in USA. MDCalc is a leading medical reference tool used by clinicians worldwide, and they are seeking a QA Engineer to enhance their AI product team. This role focuses on ensuring the quality and reliability of AI-powered features, particularly in testing LLM-based systems, while collaborating with cross-functional teams to define quality metrics and testing strategies.

Responsibilities

Design and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detection
Build and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputs
Perform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge cases
Define quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost) and establish thresholds for release readiness
Collaborate cross-functionally with engineers, product managers, ML/AI engineers, and clinical reviewers to define what 'good' looks like for AI responses
Investigate and triage AI failure modes, distinguishing model issues, prompt issues, retrieval issues, and integration bugs
Participate in team discussions, offering feedback on testability, risks, prompt design, and guardrails
Help develop QA strategies to expand future testing capacity, automation, and evaluation coverage as the AI product surface grows

Skills

5+ years of experience in software QA, with at least 1 year of hands-on testing of LLM-based or AI/ML-powered features
Strong understanding of QA principles, test case creation/documentation, and best practices for both deterministic and non-deterministic systems
Hands-on experience with LLM tooling and concepts: prompt engineering, RAG systems, evaluation frameworks (e.g., Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, OpenAI Evals), and LLM APIs (OpenAI, Anthropic, etc.)
Experience designing automated qualitative evaluation approaches, including LLM-as-judge, rubric-based scoring, semantic similarity checks, and golden dataset regression testing
Proficiency with test automation tools, with a focus on Playwright
Strong SQL skills for data validation, test data creation, and verifying data integrity across systems
Familiarity with token usage, latency profiling, and cost monitoring as quality signals
Eagerness to learn quickly and a positive, solutions-oriented attitude
Clear and concise communicator, able to surface issues, blockers, and risks effectively when communicating ambiguous or probabilistic failures
Self-motivated, proactive, and able to manage time and priorities independently

Benefits

Medical, Dental, & Vision Coverage, with option to extend to your dependents
Company-sponsored short-term insurance
Fully-paid 8 week parental leave, after 6 months of employment
Company-sponsored 401k, after 3 months of employment
Unlimited vacation for salaried roles - we trust you to take the time you need
Bi-annual company offsites to connect, reflect, and plan together
Work from home monthly stipend
A culture of fun and motivated team members who believe in a greater mission here at MDCalc

Company Overview

MDCalc is used by over 2/3 of US physicians, provides free and access to 800+ medical scores, calculations and algorithms. It was founded in 2005, and is headquartered in New York, New York, USA, with a workforce of 11-50 employees. Its website is https://www.mdcalc.com.

Apply To This Job Apply tot his job Apply To this Job

Apply Now

[Remote] QA Engineer, AI Products

Responsibilities

Benefits

Similar Jobs

Senior Threat Intelligence Analyst (Iran APT Focus)

[Remote] senior cybersecurity threat intelligence analyst (Remote, US)

[Remote] QA Engineer (Manual)

Sr. Cyber Threat Intelligence Analyst - Remote Opportunity ($26-$35/Hour)

VP, Cyber Threat Intelligence (Remote)

[Remote] Cyber Security Threat Analyst

Experienced Cyber Threat Intelligence Analyst - Remote Opportunity in Airline Industry Cybersecurity

Cyber Threat Intel Analyst - Mid

Remote Cyber Threat Intelligence Analyst – Entry Level Opportunity with blithequark

Sr. Threat Hunting Intelligence Analyst (Remote, West Coast)

[Remote] Product Manager

Experienced Remote Data Entry Specialist – Healthcare Operations Support

Adjunct Faculty - Undergraduate Communication Disorders - Audiology (remote/asynchronous)

Administrative Assistant / Data Entry Clerk (Remote Work From Home Online Telecommute)

Experienced Financial Assistance Counselor – Patient Business Services (Remote) at arenaflex

Bilingual Events Consultant, Talent Acquisition (12-Month Contract / Secondment)

Part-time Customer Support Representative - Chat: Join arenaflex's Dynamic Team and Revolutionize Automotive Customer Experience

Developer, Oracle Systems

Experienced Data Analyst – High-Level Examination and Content Development at arenaflex

Data Engineer (Azure Databricks)