Research Engineer

Back to all jobs
Talentpluto
Published
May 17, 2026
Location
San Francisco, CA
Category

Research Engineer: our view in 3 lines...

  • The Role: An engineer who builds and scales QA systems and tooling to ensure training datasets are reliable for reinforcement learning model training and evaluation.
  • The Person: Design and enforce dataset quality standards, build validation and audit tooling including model-assisted and human-in-the-loop workflows, and partner with suppliers to debug and improve dataset quality.
  • Requirements: Proficiency with Python, experience working in Linux environments, experience with Docker, and experience working with large-scale datasets.

Job Description

Location: San Francisco Bay Area
Work model: On-site (some team members are remote, but this role is currently on-site)
Industry: AI infrastructure / Reinforcement Learning (RL) training data & evaluations
Compensation: Competitive (range not provided) + benefits (medical/dental/vision coverage, meals, 401(k), commuter benefits, wellness perk)

About the Company (our partner)

Our partner is a fast-growing, venture-backed AI infrastructure company building the tooling and workflows that power reinforcement learning (RL) training data and evaluation for frontier AI agents. Their platform is used by advanced AI teams across large enterprises and high-growth startups, and they’re scaling quickly to meet strong customer demand. The team is small, highly technical, and execution-focused, with a culture that values ownership, speed, and craftsmanship.

The Opportunity

Our partner is hiring a Research Engineer to help scale the quality assurance (QA) systems behind training data generated through their infrastructure. This role sits at the intersection of data quality, tooling, and applied ML operations: you’ll build the standards, pipelines, and feedback loops that ensure datasets are reliable, consistent, and ready for training and evaluation.

You’ll work closely with internal stakeholders and external data suppliers to diagnose quality issues, improve workflows, and continuously fold QA learnings back into the platform. If you enjoy building systems that make high-quality data scalable—and want to do it in a high-ownership, fast-paced environment—this role is a strong fit.

Responsibilities

  • Define and enforce quality standards for training datasets used for RL training and evaluation
  • Build tooling and workflows to audit supplier-generated datasets, including sampling strategies, validation pipelines (rule-based and model-assisted), and feedback loops
  • Evaluate and implement human-in-the-loop review workflows where beneficial to improve quality and efficiency
  • Partner with external data suppliers to debug quality issues, provide actionable feedback, and improve their data generation processes
  • Integrate QA learnings into internal tools and supplier portals to reduce anomalies, inconsistencies, and edge cases over time
  • Track QA outcomes and continuously improve processes, metrics, and documentation

Requirements

  • Proficiency with Python and experience working in Linux environments
  • Experience with Docker and reproducible development/deployment workflows
  • Experience working with large-scale datasets (validation, transformation, or analysis)
  • Strong problem-solving skills and evidence of rapid learning in technical environments
  • Ability to operate independently and deliver results in an early-stage, fast-moving setting
  • Clear written and verbal communication skills (including collaborating across time zones)

Nice to have

  • Experience building data validation pipelines and/or human-in-the-loop review systems
  • Familiarity with common training-data failure modes and techniques to detect subtle inconsistencies
  • Comfort designing QA metrics, experiments, and processes—not just executing predefined checks
  • Familiarity with modern AI tooling and LLM capabilities

Equal Opportunity & Accessibility

Our partner is an Equal Opportunity Employer and is committed to building an inclusive workplace. They consider all qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. Reasonable accommodations are available throughout the hiring process.

Key Skills
? Key Skills in dark blue have been inferred based on similar industry roles
Data Validation Pipelines Large-scale Dataset Processing Human-in-the-loop Systems QA Metrics And Experimentation ML LLM Accessibility Quality Assurance Python Docker Linux

Subscribe to Career Resources

Get the latest career advice, industry insights, and job opportunities delivered to your inbox.