- One Robot
- San Francisco, CA
- Full-Time
- 1 week ago
- $150k - $275k
Founding Machine Learning – Eval Layer: our view in 3 lines...
- The Role: Build and own evaluation and world-modeling work that validates robot manipulation policies and identifies policy failure modes before deployment.
- The Person: Design, train, and improve evaluation models and confidence layers, enhance model grounding for physical scenes, and build a self-improving data engine for the eval layer.
- Requirements: Very strong coding in Python and PyTorch and a track record in training VLMs or LLMs and developing and shipping evals for VLMs or LLMs.
Job Description
One Robot builds task-specific world models and an evaluation platform for robot manipulation policies.
Training end-to-end policies for robots is vibes-based today. Teams collect data, train, deploy on a real robot, find out what fails, collect more, retry. We replace the trial-and-error with rigorous validation that tells you where your policy will fail and what data to collect to fix it.
Robotics can't industrialize without an evaluation layer. We're building it.
We're solving challenging technical problems around long-horizon autoregressive generation, world model controllability, and closing the sim-to-real gap. We work with real customer data, real failures, and real deployment pressure.
We're based in San Francisco, backed by Accel, YC, several exited founders, and engineering leaders at leading AI companies.
We're small and deliberately so. Everyone is an IC with deep ownership of a wide surface area. The culture is fast iteration and direct responsibility.
Hemanth Sarabu and Elton Shon co-founded One Robot after leading robot learning together at Industrial Next (YC W22), bringing experience from Google, NASA JPL, and Tesla.
We're building the evaluation layer to understand policy failure modes before they hit production. You'll own modeling work that makes the eval trustworthy.
What you'll do:
- Train evaluation models: Develop VLMs that classify and verify policy behavior.
- Build confidence layers: Convert model outputs into trustworthy signals the customer can act on.
- Improve model grounding: Make the eval models reason accurately about physical and spatial scenes.
- Build a self-improving eval layer: Develop data engine that makes the eval models sharper with each customer's deployments and corrections.
Requirements:
- Very strong coding in Python and PyTorch.
- VLM/LLM training: Track record in training VLMs or LLMs.
- Evals experience: Developed and shipped evals for VLMs or LLMs.
One Robot builds task-specific world models and an evaluation platform for robot manipulation policies.
Training end-to-end policies for robots is vibes-based today. Teams collect data, train, deploy on a real robot, find out what fails, collect more, retry. We replace the trial-and-error with rigorous validation that tells you where your policy will fail and what data to collect to fix it.
Robotics can't industrialize without an evaluation layer. We're building it.
We're solving challenging technical problems around long-horizon autoregressive generation, world model controllability, and closing the sim-to-real gap. We work with real customer data, real failures, and real deployment pressure.
We're based in San Francisco, backed by Accel, YC, several exited founders, and engineering leaders at leading AI companies.
We're small and deliberately so. Everyone is an IC with deep ownership of a wide surface area. The culture is fast iteration and direct responsibility.
Hemanth Sarabu and Elton Shon co-founded One Robot after leading robot learning together at Industrial Next (YC W22), bringing experience from Google, NASA JPL, and Tesla.
