- Apple
- Cupertino, CA
- Full-Time
- Today
Software Engineer, Agentic Evaluation: our view in 3 lines...
- The Role: Build end-to-end software for Siri and AI-powered experiences, working with engineers, designers, and researchers to evaluate and measure product quality.
- The Person: Design and build software from prototypes to production on iOS/macOS devices and help define and implement evaluation and measurement systems for quality.
- Requirements: 3+ years of software engineering experience with proficiency in Swift, Objective-C, Python and familiarity with generative AI coding tools and scripting languages such as Ruby or Bash.
Job Description
We're a team at Apple building software that helps shape the next generation of Siri and AI-powered experiences. The work spans frameworks, tooling, and infrastructure — including a strong focus on how we evaluate and measure the quality of what we ship. We can't say much about specifics, but the problems are new, the surface area is large, and the reach is enormous. We're a collaborative, humble, and curious group that learns from each other and builds together.
Description
You'll work alongside engineers, designers, and researchers to design and build software end-to-end — from early prototypes to production systems running on real devices. You'll have meaningful autonomy in how you get there, and the opportunity to shape both what we build and how we know it's working. The work is hard enough to stretch you, and the team is generous enough to support you while you grow.
Minimum Qualifications
3+ years of software engineering experience with strong CS fundamentals
Proficiency in Swift, Objective-C, Python, or another modern language — strong engineers in adjacent stacks will pick up the rest
You've shipped software that people used, and you're ready to own bigger pieces end-to-end
Expert in using generative AI models for coding — you've integrated tools like Claude, Cursor, or Codex deeply into how you work, and have a point of view on where they help and where they don't
An interest in software evaluation and quality — you care about whether what you build actually works, and want to be on a team that takes measurement seriously
Comfortable with ambiguity; when you're stuck, you dig in
Strong communication and a track record of working well across teams
BS in Computer Science or equivalent experience
Preferred Qualifications
Experience in one or more iOS/macOS domains: system services, UI frameworks, concurrent application architecture, or performance
Background building developer tools, test infrastructure, evaluation systems, or data pipelines
Familiarity with how AI systems are evaluated — offline eval, human eval, A/B, or model-graded approaches
Proficiency with one or more scripting languages (Python, Ruby, Bash)
You seek out feedback and learn fast from those around you
Close to the frontier — curious about new models and techniques, and have a point of view on where human-AI interaction is headed
