Software Engineer, Agentic Evaluation

Back to all jobs
  • Apple
  • Cupertino, CA
  • Full-Time
  • Today
Published
May 21, 2026
Location
Cupertino, CA
Category
Job Type

Software Engineer, Agentic Evaluation: our view in 3 lines...

  • The Role: Build end-to-end software for Siri and AI-powered experiences, working with engineers, designers, and researchers to evaluate and measure product quality.
  • The Person: Design and build software from prototypes to production on iOS/macOS devices and help define and implement evaluation and measurement systems for quality.
  • Requirements: 3+ years of software engineering experience with proficiency in Swift, Objective-C, Python and familiarity with generative AI coding tools and scripting languages such as Ruby or Bash.

Job Description

We're a team at Apple building software that helps shape the next generation of Siri and AI-powered experiences. The work spans frameworks, tooling, and infrastructure — including a strong focus on how we evaluate and measure the quality of what we ship. We can't say much about specifics, but the problems are new, the surface area is large, and the reach is enormous. We're a collaborative, humble, and curious group that learns from each other and builds together.

Description

You'll work alongside engineers, designers, and researchers to design and build software end-to-end — from early prototypes to production systems running on real devices. You'll have meaningful autonomy in how you get there, and the opportunity to shape both what we build and how we know it's working. The work is hard enough to stretch you, and the team is generous enough to support you while you grow.

Minimum Qualifications

3+ years of software engineering experience with strong CS fundamentals
Proficiency in Swift, Objective-C, Python, or another modern language — strong engineers in adjacent stacks will pick up the rest
You've shipped software that people used, and you're ready to own bigger pieces end-to-end
Expert in using generative AI models for coding — you've integrated tools like Claude, Cursor, or Codex deeply into how you work, and have a point of view on where they help and where they don't
An interest in software evaluation and quality — you care about whether what you build actually works, and want to be on a team that takes measurement seriously
Comfortable with ambiguity; when you're stuck, you dig in
Strong communication and a track record of working well across teams
BS in Computer Science or equivalent experience

Preferred Qualifications

Experience in one or more iOS/macOS domains: system services, UI frameworks, concurrent application architecture, or performance
Background building developer tools, test infrastructure, evaluation systems, or data pipelines
Familiarity with how AI systems are evaluated — offline eval, human eval, A/B, or model-graded approaches
Proficiency with one or more scripting languages (Python, Ruby, Bash)
You seek out feedback and learn fast from those around you
Close to the frontier — curious about new models and techniques, and have a point of view on where human-AI interaction is headed

Key Skills
? Key Skills in dark blue have been inferred based on similar industry roles
Ios/macos System Services UI Frameworks Concurrent Application Architecture Test Infrastructure Evaluation Systems Bash Ruby REST UI C Swift Objective-c Python Ios

Subscribe to Career Resources

Get the latest career advice, industry insights, and job opportunities delivered to your inbox.