Research Engineer, FAIR Media Data – Meta Superintelligence Labs

Back to all jobs
  • Meta
  • Menlo Park, CA
  • Full-Time
  • 2 weeks ago
  • $154,003/year to $217,000/year
Published
May 5, 2026
Location
Menlo Park, CA
Category
Job Type

Research Engineer, FAIR Media Data – Meta Superintelligence Labs: our view in 3 lines...

  • The Role: An AI research engineer role focused on building and curating large-scale multimodal data for Meta's foundational language and media models.
  • The Person: Design and build scalable data curation systems and tooling, execute pre-/mid-/post-training data projects for multimodal LLM/LMMs, and lead complex technical projects end-to-end.
  • Requirements: Experience with multimodal pre-training or mid-training data curation, published research (ACL/NeurIPS/ICML/ICLR/AAAI/KDD/CVPR/ICCV), familiarity with SQL and file formats such as Hive Iceberg Parquet, Python, PyTorch, and Spark is preferred.

Job Description

Meta is seeking AI research engineers to help us build the data foundation for Meta's large language and media models. We're looking for LLM/LMM expertise to join us on working with data at scale and to push beyond the data ceiling. Our team contributes to data curation across all stages of LLM/LMM development (pre-training, mid-training, post-training) and all domains/modalities (image, video, audio, agent, media perception and generation). We are tackling complex challenges at trillion-scale, including organic data curation, synthetic data generation, agent and interaction data, and frontier paradigms in AI research. Based in Meta Superintelligence Labs (MSL) within the Fundamental AI Research Organization (FAIR), you'll directly contribute to Meta’s frontier models while having the chance to collaborate with researchers and engineers across MSL.

Responsibilities

  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Architect efficient and scalable data curation systems and pipelines
  • Improve data processing speed and throughput across workflows and projects by building and enhancing data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, data scaling laws, or data mixing
  • Lead complex technical projects end-to-end
Minimum Qualifications

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 1+ years of industry research experience in LLM/LMM, computer vision, or related AI/ML models
  • Experience owning and/or driving complex technical projects from end-to-end
  • Practical experience with multimodal pre-training or mid-training data curation for large media perception or generation models
  • Demonstrated data infrastructure and software background, and experience building data tooling and services
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Preferred Qualifications

  • Familiarity with SQL and file formats, such as Hive, Iceberg, Parquet, etc
  • Master's degree or PhD in Computer Science or a related technical field
  • Experience working on frontier-quality/state-of-the-art Large Language or Large Media Models
  • Programming experience in Python and hands-on experience with frameworks like PyTorch or Spark, or related distributed computing frameworks (Ray, DataFlow)

$154,003/year to $217,000/year + bonus + equity + benefits

Key Skills
? Key Skills in dark blue have been inferred based on similar industry roles
Apache Spark Distributed Data Pipelines Parquet / Iceberg / Hive File Formats Data Curation For Pre-/mid-/post-training Large Language / Media Model Data Scaling ML Hive Dataflow Computer Vision LLM Python Spark SQL Pytorch

Subscribe to Career Resources

Get the latest career advice, industry insights, and job opportunities delivered to your inbox.