Back to News Hub
🟢TechCrunch AI
June 17, 2026
General AI

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it.

Overview

If physical AI is going to match the accomplishments of LLMs, there's a data problem that needs to be solved. Two weeks ago, OpenAI said it would relaunch the robotics program it shuttered in 2021 - the latest signal that the biggest AI labs are racing to teach machines to operate in the physical world. But building capable robots requires something the AI industry doesn't yet have, which is the training data to match that used for language models.

Key Takeaways

  • That gap is creating a new kind of infrastructure business.

    Unlike LLMs that were trained on a vast sea of publicly available text, robots need data that captures physical interaction, and that kind of data barely exists.

  • Co-founder and CEO Philipp Wu says XDOF, which has about 60 employees, is already working with 20 customers, including several frontier AI labs, but cannot name them.

    "All of the top labs are trying to pursue robotics," Wu said.

  • "We didn't have large-scale data to work with," he told TechCrunch.

    "There was this chicken-and-egg problem - we first needed to actually collect data before we could even ask how to train a foundation model for robotics.

  • Mindful that data provision alone can be a dead-end business, the company is also focused on data cleaning, tooling, and annotation - creating a self-reinforcing feedback loop for robot trainers.

    As a starting point, the company is partnering with UC Berkeley's AI Research lab to release what it believes is the largest collection of high-quality robot training data ever assembled, dubbed ABC .

  • Unlimited degrees of freedom The company plans to work across three tiers of a data pyramid.

Stats & Key Facts

  • #The startup aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can't easily build themselves - and has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to do it.
  • #Co-founder and CEO Philipp Wu says XDOF, which has about 60 employees, is already working with 20 customers, including several frontier AI labs, but cannot name them.
  • #Spotting the opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 2024 to provide a data ecosystem for companies pursuing robotics models.

That gap is creating a new kind of infrastructure business. Unlike LLMs that were trained on a vast sea of publicly available text, robots need data that captures physical interaction, and that kind of data barely exists. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the physical world.

XDOF (pronounced "ecks-doff"), emerging from stealth today, is betting that the next great bottleneck in AI isn't models or chips, but the data feedback loop needed to teach robots how to interact with the physical world. The startup aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can't easily build themselves - and has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to do it. Co-founder and CEO Philipp Wu says XDOF, which has about 60 employees, is already working with 20 customers, including several frontier AI labs, but cannot name them.

"All of the top labs are trying to pursue robotics," Wu said. "We've already seen some of the downfalls of falling a little bit behind in the language model race ... you don't want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier.

For more details please read the original article at TechCrunch AI.

Continue Learning

Originally published by TechCrunch AI
Read the original

Comments

Sign in to join the conversation