
Real-world training data for frontier AI labs and robotics.
Hub is a real-world data infrastructure company. We capture humanity's lived richness voice, language, nuance, environment, and physical motion and turn it into the original, high-fidelity datasets that AI models and robotics need to keep learning.
Public web data is finite and increasingly contaminated by AI's own output. The next phase of artificial intelligence requires access to the vast footprint of human experience that was never digitized. Hub is the infrastructure pipeline that brings it in.
Incubated by Y Combinator (P26 batch) and headquartered in Palo Alto, CA, Hub operates a distributed global contributor network spanning over 150 countries and 100+ languages. We work directly with frontier AI labs, Fortune 500 enterprises, and top robotics companies to scope, collect, and deliver custom multimodal training data from scratch.
We are a fast-moving, deeply technical team of builders, data scientists, and ML engineers. We are moving fast to solve the data ceiling problem for foundational models and embodied AI. If you want to build cutting-edge distributed infrastructure that connects global human capability to frontier AI development, we want to hear from you.