
Real-world training data for frontier AI labs and robotics.
None of Hub's data gets collected until the equipment is in someone's hands.
Getting it there, across a country the size of Brazil, is the whole job.
Hub is a real world data infrastructure company. We collect original data that does not exist online: the voice, motion, and first person experience that AI and robots need to understand how people actually live. One of our fastest growing categories is egocentric video, first person footage people record through a head mounted phone. Real people record it, in real homes, doing real tasks. That only works if they have the gear. Backed by Y Combinator. Clients are leading AI labs and Fortune 500 companies. Now scaling on the ground in São Paulo.
We are putting recording equipment, mainly head straps, into the hands of contributors across Brazil. That means importing hardware, clearing customs, managing suppliers, and shipping to people in cities all over the country. Right now nobody owns this. It is unglamorous, and it is the bottleneck the entire Brazil operation runs through.
Week 1. Map the operation as it stands. Where gear comes from, how it moves, where it breaks.
Week 2 to 3. Take ownership of suppliers and shipping. Fix the worst bottleneck first. Start sourcing better partners.
Week 4. A working pipeline. Equipment reaching contributors on a predictable timeline, costs you can report, and a plan to scale it.
Not for you if you want a desk, a fixed process, and someone handing you the steps.
For you if you like owning a real operation in the physical world and watching it get faster because of you. No hard feelings either way.
We use a short application form. Give it fifteen to twenty minutes. We read every answer. We want specific over generic, real stories over theory, and clear thinking in clean English. "Here is exactly what I did and what happened" beats "I am a hard worker" every time.
Hub is a real-world data infrastructure company. We capture humanity's lived richness voice, language, nuance, environment, and physical motion and turn it into the original, high-fidelity datasets that AI models and robotics need to keep learning.
Public web data is finite and increasingly contaminated by AI's own output. The next phase of artificial intelligence requires access to the vast footprint of human experience that was never digitized. Hub is the infrastructure pipeline that brings it in.
Incubated by Y Combinator (P26 batch) and headquartered in Palo Alto, CA, Hub operates a distributed global contributor network spanning over 150 countries and 100+ languages. We work directly with frontier AI labs, Fortune 500 enterprises, and top robotics companies to scope, collect, and deliver custom multimodal training data from scratch.
We are a fast-moving, deeply technical team of builders, data scientists, and ML engineers. We are moving fast to solve the data ceiling problem for foundational models and embodied AI. If you want to build cutting-edge distributed infrastructure that connects global human capability to frontier AI development, we want to hear from you.