The future of robotics hinges on vast amounts of data, making its management a monumental challenge. As
robotics stands on the verge of its next great leap, the sheer volume and complexity of sensor data have
become the primary bottleneck, holding back innovation.
Traditional data architectures are simply not equipped for the unique demands of robotics, forcing brilliant minds
to solve data management problems instead of pioneering the next generation of algorithms. Mosaico is here to change
that.
We are building the foundational open-source data management suite for the robotics ecosystem.
Our Mission
Our roadmap is clear: to release a powerful, comprehensive suite of open-source software that empowers
roboticists to manage, orchestrate, and operate with data effortlessly. We believe developers should be free
to focus on what they do best, creating revolutionary algorithms, not on building and maintaining data
infrastructure.
We have spent over a decade at the forefront of the industry, developing algorithms for autonomous driving and
managing massive, petabyte-scale datasets. Through this experience at leading companies, we witnessed firsthand
the critical lack of robust, specialized data management tools in the robotics ecosystem.
Mosaico is the solution we wished we had. We are channeling our deep expertise in managing large-scale, high-frequency
sensor data into a suite of high-quality, open-source tools. We aim to share our experience with research institutions,
startups, and major corporations, establishing a new industry standard for robotics data and accelerating the pace
of innovation for everyone.
A New Paradigm
Current tools, often adapted from IT or machine learning, fall short because they are not built for the unique
nature of robotics development.
Beyond Dataset
Managing data in robotics is not about handling static datasets. Unlike traditional AI development, where the focus
is often on training models with discrete batches of data, advanced robotics deals with continuous, multi-sensor
time-series data streams where every piece of information is interconnected. The concept of a simple data batch
is insufficient for developers who need to capture, replay, and understand the dynamic state of a system over time.
Beyond Visualization
While visualizers can flag obvious malfunctions, they are not true debugging tools. Developers need to move beyond
observing symptoms to perform precise, data-oriented debugging to find the root cause of an issue. Mosaico enables
this through a robust data lineage system that creates a complete, auditable trail for every piece of data. The
system actively tracks all interdependencies and transformations, providing a granular view of data provenance
at two crucial levels: at the record level within a single dataset, and at the interconnection level between different
datasets. This ensures that every relationship and transformation from source to target is captured. Furthermore,
it introduces the Artifact, a versioned entity that goes beyond data to include every component influencing an
outcome, such as algorithm parameters, configurations, and model weights. This combination of deep lineage and
comprehensive artifact versioning ensures perfect reproducibility, allowing developers to pinpoint the exact source
of any change in performance.
Team
The Mosaico team is composed of scientists who have collaborated for over 10 years in developing advanced algorithms and software. Their collective experience includes senior and lead research roles, focusing on autonomous driving, robotics, and large-scale data management.