EGOINFINITY
3D Viewer Browse Stats Roadmap Waitlist Team GitHub ↗ Dataset ↗
001 / Curated, processed, ready

EgoInfinity

A Web-Scale Data Engine for Video-to-Action Robot Learning through Egocentric Views

EgoInfinity is a data engine that automatically curates and processes any-view video clips into high-quality 4D hand-object-interaction (HOI) sequences, and transforms them into any other views, including egocentric. EgoInfinity also provides a framework to retarget processed human motions to motions on any robots.

Click any clip in the Browse section below to load it into the 3D viewer and inspect panels.

source: action100m extract: pose · contact · sam3d_mesh retarget: any embodiment
01 / Duration 14.6+ Years*
02 / Pipeline Modular & Upgradable
03 / Scope Data Engine For Any Video
04 / Enabling Retargeting to Any Robot
* total duration, sourced from Action100M
002 / 3D Viewer · viser

Same scene as the pipeline.

Click any clip in the Browse section below to load its 3D scene here. Embedded viser session with the same renderer, GUI controls, and layers as exo_pipeline.py --load-cache. One viser process at a time; selecting an episode swaps it.

Default view: Egocentric
Drag to interact
- idle
Pick an episode
Click any clip in the browse list. Its pre-recorded viser scene will load here.
Pick a clip below to view its robot-embodiment retargets.
003 / Inspect

Episode detail.

Click any clip in the Browse section below to populate these panels. Annotated stream (hand skeleton), raw clip, MoGe-2 depth, MEMFOF flow, plus per-frame contact / grasp / motion / trust signals from the 6-DoF tracker.

raw · YouTube
mask · sam-tracked · skeleton overlay
depth · moge2
flow · memfof
signals (per-frame)
contact_l
-
contact_r
-
grasp
-
motion (px)
-
trust (any)
-
object state · static grasp L grasp R both moving

-

-
duration
-
frames
-
fps
-
video uid
-
start
-
end
-
description

-

objects in scene
tracking summary
004 / Browse

Episode-level dataset preview.

Search, filter, and sort all favorites. Switch between thumbnail and table view. Click any clip to load it into the inspect and viser panels above.

-ep
Filtered episodes
-s
Filtered duration
-obj
SAM3D objects
-%
With grasp
-
loading clips…
Clip Actor Source Objects Frames Duration Grasp
005 / Statistics

Inside the preview corpus.

Statistics from Action100M annotations.

Word cloud of Action100M task titles and action labels
01 · word cloud

Action & task vocabulary at corpus scale.

Sunburst breakdowns of the four most-common verbs (add / stir / demonstrate / place) with their inner-ring objects and outer-ring qualifiers
02 · category sunbursts

Top-verb co-occurrence rings.

Three-column frequency chart: unigrams, bigrams, and trigrams in the Action100M annotation text
03 · n-grams

Unigram / bigram / trigram frequency.

Source

Action100M annotations.

The vocabulary, sunburst rings, and n-gram frequencies above are from Action100M's human-written action labels and detailed descriptions. Each EgoInfinity clip carries its source action_brief + action_detailed + summary verbatim in scene.json.action100m_metadata.

Please cite Action100M alongside EgoInfinity if you use these statistics:

@article{chen2026action100m,
  title  = {Action100M: A Large-scale Video Action Dataset},
  author = {Chen, Delong and Kasarla, Tejaswi and Bang, Yejin
            and Shukor, Mustafa and Chung, Willy and Yu, Jade
            and Bolourchi, Allen and Moutakanni, Théo
            and Fung, Pascale},
  journal = {arXiv preprint arXiv:2601.10592},
  year   = {2026}
}
006 / Roadmap

What we're working on next.

Public-facing milestones for the EgoInfinity dataset and pipeline. Status: shipped · active · queued.

  1. Shipped

    Preview release v1.0 · shipped

    • 106-clip demo: public HF Dataset of 106 curated Action100M clips with full 4D HOI scenes; browsable via the viewer Space you're on.
    • Embodiment retargeting: each clip ships with Shadow / dexterous-hand retargets so downstream policies can train on any-embodiment trajectories.
    • Open-source pipeline code: full pipeline on GitHub; modular stages (filter / depth / hands / flow / segmentation / 3D mesh / tracking) swappable stage-by-stage.
    • HF dataset
    • viewer space
    • shadow retarget
    • pipeline code
  2. Now

    Action100M-scale processing · active

    Run the pipeline across the full Action100M corpus, growing the curated subset from 106 clips toward the 100M-clip headline number.

    • batch scale-up
    • v0.2 release
  3. Future

    Beyond v1.0 · queued

    • Dynamic-camera support: adopt Depth Anything V3 for depth and switch to a V-SLAM based scene reconstruction, unlocking all clips with moving cameras, including egocentric capture.
    • Component refinement: iterate on individual modules (filter, curation, contact / grasp signals, hand smoothing) to lift quality and recall across the whole corpus.
    • depth-anything v3
    • v-slam
    • ego-centric
    • filter
    • curation
007 / Waitlist

Get the next dataset drop before the public mirror.

We'll email you when there's an update. Unsubscribe anytime.

Request early access

We usually respond within a few business days.

Open the waitlist form
008 / Team

Authors.

Built by the Rice RobotPI Lab together with collaborators at the Robotics and AI Institute.

Gaotian Wang
Rice University
Kejia Ren
Rice University
Howard Qian
Rice University
Andrew S. Morgan
Robotics and AI Institute
Yiting Chen
Rice University
Podshara Chanrungmaneekul
Rice University
Kaiyu Hang
Rice University
release
v0.1 · favorites
license
CC BY-SA 4.0
source
action100m · curated
episodes
-
frames
-
duration
-
retargeting
planned
pipeline stages
moge2_depth geocalib wilor_hand memfof_flow sam3_track sam3d_mesh