001 / Curated, processed, ready

EgoInfinity

A Web-Scale Data Engine for Video-to-Action Robot Learning through Egocentric Views

EgoInfinity is a data engine that automatically curates and processes any-view video clips into high-quality 4D hand-object-interaction (HOI) sequences, and transforms them into any other views, including egocentric. EgoInfinity also provides a framework to retarget processed human motions to motions on any robots.

Click any clip in the Browse section below to load it into the 3D viewer and inspect panels.

Browse - episodes → Open 3D viewer ↗

      source: action100m
      →
      extract: pose · contact · sam3d_mesh
      →
      retarget: any embodiment
    

01 / Duration 14.6+ Years^*

02 / Pipeline Modular & Upgradable

03 / Scope Data Engine For Any Video

04 / Enabling Retargeting to Any Robot

^* total duration, sourced from Action100M

Dataset ↗ GitHub ↗ Project page ↗ arXiv ↗

002 / 3D Viewer · viser

Same scene as the pipeline.

Click any clip in the Browse section below to load its 3D scene here. Embedded viser session with the same renderer, GUI controls, and layers as exo_pipeline.py --load-cache. One viser process at a time; selecting an episode swaps it.

Default view: Egocentric

Drag to interact

- idle

Pick an episode

Click any clip in the browse list. Its pre-recorded viser scene will load here.

Pick a clip below to view its robot-embodiment retargets.

003 / Inspect

Episode detail.

Click any clip in the Browse section below to populate these panels. Annotated stream (hand skeleton), raw clip, MoGe-2 depth, MEMFOF flow, plus per-frame contact / grasp / motion / trust signals from the 6-DoF tracker.

raw · YouTube

mask · sam-tracked · skeleton overlay

depth · moge2

flow · memfof

signals (per-frame)

contact_l

contact_r

grasp

motion (px)

trust (any)

object state · static grasp L grasp R both moving

-

duration: -
frames: -
fps: -
video uid: -
start: -
end: -

description

full summary

objects in scene

tracking summary

004 / Browse

Episode-level dataset preview.

Search, filter, and sort all favorites. Switch between thumbnail and table view. Click any clip to load it into the inspect and viser panels above.

-_ep

Filtered episodes

-_s

Filtered duration

-_obj

SAM3D objects

-_%

With grasp

has grasp has tracking multi-object -

loading clips…

Clip	Actor	Source	Objects	Frames	Duration	Grasp

005 / Statistics

Inside the preview corpus.

Statistics from Action100M annotations.

Word cloud of Action100M task titles and action labels — 01 · word cloud
Action & task vocabulary at corpus scale.

Sunburst breakdowns of the four most-common verbs (add / stir / demonstrate / place) with their inner-ring objects and outer-ring qualifiers — 02 · category sunbursts
Top-verb co-occurrence rings.

Three-column frequency chart: unigrams, bigrams, and trigrams in the Action100M annotation text — 03 · n-grams
Unigram / bigram / trigram frequency.

Source

Action100M annotations.

The vocabulary, sunburst rings, and n-gram frequencies above are from Action100M's human-written action labels and detailed descriptions. Each EgoInfinity clip carries its source action_brief + action_detailed + summary verbatim in scene.json.action100m_metadata.

Please cite Action100M alongside EgoInfinity if you use these statistics:

@article{chen2026action100m,
  title  = {Action100M: A Large-scale Video Action Dataset},
  author = {Chen, Delong and Kasarla, Tejaswi and Bang, Yejin
            and Shukor, Mustafa and Chung, Willy and Yu, Jade
            and Bolourchi, Allen and Moutakanni, Théo
            and Fung, Pascale},
  journal = {arXiv preprint arXiv:2601.10592},
  year   = {2026}
}

006 / Roadmap

What we're working on next.

Public-facing milestones for the EgoInfinity dataset and pipeline. Status: shipped · active · queued.

Shipped
Preview release v1.0 · shipped
- 106-clip demo: public HF Dataset of 106 curated Action100M clips with full 4D HOI scenes; browsable via the viewer Space you're on.
- Embodiment retargeting: each clip ships with Shadow / dexterous-hand retargets so downstream policies can train on any-embodiment trajectories.
- Open-source pipeline code: full pipeline on GitHub; modular stages (filter / depth / hands / flow / segmentation / 3D mesh / tracking) swappable stage-by-stage.
- HF dataset
- viewer space
- shadow retarget
- pipeline code
Now
Action100M-scale processing · active

Run the pipeline across the full Action100M corpus, growing the curated subset from 106 clips toward the 100M-clip headline number.
- batch scale-up
- v0.2 release
Future
Beyond v1.0 · queued
- Dynamic-camera support: adopt Depth Anything V3 for depth and switch to a V-SLAM based scene reconstruction, unlocking all clips with moving cameras, including egocentric capture.
- Component refinement: iterate on individual modules (filter, curation, contact / grasp signals, hand smoothing) to lift quality and recall across the whole corpus.
- depth-anything v3
- v-slam
- ego-centric
- filter
- curation

007 / Waitlist

Get the next dataset drop before the public mirror.

We'll email you when there's an update. Unsubscribe anytime.

New EgoInfinity releases delivered directly to your inbox.

Request early access

We usually respond within a few business days.

Open the waitlist form →

008 / Team

Authors.

Built by the Rice RobotPI Lab together with collaborators at the Robotics and AI Institute.

Gaotian Wang

Rice University

Kejia Ren

Rice University

Howard Qian

Rice University

Andrew S. Morgan

Robotics and AI Institute

Yiting Chen

Rice University

Podshara Chanrungmaneekul

Rice University

Kaiyu Hang

Rice University

EgoInfinity

Same scene as the pipeline.

Episode detail.

-

Episode-level dataset preview.

Inside the preview corpus.

Action & task vocabulary at corpus scale.

Top-verb co-occurrence rings.

Unigram / bigram / trigram frequency.

Action100M annotations.

What we're working on next.

Preview release v1.0 · shipped

Action100M-scale processing · active

Beyond v1.0 · queued

Get the next dataset drop before the public mirror.

Request early access

Authors.