{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibs2bwvwxx23g6vlyulipeyk3yaaeglkgx6bxiw7axm3psclm2ig4",
    "uri": "at://did:plc:4rgrdigiftglskeax4wvmsev/app.bsky.feed.post/3mekggpak4nx2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiflo6xt7is6b2iafwghkjahlgggocme5jwjsbeuqqwcywuvjhmszm"
    },
    "mimeType": "image/png",
    "size": 24783
  },
  "path": "/abs/2602.08961v1",
  "publishedAt": "2026-02-10T01:00:00.000Z",
  "site": "https://arxiv.org",
  "tags": [
    "Ruijie Zhu",
    "Jiahao Lu",
    "Wenbo Hu",
    "Xiaoguang Han",
    "Jianfei Cai",
    "Ying Shan",
    "Chuanxia Zheng"
  ],
  "textContent": "**Authors:** Ruijie Zhu, Jiahao Lu, Wenbo Hu, Xiaoguang Han, Jianfei Cai, Ying Shan, Chuanxia Zheng\n\nWe introduce MotionCrafter, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense motion from a monocular video. The core of our method is a novel joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, and a novel 4D VAE to effectively learn this representation. Unlike prior work that forces the 3D value and latents to align strictly with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and leads to suboptimal performance. Instead, we introduce a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments across multiple datasets demonstrate that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page",
  "title": "MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE"
}