Publications

* Equal contribution. § Core contribution. † Equal advising.

2026

  1. molmoact2.avif
    MolmoAct2: Action Reasoning Models for Real-world Deployment
    arXiv, 2026
  2. wilddet3d.jpg
    WildDet3D: Scaling Promptable 3D Detection in the Wild
    arXiv, 2026
  3. molmoweb.jpeg
    MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
    arXiv, 2026
  4. vfig.png
    VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
    arXiv, 2026
  5. topreward.png
    TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
    arXiv, 2026
  6. refdecoder.jpg
    RefDecoder: Enhancing Visual Generation with Conditional Video Decoding
    In HiGen/VideoWorldModel workshops @ CVPR, 2026
  7. molmo2.jpg
    Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
    Oral (0.7% acceptance rate)

2024

  1. Fig1_Responsible_AI_8bf7727ab5.png
    Apple Intelligence Foundation Language Models
    Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, and 145 more authors
    arXiv, 2024
  2. 2024_nerfdeformer_input.jpg
    NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  3. gom.png
    GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  4. physgen2024.gif
    PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
    Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shenlong Wang
    In European Conference on Computer Vision (ECCV), 2024

2023

  1. oplane.png
    Occupancy Planes for Single-view RGB-D Human Reconstruction
    In AAAI Conference on Artificial Intelligence (AAAI), 2023
  2. 2023-stabledreamer.jpg
    StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D
    arXiv, 2023

2022

  1. pol_mesh.gif
    CASA: Category-agnostic Skeletal Animal Reconstruction
    In Neural Information Processing Systems (NeurIPS), 2022
  2. tv.png
    Total Variation Optimization Layers for Computer Vision
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
  3. flower.gif
    trex.gif
    Neural Volumetric Object Selection
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

2021

  1. redo.png
    Class-agnostic Reconstruction of Dynamic Objects from Videos
    Zhongzheng Ren, Xiaoming Zhao, and Alexander Schwing
    In Neural Information Processing Systems (NeurIPS), 2021
  2. t.gif
    Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning
    In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
  3. wypr.png
    3D Spatial Recognition without Spatially Labeled 3D
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

2020

  1. video_fig1_linear.gif
    Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning
    Zhongzheng Ren, Raymond A. Yeh, and Alexander Schwing
    In Neural Information Processing Systems (NeurIPS), 2020
  2. ufo.png
    UFO2: A Unified Framework towards Omni-supervised Object Detection
    In European Conference on Computer Vision (ECCV), 2020
  3. teaser.png
    Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

2018

  1. concept_pic.png
    Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery
    Zhongzheng Ren and Yong Jae Lee
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018
  2. concept2.png
    Learning to Anonymize Faces for Privacy Preserving Action Detection
    Zhongzheng Ren, Yong Jae Lee, and Michael S. Ryoo
    In European Conference on Computer Vision (ECCV), 2018

2017

  1. wacv17.png
    Who moved my cheese? Automatic Annotation of Rodent Behaviors with Convolutional Neural Networks
    Zhongzheng Ren, Adriana Noronha, Annie Vogel Ciernia, and Yong Jae Lee
    In Winter Conference on Applications of Computer Vision (WACV), 2017