Anurag Bagchi

Hello! I am a Research Associate III in the Robotics Institute at Carnegie Mellon University, where I work on Video Diffusion Models for Multi-Modal Robot Perception, under the supervision of Prof. Martial Hebert. I also collaborate closely with Dr. Pavel Tokmakov at Toyota Research Institute and Prof. Yuxiong Wang at UIUC.

Before this, I worked as a Computer Vision Engineer in Prof. Song Bai's team at Bytedance AI Lab Singapore, where I developed Multi-modal (Vision, language and Audio) models for TikTok's Brand Safety Policies. Prior to that, I was fortunate to be among the youngest Machine Learning Engineers to join the stellar Recommendation Team at TikTok R&D Singapore, where I built end-to-end ML systems for TikTok's Video & Push Recommendation. During this period I also collaborated closely with Prof. Ravi Kiran and Prof. Makarand Tapaswi in the Computer Vision group at IIIT Hyderabad, working on Temporal Action Localisation in Videos.

I received my Bachelor's Degree in Electronics and Tele-Communication Engineering from Jadavpur Univerity, India where I worked under Prof. Amit Konar at the intersection of Computer Vision & BCI. During my years as an undergrad, I spent a wonderful summer in the Imaging R&D team at Samsung Research, leveraging ToF Depth data for Samsung's camera applications.

Email | CV | Scholar | Github

Research

	Egocentric Action-Conditioned Video World Models Anurag Bagchi, Zhipeng Bao, Homanga Bharadhwaj, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert Submitted to CVPR 2026 Cool Results! \| Paper (Coming Soon)
	ReferEverything: Towards Segmenting Everything We Can Speak of in Videos Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert ICCV 2025 Twitter Thread \| Project Page \| arXiv
	Video Diffusion Models Learn the Structure of the Dynamic World Zhipeng Bao, Anurag Bagchi, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert Paper
	Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization Anurag Bagchi, Jazib Mahmood, Dolton Fernandes, Ravi Kiran Sarvadevabhatla VISIGRAPP 2022 (Oral) arXiv \| Code
	Autonomous grasping of 3-D objects by a vision-actuated robot arm using Brain–Computer Interface Arnab Rakshit, Shraman Pramanick^, Anurag Bagchi^, Saugat Bhattacharyya Biomedical Signal Processing and Control, Elsevier Paper \| ScienceDirect