Ashutosh Chaubey

About Me

I am a CS PhD student at the Institute for Creative Technologies, University of Southern California, where I am advised by Prof. Mohammad Soleymani at the Intelligent Human Perception Lab. I am a Bronze Medallist from the 2021 batch of Indian Institute of Technology, Roorkee.

Prior to this, I was a Founding Research Engineer at Anoki AI where I worked on multimodal content understanding and retrieval. I have also worked at LG Ad Solutions on speaker recognition, automatic content recognition using audio and voice cloning. Over the past I have interned at Adobe Research, where I worked with Dr. Sumit Shekhar on active learning for content labelling in documents. I have also interned at Video Analytics Lab, IISc. Bengaluru where I worked with Prof. R. Venkatesh Babu on human pose estimation from a single RGB image. Back at my undergraduate college, I worked with Prof. R. Balasubramanian on automatic evaluation of machine synthesized speech.

I am always eager to collaborate on research with people in academia as well as industry. Please reach out to achaubey@usc.edu to discuss potential collaborations.

Masters/Undergrad Students

If you are a student and want to have a discussion with me regarding my papers or how to apply for a PhD program in the US, please email me at achaubey at usc dot edu

For students who wish to join our lab, please check our lab's open positions.

Email Resume Google Scholar LinkedIn GitHub

Areas of Interest

Multimodal understanding and generation of human behaviour
Speech and audio processing.

News

Oct '24 - One paper accepted at WACV 2025!. Congrats to my previous team at Anoki AI!
Aug '24 - I joined the Intelligent Human Perception Lab @ Institute for Creative Technologies, USC.
Apr '24 - I will be starting my PhD @ University of Southern California starting this Fall. Fight on!
Sep '23 - One paper has been accepted at ASRU, 2023. See you in Taipei!
Apr '23 - I have joined Anoki AI as a Founding Research Engineer.
Sep '22 - I will be at Interspeech 2022 at Incheon, Korea.
Jun '22 - One paper has been accepted at Interspeech 2022!
Jul '21 - I have started my industry experience by joining LG Ad Solutions as a Data Scientist. On to new challenges!

Research & Publications

Preprints

Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning

Ashutosh Chaubey, Xulang Guan, Mohammad Soleymani

Under review

Preprint / Project Page

Proposed a novel MLLM with a visual encoder to encode rich face region features based on face landmarks. Collected FaceInstruct-1M dataset with over one million samples for instruction tuning a MLLM.

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Maksim Siniukov*, Di Chang*, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani

Under review

Preprint / Project Page

Proposed a diffusion transformer (DiT) with causal multimodal attention to generate listener behaviors from audio-visual signals in speaker videos.

Published Work

ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising

Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agrawal, Susmita Ghose

IEEE/CVF WACV, 2025

Paper / Poster / Cite

Proposed ContextIQ, a novel framework for video retrieval for contextual advertising using a mixture of multimodal experts.

Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition

Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose

IEEE ASRU, 2023

Paper / Poster / Cite

Proposed two novel approaches for imposter identification in unseen speaker recognition, including speaker-specific thresholding and a meta-learning approach.

Improved Relation Networks for End-to-End Speaker Verification and Identification

Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose

Interspeech, 2022

Paper / Poster / Cite

Enhanced speaker recognition using relation networks inspired by computer vision, with global supervision and faster training.

OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis

Sumit Shekhar, Bhanu Prakash Reddy Guda, Ashutosh Chaubey, Ishan Jindal, Avneet Jain

CVPR Workshops, 2022

Paper / Patent / Cite

A reinforcement policy-based active learning approach for document content labeling tasks such as object detection and named entity recognition.

Universal Adversarial Perturbations: A Survey

Ashutosh Chaubey*, Nikhil Agrawal*, Kavya Barnwal, Keerat K. Guliani, Pramod Mehta

Survey paper, arXiv 2020

Paper / Cite

A comprehensive survey on universal adversarial perturbations, covering both attacks and defenses.

A Generative Adversarial Network Based Ensemble Technique for Automatic Evaluation of Machine Synthesized Speech

Ashutosh Chaubey*, Jaynil Jaiswal*, Sasi Kiran Reddy Bhimvarapu, Shashank Kashyap, Puneet Kumar, Balasubramanian Raman, Partha Pratim Roy

ACPR, 2019

Paper / Cite

Proposed a technique leveraging the discriminator from a GAN-based TTS model for automatic evaluation of machine synthesized speech.

Education

University of Southern California – PhD, Computer Science (2024 - Present)

Graduate Researcher – Intelligent Human Perception Lab, Institute for Creative Technologies

Indian Institute of Technology Roorkee – BS, Computer Science (2017 - 2021), GPA: 9.718/10

Chair - ACM IIT Roorke Chapter | Co-President - Vision and Language Group