I am a Senior AI Researcher at the Dolby Laboratories, India.
My core interest lies in video analytics, specifically exploring state-of-the-art models for designing scalable solutions for long-form video analysis, Motivated by my research, I wish to design a framework to mimic human intelligence for sequential modeling. I am also interested to explor about human behavior and neuroscience to demystify human intelligence.
Prior to my current pursuits, I was a postdoctoral researcher at the University of Maryland (UMD), College Park, with Dr. Abhinav Shrivastava. My research at the University of Maryland revolves around advanced video understanding, particularly in the realm of unsupervised extraction of grammar from videos and generic event boundary detection.
I successfully completed my Ph.D. at IIIT Delhi, India, under the guidance of Dr. Chetan Arora (IIT Delhi). My doctoral dissertation, titled "Analyzing Day-Long Egocentric Videos," showcases deep learning and theory-based approaches to analyze massively long sequences (ranging up to 60k time steps) derived from real-life egocentric videos and photostreams. My dissertation demonstrates fundamental video analysis tasks that efficiently tackle overarching challenges such as scalability, privacy, and the utilization of unlabeled data resulting from egocentric lifelogs. I have demonstrated that the state-of-the-art deep learning frameworks viz RNN, LSTM, IndRNN, and the recently introduced Transformer network failed to model massive long sequences. We are the first to work on the Disney (comprises 4 to 8 hrs long video samples) and UTE datasets (comprises 3 to 5 hrs long video samples) for summarization and temporal segmentation. All the solutions are scalable to thousands of time steps and completely unsupervised/self-supervised. Currently, we are working on the EgoRoutine photostream dataset (which scales up to 20 days).
You can contact me via email at pnagar (at) umd (dot) edu.