About Me
I am a Research Software Engineer at Google India. I specialize in Reinforcement Learning algorithms and my current research interest lies at the intersection of Robotics and Accessibility. I received my Ph.D. in Machine Learning from the Department of Computer Science and Engineering at Indian Institute of Technology (IIT) Kharagpur in 2020. I was co-advised by Prof. Pabitra Mitra (from IIT Kharagpur) and Prof. Balaraman Ravindran (from IIT Madras). I was a recipient of the Google Ph.D. Fellowship Award in 2016. During my tenure as a Ph.D. student, I have worked on real world robotics problems as an intern at Intel Labs (2017) and Google Brain (2018, 2019). I have also worked as a Machine Learning Consultant with Indian Space Research Organization (ISRO), TATA Steel, Intel and the Ministry of Health and Family Welfare, Govt. of India. I obtained my bachelor's degree (B.Tech.) in Electronics and Electrical Communication Engineering from IIT Kharagpur in 2015.
Research Projects that I have Led
Characteristics of the galloping behavior learned by different imitators with different physical properties from the same set of demonstrations for the Cheetah-Run task.
In this work, we address the practical problem of training MPC policies with parameterised cost functions when the demonstrator and the imitator agents have non-identical state-action spaces and transition dynamics. We propose a novel approach that uses a generative adversarial network (GAN) to match state-trajectory distributions of the demonstrator and the imitator. We demonstrate its efficacy on MuJoCo tasks of the DeepMind Control suite.
Timeline: 2022-Present
Venue: Google Research, Indian Institute of Technology Madras
Publication: GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts, Under Review.
Picture of our robot (from Everyday Robots) and the target objects studied in our experiment
In this work, we propose a modular framework that is able to efficiently search real indoor environments for objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human interaction. We train a Contextual Bandit agent for exploration and leverage Weighted Minimum Latency Solvers and Learnable MPC motion planners for high sample efficiency, safety and reliability.
Timeline: 2021-Present
Venue: Google Research, Robotics@Google, Everyday Robots
Publication: A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations, appearing in ICRA 2023, London.
An illustration of the problem statement with an example in which the agent plans a sequence of 4 actions - {x2, x3, x1, x6}. The next action is played only if all the preceding actions in the sequence fail. Each trial has a cost. The agent is allowed to give up and the later it gives up in the retry sequence the higher is the penalty attracted. Thus the agent is incentivised to succeed quicker.
Motivated by practical problems of ranking with partial information, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. Our analysis delivers tight regret bounds which, when specialized to standard cascading bandits, results in sharper guarantees than previously available in the literature.
Timeline: 2020-2022
Venue: Google Research
Publication: Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback, AISTATS 2022
Ph.D. Thesis at IIT Kharagpur
We built the world’s first open-source multi-agent simulator for autonomous driving.
Timeline: 2019-2021
Venue: Intel Labs, IIT Kharagpur, IIT Madras, IIIT Hyderabad
Publication: MADRaS : Multi Agent Driving Simulator, JAIR 2021
We developed a Conditional-Value-at-Risk (CVaR) based objective function for imitation learning agents and showed its efficacy in reducing the occurrence of tail-end events of catastrophic failure in risk-sensitive applications like autonomous driving.
Timeline: 2018-2019
Venue: Intel Labs, IIT Kharagpur, IIT Madras
Publication: RAIL: Risk-Averse Imitation Learning, AAMAS 2018
We formulate a novel transfer guided exploration method, ExTra, based on the theory of bisimulation based policy transfer in MDPs.
Timeline: 2018-2020
Venue: IIT Kharagpur, IIT Madras, Intel Labs
Publication: ExTra: Transfer-guided exploration, AAMAS 2020
We developed a novel deep learning neural network architecture that achieved state of the art results for Hyperspectral Image classification and leveraged that to develop a positive-unlabelled classification framework for information retrieval in Hyperspectral Image datasets.
Timeline: 2018-2019
Venue: IIT Kharagpur, Space Application Center - ISRO
Publications:
We were one of the first to propose deep neural networks for detection of Diabetic Retinopathy in Fundus Images.
Timeline: 2014-2016
Venue: IIT Kharagpur
Publications:
Copyright 2024. Anirban Santara.