Home - Atmadeep Banerjee

Publications

Papers I have published

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

Paul S. Scotti*, Atmadeep Banerjee,* Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, Tanishq Mathew Abraham
NeurIPS 2023, Spotlight
[arxiv] [paper] [website]

CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification

Siddhant Kharbanda, Atmadeep Banerjee, Erik Schultheis, Rohit Babbar
NeurIPS 2022
[arxiv] [paper]

Revisiting RCAN: Improved Training for Image Super-Resolution

Zudi Lin, Prateek Garg, Atmadeep Banerjee, Salma Abdel Magid, Deqing Sun, Yulun Zhang, L. Gool, D. Wei, H. Pfister
[arxiv]

InceptionXML: A Lightweight Framework with Synchronized Negative Sampling for Short Text Extreme Classification

Siddhant Kharbanda, Atmadeep Banerjee, Akash Palrecha, Devaansh Gupta, Rohit Babbar
SIGIR 2023
[arxiv]

Meta-DRN: Meta-Learning for 1-Shot Image Segmentation

Atmadeep Banerjee
2020 IEEE 17th India Council International Conference (INDICON)
[arxiv] [paper]

MXR-U-Nets for Real Time Hyperspectral Reconstruction

Atmadeep Banerjee, Akash Palrecha
Submitted to Computer Vision and Pattern Recognition Workshops (CVPRW), 2020
[arxiv ] [challenge paper ]
Rank 12 in NTIRE 2020 Spectral Reconstruction Challenge

Combinets v2: Improving Conceptual Expansion using SGD

Atmadeep Banerjee
CODS COMAD 2021: 8th ACM IKDD CODS and 26th COMAD
[paper]

Experience

Things I have formally worked on and positions I have held

INDUSTRY

MedARC | Open Source Researcher

Nov, 2022 - Present

Working on projects related to neuro AI.

My work on fMRI-to-image reconstruction was published in Neurips 2023. Also worked on the Algonauts 2023 challenge for predicting fMRI responses from a given image. Currently working on EEG-to-image reconstruction.

Morphle Labs | ML Engineer

Sep, 2021 - Oct, 2022

Morphle is a biomedical robotics company that builds automated microscopes.

My work was centered around computer vision for pathology. I built two sets of AI models that integrate with the microscope. The first set enables the robot to optimally capture critical areas of a slide. This is achieved by rapidly taking high FoV images at low resolution and detecting important areas using deep learning models. This is followed by optimal path calculation using algorithms like DFS and rescanning these important areas at high resolution. The second set of models works on top of high resolution images to detect cells and other inclusions. Some examples include detection of malarial parasites, blast cells (leukemia) and brain tumours.

Visual One | ML Intern

Jan, 2021 - Jul, 2022

VisualOne was a Computer Vision startup building a few-shot learning framework, specifically for event detection using security cameras.

I worked on increasing the robustness of object detection. The final approach consisted of 3 modifications to the training step:

Distillation to transfer knowledge from a larger and more robust model, to a smaller deployable model.
Mixup to smoothen the decision boundaries and reduce strong variations in output due to small changes in input.
A variant of focal loss with a specialized gamma schedule that helps in model calibration.

Pixxel | AI Research Intern

Jan, 2020 - Aug, 2020

Pixxel is a Remote Sensing startup working towards building a health monitor for the planet. It aims to launch a constellation of nanosatellites for real time satellite imagery and analytics. After Pixxel became a startup I was formally employed as an intern, working alongside my coursework.

Trained a model to segment objects of interest from satellite imagery using few-shot learning. The classes in the XView dataset were used.
Trained an image2image model to synthesize multispectral imagery using radar satellite data. Achieved a validation PSNR of 28.9

Example of synthetic multispectral imagery. The model takes source information consisting of a radar (SAR) image, a multispectral image and a timestamp and generates a multispectral image of the same area for a query timestamp. Using multiple query timestamps for the same source, allows one to visualize the effect of change of seasons on a piece of land.

ACADEMIA

Aalto University | Research Assistant

June, 2021 - July, 2022

My research was in the domain of extreme classification — classification problems with millions of labels.

My research led to 3 publications:

CascadeXML: A novel Tree-In-Transfomer model that first clusters the label space and successively refines it. Published in Neurips, 2022 this approach is the current state-of-the-art for extreme classification of text.
InceptionXML: A lightweight CNN model that applies convolutions in a novel way to outperform transformers like Bert for short-text (text queries with ≤ 10 words) extreme classification.
Gandalf: A data augmentation technique that improves performance of other extreme classification approaches by leveraging label features in a novel way.

Visual Computing Group, Harvard University | Research Intern/Assistant

July, 2020 - November, 2021

Worked under the guidance of Prof. Hanspeter Pfister. Worked on Instance Segmentation of natural images using metric learning, Connectomic Segmentation from 3D electron microscope imagery, and Single Image Super Resolution.

Example of a 3D neuron mesh segmented from EM using my trained model (flood-filling network). Left is GT and right is Pred. The model is unable to segment the smaller dendrons automatically. Fortunately this is a human-in-the-loop model which allows errors to be fixed with human intervention.

AT UNIVERSITY

Pixxel | AI Team Lead

May, 2018 - Dec, 2019

Before it was a startup, Pixxel started off as a student team in BITS Pilani. I was a member of the Pixxel AI Team during its inception, and a few months later became the AI lead. During my time at Pixxel I worked on computer vision models for extracting information from satellite imagery. The primary focus of my work was to build a high accuracy model for extracting Indian road networks as graphs from satellite imagery. I also worked on models for crop yield prediction and building segmentation.

MapMyIndia | Study Project

September, 2018 - November, 2018

I worked on a project for detecting and classifying various kinds of Road Signs appearing on Indian Roads(advertisements, traffic signs, etc.). I trained various single-shot and region based object detection algorithms on a dataset provided by MapMyIndia with around 16,000 annotated images across 30 classes and compared their speed and accuracy. For the final submission I trained a network based on YOLO-v3 that achieved a mAP score of 89.71 and F1-score of 0.94

BITS Pilani Coding Club | Apogee Joint Coordinator

April, 2019 - July, 2020

I was a member of the Game Development and ML Teams of BITS Pilani Coding Club from August 2017 till my graduation in 2021. I was the club's Joint Coordinator for Apogee, BITS Pilani's Tech Fest, between from 2019 to 2020.

As the Joint Coordinator for Apogee 2020, I co-led the creation of a gamified (Pokemon Go like) AR app for crowd control during the fest.

Prior to this, as a part of the club's game development team, I was responsible for designing and building original games to be played by students attending the various fests at BITS. I have worked on Kinect and VR based games.

Apart from this, I helped organize various workshops and hackathons. I helped organize a workshop for teaching 3D modelling and game programming. As a member of the ML team, I organised a Machine Learning hackathon, sponsored by Yes Bank, during Apogee 2019.

projects

Things I have independently worked on

GI Tract Segmentation

My submission for the UW-Madison GI Tract Image Segmentation Kaggle competition. The final submission is an ensemble of 3 different types models with test time augmentation for each.

2x 3D segmentation models using MONAI
5x 2.5D UNet models with efficientnetv2s backbone
2x 2.5D UNet models with swin tiny backbone

My submission won a silver medal

Q-Learning

This project explores Q-Learning to train an agent for the game DotsNBoxes. Several agents were trained using settings like random opponent, heuristic opponent and adversarial self-play. Our observations on agent performance is compiled in a report in the link given

The code is written from scratch in Java.

Diabetic Retinopathy Diagnosis

This model was made for the Kaggle APTOS 2019 Blindness Detection competition. I trained a CNN model to detect the occurence of diabetic retinopathy from fundus photography. The model outputs an integer between 0 to 4 with 0 indicating no DR and 4 indicating proliferative DR.

My submission won a silver medal in the competition.

PokeGAN

I trained a GAN (Wasserstein Gan with Gradient Penalty) to generate new images of (fire) Pokemon. For training, I scraped images from DuckDuckGo Image Search to make a custom dataset. 2,109 images were downloaded and augmented to make a dataset of 37,962 images. Head over to the GitHub link for more information.

▼

MetaAI

MetaAI

This is a deep learning library specialised for meta-learning. It is built on top of fastai v1 and Pytorch.

Supports MAML, Meta-SGD and Reptile algorithms.
Native support for all torchvision resnet models. Also allows use of pretrained imagenet weights.
Provides functional versions of nn.Conv2d, nn.Batchnorm2d, nn.Linear and nn.Sequential to allow users to easily create their own models for few shot learning.
Supports fastai’s callback system.

NumPyML

Most Deep Learning libraries depend on code written in a language more performant than Python, and this code is not easily accessible by users. A new learner is not able to see how DL algorithms are implemented, essentially turning these algorithms into black boxes. So I built a fully functional CNN library from scratch using only Numpy. The code is designed to be simple, easy to read and, easily extendable by the user.

Sentiment Analysis

This project enables users to stream tweets and news articles in real time depending on a search query. The streamed text corpora are tokenized, stemmed and then vectorized using word2vec embedding. The vectorized sentences are then read by a CNN model which outputs the mean sentiment associated with the text corpus on a scale of 0 to 1, with 1 being highly positive and 0 being highly negative.

The aim of this project is to enable users to assess the public sentiment related to a topic or organization at any point of time.

Firewall with Prolog

This project allows users to simulate a Firewall using the logic programming language Prolog. Users can create a rule set dictating what kinds of packets to allow through the firewall. Different rules can be set for adapter and ethernet clauses. The system determines if an incoming packet is to be allowed or blocked. The Github link has a more detailed explanation.

The project was completed in partial fulfilment of the course Logic in Computer Science in BITS Pilani.