Suvodeep Majumder
Applied Scientist @ AWS
Ph.D. CS · NCSU
Multimodal AI
🏆 ACM SIGSOFT Award

Suvodeep
Majumder

Applied Scientist & AI Researcher

Building the future of Multimodal AI at Amazon AWS. Ph.D. from NCSU. I work where Machine Learning, Software Engineering, and Generative AI intersect — with a deep focus on fairness, explainability, and real-world impact.

0 Papers
0 Years Research
Scroll

Bridging AI Research
& Real-World Impact

I'm an Applied Scientist at Amazon AWS in New York, previously a Post-Doctoral Research Scientist at Columbia University. My Ph.D. (NCSU, 2023) focused on making ML systems fair, efficient, and reliable.

Today, I design and build systems that combine language, vision, and structured data — from bidirectional multimodal LLM frameworks to model-based evaluation pipelines that handle documents, images, and video.

Download full CV
🤖

Generative AI

Multimodal LLMs, visual question answering, LLM-as-a-judge evaluation

⚖️

AI Fairness

Causal inference for bias reduction, fair oversampling, sufficient fairness measures

🔬

Software Analytics

Defect prediction, socio-technical graph mining, semi-supervised LLM fine-tuning

🏥

Medical AI

Explainable tumor segmentation failure detection, model-agnostic radiomics

Research Interests

Core domains where I've published, built systems, and driven impact

01

Multimodal AI & LLMs

Bidirectional interaction frameworks for LLMs. Visual question answering over rich documents. LLM-as-a-judge for numerical reasoning.

GPTBERTVQARAG
02

AI Fairness & Ethics

Causal inference for bias-aware data selection. Fairness-aware minority oversampling. Rigorous evaluation of fairness metrics.

FairnessCausal MLSMOTE
03

Neural Machine Translation

Context selection for NMT. Knowledge distillation for contextual translation. Pivot NMT with linguistic context.

NMTSeq2SeqKDBERT
04

Software Analytics

Defect prediction across large project samples. Socio-technical graph mining. Transfer learning with bellwether method.

MSRGraph MiningTransfer
05

Computer Vision

Explainable tumor segmentation failure detection. Model-agnostic radiomics feature extraction. Failure reasoning frameworks.

SegmentationXAIRadiomics
06

Semi-Supervised Learning

Co-training strategies for defect prediction. Fair ML with limited labeled data. SSL for software engineering tasks.

SSLCo-trainingBERT

Experience

Applied Scientist

Amazon AWS · New York, NY

Jun 2024 – Present
🔗
Multimodal Interaction — Bidirectional multimodal interaction framework for LLMs, bridging text, images, and structured content.
📊
Multimodal Evaluation — Model-based evaluation pipelines for documents, HTML, images, and audio/video.
👁️
Visual Language Models — Features enabling LLMs to answer visual questions from rich documents.
⚖️
LLM as a Judge — Evaluation frameworks for complex numerical reasoning Q&A.

Post Doctoral Research Scientist

Columbia University · New York, NY

Aug 2023 – Jun 2024
🏥
Image Segmentation — Model-agnostic framework extracting image, model, and tumor properties to identify and explain tumor segmentation failures.

Research Assistant

NC State University · Raleigh, NC

Aug 2019 – Aug 2023
🔀
Transfer Learning — Scalable hierarchical transfer learner using the bellwether method.
⚖️
AI Fairness — Causal inference for bias-aware data selection and fairness-aware oversampling.
🤖
Semi-supervised LLM Fine-tuning — Semi-supervised strategies for BERT on classification and generative tasks.

Applied Scientist II Intern × 3 Summers

Amazon AWS · New York, NY

2020 – 2022
🌐
2022 — Context selection for NMT using BERT-based downstream tasks.
📐
2021 — Model architecture analysis for contextual NMT; knowledge distillation for shallow-to-deep model transfer.
🔄
2020 — Pivot NMT improvement with linguistic context; automated contrastive test-set creation.

Data Scientist Intern

IBM · Raleigh, NC

Jun – Aug 2018
📈
ML-based insights for developer teams via GitHub analysis; large-scale Spark pipelines on terabytes of software project data.

Test Analyst & Test Engineer

Infosys Ltd. · Bhubaneswar, India

Mar 2013 – Jun 2017
🧪
Created test scenarios and automation scripts for web portals, mobile applications, and IVR systems covering automation, manual, and performance testing.

Selected Publications

Full list on Google Scholar ↗

EMSE 2023 Journal

When Less is More: On the Value of "Co-training" for Semi-Supervised Software Defect Predictors

S. Majumder, J. Chakraborty, T. Menzies

TOSEM 2023 Journal

Fair Enough: Searching for Sufficient Measures of Fairness

S. Majumder, J. Chakraborty, G. R. Bai, K. T. Stolee, T. Menzies

EMSE 2022 Journal

Revisiting Process versus Product Metrics: A Large Scale Analysis

S. Majumder, P. Mody, T. Menzies

MSR 2022 Conference

Methods for Stabilizing Models across Large Samples of Projects

S. Majumder, T. Xia, R. Krishna, T. Menzies

FSE 2022 Conference

Fair-SSL: Building Fair ML Software with Less Data

J. Chakraborty, S. Majumder, H. Tu, T. Menzies

ICSE 2021 Conference

Early Life Cycle Software Defect Prediction, Why? How?

N. C. Shrikanth, S. Majumder, T. Menzies

FSE 2020 Conference

Fairway: A Way to Build Fair ML Software

J. Chakraborty, S. Majumder, Z. Yu, T. Menzies

MSR 2018 Conference

500+ Times Faster than Deep Learning: Text Mining StackOverflow

S. Majumder, N. Balaji, K. Brey, W. Fu, T. Menzies

arXiv Preprint

A Baseline Revisited: Multi-Segment Models for Context-Aware Translation

S. Majumder, S. Lauly, M. Nadejde, M. Federico, G. Dinu

arXiv Preprint

Communication and Code Dependency Effects on Software Code Quality

S. Majumder, J. Chakraborty, T. Menzies

arXiv Preprint

Causality-Based Testing for Software Fairness

J. Chakraborty, S. Majumder, T. Menzies

arXiv Preprint

Can We Achieve Fairness Using Semi-Supervised Learning?

J. Chakraborty, S. Majumder, H. Tu, T. Menzies

Education

Ph.D. Computer Science
North Carolina State University
2019 – 2023  ·  GPA 4.02 / 4.0
M.S. Computer Science
North Carolina State University
2017 – 2019  ·  GPA 4.1 / 4.0
B.E. Computer Science & Engineering
West Bengal University of Technology
2008 – 2012  ·  GPA 8.21 / 10.0
Honors & Awards
ACM SIGSOFT Distinguished Paper Award ESEC/FSE 2021
Making the Difference Award  ·  Infosys 2017
Milestone Award for Domain Champion  ·  Infosys 2015
Laurel Best Debutant Award  ·  Infosys 2014

Skills

Languages
Python
R
Java
ML Frameworks
PyTorchTensorFlowKeras scikit-learnmxNetSockeye Torchvisionpyradiomics
Domains
LLMsMultimodal AINLP / NMT AI FairnessComputer Vision Software AnalyticsMLOps
Suvodeep Majumder
Applied Scientist · Amazon AWS
suvodeep.majumder90@gmail.com
New York, NY
+1 (774) 208-663