Research
The overarching goal of my research is to develop the next generation of machine learning to advance diagnostic and therapeutic technologies.
Below are selected projects and publications. Please refer to my CV or Google Scholar for a complete list.
Innovating contextual AI for precision medicine

Michelle M. Li, Yepeng Huang, Marissa Sumathipala, Man Qing Liang, Alberto Valdeolivas, Ashwin Ananthakrishnan, Katherine Liao, Daniel Marbach, Marinka Zitnik
Nature Methods 2024
PINNACLE: Contextual AI Model for Single-Cell Protein Biology
Understanding protein function and discovering molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across diverse biological contexts, such as tissues and cell types, remains a significant challenge for existing algorithms. We introduce PINNACLE, a flexible geometric deep learning approach that is trained on contextualized protein interaction networks to generate context-aware protein representations. Leveraging a human multi-organ single-cell transcriptomic atlas, PINNACLE provides 394,760 protein representations split across 156 cell type contexts from 24 tissues and organs. PINNACLE’s contextualized representations of proteins reflect cellular and tissue organization. Pretrained PINNACLE protein representations can be adapted for downstream tasks: to enhance 3D structure-based protein representations at cellular resolution and to study the genomic effects of drugs across cellular contexts. PINNACLE outperforms state-of-the-art, yet context-free, models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and can pinpoint cell type contexts that are more predictive of therapeutic targets than context-free models. PINNACLE is a network-based contextual AI model that dynamically adjusts its outputs based on biological contexts in which it operates. Read more
Links: Paper, GitHub Repository, Interactive Demo, Data, Model Checkpoints
Press/Highlights: Cell Systems, HMS 2024 State of the School Address (skip to 35:57), Nature Methods Editorial, HMS News, Deeper Learning (Kempner Institute), Bio-IT World
Related Co-authored Papers:
* Contextual AI models for context-specific prediction in biology (Nature Methods Research Briefing, 2024)
* Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics (NeurIPS Workshop on AI for New Drug Modalities, 2024; Selected for spotlight presentation)
* Multi-Scale Graph Neural Network for Alzheimer’s Disease (Machine Learning for Health Findings Track, 2024)
* Deep Contextual Learners for Protein Networks (ICML Workshop on Computational Biology, 2021; Selected for oral spotlight presentation and received Best Poster Award)

Michelle M. Li, Kevin Li, Yasha Ektefaie, Shvat Messica, Marinka Zitnik
In Review 2025
CLEF: Controllable Sequence Editing for Counterfactual Generation
Counterfactual thinking is a fundamental objective in biology and medicine. “What if” scenarios are critical for reasoning about the underlying mechanisms of cells, patients, diseases, and drugs: “What if we treat the cells with the drug every one or 24 hours?” and “What if we perform the surgery on the patient today or next year?” We should reason about both the choice and timing of the counterfactual condition. Thus, counterfactual generation requires precise and context-specific edits that adhere to temporal and structural constraints. Sequence models generate counterfactuals by modifying parts of a sequence based on a given condition, enabling reasoning about “what if” scenarios. While these models excel at conditional generation, they lack fine-grained control over when and where edits occur. We develop CLEF, a controllable sequence editing approach for instance-wise counterfactual generation. CLEF learns temporal concepts that represent the trajectories of the sequences to enable accurate counterfactual generation guided by a given condition. We demonstrate that CLEF achieves state-of-the-art performance on four novel benchmark datasets in cellular reprogramming and patient immune dynamics. Read more
Links: Paper, GitHub Repository

Emily Alsentzer*, Michelle M. Li*, Shilpa N. Kobren, Undiagnosed Diseases Network, Isaac S. Kohane, Marinka Zitnik
In Review 2024
* Equal contribution (alphabetical)
SHEPHERD: Deep Learning for Diagnosing Patients with Rare Genetic Diseases
There are over 7,000 unique rare diseases, some of which affecting 3,500 or fewer patients in the US. Due to clinicians’ limited experience with such diseases and the considerable heterogeneity of their clinical presentations, many patients with rare genetic diseases remain undiagnosed. While artificial intelligence has demonstrated success in assisting diagnosis, its success is usually contingent on the availability of large annotated datasets. Here, we present SHEPHERD, a deep learning approach for multi-faceted rare disease diagnosis. To overcome the limitations of supervised learning, SHEPHERD performs label-efficient training by (1) training exclusively on simulated rare disease patients without the use of any real labeled data and (2) incorporating external knowledge of known phenotype, gene and disease associations via knowledge-guided deep learning. Read more
Links: Paper, GitHub Repository, Interactive Demo, Data, Model Checkpoints
Awards: Best Oral Presentation (ISMB 2023 TransMed; selected from 130 abstracts for long talk)
Press: Washington Post (Opinion)
Related Co-authored Papers:
* Simulation of undiagnosed patients with novel genetic conditions (Nature Communications, 2023; GitHub Repository, Data)
* Subgraph Neural Networks (NeurIPS, 2020; Project Website, GitHub Repository, Data)

Michelle M. Li, Kexin Huang, Marinka Zitnik
Nature Biomedical Engineering 2022
Perspective: Graph Representation Learning in Biomedicine and Healthcare
Biomedical networks are universal descriptors of systems of interacting elements, from protein interactions to disease networks, all the way to healthcare systems and scientific knowledge. With the remarkable success of representation learning in providing powerful predictions and insights, we have witnessed a rapid expansion of representation learning techniques into modeling, analyzing, and learning with such networks. Here, we put forward an observation that long-standing principles of networks in biology and medicine—while often unspoken in ML research—can provide the conceptual grounding for representation learning, explain its current successes and limitations, and inform future advances. Read more
Links: ISMB 2022 Tutorial Materials
Related Co-authored Papers:
* Graph Artificial Intelligence in Medicine (Annual Review of Biomedical Data Science, 2024)
* Current and future directions in network biology (Bioinformatics Advances, 2024)
Past Research
I have been involved in algorithmic innovation for other areas of precision medicine research (e.g., antibiotic resistance, skin/gut/vaginal microbiomes). While I am no longer actively working in these areas, they remain fascinating and exciting to me as a lifelong learner.
Identifying mechanisms of antibiotic resistance
More details

Matthew G. Durrant, Michelle M. Li, Benjamin A. Siranosian, Stephen B. Montgomery, Ami S. Bhatt
Cell Host & Microbe 2020
MGEFinder: A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation
Mobile genetic elements (MGEs) contribute to bacterial adaptation and evolution; however, high-throughput, unbiased MGE detection remains challenging. We describe MGEfinder, a bioinformatic toolbox that identifies integrative MGEs and their insertion sites by using short-read sequencing data. MGEfinder identifies the genomic site of each MGE insertion and infers the identity of the inserted sequence. We apply MGEfinder to 12,374 sequenced isolates of 9 prevalent bacterial pathogens, including Mycobacterium tuberculosis, Staphylococcus aureus, and Escherichia coli, and identify thousands of MGEs, including candidate insertion sequences, conjugative transposons, and prophage elements. The MGE repertoire and insertion rates vary across species, and integration sites often cluster near genes related to antibiotic resistance, virulence, and pathogenicity. MGE insertions likely contribute to antibiotic resistance in laboratory experiments and clinical isolates. Read more
Links: GitHub Repository

Xi Yang, Marjan M. Hashemi, Nadya Andini, Michelle M. Li, …, Samuel Yang
Journal of Antimicrobial Chemotherapy 2020
RNA markers for ultra-rapid molecular antimicrobial susceptibility testing in fluoroquinolone-treated Klebsiella pneumoniae
Traditional antimicrobial susceptibility testing (AST) is growth dependent and time-consuming. With rising rates of drug-resistant infections, a novel diagnostic method is critically needed that can rapidly reveal a pathogen’s antimicrobial susceptibility to guide appropriate treatment. We used RNA sequencing to investigate the potential of RNA markers for rapid molecular AST using Klebsiella pneumoniae and ciprofloxacin as a model. We identified RNA signatures that were induced or suppressed following exposure to ciprofloxacin. Significant shifts at the transcript level were observed as early as 10 min after antibiotic exposure. Our results suggest that RNA signature is a promising approach to AST development, resulting in faster clinical diagnosis and treatment of infectious disease. This approach is potentially applicable in other models including other pathogens exposed to different classes of antibiotics. Read more
Characterizing microbiomes to assess effects of intervention
More details

Jennifer Dawkins*, Joyce B. Kang*, Michelle M. Li*, Aditya Misra*, Jon Arizti Sanz*, Tami Lieberman
Recruitment Closed
* Equal contribution (alphabetical)
Assessment of clothing breathability on vaginal microbiome and yeast-bacterial interactions
There is much “conventional wisdom” regarding the adverse effects of wearing tight clothing and non-breathable undergarments too often; however, a controlled scientific study has yet to be done to assess if non-breathable undergarments and clothing adversely affect the microbiome. A healthy vaginal microbiome is important because bacterial vaginosis (BV) and yeast infections can commonly arise when the microbiome is thrown off. These conditions are not only uncomfortable, but BV especially is correlated with higher preterm birth and less protection against STDs. Still, little is known about the reasons BV or an overgrowth of fungi (yeast infections) can develop. One possible way the microbiome could be affected is through a change in temperature, humidity and acidity due to clothing choice, which we plan to gain insight into through this study. Specifically, we would like to compare the effects of breathable and non-breathable clothing/undergarments on the vaginal microbiome and mycobiome (fungi), and use the data gathered to look at interactions between bacteria and fungi (e.g. Candida).
Links: Project website

Christopher J. Severyn, Benjamin A. Siranosian, Sandra Tian-Jiao Kong, Angel Moreno, Michelle M. Li, …, Ami S. Bhatt, Jennifer S. Whangbo
Journal of Clinical Investigation Insights 2022
Microbiota dynamics in a randomized trial of gut decontamination during allogeneic hematopoietic cell transplantation
Gut decontamination (GD) can decrease the incidence and severity of acute graft- versus-host-disease (aGVHD) in murine models of allogeneic hematopoietic cell transplantation (HCT). Several HCT centers standardly practice GD with different antibiotic regimens. In this pilot study, we examined the impact of GD on the gut microbiome composition and incidence of aGVHD in HCT patients. We randomized 20 pediatric patients undergoing allogeneic HCT to receive (GD) or not receive (no-GD) oral vancomycin-polymyxin B from day -5 through neutrophil engraftment. We evaluated shotgun metagenomic sequencing of serial stool samples to compare the composition and diversity of the gut microbiome between study arms. We assessed clinical outcomes in the 2 arms and performed strain-specific analyses of pathogens that caused bloodstream infections (BSI). While GD did not differentially impact Shannon diversity or clinical outcomes, our findings suggest that GD may protect against gut-derived BSI in HCT patients by decreasing the prevalence or abundance of gut microbial pathogens. Read more