Research

The overarching goal of my research is to develop the next generation of machine learning to advance diagnostic and therapeutic technologies.

Below are selected projects and publications. Please refer to my CV or Google Scholar for a complete list.


Learning patient representations for novel disease diagnosis and treatment

Michelle M. Li, Yepeng Huang, Marissa Sumathipala, Man Qing Liang, Alberto Valdeolivas, Ashwin Ananthakrishnan, Katherine Liao, Daniel Marbach, Marinka Zitnik
Nature Methods (in press) 2024

PINNACLE: Contextual AI Model for Single-Cell Protein Biology

Understanding protein function and discovering molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across diverse biological contexts, such as tissues and cell types, remains a significant challenge for existing algorithms. We introduce PINNACLE, a flexible geometric deep learning approach that is trained on contextualized protein interaction networks to generate context-aware protein representations. Leveraging a human multi-organ single-cell transcriptomic atlas, PINNACLE provides 394,760 protein representations split across 156 cell type contexts from 24 tissues and organs. PINNACLE’s contextualized representations of proteins reflect cellular and tissue organization. Pretrained PINNACLE protein representations can be adapted for downstream tasks: to enhance 3D structure-based protein representations at cellular resolution and to study the genomic effects of drugs across cellular contexts. PINNACLE outperforms state-of-the-art, yet context-free, models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and can pinpoint cell type contexts that are more predictive of therapeutic targets than context-free models. PINNACLE is a network-based contextual AI model that dynamically adjusts its outputs based on biological contexts in which it operates. Read more

Links: Paper, GitHub Repository, Interactive Demo, Data, Model Checkpoints

Emily Alsentzer*, Michelle M. Li*, Shilpa N. Kobben, Undiagnosed Diseases Network, Isaac S. Kohane, Marinka Zitnik
In Review 2022
* Equal contribution (alphabetical)

SHEPHERD: Deep Learning for Diagnosing Patients with Rare Genetic Diseases

There are over 7,000 unique rare diseases, some of which affecting 3,500 or fewer patients in the US. Due to clinicians’ limited experience with such diseases and the considerable heterogeneity of their clinical presentations, many patients with rare genetic diseases remain undiagnosed. While artificial intelligence has demonstrated success in assisting diagnosis, its success is usually contingent on the availability of large annotated datasets. Here, we present SHEPHERD, a deep learning approach for multi-faceted rare disease diagnosis. To overcome the limitations of supervised learning, SHEPHERD performs label-efficient training by (1) training exclusively on simulated rare disease patients without the use of any real labeled data and (2) incorporating external knowledge of known phenotype, gene and disease associations via knowledge-guided deep learning. Read more

Links: Paper, GitHub Repository, Interactive Demo, Data, Model Checkpoints
Awards: Best Oral Presentation (ISMB 2023 TransMed; selected from 130 abstracts for long talk)
Press: Washington Post (Opinion)
Related Papers: Simulation of undiagnosed patients with novel genetic conditions (Co-author; Paper, GitHub Repository, Data)

Michelle M. Li, Kexin Huang, Marinka Zitnik
Nature Biomedical Engineering 2022

Graph Representation Learning in Biomedicine and Healthcare

Biomedical networks are universal descriptors of systems of interacting elements, from protein interactions to disease networks, all the way to healthcare systems and scientific knowledge. With the remarkable success of representation learning in providing powerful predictions and insights, we have witnessed a rapid expansion of representation learning techniques into modeling, analyzing, and learning with such networks. In this review, we put forward an observation that long-standing principles of networks in biology and medicine—while often unspoken in machine learning research—can provide the conceptual grounding for representation learning, explain its current successes and limitations, and inform future advances. Read more

Links: ISMB 2022 Tutorial Materials

Michelle M. Li, Marinka Zitnik
International Conference on Machine Learning (ICML) Workshop on Computational Biology 2021

Deep Contextual Learners for Protein Networks

Spatial context is central to understanding health and disease. Yet reference protein interaction networks lack such contextualization, thereby limiting the study of where protein interactions likely occur in the human body and how they may be altered in disease. Contextualized protein interactions could better characterize genes with disease-specific interactions and elucidate diseases’ manifestation in specific cell types. Here, we introduce AWARE, a graph neural message passing approach to inject cellular and tissue context into protein embeddings. We construct a multi-scale network of the Human Cell Atlas and apply AWARE to learn protein, cell type, and tissue embeddings that uphold cell type and tissue hierarchies. We demonstrate AWARE’s utility on the novel task of predicting whether a protein is altered in disease and where that association most likely manifests in the human body. Read more

Selected for oral spotlight presentation and received Best Poster Award.

Emily Alsentzer*, Samuel Finlayson*, Michelle M. Li, Marinka Zitnik
NeurIPS 2020
*
Equal contribution (alphabetical)

Subgraph Neural Networks

Deep learning methods for graphs achieve remarkable performance on many node-level and graph-level prediction tasks. However, despite the proliferation of the methods and their success, prevailing Graph Neural Networks (GNNs) neglect subgraphs, rendering subgraph prediction tasks challenging to tackle in many impactful applications. Further, subgraph prediction tasks present several unique challenges: subgraphs can have non-trivial internal topology, but also carry a notion of position and external connectivity information relative to the underlying graph in which they exist. Here, we introduce SubGNN, a subgraph neural network to learn disentangled subgraph representations. We propose a novel subgraph routing mechanism that propagates neural messages between the subgraph’s components and randomly sampled anchor patches from the underlying graph, yielding highly accurate subgraph representations. Read more

Links: Project Website, GitHub Repository, Data


Identifying mechanisms of antibiotic resistance

Matthew G. Durrant, Michelle M. Li, Benjamin A. Siranosian, Stephen B. Montgomery, Ami S. Bhatt
Cell Host & Microbe 2020

A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation

Mobile genetic elements (MGEs) contribute to bacterial adaptation and evolution; however, high-throughput, unbiased MGE detection remains challenging. We describe MGEfinder, a bioinformatic toolbox that identifies integrative MGEs and their insertion sites by using short-read sequencing data. MGEfinder identifies the genomic site of each MGE insertion and infers the identity of the inserted sequence. We apply MGEfinder to 12,374 sequenced isolates of 9 prevalent bacterial pathogens, including Mycobacterium tuberculosis, Staphylococcus aureus, and Escherichia coli, and identify thousands of MGEs, including candidate insertion sequences, conjugative transposons, and prophage elements. The MGE repertoire and insertion rates vary across species, and integration sites often cluster near genes related to antibiotic resistance, virulence, and pathogenicity. MGE insertions likely contribute to antibiotic resistance in laboratory experiments and clinical isolates. Read more

Links: GitHub Repository

Xi Yang, Marjan M. Hashemi, Nadya Andini, Michelle M. Li, …, Samuel Yang
Journal of Antimicrobial Chemotherapy 2020

RNA markers for ultra-rapid molecular antimicrobial suscep- tibility testing in fluoroquinolone-treated Klebsiella pneumoniae

Traditional antimicrobial susceptibility testing (AST) is growth dependent and time-consuming. With rising rates of drug-resistant infections, a novel diagnostic method is critically needed that can rapidly reveal a pathogen’s antimicrobial susceptibility to guide appropriate treatment. We used RNA sequencing to investigate the potential of RNA markers for rapid molecular AST using Klebsiella pneumoniae and ciprofloxacin as a model. As a result, we identified RNA signatures that were induced or suppressed following exposure to ciprofloxacin. Significant shifts at the transcript level were observed as early as 10 min after antibiotic exposure. Our results suggest that RNA signature is a promising approach to AST development, resulting in faster clinical diagnosis and treatment of infectious disease. This approach is potentially applicable in other models including other pathogens exposed to different classes of antibiotics. Read more


Characterizing gut and vaginal microbiomes to assess effects of intervention

Jennifer Dawkins*, Joyce B. Kang*, Michelle M. Li*, Aditya Misra*, Jon Arizti Sanz*, Tami Lieberman
Recruitment Closed
* Equal contribution (alphabetical)

Assessment of clothing breathability on vaginal microbiome and yeast-bacterial interactions

There is much “conventional wisdom” regarding the adverse effects of wearing tight clothing and non-breathable undergarments too often; however, a controlled scientific study has yet to be done to assess if non-breathable undergarments and clothing adversely affect the microbiome. A healthy vaginal microbiome is important because bacterial vaginosis (BV) and yeast infections can commonly arise when the microbiome is thrown off. These conditions are not only uncomfortable, but BV especially is correlated with higher preterm birth and less protection against STDs. Still, little is known about the reasons BV or an overgrowth of fungi (yeast infections) can develop. One possible way the microbiome could be affected is through a change in temperature, humidity and acidity due to clothing choice, which we plan to gain insight into through this study. Specifically, we would like to compare the effects of breathable and non-breathable clothing/undergarments on the vaginal microbiome and mycobiome (fungi), and use the data gathered to look at interactions between bacteria and fungi (e.g. Candida).

Links: Project website

Christopher J. Severyn, Benjamin A. Siranosian, Sandra Tian-Jiao Kong, Angel Moreno, Michelle M. Li, …, Ami S. Bhatt, Jennifer S. Whangbo
Journal of Clinical Investigation Insights 2022

Microbiota dynamics in a randomized trial of gut decontamination during allogeneic hematopoietic cell transplantation

Gut decontamination (GD) can decrease the incidence and severity of acute graft- versus-host-disease (aGVHD) in murine models of allogeneic hematopoietic cell transplantation (HCT). Several HCT centers standardly practice GD with different antibiotic regimens. In this pilot study, we examined the impact of GD on the gut microbiome composition and incidence of aGVHD in HCT patients. We randomized 20 pediatric patients undergoing allogeneic HCT to receive (GD) or not receive (no-GD) oral vancomycin-polymyxin B from day -5 through neutrophil engraftment. We evaluated shotgun metagenomic sequencing of serial stool samples to compare the composition and diversity of the gut microbiome between study arms. We assessed clinical outcomes in the 2 arms and performed strain-specific analyses of pathogens that caused bloodstream infections (BSI). While GD did not differentially impact Shannon diversity or clinical outcomes, our findings suggest that GD may protect against gut-derived BSI in HCT patients by decreasing the prevalence or abundance of gut microbial pathogens. Read more