T-/B-Cell Receptor (TCR/BCR) Embedding

Heewook Lee, Pengfei Zhang, (2024-25).

Background

T-cell and B-cell receptors define what they will bind to and, by extension, what an immune system recognizes and combats. Language models can be a useful tool in predicting what epitopes TCRs and BCRs may bind to.

Research Goals

We are interested in devising new language model architectures to more effectively embed TCR/BCR sequences. Students will build/test various embedding architecture using popular models such as GRU, LSTM, and Transformer. Specifically, students will (1) identify testable hypothesis with varying model objective (masked language vs next-token prediction) as well as model architecture, (2) implement embedding models, (3) train embedding models, and (4) evaluate by training/testing downsteam task models.

Skills Needed

Linux, Python

Skills Gained

Python, pytorch, ML model training/testing/validation