Background
T-cell and B-cell receptors define what they will bind to and, by extension, what an immune system recognizes and combats.
Language models can be a useful tool in predicting what epitopes TCRs and BCRs may bind to.
Research Goals
We are interested in devising new language model architectures to more effectively embed TCR/BCR sequences.
Students will build/test various embedding architecture using popular models such as GRU, LSTM, and Transformer.
Specifically, students will (1) identify testable hypothesis with varying model objective (masked language vs next-token prediction) as well as model architecture, (2) implement embedding models, (3) train embedding models, and (4) evaluate by training/testing downsteam task models.
Skills Needed
Linux, Python
Skills Gained
Python, pytorch, ML model training/testing/validation