AI Campus Logo @ Cedars-Sinai

Project Details

Summary:


Project Thumbnail
Natural Language Processing in Cancer: Extracting Diagnostic Insights from Pathology Reports

This project focuses on leveraging natural language processing (NLP) techniques to extract critical information from pathology reports. Participants will gain hands-on experience with a wide range of text classification methods, from traditional tf-idf analysis to cutting-edge LLMs. The Cancer Genome Atlas (TCGA) pathology report corpus will be used for the development of advanced NLP technologies that can ultimately enhance patient diagnosis, treatment selection, and cancer care.

Description:


This project focuses on leveraging natural language processing (NLP) techniques to extract critical information from pathology reports. Participants will gain hands-on experience with a wide range of text classification methods, from traditional tf-idf analysis to cutting-edge large language models. The Cancer Genome Atlas (TCGA) pathology report corpus used in this project offers a unique opportunity for the development of advanced NLP technologies that can ultimately enhance patient diagnosis, treatment selection, and many other aspects of cancer care.

 

Datasets

 

Code

 

Publications

  • Kefeli, J., Berkowitz, J., Acitores Cortina, J.M. et al. Generalizable and automated classification of TNM stage from pathology reports with external validation. Nat Commun 15, 8916 (2024). https://doi.org/10.1038/s41467-024-53190-9
  • Kefeli, J., Tatonetti, N. TCGA-Reports: A machine-readable pathology report resource for benchmarking text-based AI models. Patterns 5(3), 100933 (2024). https://doi.org/10.1016/j.patter.2024.100933

 

Project Prepared By:
Guillermo Lopez Garcia – Guillermo.LopezGarcia@cshs.org
Takeshi Onishi – Takeshi.Onishi@cshs.org

File:


Tags: