AI Campus

Summary:

Graph Neural Networks for Social and Biological Network Analysis

This project introduces the cutting-edge field of graph neural networks (GNNs) with a focus on embedding actors within social or biological/cellular networks. The aim is to understand how GNNs can capture and represent complex relationships in non-Euclidean data, like social networks. We will explore how these embeddings can effectively represent the intricate connections and interactions between individuals or entities for predictive analytics.

Description:

This project introduces the cutting-edge field of graph neural networks (GNNs) with a focus on embedding actors within social or biological/cellular networks. The aim is to understand how GNNs can capture and represent complex relationships in non-Euclidean data, like social networks. We will explore how these embeddings can effectively represent the intricate connections and interactions between individuals or entities for predictive analytics. Use of GNNs will be compared to other state-of-the-art community detection and social influence models.

Project Goals

Introduce students to the basics and advanced concepts of Graph Neural Networks (GNNs).
Analyze and model relationships in both social and biological networks using GNNs.
Compare the effectiveness of GNNs with traditional network analysis models like community detection algorithms and logistic regression.
Provide hands-on experience with various GNN models, including Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs).

Summary of Learning Objectives

Graph theory fundamentals.
Python programming for network analysis.
Machine learning techniques for network data.
Traditional and deep learning network analysis methods.
Practical application of GNNs to real-world datasets in social and biological contexts.

ML/Data Science Methods

Graph Theory: Learn the basics of nodes, edges, and attributes, and how they represent real-world entities and their relationships.
Traditional Network Analysis: Apply methods like centrality measures, community detection, and network visualization.
Deep Learning Network Analysis: Implement and compare Graph Neural Networks such as GCNs and GATs.
Comparison with Traditional Methods: Use logistic regression and other algorithms that do not account for network structure to highlight the benefits of GNNs.

Datasets

Physician Sharing Network: Two different networks. The tutorial focuses on physicians connected based on survey data describing their professional and personal interactions. https://schochastics.github.io/networkdata/reference/physicians.html GNNs in Neuroscience: BrainGNN
Lymphocyte Detection in Tissue Slides: Contains tissue histology images with localized immune cells tagged based on their imaging/immunofluorescence features. See https://proceedings.mlr.press/v194/reddy22a.html .
Fake News Detection on Twitter: Contains data on fake news propagation patterns on Twitter. See https://github.com/safe-graph/GNN-FakeNews https://www.sciencedirect.com/science/article/pii/S1568494623002533 .
Neuroscience Connectome Analysis: Extract time-dependent correlations in fMRI to understand how connectivity evolves over time.

Getting Started Guide

Prerequisites

Basic understanding of Python, machine learning, and network analysis. R is optional.
See paper from: https://github.com/jlevy44/GCN4R

Environment Setup

Install Python and necessary libraries:

pip install networkx pandas numpy torch opencv-python matplotlib seaborn scikit-learn plotly rpy2 cdlib libpysal spreg captum pysnooper fire

Install Torch Geometric
Recommended IDE: Jupyter Notebook or VS Code.

Data Preparation and Tasks

Physician Innovation Network:

Download the dataset from Physician Sharing Network Dataset.
Load and preprocess the data using R/pandas. Already provided in
Construct the physician network graph using NetworkX.
Task: Predict the year physicians adopted the new technology. Compare GNNs with classical logistic regression and other non-graph based methods.

Lymphocyte Detection in Tissue Slides:

Download the dataset from Lymphocyte Detection Dataset.
Load and preprocess the images using OpenCV.
Construct the spatial network of cells using NetworkX.
Description: Nodes: Cells, Attributes: CNN features, Edges: K-nearest neighboring cells.
Task: Predict whether a cell is immune or non-immune based on its spatial features. Compare GNNs with traditional image classification / MLP / etc. models.

Fake News Detection on Twitter:

Download the dataset from Fake News Detection Dataset.
Load and preprocess the data using pandas.
Construct the social network graph using NetworkX.
Description: Nodes: Users, Edges: Retweets, Attributes: User profile and/or BERT-derived embeddings of tweets.
Task: Determine whether news is fake based on tweet text content and information from retweeting pattern. Does retweeting pattern inform whether news is fake? Compare GNNs with traditional text classification models on original tweet.

GNNs neuroscience:

Download the dataset from GNNs in Neuroscience
Run through above notebook, may need to alter the code for latest package versions.
Description: fMRI signals within local brain regions of interest, which serve as the nodes. Edges are defined by the correlation between the fMRI signals within a fixed time period. Graph evolves over time as correlations within fixed intervals change.
Goal: Predict the age of the individual based on the evolving brain connectivity patterns. Compare GNNs with traditional time-series analysis methods, image analysis methods, average correlations, etc.. Interpret models with Captum, GNNExplainer, etc. What are important time-points/connections?

Model Implementation

Graph Neural Networks:
- Implement basic GNN models using PyTorch Geometric.
- Compare GCNs and GATs in terms of performance and interpretability.
Analysis and Visualization:
- Apply network analysis and compare predictive techniques and visualize the results using Matplotlib and Seaborn.

Data Download Links

Physician Innovation Network Dataset, see X/A_physician.csv or load in R and convert to networkx via R Data
Lymphocyte Detection Dataset
Fake News Detection Dataset

Project Files

Download Project Files (ZIP)

Getting Started

Check out tutorial notebooks here: Tutorial Notebooks

Project Prepared By:

Joshua Levy - GitHub | LinkedIn | LevyLab

File:

Tags:

deep learning

Project Details