Predicting Off-Target Drug Effects Using Gene and Phenotype Ontologies

01 — The Problem

How do we know when
a drug is causing harm?

💊

Think of it like this: When you take a medication, it's designed to hit one specific target in your body. But drugs are imprecise and they can also interact with other proteins and biological systems, causing unexpected side effects. These are called off-target effects. Our project builds a system that can predict which side effects are likely and why they happen, using biology as the explanation.

Detection without explanation

The FDA's FAERS database tracks which side effects get reported after a drug is approved — but it only tells us what happened, not why. A drug causing 500 liver injury reports doesn't tell us which biological pathway caused it.

Underreporting is a real risk

Some dangerous side effects are rarely reported, not simply because they don't happen, but because they're hard to connect to a drug, or simply overlooked. Our framework uses biological pathway knowledge to surface these hidden signals before they become clinical problems.

02 — Research Question

Can biology explain
what statistics miss?

Can we use structured knowledge about genes and biological processes to predict drug side effects and use it to explain why they happen, beyond what frequency-based reporting alone can reveal?

In simpler terms: instead of just counting how often side effects are reported, can we trace the molecular chain of events, from a drug binding to a protein, to a biological process going wrong, to a patient experiencing harm?

04 — Data Sources

Four datasets, one
connected picture

We combined four publicly available databases to build our reasoning network and validate our findings with each contributing a different layer of knowledge.

DrugBank

Tells us which proteins Pralsetinib binds to inside the body. Think of proteins as locks and DrugBank maps which locks this drug's key fits into, including ones it wasn't designed for. We identified 11 protein targets for Pralsetinib.

Gene Ontology (GO)

A standardized biological dictionary that categorizes what each protein does whether it's involved in cell death, immune signaling, or DNA repair. We parsed 38,736 GO terms and 13,909 parent-child relationships to build the knowledge graph.

GO Annotation (GOA)

A database that maps individual human proteins to specific GO biological process terms. While GO defines what processes exist, GOA tells us which proteins participate in each process. Used for both the knowledge graph (159 GO terms across 3 primary targets) and as the statistical background for the mechanistic enrichment analysis (34,737 annotated human proteins).

FAERS

The FDA's real-world side effect database. After a drug hits the market, doctors, patients, and companies submit reports of observed adverse events. We extracted 2,196 reports across 200 distinct adverse events for Pralsetinib which is our ground-truth frequency signal.

06 — Modeling Strategy

Four models, each asking
a different question

We built a progressive pipeline starting from a simple frequency count, adding biological reasoning, then directly measuring how independent those two signals are. A fourth model uses the full knowledge graph topology to predict drug-target relationships using graph neural network embeddings.

Model 1 — Baseline

FAERS Frequency Ranking

"What side effects are reported most often?"

We rank side effect themes purely by how many reports appear in the FDA database. This is the standard approach which is useful, but ultimately is limited. High counts may reflect reporting bias, not actual biological risk.

Report Count → Theme Rank

Key Output Top themes by frequency: Pneumonia, Hepatic Injury, Infections. These are what the data surface but are they the whole story?

Model 2 — Hybrid Scoring

Bayesian Path Scoring

"What does biology say should be risky?"

We combine FAERS report frequency with the number of biological pathway connections in our knowledge graph. A side effect with few reports but many biological links gets elevated which reveals potentially underreported signals.

Drug → Protein → GO Process → Side Effect

Key Finding Cell Death / Apoptosis and Renal Effects emerge as high-biology / low-reporting candidates, which may be potentially underreported adverse events invisible to Model 1.

Model 3 — Complementarity Analysis

LOOCV Logistic Regression
+ KG–FAERS Complementarity

"Are biology and reporting measuring different things?"

We fit two logistic regression models , one using only knowledge graph features, one adding FAERS frequency, and evaluated both with leave-one-out cross-validation (LOOCV). The near-chance LOOCV AUC on the KG-only model revealed that the scientific value of KG features is in their structural independence from FAERS and not in replacing it.

KG path count vs. FAERS count → Spearman ρ ≈ 0.18

0.18 Spearman ρ
KG vs. FAERS

4.53× log_faers weight
vs. KG features

Key Finding KG and FAERS signals are nearly independent and they measure fundamentally different dimensions of risk. Cell Death/Apoptosis ranks 13th in FAERS but 5th in KG (12 paths, score 0.55). A 2×2 quadrant framework classifies themes as: validated, underreported, class-effect, or low-priority.

Model 4 — Graph Neural Network

Graph Neural Network
+ Mechanistic Enrichment

"Can the full graph topology predict drug-target relationships?"

A graph convolutional network is trained on the full knowledge graph, learning node embeddings from the entire topology of drug–protein–pathway–disease connections. It predicts drug-target relationships using the structure of the graph itself, rather than hand-crafted path features.

Full Knowledge Graph → GNN node embeddings → Drug-target relationship prediction

Key Limitation Reduced interpretability compared to path-based models and the GNN learns which graph structures matter, but cannot explain why a particular pathway drives a prediction the way Model 2 can.

Independent validation via Mechanistic Enrichment Analysis (Fisher's Exact Test against 34,737 GOA proteins):

MechanismGO termsOdds RatioFDR

Cell Adhesion7613.00.010 ✓

Immune System5517.70.010 ✓

Cell Death729.10.036 ✓

Oxidative Stress288.60.119 —

Key Finding Cell Death independently confirmed (OR=9.1, FDR=0.036) using enrichment analysis against the full GOA background which is validating the KG's nomination of Cell Death/Apoptosis as an underreported candidate. Cell Adhesion (OR=13) is the strongest signal overall.

Technical Architecture

The knowledge graph encodes four node types: Drug (Pralsetinib, CHEMBL4582651), Protein targets (JAK2, FLT3, RET — 11 total from DrugBank), GO Biological Process terms (159 terms), and Toxicity Themes (13 categories). Model 2 uses a Bayesian noisy-OR framework: posterior ∝ prior(FAERS) × noisy-OR(KG paths), with base_prob=0.18 and alpha_prior=1.5. Model 3A uses KG-only features (path_count, go_overlap_ratio, max_path_score, mean_path_score, theme_specificity, n_proteins, has_direct_maps_to); Model 3B adds log_faers. LOOCV holds out each of the 13 themes in turn. Model 4 applies a graph convolutional network over the full knowledge graph topology to learn node embeddings and predict drug-target relationships; interpretability is reduced compared to path-based approaches. Independent mechanistic validation applies one-sided Fisher's Exact Test against 34,737 annotated human proteins from the GO Annotation (GOA) database, with Benjamini-Hochberg FDR correction (threshold: FDR < 0.05).

07 — Results

What we found

◈

Two signals, not one

Report frequency and biological pathway evidence are nearly uncorrelated Spearman ρ ≈ 0.18 across 13 toxicity themes. This is the project's central finding: our biology-based approach isn't replicating FAERS and it's discovering genuinely new information.

A 2×2 quadrant framework classifies every theme by KG evidence vs. FAERS frequency:

QuadrantKGFAERS

✓ ValidatedHighHigh

⚑ Underreported candidateHighLow

↔ Class / indirect effectLowHigh

— Low priorityLowLow

Cell Death/Apoptosis, Renal, and Proliferative all fall in the underreported candidate quadrant, strong biological evidence, low FAERS representation.

◉

Cell Death confirmed by two independent methods

Cell Death/Apoptosis was nominated by KG path scoring (rank 5, 12 paths, score 0.55) despite ranking 13th in FAERS. A completely separate Fisher's Exact Test enrichment analysis independently confirmed it and is testing Pralsetinib's 11 protein targets against 34,737 human proteins in the GO background:

MechanismOdds RatioFDR

Cell Adhesion13.00.010 ✓

Immune System7.70.010 ✓

Cell Death9.10.036 ✓

Oxidative Stress8.60.119 —

Convergence across two independent methods like KG path scoring and GO enrichment testing are substantially strengthens Cell Death/Apoptosis as an underreported risk candidate.

◇

LOOCV Logistic Regression reveals complementarity, not competition

The LOOCV logistic regression produced near-chance AUC on KG-only features which is expected with 13 themes from a single drug. But this is a diagnostic finding, not a failure: when FAERS frequency is added as a feature, it captures over 95% of model weight, completely masking KG signal.

KG features

max 0.20

log_faers

4.53 coef

This confirms that KG and FAERS are not competing predictors but rather they are orthogonal signals. The KG's value is in the quadrant analysis and rank shifts, not within-theme classification.

The Bottom Line

KG-derived mechanistic features and post-marketing report frequency are near-uncorrelated (ρ ≈ 0.18). Two independent analyses, KG path scoring and GO enrichment testing, which are both nominate Cell Death/Apoptosis as an underreported risk despite its rank-13 FAERS position. Together, these findings demonstrate that ontology integration surfaces complementary signal that frequency-based pharmacovigilance structurally cannot.

08 — Key Takeaways

What this framework
tells us

I Biology and reporting are measuring different things. The near-zero Spearman correlation (ρ ≈ 0.18) between KG path count and FAERS frequency across 13 themes is the project's core result which it proves the neuro-symbolic approach adds signal that frequency analysis structurally cannot produce.
II Cell Death / Apoptosis was independently nominated as an underreported risk by two separate analyses: KG path scoring (rank 5, 12 paths) and GO enrichment testing (OR = 9.1, FDR = 0.036), despite ranking 13th in FAERS. Convergence across orthogonal methods is the strongest form of computational evidence.
III The LOOCV logistic regression produced near-chance AUC on KG-only features which is not a failure, but a diagnostic finding. It ultimately reveals that KG features are structurally independent from FAERS frequency, which is exactly why they're valuable for the quadrant analysis rather than head-to-head classification.
IV Cell Adhesion emerged as the most significantly enriched mechanism in the Model 4 enrichment validation (OR = 13, FDR = 0.010) which is a signal not captured in the primary KG theme taxonomy. This points to a gap worth closing in future versions of the framework.

09 — Impact & Limitations

Why this matters

Most AI tools in healthcare produce a probability such as "70% chance of side effect X." This framework produces an explanation: "this drug causes side effect X because it binds protein Y, which disrupts biological process Z."

That distinction matters for clinicians, regulators, and patients who need to understand and trust safety predictions and not just receive them.

Interpretable AI Drug Safety Pharmacovigilance Knowledge Graphs Neuro-Symbolic AI

Limitations

Known constraints of this study:

Single-drug case study where the framework was applied only to Pralsetinib. Generalizability across other drugs requires further validation. KG coverage is limited to 3 primary targets (JAK2, FLT3, RET). Off-target proteins like VEGFR2 which are implicated in Pralsetinib's pulmonary toxicity are absent, explaining the low KG rank of Pulmonary despite its rank-2 FAERS position. FAERS reflects voluntary post-market reporting. Known biases include underreporting of events attributed to cancer progression and overrepresentation of administratively common categories. All findings are computational, Cell Death/Apoptosis and other nominated candidates are hypotheses for prospective monitoring, not confirmed clinical signals.

10 — Glossary

Key terms, plainly defined

A reference for anyone new to the biology or data science concepts in this project.

Off-Target Effect

A side effect caused by a drug interacting with proteins it wasn't designed to interact with. Like a key accidentally opening the wrong lock.

FAERS

FDA Adverse Event Reporting System. A database of post-market side effect reports submitted by doctors, patients, and pharmaceutical companies after a drug is approved.

Knowledge Graph

A network of connected facts represented as nodes (entities) and edges (relationships). Enables structured reasoning across large datasets by following chains of relationships.

Gene Ontology (GO)

A standardized biological dictionary that classifies what proteins do — grouping them into categories like "cell death," "immune response," or "DNA repair."

Biological Pathway

A sequence of molecular events inside a cell — like a chain reaction where protein A activates protein B, which triggers effect C. Disrupting one step can cause downstream harm.

Neuro-Symbolic AI

An AI approach combining neural networks (which find patterns in data) with symbolic reasoning (which follows logical rules). It gains interpretability from the symbolic component.

AUC-ROC

Area Under the Receiver Operating Characteristic curve. A measure of classifier performance — 0.5 = random guessing, 1.0 = perfect. 0.644 means meaningfully better than chance.

Spearman Correlation (ρ)

A statistical measure of how closely two rankings agree. Values near 0 mean they're essentially independent signals; near 1 means they move together perfectly.

FDR / q-value

False Discovery Rate. When running many statistical tests, FDR adjustment controls how often we'd expect a "significant" result by pure chance. A q-value < 0.05 is considered reliable.

Pralsetinib

A targeted cancer drug approved for certain types of lung and thyroid cancer. It works by blocking a specific mutated protein (RET kinase). Our case study drug.

Logistic Regression

A machine learning model that predicts a binary outcome (serious / not serious) by finding which input features best separate the two groups. Interpretable and well-suited for medical data.

Cross-Validation (LOOCV)

Leave-one-out cross-validation holds out one data point at a time as the test case and trains on the rest. The most rigorous form of cross-validation — used here because we only had 13 toxicity themes to work with.

Fisher's Exact Test

A statistical test that checks whether a group of proteins is over-represented in a biological process compared to all human proteins. An odds ratio of 9.1 means Pralsetinib targets are 9× more likely to be involved in cell death than a random set of proteins.

DAG (Directed Acyclic Graph)

A network where arrows flow in one direction and never loop back. The Gene Ontology is structured as a DAG — "apoptosis" is a child of "cell death," which is a child of broader biological processes. This lets us traverse the hierarchy to collect all related terms.

Mechanistic Enrichment

A statistical analysis testing whether a drug's protein targets cluster within specific biological processes more than expected by chance. Provides independent validation that a predicted biological mechanism is real — not just an artifact of the knowledge graph structure.

Predicting Off-Target
Drug Effects

How do we know when
a drug is causing harm?

Detection without explanation

Underreporting is a real risk

Can biology explain
what statistics miss?

Connecting molecules
to patient outcomes

Four datasets, one
connected picture

DrugBank

Gene Ontology (GO)

GO Annotation (GOA)

FAERS

The network of biological
connections

Four models, each asking
a different question

FAERS Frequency Ranking

Bayesian Path Scoring

LOOCV Logistic Regression
+ KG–FAERS Complementarity

Graph Neural Network
+ Mechanistic Enrichment

What we found

Two signals, not one

Cell Death confirmed by two independent methods

LOOCV Logistic Regression reveals complementarity, not competition

What this framework
tells us

Why this matters

Limitations

Key terms, plainly defined

Meet the researchers

Predicting Off-TargetDrug Effects

How do we know whena drug is causing harm?

Detection without explanation

Underreporting is a real risk

Can biology explainwhat statistics miss?

Connecting moleculesto patient outcomes

Four datasets, oneconnected picture

DrugBank

Gene Ontology (GO)

GO Annotation (GOA)

FAERS

The network of biologicalconnections

Four models, each askinga different question

FAERS Frequency Ranking

Bayesian Path Scoring

LOOCV Logistic Regression+ KG–FAERS Complementarity

Graph Neural Network+ Mechanistic Enrichment

What we found

Two signals, not one

Cell Death confirmed by two independent methods

LOOCV Logistic Regression reveals complementarity, not competition

What this frameworktells us

Why this matters

Limitations

Key terms, plainly defined

Meet the researchers

Predicting Off-Target
Drug Effects

How do we know when
a drug is causing harm?

Can biology explain
what statistics miss?

Connecting molecules
to patient outcomes

Four datasets, one
connected picture

The network of biological
connections

Four models, each asking
a different question

LOOCV Logistic Regression
+ KG–FAERS Complementarity

Graph Neural Network
+ Mechanistic Enrichment

What this framework
tells us