AI Liquid Biopsy Detects Cancer from Blood at Vanishingly Low Tumor Fractions
Fate-AI integrates DNA fragmentation and methylation signals to detect 10 cancer types from blood with AUCs up to 0.97.
Summary
Researchers developed Fate-AI, a machine learning framework that analyzes cell-free DNA from blood samples to detect cancer with remarkable sensitivity. The system combines two types of genomic data — DNA fragmentation patterns and methylation signals — from low-pass whole-genome sequencing and cfMeDIP-seq. Tested across 1,219 plasma samples spanning 10 cancer types and multiple labs, Fate-AI outperformed existing methods. It detected tumor signals at fractions as low as 1-in-100,000 DNA molecules, tracked disease progression longitudinally, and classified tissue of origin with AUCs ranging from 0.84 to 0.97 across six cancer types. A key innovation is per-sample normalization that reduces batch effects between different labs and sequencing platforms.
Detailed Summary
Cancer liquid biopsy — detecting tumor signals in blood — holds enormous promise for early detection and monitoring, but current methods struggle when tumor-derived DNA makes up only a tiny fraction of circulating cell-free DNA (cfDNA). Tumor-informed targeted approaches offer high specificity but miss tumor evolution and acquired resistance. Tumor-naive genome-wide methods increase sensitivity but sacrifice specificity, especially at low tumor fractions. A robust method that generalizes across labs and sequencing platforms has remained elusive.
The authors developed Fate-AI (Fragmentomics Analysis for Tumor Evaluation with AI), a multimodal framework integrating fragmentomic and methylation-derived features from two complementary assays: low-pass whole-genome sequencing (LPWGS) and cell-free methylated DNA immunoprecipitation sequencing (cfMeDIP-seq). The key innovation is a knowledge-informed feature selection strategy that focuses on genomic regions recurrently altered in cancer — including copy number alteration hotspots and tissue-specific methylation loci — rather than scanning the entire genome indiscriminately. This mirrors the specificity of tumor-informed approaches while retaining the breadth of tumor-naive methods.
A central methodological advance is a per-sample Fragment Differential Distribution (FDD) normalization, which computes the difference in fragment length distributions between frequently amplified versus frequently deleted genomic regions within the same sample. In cancer samples, FDD showed pronounced opposite-phase modulation: a positive deviation peaking at ~140 bp (mono-nucleosomal DNA without linker) and a negative deviation at ~170 bp. Healthy controls showed nearly flat profiles. Because this normalization is internal to each sample, it largely cancels out batch effects arising from different sequencing centers, reagent batches, and protocols — a persistent problem that has undermined the cross-cohort generalizability of prior methods.
The study evaluated Fate-AI on 1,219 total plasma samples: 432 newly profiled cases (280 with both assays performed), plus 787 samples from four independent published datasets spanning colorectal, lung, breast, prostate, renal cell carcinoma, melanoma, pancreatic, Ewing sarcoma, mesothelioma, and multiple myeloma cancers. In experimental serial dilution experiments, Fate-AI detected tumor-derived signals at tumor fractions as low as 10⁻⁵ — one tumor DNA molecule in 100,000 — substantially lower than state-of-the-art comparators. In tissue-of-origin classification, AUCs ranged from 0.84 to 0.97 across six cancer types. Longitudinal tracking showed Fate-AI scores correlating with disease stage and anticipating clinical relapse by months before imaging or clinical progression was apparent.
The clinical implications are significant. Early-stage cancer detection currently relies on imaging and invasive biopsies; a blood test with this sensitivity could enable population screening and MRD monitoring after treatment. The cross-cohort generalizability — validated on samples from multiple independent labs in Italy and the United States — is particularly important for real-world deployment. Limitations include the preprint status of this work (not yet peer-reviewed), the retrospective nature of most cohorts, incomplete paired assay data for all samples, and the need for prospective validation studies before clinical adoption.
Key Findings
- Fate-AI detected tumor-derived cfDNA signals at tumor fractions as low as 10⁻⁵ (1-in-100,000 molecules) in experimental dilution series
- Tissue-of-origin classification achieved AUCs ranging from 0.84 to 0.97 across six cancer types
- Evaluated on 1,219 total plasma samples spanning 10 cancer types from multiple independent labs across Italy and the USA
- 432 newly profiled cases included, with 280 having both cfMeDIP-seq and LPWGS performed in parallel
- Fragment Differential Distribution (FDD) showed a positive deviation peaking at ~140 bp and negative deviation at ~170 bp in cancer vs. near-flat profiles in healthy controls
- Longitudinal Fate-AI scores tracked disease progression and anticipated clinical relapse months before imaging-confirmed progression
- Recurrent CNA threshold of 25% captured CNAs in ≥90% of patients for most cancer types analyzed, with exceptions in Ewing sarcoma, pleural mesothelioma, and pancreatic cancer
Methodology
Fate-AI integrates LPWGS and cfMeDIP-seq data using knowledge-informed feature selection targeting recurrently altered genomic regions (copy number alteration hotspots and tissue-specific methylation loci). A per-sample Fragment Differential Distribution normalization compares fragment lengths in amplified vs. deleted regions within the same sample to mitigate cross-cohort batch effects. The model was trained and externally validated on 1,219 plasma samples from multiple independent cohorts across Italy and the United States, with performance benchmarked against published state-of-the-art liquid biopsy methods using AUC as the primary metric.
Study Limitations
This is a preprint and has not yet undergone formal peer review. Most cohorts are retrospective, and the model requires prospective validation in screening populations to establish real-world performance. Not all samples had both LPWGS and cfMeDIP-seq performed, limiting some multimodal analyses; no explicit conflicts of interest were disclosed in the available text.
Enjoyed this summary?
Get the latest longevity research delivered to your inbox every week.
Enter your email to subscribe:
