Reference-free reconstruction and quantification of transcriptomes from long-read sequencing

Ivan de la Rubia, Joel A Indi, Silvia Carbonell, Julien Lagarde, M.Mar Albà, Eduardo Eyras (2020): Reference-free reconstruction and quantification of transcriptomes from long-read sequencing. In: bioRxiv, 2020.

Abstract

Single-molecule long-read sequencing provides an unprecedented opportunity to measure the transcriptome from any sample. However, current methods for the analysis of transcriptomes from long reads rely on the comparison with a genome or transcriptome reference, or use multiple sequencing technologies. These approaches preclude the cost-effective study of species with no reference available, and the discovery of new genes and transcripts in individuals underrepresented in the reference. Methods for the assembly of DNA long-reads cannot be directly transferred to transcriptomes since their consensus sequences lack the interpretability as genes with multiple transcript isoforms. To address these challenges, we have developed RATTLE, the first method for the reference-free reconstruction and quantification of transcripts from long reads. Using simulated data, transcript isoform spike-ins, and sequencing data from human and mouse tissues, we demonstrate that RATTLE accurately performs read clustering and error-correction. Furthermore, RATTLE predicts transcript sequences and their abundances with accuracy comparable to reference-based methods. RATTLE enables rapid and cost-effective long-read transcriptomics in any sample and any species, without the need of a genome or annotation reference and without using additional technologies.

BibTeX (Download)

@article{delaRubia2020,
title = {Reference-free reconstruction and quantification of transcriptomes from long-read sequencing},
author = {Ivan de la Rubia, Joel A Indi, Silvia Carbonell, Julien Lagarde, M.Mar Albà, Eduardo Eyras},
url = {https://www.biorxiv.org/content/10.1101/2020.02.08.939942v1},
year  = {2020},
date = {2020-02-09},
journal = {bioRxiv},
abstract = {Single-molecule long-read sequencing provides an unprecedented opportunity to measure the transcriptome from any sample. However, current methods for the analysis of transcriptomes from long reads rely on the comparison with a genome or transcriptome reference, or use multiple sequencing technologies. These approaches preclude the cost-effective study of species with no reference available, and the discovery of new genes and transcripts in individuals underrepresented in the reference. Methods for the assembly of DNA long-reads cannot be directly transferred to transcriptomes since their consensus sequences lack the interpretability as genes with multiple transcript isoforms. To address these challenges, we have developed RATTLE, the first method for the reference-free reconstruction and quantification of transcripts from long reads. Using simulated data, transcript isoform spike-ins, and sequencing data from human and mouse tissues, we demonstrate that RATTLE accurately performs read clustering and error-correction. Furthermore, RATTLE predicts transcript sequences and their abundances with accuracy comparable to reference-based methods. RATTLE enables rapid and cost-effective long-read transcriptomics in any sample and any species, without the need of a genome or annotation reference and without using additional technologies.},
keywords = {Long read, Nanopore, transcriptome}
}