Zoeken
Zoeken kan via de modus 'eenvoudig zoeken' (één veld) of uitgebreid via 'geavanceerd zoeken' (meerdere velden). Zo kan je bv. zoeken op een combinatie van een auteursnaam (auteur), een jaartal (jaar) en een documenttype.
Boekenmand
Nuttige resultaten kan je aanvinken en toevoegen aan een mandje. De inhoud hiervan kan je exporteren of afdrukken (naar bv. PDF).
RSS
Op de hoogte blijven van nieuw toegevoegde publicaties binnen uw interessegebied? Dit kan door een RSS-feed (?) te maken van jouw zoekopdracht.
nieuwe zoekopdracht
FastqPuri: high-performance preprocessing of RNA-seq data
In: BMC Bioinformatics. BioMed Central: London. e-ISSN 1471-2105
| |
| Author keywords |
fastq; RNA-seq; Quality control; Preprocessing; Sequence data |
| Auteurs | | Top |
- Pérez-Rubio, P.
- Lottaz, C.
- Engelmann, J.C.
|
|
|
| Abstract |
Pérez-Rubioet al. BMC Bioinformatics (2019) 20:226 https://doi.org/10.1186/s12859-019-2799-0SOFTWAREOpen AccessFastqPuri: high-performancepreprocessing of RNA-seq dataPaula Pérez-Rubio1, Claudio Lottaz1and Julia C. Engelmann2*AbstractBackground:RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcriptexpression in high-throughput. While previously sequence alignment was a time demanding step, fast alignmentmethods and even more so transcript counting methods which avoid mapping and quantify gene and transcriptexpression by evaluating whether a read is compatible with a transcript, have led to significant speed-ups in dataanalysis. Now, the most time demanding step in the analysis of RNA-seq data is preprocessing the raw sequence data,such as running quality control and adapter, contamination and quality filtering before transcript or genequantification. To do so, many researchers chain different tools, but a comprehensive, flexible and fast software thatcovers all preprocessing steps is currently missing.Results:We here presentFastqPuri, a light-weight and highly efficient preprocessing tool for fastq data.FastqPuriprovides sequence quality reports on the sample and dataset level with new plots which facilitate decision making forsubsequent quality filtering. Moreover,FastqPuriefficiently removes adapter sequences and sequences frombiological contamination from the data. It accepts both single- and paired-end data in uncompressed or compressedfastq files.FastqPurican be run stand-alone and is suitable to be run within pipelines. We benchmarkedFastqPuriagainst existing tools and found thatFastqPuriis superior in terms of speed, memory usage, versatility andcomprehensiveness.Conclusions:FastqPuriis a new tool which covers all aspects of short read sequence data preprocessing. It wasdesigned for RNA-seq data to meet the needs for fast preprocessing of fastq data to allow transcript and genecounting, but it is suitable to process any short read sequencing data of which high sequence quality is needed, suchas for genome assembly or SNV (single nucleotide variant) detection.FastqPuriis most flexible in filtering undesiredbiological sequences by offering two approaches to optimize speed and memory usage dependent on the total sizeof the potential contaminating sequences.FastqPuriis available athttps://github.com/jengelmann/FastqPuri.Itisimplemented in C and R and licensed under GPL v3. |
IMIS is ontwikkeld en wordt gehost door het VLIZ.