Skip to content Skip to footer

This page lists bioinformatics tools and software that are installed across several of the BioCommons infrastructure partner systems, including Gadi, Australian BioCommons Tools and Workflows repository at NCI (project if89), Setonix, Bunya, and Galaxy Australia.

Please let us know if you have any feedback.

Loading...
Filter results by topic(s):
Select one or more topics from the Topic(s) column
Clear All Filters
Tool metadata Availability on Australian compute infrastructures
Tool Name Description Registry link Tool identifier (e.g. module name) Topic(s) Publications Containers available? (BioContainers) License Resources / documentation Galaxy Australia NCI (Gadi) NCI (if89) Pawsey (Setonix) QRIScloud / UQ-RCC (Bunya)
3D de novo assembly (3D-DNA) is a pipeline for de novo assembly using HiC. 3d-dna De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds MIT 201008
Mass screening of contigs for antimicrobial resistance or virulence genes. abricate ABRicate GPL-2.0 3 tools 1.0.0-gompi-2021a
De novo genome sequence assembler using short reads. abyss 2 publications ABySS GPL-3.0 ABySS 2.3.7+galaxy0 2.2.3
Another Gff Analysis Toolkit (AGAT) Suite of tools to handle gene annotations in any GTF/GFF format. agat 10.5281/zenodo.3552717 GPL-3.0 AGAT 1.4.0+galaxy0 1.4.0
The Assisted Model Building with Energy Refinement tool refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs); and a package of molecular simulation programs which includes source code and demos. amber 2 publications Generate MD topologies for small molecules 21.10+galaxy0 19.19.12 20 20-tools21 22
Consists of several independently developed packages that work well by themselves, and with Amber (Assisted Model Building with Energy Refinement) itself. The suite can also be used to carry out complete (non-periodic) molecular dynamics simulations (using NAB), with generalized Born solvent models. ambertools The Amber biomolecular simulation programs MMPBSA/MMGBSA 21.10+galaxy0
Software package specially developed for the study of genes’ primary structure. It uses gene sequences downloaded from public databases, as FASTA and GenBank, and it applies a set of statistical and visualization methods in different ways, to reveal information about codon context, codon usage, nucleotide repeats within open reading frames (ORFeome) and others. anaconda Statistical, computational and visualization methodologies to unveil gene primary structure features 2022.05
From https://anndata.readthedocs.io/en/latest/ "Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray." anndata 10.1101/2021.12.16.473007 BSD-3-Clause 4 tools
This tool can get annotation for a generic set of IDs, using the Bioconductor annotation data packages. Supported organisms are human, mouse, rat, fruit fly and zebrafish. The org.db packages that are used here are primarily based on mapping using Entrez Gene identifiers. More information on the annotation packages can be found at the Bioconductor website, for example, information on the human annotation package (org.Hs.eg.db) can be found here. annotatemyids MIT annotateMyIDs 3.18.0+galaxy0
Rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes. It integrates and cross-links with a large number of in silico secondary metabolite analysis tools that have been published earlier. antismash 5 publications antiSMASH Antismash 6.1.1+galaxy1
Convert various sequence formats to FASTA any2fasta GPL-3.0 0.4.2-gcccore-10.3.0
Apollo is a genome annotation viewer and editor. Apollo allows researchers to explore genomic annotations at many levels of detail, and to perform expert annotation curation, all in a graphical environment. apollo Apollo: a sequence annotation editor.
ARAGORN detects tRNA, mtRNA info about tmRNA, and tmRNA genes aragorn ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences Not licensed tRNA and tmRNA 0.6
argtable 2.13
Arriba is a command-line tool to detect gene fusions from RNA-Seq data based on the STAR aligner. In addition to fusions, it can detect exon duplications/inversions and truncations of genes (i.e., breakpoints in introns and intergenic regions). Arriba is the winner of the DREAM SMC-RNA Challenge. arriba MIT 3 tools
AUGUSTUS is a eukaryotic gene prediction tool. It can integrate evidence, e.g. from RNA-Seq, ESTs, proteomics, but can also predict genes ab initio. The PPX extension to AUGUSTUS can take a protein sequence multiple sequence alignment as input to find new members of the family in a genome. It can be run through a web interface (see https://bio.tools/webaugustus), or downloaded and run locally. augustus 9 publications Augustus Artistic-1.0 2 tools 3.4.03.5.0 3.4.03.5.0 3.4.0-foss-2021a3.5.0-foss-2022a (D) 3.4.0-foss-2021a3.5.0-foss-2022a (D)
AutoDock Vina is a new open-source program for drug discovery, molecular docking and virtual screening, offering multi-core capability, high performance and enhanced accuracy and ease of use. autodock_vina 10.1002/jcc.21334 4 tools
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids bakta 10.1099/mgen.0.000685 GPL-3.0 Bakta 1.9.2+galaxy0
BamTools provides a fast, flexible C++ API & toolkit for reading, writing, and managing BAM files. bamtools Bamtools: A C++ API and toolkit for analyzing and managing BAM files bamtools MIT 5 tools 2.5.2 2.5.1--hd03093a_10 2.5.2-gcc-10.3.02.5.2-gcc-11.3.0 (D) 2.5.2-gcc-10.3.02.5.2-gcc-11.3.0 (D)
Bamutil provides a serie of programs to work on SAM/BAM files. bamutil_diff An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data GPL-3.0 BamUtil diff 1.0.15+galaxy1
GUI program that allows users to interact with the assembly graphs made by de novo assemblers such as Velvet, SPAdes, MEGAHIT and others. It visualises assembly graphs, with connections, using graph layout algorithms. bandage 10.1093/bioinformatics/btv383 bandage GPL-3.0 2 tools
Predict the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). barrnap barrnap GPL-3.0 barrnap 1.2.2
basespace 1.5.3 matlab
BBMap is a fast splice-aware aligner for RNA and DNA. It is faster than almost all short-read aligners, yet retains unrivaled sensitivity and specificity, particularly for reads with many errors and indels. bbmap bbmap BSD-3-Clause BBTools: BBduk 39.08+galaxy0 38.93 38.96--h5c4e2a8_0 38.96-gcc-10.3.039.01-gcc-11.3.0 (D) 38.96-gcc-10.3.039.01-gcc-11.3.0 (D)
A tool for filling the gap created by genomic data processing/analysis by rebasing some analysis results against the parent features which were originally analysed. bcbiogff Biopython: Freely available Python tools for computational molecular biology and bioinformatics 3 tools
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. bcftools 2 publications BCFtools MIT 30 tools 1.9 1.12 1.15--haf5b3da_0 1.12-gcc-10.3.01.15.1-gcc-11.3.0 (D) 1.12-gcc-10.3.01.15.1-gcc-11.3.0 (D)
bcl2fastq2 2.20.0-gcc-11.3.0
Beagle is a software package that performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection. beagle 4 publications Beagle GPL-3.0 5.4.22jul22.46e-java-11
BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. beagle-lib BEAGLE: An application programming interface and high-performance computing library for statistical phylogenetics GPL-3.0 3.1.2 3.1.2-gcc-11.3.04.0.0-gcc-11.3.0 (D) 3.1.2-gcc-11.3.04.0.0-gcc-11.3.0 (D)
The Bayesian Evolutionary Analysis Sampling Trees is a cross-platform program for Bayesian analysis of molecular sequences using MCMC (Markov chain Monte Carlo). It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. beast 3 publications BEAST 1.10.4 1.10.4
Bayesian phylogenetic analysis of molecular sequences. It estimates rooted, time-measured phylogenies using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. It uses Markov chain Monte Carlo (MCMC) to average over tree space, so that each tree is weighted proportional to its posterior probability. It includes a graphical user-interface for setting up standard analyses and a suit of programs for analysing the results. beast2 BEAST 2: A Software Platform for Bayesian Evolutionary Analysis 2.6.7
Convert a BED format file of the proteins from a proteomics search database into a tabular format for the Multiomics Visualization Platform (MVP). bed_to_protein_map Not licensed bed to protein map 0.2.0
BEDTools is an extensive suite of utilities for comparing genomic features in BED format. bedtools BEDTools: A flexible suite of utilities for comparing genomic features BEDTools GPL-2.0 39 tools 2.28.0 2.30.0--h468198e_3 2.30.0-gcc-10.3.02.30.0-gcc-11.3.0 (D) 2.30.0-gcc-10.3.02.30.0-gcc-11.3.0 (D)
The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras. Bellerophon is a pipeline created to remove falsely assembled chimeric transcripts in de novo transcriptome assemblies. The pipeline can be downloaded as a vragrant virtual machine (https://app.vagrantup.com/bellerophon/boxes/bellerophon). This is recommended, as it avoids backwards compatibility problems with TransRate bellerophon The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras Filter and merge 1.0+galaxy1
Trim, circularise, orient and filter long read bacterial genome assemblies berokka GPL-3.0 Berokka 0.2.3
bftools 2 tools
bio-db-hts 3.01-gcc-11.3.0
bio-searchio-hmmer 1.7.3-gcc-10.3.01.7.3-gcc-11.3.0 (D) 1.7.3-gcc-10.3.01.7.3-gcc-11.3.0 (D)
Bio3D is an R package containing utilities for the analysis of protein structure, sequence and trajectory data. bio3d The Bio3D packages for structural bioinformatics GPL-3.0 4 tools
biobakery_workflows 3.1
Tools for early stage NGS alignment file processing including fast sorting and duplicate marking. biobambam Biobambam: Tools for read pair collation based algorithms on BAM files GPL-3.0 2.0.182
This package includes basic tools for reading biom-format files, accessing and subsetting data tables from a biom object, as well as limited support for writing a biom-object back to a biom-format file. The design of this API is intended to match the python API and other tools included with the biom-format project, but with a decidedly "R flavor" that should be familiar to R users. This includes S4 classes and methods, as well as extensions of common core functions/methods. biom-format Orchestrating high-throughput genomic analysis with Bioconductor biom-format GPL-2.0 2 tools
bionano_solve Bionano Hybrid Scaffold 3.7.0+galaxy3
A collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It provides software modules for many of the typical tasks of bioinformatics programming. bioperl An introduction to BioPerl bioperl 1.7.8-gcccore-10.3.01.7.8-gcccore-11.3.0 (D) 1.7.8-gcccore-10.3.01.7.8-gcccore-11.3.0 (D)
Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. biopython Biopython: Freely available Python tools for computational molecular biology and bioinformatics MIT 3 tools 1.79 1.79-foss-2021a1.79-foss-2022a (D) 1.79-foss-2021a1.79-foss-2022a (D)
BioTransformer is a freely available web server that supports accurate, rapid and comprehensive in silico metabolism prediction. biotransformer BioTransformer 3.0 - a web server for accurately predicting metabolic transformation products LGPL-3.0 BioTransformer 3.0.20230403+galaxy3
Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion. bismark Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications bismark GPL-3.0 4 tools
A tool that finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. blast 4 publications 2.10.1 2.11.0 2.13.02.14.1 2.13.02.14.1 2.12.0--pl5262h3289130_0 2.11.0-linux_x86_642.13.0--hf3cf87c_0 2.11.0-linux_x86_642.13.0--hf3cf87c_0
A tool that finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. blast+ 4 publications 16 tools 2.11.0-gompi-2021a2.13.0-gompi-2022a (D) 2.11.0-gompi-2021a2.13.0-gompi-2022a (D)
Fast, accurate spliced alignment of DNA sequences. blat BLAT - The BLAST-like alignment tool 37 3.7-gcc-11.3.0
detect blocks of overlapping reads using a gaussian-distribution approach Blockbuster Evidence for human microRNA-offset RNAs in small RNA sequencing data blockbuster 0.1.2
BlockClust BlockClust 1.1.1
bolt-lmm 2.4.1-intel-2022a
Boost is a set of libraries for the C++ programming language that provides support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing. boost Other 1.71.0 1.72.0 1.77.0 1.79.0 1.80.0 1.76.0-gcc-10.3.01.79.0-gcc-11.3.0 (D) 1.76.0-gcc-10.3.01.79.0-gcc-11.3.0 (D)
Bowtie is an ultrafast, memory-efficient short read aligner. bowtie 3 publications Bowtie 2 tools 1.3.1-gcc-11.3.0
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes. bowtie2 6 publications Bowtie2 GPL-3.0 2.3.5.1 2.4.5--py36hd4290be_0 2.4.4-gcc-10.3.02.4.5-gcc-11.3.0 (D) 2.4.4-gcc-10.3.02.4.5-gcc-11.3.0 (D)
Statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. bracken Bracken: Estimating species abundance in metagenomics data GPL-3.0 Bracken 3.0+galaxy0
Pipeline for unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. braker BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS BRAKER3 3.0.6+galaxy2 3.0.3
Runs Breseq software on a set of fastq files. breseq 3 publications breseq 0.35.5+0
Provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs. busco 4 publications BUSCO Busco 5.5.0+galaxy0 5.2.15.4.0 5.2.15.4.0 5.4.2-foss-2021a5.4.5-foss-2022a (D) 5.4.2-foss-2021a5.4.5-foss-2022a (D)
Accelerated nanopore basecalling with SLOW5 data format. buttery-eel Accelerated nanopore basecalling with SLOW5 data format MIT 0.3.1+guppy6.4.20.4.1+guppy6.5.70.4.2+dorado7.2.130.4.2+guppy6.5.7 0.3.1+guppy6.4.20.4.1+guppy6.5.70.4.2+dorado7.2.130.4.2+guppy6.5.7 0.3.1+guppy6.4.20.4.1+guppy6.5.70.4.2+dorado7.2.130.4.2+guppy6.5.7 0.3.1+guppy6.4.20.4.1+guppy6.5.70.4.2+dorado7.2.130.4.2+guppy6.5.7
bwa
Fast, accurate, memory-efficient aligner for short and long sequencing reads bwa 6 publications bwa MIT 2 tools 0.7.17 0.7.17--h7132678_9 0.7.17-gcc-10.3.00.7.17-gcccore-11.3.0 (D) 0.7.17-gcc-10.3.00.7.17-gcccore-11.3.0 (D)
BWA-meth bwameth 0.2.7+galaxy0
Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. bwa-mem2 Efficient architecture-aware acceleration of BWA-MEM for multicore systems MIT BWA-MEM2 2.2.1+galaxy1 2.2.1 2.2.1--hd03093a_2
bwakit 0.7.110.7.17 0.7.110.7.17
Tools for manipulating biological data, particularly multiple sequence alignments. bx-python MIT 13 tools
c3s
c3s Copernicus Climate Data Store 0.1.0
Cactus is a reference-free whole-genome multiple alignment program. cactus 3 publications 2 tools 2.0.3
Annotation of peaklists generated by xcms, rule based annotation of isotopes and adducts, isotope validation, EIC correlation based tagging of unknown adducts and fragments. camera CAMERA: An integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets GPL-2.0 2 tools
De-novo assembly tool for long read chemistry like Nanopore data and PacBio data. canu Canu: Scalable and accurate long-read assembly via adaptive κ-mer weighting and repeat separation Canu Canu assembler 2.2+galaxy0 2.0 2.0t 1.9 2.1.1 2.2--ha47f30e_0 2.2-gcccore-10.3.02.2-gcccore-11.3.0 (D) 2.2-gcccore-10.3.02.2-gcccore-11.3.0 (D)
Web-based contig assembly. cap3 CAP3: A DNA sequence assembly program cap3 2.0.0
Implements statistical and computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification. cardinal 10.1093/bioinformatics/btv146 Artistic-2.0 9 tools
CAT
Contig Annotation Tool (CAT) and Bin Annotation Tool (BAT) are pipelines for the taxonomic classification of long DNA sequences and metagenome assembled genomes (MAGs/bins) of both known and (highly) unknown microorganisms, as generated by contemporary metagenomics studies. The core algorithm of both programs involves gene calling, mapping of predicted ORFs against the nr protein database, and voting-based classification of the entire contig / MAG based on classification of the individual ORFs. cat_bins Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT MIT 2 tools
Cluster a nucleotide dataset into representative sequences. cd-hit 5 publications cd-hit 2 tools 4.8.1 4.8.1-gcc-10.3.04.8.1-gcc-11.3.0 (D) 4.8.1-gcc-10.3.04.8.1-gcc-11.3.0 (D)
celseq2 is a Python framework for generating the UMI count matrix from CEL-Seq2 sequencing data. celseq2 CEL-Seq2: Sensitive highly-multiplexed single-cell RNA-Seq BSD-2-Clause 0.5.3
Tool for quantifying data from biological images, particularly in high-throughput experiments. CellProfiler 2 publications CellProfiler BSD-3-Clause 23 tools
cellranger 6.1.2 7.1.0
cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell transcriptomics datasets, such as those coming from the Human Cell Atlas. cellxgene 10.1101/2021.04.05.438318 MIT Interactive CellXgene Environment 1.1.1
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. checkm CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes GPL-3.0 1.1.3-foss-2021a1.2.2-foss-2022a (D) 1.1.3-foss-2021a1.2.2-foss-2022a (D)
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. checkm-database CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes GPL-3.0 2015_01_16
Database of bioactive compounds, their quantitative properties and bioactivities (binding constants, pharmacology and ADMET, etc). The data is abstracted and curated from the primary scientific literature. chembl 2 publications 2 tools
Fast cheminformatics fingerprint search, at your fingertips. Chemfp is a set of command-line tools and a Python library for fingerprint generation and high-performance similarity search. There are two ways to try out chemfp. From the download page page you can request an evaluation copy of the most recent version of chemfp, or you can download an earlier version for no cost under the MIT license chemfp 2 publications 4 tools
ChemicalToolbox is a publicly available web server for performing cheminformatics analysis. The ChemicalToolbox provides an intuitive, graphical interface for common tools for downloading, filtering, visualizing and simulating small molecules and proteins. The ChemicalToolbox is based on Galaxy, an open-source web-based platform which enables accessible and reproducible data analysis. There is already an active Galaxy cheminformatics community using and developing tools. Based on their work, we provide four example workflows which illustrate the capabilities of the ChemicalToolbox, covering assembly of a compound library, hole filling, protein-ligand docking, and construction of a quantitative structure-activity relationship (QSAR) model. chemicaltoolbox The ChemicalToolbox: Reproducible, user-friendly cheminformatics analysis on the Galaxy platform 3 tools
This package implements functions to retrieve the nearest genes around the peak, annotate the genomic region of the peak, statistical methods for estimating the significance of overlap among ChIP peak data sets, and incorporate GEO database to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. chipseeker ChIP seeker: An R/Bioconductor package for ChIP peak annotation, comparison and visualization chipseeker Artistic-2.0 ChIPseeker 1.28.3+galaxy0
ChiRA is a tool suite to analyze RNA-RNA interactome experimental data such as CLASH, CLEAR-CLIP, PARIS, SPLASH, etc. chira GPL-3.0 5 tools
An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons (dotplot). chromeister Ultra-fast genome comparison for large-scale genomic experiments GPL-3.0 Chromeister 1.5.a+galaxy1
circexplorer CIRCexplorer 1.1.9.0
circlator 1.5.5--py_3
Circos is tool for visualizing data in a circular format. It was developed for genomic data but can work for many other kinds of data as well. circos 2 publications circos 12 tools
clifinder CLIFinder 0.5.1
climate_stripes climate stripes 1.0.1
Automatic generation of gene cluster comparison figures. Gene cluster comparison figure generator. A d3 chart for generating gene cluster comparison figures. clinker is a pipeline for easily generating publication-quality gene cluster comparison figures. Given a set of GenBank files, clinker will automatically extract protein translations, perform global alignments between sequences in each cluster, determine the optimal display order based on cluster similarity, and generate an interactive visualisation (using clustermap.js) that can be extensively tweaked before being exported as an SVG file. clustermap.js is an interactive, reusable d3 chart designed to visualise homology between multiple gene clusters. clinker 10.1101/2020.11.08.370650 MIT clinker 0.0.23+galaxy0
clipkit ClipKIT. Alignment trimming software for phylogenetics. 0.1.0
Multiple sequence alignment software. The name is occassionally spelled as ClustalOmega, Clustal Ω, ClustalΩ, Clustal O, ClustalO. clustalo 3 publications Clustal Omega GPL-2.0 1.2.4 1.2.4--h87f3376_5
Multiple sequence alignment software. Old deprecated versions. Even older versions were CLUSTAL and CLUSTAL V (ClustalV). clustalw 5 publications clustalw ClustalW 2.1+galaxy1 2.1
ColabFold databases are MMseqs2 expandable profile databases to generate diverse multiple sequence alignments to predict protein structures. colabfold_batch 2 publications MIT 1.4.01.5.2 1.4.01.5.2
compose_text_param Compose text parameter value 0.1.1
cookiecutter 2.4.0
Count1 Count 1.0.3
CPAT (Coding-Potential Assessment Tool) is a logistic regression model-based classifier that can accurately and quickly distinguish protein-coding and noncoding RNAs using pure linguistic features calculated from the RNA sequences. CPAT takes as input the nucleotides sequences or genomic coordinates of RNAs and outputs the probabilities p (0 ≤ p ≤ 1), which measure the likelihood of protein coding. cpat RNA Coding Potential Prediction Using Alignment-Free Logistic Regression Model CPAT (Coding-Potential Assessment Tool) GPL-3.0 CPAT 3.0.5+galaxy1
Software to generate CRISPR guide RNAs against genomes annotated with individual variation. crisflash Crisflash: Open-source software to generate CRISPR guide RNAs against genomes annotated with individual variation GPL-3.0 1.2.0 star-ccm+
ctsm_fates CTSM/FATES-EMERALD 2.0.1
cuda 10.1 11.0.3 11.2.2 11.4.1 11.6.1 11.7.0
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cudnn 7.6.5-cuda10.1 8.1.1-cuda11 8.2.2-cuda11.4 8.6.0-cuda11
Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one. cufflinks 5 publications Cufflinks BSL-1.0 5 tools
Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations. cummeRbund Erratum: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks (Nature Protocols (2012) 7 (562-578)) cummeRbund Artistic-2.0 cummeRbund 2.16.0+galaxy1
Generate customized protein sequence database from RNA-Seq data for proteomics search. customprodb CustomProDB: An R package to generate customized protein databases from RNA-Seq data for proteomics search Artistic-2.0 CustomProDB 1.22.0
Find and remove adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. cutadapt 2 publications MIT Cutadapt 4.9+galaxy1 3.7 3.7--py38hbff2b2d_0 3.4-gcccore-10.3.04.2-gcccore-11.3.0 (D) 3.4-gcccore-10.3.04.2-gcccore-11.3.0 (D)
Long Read based Human Genomic Structural Variation Detection with cuteSV | Long-read sequencing technologies enable to comprehensively discover structural variations (SVs). However, it is still non-trivial for state-of-the-art approaches to detect SVs with high sensitivity or high performance or both. Herein, we propose cuteSV, a sensitive, fast and lightweight SV detection approach. cuteSV uses tailored methods to comprehensively collect various types of SV signatures, and a clustering-and-refinement method to implement a stepwise SV detection, which enables to achieve high sensitivity without loss of accuracy. Benchmark results demonstrate that cuteSV has better yields on real datasets. Further, its speed and scalability are outstanding and promising to large-scale data analysis cutesv 2 publications MIT cuteSV 1.0.8+galaxy0 1.0.13
This package infers exact sequence variants (SVs) from amplicon data, replacing the commonly used and coarser OTU clustering approach. This pipeline inputs demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier. dada2 10.1038/nmeth.3869 GPL-3.0 10 tools
dadi 2.0.5
datamash Datamash 1.8+galaxy0
dbbuilder Protein Database Downloader 0.3.4
dcm2niix 1.0.202207201.0.20230411 1.0.202207201.0.20230411
deepconsensus-cpu 1.0.0
deepconsensus-gpu 1.2.0
User-friendly tools for the normalization and visualization of deep-sequencing data. deeptools DeepTools: A flexible platform for exploring deep-sequencing data DeepTools GPL-3.0 17 tools 3.5.0-foss-2021a3.5.2-foss-2022a (D) 3.5.0-foss-2021a3.5.2-foss-2022a (D)
DendroPy is a Python library for phylogenetic computing. It provides classes and functions for the simulation, processing, and manipulation of phylogenetic trees and character matrices, and supports the reading and writing of phylogenetic data in a range of formats. dendropy DendroPy: A Python library for phylogenetic computing dendropy BSD-3-Clause 4.5.2-gcccore-10.3.04.5.2-gcccore-11.3.0 (D) 4.5.2-gcccore-10.3.04.5.2-gcccore-11.3.0 (D)
R/Bioconductor package for differential gene expression analysis based on the negative binomial distribution. Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution. deseq2 Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 LGPL-2.1 DESeq2 2.11.40.8+galaxy0
The package is focused on finding differential exon usage using RNA-seq exon counts between samples with different experimental designs. It provides functions that allows the user to make the necessary statistical tests based on a model that uses the negative binomial distribution to estimate the variance between biological replicates and generalized linear models for testing. The package also provides functions for the visualization and exploration of the results. DEXSeq Drift and conservation of differential exon usage across tissues in primate species DEXSeq GPL-3.0 3 tools
The Dfam database is a open collection of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations. dfam 10.21203/RS.3.RS-76062/V1 3.3--hdfd78af_0
Neural networks and interference correction enable deep proteome coverage in high throughput. DIA-NN - a fast and easy to use tool for processing data independent acquisition (DIA) proteomics data. None required (for .raw, .mzML and .dia processing). Two executables are provided: DiaNN.exe (a command-line tool) and DIA-NN.exe (a GUI implemented as a wrapper for DiaNN.exe) diann DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput DIA-NN 1.8.1+galaxy3
DIA-Umpire is an open source Java program for computational analysis of data independent acquisition (DIA) mass spectrometry-based proteomics data. It enables untargeted peptide and protein identification and quantitation using DIA data, and also incorporates targeted extraction to reduce the number of cases of missing quantitation. diaumpire DIA-Umpire: Comprehensive computational framework for data-independent acquisition proteomics Apache-2.0 DIA_Umpire_SE 2.1.3.0
Sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000. diamond Fast and sensitive protein alignment using DIAMOND AGPL-3.0 3 tools 2.1.9 2.0.14--hdcc8f71_0 2.0.13-gcc-10.3.02.1.0-gcc-11.3.0 (D)2.1.7 2.0.13-gcc-10.3.02.1.0-gcc-11.3.0 (D)2.1.7 2.0.13-gcc-10.3.02.1.0-gcc-11.3.0 (D)2.1.7
diaPASEF is an appproch for parallel accumulation-serial fragmentation combined with data-independent acquisition. diapysef diaPASEF: parallel accumulation–serial fragmentation combined with data-independent acquisition diapysef library generation 0.3.5.0
Compute differentially bound sites from multiple ChIP-seq experiments using affinity (quantitative) data. Also enables occupancy (overlap) analysis and plotting functions. diffbind VAV3 mediates resistance to breast cancer endocrine therapy Artistic-2.0 DiffBind 3.12.0+galaxy0
Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads. dorado 2 tools
orad 2.6.1
drishti 3.0 3.0.1
Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix. dropletutils 2 publications GPL-3.0 DropletUtils 1.10.0+galaxy2
dwt
dwt 5 tools
Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool. eagleimp 10.1101/2022.01.11.475810 GPL-3.0 1.10
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way. easybuild GPL-2.0 4.8.0
Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, SAGE and CAGE. edger 3 publications edger GPL-2.0 edgeR 3.36.0+galaxy4
Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, SAGE and CAGE. edger-repenrich 3 publications GPL-2.0 edgeR-repenrich 1.5.2
Entrez Direct (EDirect) is a command-line tool for Entrez databases. EDirect connects to Entrez through the Entrez Programming Utilities interface. It supports searching by indexed terms, looking up precomputed neighbors or links, filtering results by date or category, and downloading record summaries or reports. edirect Freeware 16.2
For fast functional annotation of novel sequences. It uses precomputed orthologous groups and phylogenies from the eggNOG database to transfer functional information from fine-grained orthologs only. Its common uses include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs. The use of orthology predictions for functional annotation permits a higher precision than traditional homology searches, as it avoids transferring annotations from close paralogs. eggnog-mapper 3 publications GPL-3.0 3 tools
This package implements the Ensemble of Gene Set Enrichment Analyses method for gene set testing. egsea Combining multiple tools outperforms individual methods in gene set enrichment analyses egsea EGSEA 1.20.0
Evaluation of an Open Source Registration Package for Automatic Contour Propagation in Online Adaptive Intensity-Modulated Proton Therapy of Prostate Cancer. Home : About : FAQ : wiki : Download : News : Legal stuff : Documentation. Welcome to elastix : a toolbox for rigid and nonrigid registration of images. elastix is open source software, based on the well-known Insight Segmentation and Registration Toolkit (ITK). The software consists of a collection of algorithms that are commonly used to solve (medical) image registration problems. The modular design of elastix allows the user to quickly configure, test, and compare different registration methods for a specific application. A command-line interface enables automated processing of large numbers of data sets, by means of scripting. Nowadays elastix is accompanied by SimpleElastix , making it available in languages like C++, Python, Java, R, Ruby, C# and Lua. elastix 3 publications 4.9.05.1.0 4.9.05.1.0
Diverse suite of tools for sequence analysis; many programs analagous to GCG; context-sensitive help for each tool. emboss EMBOSS: The European Molecular Biology Open Software Suite EMBOSS (European Molecular Biology Open Software Suite) 107 tools
A globally comprehensive data resource for nucleotide sequence, spanning raw data, alignments and assemblies, functional and taxonomic annotation and rich contextual data relating to sequenced samples and experimental design. Serving both as the database of record for the output of the world's sequencing activity and as a platform for the management, sharing and publication of sequence data. ena_upload 2 publications 2 tools
EncyclopeDIA is library search engine comprised of several algorithms for DIA data analysis and can search for peptides using either DDA-based spectrum libraries or DIA-based chromatogram libraries. encyclopedia Chromatogram libraries improve peptide detection and quantification by data independent acquisition mass spectrometry Apache-2.0 4 tools
enrichm 0.6.5
ensembl-vep 106.1
The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org ete3 3.1.3
ethercalc EtherCalc 0.1
Integrated database covering the eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. The database portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera. eupathdb EuPathDB: A portal to eukaryotic pathogen databases EuPathDB 1.0.0
ExaBayes is a software package for Bayesian phylogenetic tree inference. It is particularly suitable for large-scale analyses on computer clusters. exabayes GPL-3.0 1.5.1
Tool for phylogenomic analyses on supercomputers. examl ExaML version 3: A tool for phylogenomic analyses on supercomputers GPL-3.0 3.0.22
A tool for pairwise sequence alignment. It enables alignment for DNA-DNA and DNA-protein pairs and also gapped and ungapped alignment. exonerate Automated generation of heuristics for biological sequence comparison Exonerate GPL-3.0 Exonerate 2.4.0+galaxy2 2.2.02.4.0 2.2.02.4.0 2.4.0--hf34a1b8_7
export2graphlan is a conversion software tool for producing both annotation and tree file for GraPhlAn. In particular, the annotation file tries to highlight specific sub-trees deriving automatically from input file what nodes are important. export2graphlan Compact graphical representation of phylogenetic data and metadata with GraPhlAn MIT Export to GraPhlAn 0.20+galaxy0
export_remote Export datasets 0.1.0
Streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences. Example applications include transcript-level RNA-Seq quantification, allele-specific/haplotype expression analysis (from RNA-Seq), transcription factor binding quantification in ChIP-Seq, and analysis of metagenomic data. It can be used to resolve ambiguous mappings in other high-throughput sequencing based applications. eXpress 2 publications Apache-2.0 eXpress 1.1.1
sdf_to_tab Extract values from an SD-file 2020.03.4+galaxy0
f5c
GPU Accelerated Adaptive Banded Event Alignment for Rapid Comparative Nanopore Signal Analysis | Re-engineered and optimised Nanopolish call-methylation module (supports CUDA acceleration) | An optimised re-implementation of the call-methylation module in Nanopolish. Given a set of basecalled Nanopore reads and the raw signals, f5c detects the methylated cytosine bases. f5c can optionally utilise NVIDIA graphics cards for acceleration f5c 2 publications MIT 1.3 1.1--h0326b38_1
Experimental PacBio diploid assembler. pb-assembly 10.5281/zenodo.35745 0.0.8--hdfd78af_1
Add length of sequence to fasta header. fasta_compute_length 2 publications Compute sequence length 1.0.3
fastahack 1.0.0-gcccore-10.3.0
FastANI is developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. FastANI supports pairwise comparison of both complete and draft genome assemblies. fastani Apache-2.0 1.33-gcc-10.3.0
Read huge FastQ and FastA files (both normal and gzipped) an demanipulate them. fastool MIT 0.1.4--h7132678_6
A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance. fastp 10.1093/bioinformatics/bty560 fastp MIT fastp 0.23.4+galaxy1 0.20.0 0.23.2-gcc-11.3.0
This tool aims to provide a QC report which can spot problems or biases which originate either in the sequencer or in the starting library material. It can be run in one of two modes. It can either run as a stand alone interactive application for the immediate analysis of small numbers of FastQ files, or it can be run in a non-interactive mode where it would be suitable for integrating into a larger analysis pipeline for the systematic processing of large numbers of files. fastqc 10.7490/f1000research.1114334.1 FASTQC GPL-3.0 FastQC 0.74+galaxy1 0.11.7 0.12.1 0.11.9--hdfd78af_1 0.11.9-java-11
Compute quality stats for FASTQ files and print those stats as emoji... for some reason. fastqe FASTQE 0.3.1+galaxy0
Infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. fasttree 2 publications FastTree FASTTREE 2.1.10+galaxy1 2.1.11 2.1.11-gcccore-10.3.0
Collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. fastx Comparison of DNA sequences with protein sequences FASTX-Toolkit AGPL-3.0 9 tools
Collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. fastx_toolkit Comparison of DNA sequences with protein sequences AGPL-3.0 5 tools
featureCounts is a very efficient read quantifier. It can be used to summarize RNA-seq reads and gDNA-seq reads to a variety of genomic features such as genes, exons, promoters, gene bodies and genomic bins. It is included in the Bioconductor Rsubread package and also in the SourceForge Subread package. featurecounts FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features GPL-3.0 featureCounts 2.0.3+galaxy2
A tool to annotate long non-coding RNAs from RNA-seq assembled transcripts. feelnc FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome GPL-3.0 FEELnc 0.2.1+galaxy0
fermi-lite 20190320-gcccore-10.3.0
HMM-based gene structure prediction (multiple genes, both chains); Program for predicting multiple genes in genomic DNA sequences. fgenesh 10.1186/gb-2006-7-s1-s10 FGENESH get protein 1.0.0+galaxy0
The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction. fgsea 10.1101/060012 MIT fgsea 1.8.0+galaxy1
filter_transcripts_via_tracking Filter Combined Transcripts 0.1
pileup_parser Filter pileup 1.0.2
sam_bitwise_flag_filter Filter SAM 1.0.0
Filtlong is a tool for filtering long reads by quality. It can take a set of long reads and produce a smaller, better subset. It uses both read length (longer is better) and read identity (higher is better) when choosing which reads pass the filter. filtlong GPL-3.0 filtlong 0.2.1+galaxy0
FlashLFQ is an ultrafast label-free quantification algorithm for mass-spectrometry proteomics. flashlfq LGPL-3.0 FlashLFQ 1.0.3.1
flex 2.6.4-gcccore-10.3.02.6.4-gcccore-11.3.02.6.4-gcccore-12.3.0 (D) 2.6.4-gcccore-10.3.02.6.4-gcccore-11.3.02.6.4-gcccore-12.3.0 (D) 2.6.4-gcccore-10.3.02.6.4-gcccore-11.3.02.6.4-gcccore-12.3.0 (D)
Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. flye 3 publications Flye 2.9.4+galaxy0 2.92.9.12.9.3 2.92.9.12.9.3 2.92.9.12.9.3 2.9-gcc-10.3.0
An integrated database for Drosophila and Anopheles genomics. flymine 2 publications LGPL-2.1 Flymine 1.0.0
Foldseek enables fast and sensitive comparisons of large structure sets. It reaches sensitivities similar to state-of-the-art structural aligners while being at least 20,000 times faster. foldseek 2 publications GPL-3.0 3-915ef7d
Web server which detects small molecule pockets by relying on the geometric alpha sphere theory. It also tracks pockets during molecular dynamics so to provide insight on pocket dynamics (mdpocket) and transposes mdpocket to the combined analysis of homologous structures (hpocket). fpocket 2 publications Freeware 2 tools
Application for finding (fragmented) genes in short reads fraggenescan 2 publications GPL-3.0 FragGeneScan 1.30.0
Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, multi-nucleotide polymorphisms, and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment. freebayes freebayes MIT 2 tools 1.3.6-foss-2021a-r-4.1.0
fsom 20141119-gcccore-10.3.0
funannotate is a pipeline for genome annotation (built specifically for fungi, but will also work with higher eukaryotes). funannotate BSD-2-Clause 5 tools
Gene Annotation EVAluation. gaeval Not licensed 4 tools
galaxy_genomic_intervals 2 tools
text_processing 19 tools
galaxy_collection_operations 19 tools
Galaxy CONVERTER 75 tools
galaxy_data_sources 11 tools
galaxy_fetch_alignments_sequences 12 tools
galaxy_filter_and_sort 10 tools
galaxy_graph_display 11 tools
galaxy_join_subtract_and_group 5 tools
galaxy_sequence_utils 21 tools
galaxy_statistics 8 tools
galaxy_text_manipulation 36 tools
The Genome Analysis Toolkit (GATK) is a set of bioinformatic tools for analyzing high-throughput sequencing (HTS) and variant call format (VCF) data. The toolkit is well established for germline short variant discovery from whole genome and exome sequencing data. GATK4 expands functionality into copy number and somatic analyses and offers pipeline scripts for workflows. Version 4 (GATK4) is open-source at https://github.com/broadinstitute/gatk. gatk The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data GATK 4.1.4.0 4.2.1.0 4.1.8.1 4.2.5.0 4.2.5.0--hdfd78af_0 4.3.0.0-gcccore-11.3.0-java-11
Cleaning aligned sequences. gblock 2 publications Gblocks 0.91b
Genome-wide Complex Trait Analysis. Estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits (the GREML method), and has subsequently extended for many other analyses to better understand the genetic architecture of complex traits. gcta GCTA: A tool for genome-wide complex trait analysis gcta MIT 1.94.0beta-gfbf-2022a1.94.1--h9ee0642_0 1.94.0beta-gfbf-2022a1.94.1--h9ee0642_0
Software aimed at pairwise sequence comparison generating high quality results (equivalent to MUMmer) with controlled memory consumption and comparable or faster execution times particularly with long sequences. gecko Breaking the computational barriers of pairwise genome comparison GPL-3.0 Gecko 1.2
GEMINI (GEnome MINIng) is a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample phenotypes and genotypes, as well as genome annotations into an integrated database framework, GEMINI provides a simple, flexible, and powerful system for exploring genetic variation for disease and population genetics. gemini 10.1371/journal.pcbi.1003153 MIT 24 tools
Gene Model Mapper is a homology-based gene prediction program. GeMoMa uses the annotation of protein-coding genes in a reference genome to infer the annotation of protein-coding genes in a target genome. Thereby, it utilizes amino acid sequence and intron position conservation. In addition, it allows to incorporate RNA-seq evidence for splice site prediction. gemoma Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi GPL-3.0 1.81.9 1.81.9
An interactive web tool for versatile, clinically-driven variant interrogation and prioritization. IOBIO is a suite of web apps for visually driven real-time analysis of genomic data. Visually driven real-time analysis of genomic data. geneiobio 10.1101/2020.11.05.20224865 gene.iobio visualisation 4.7.1+galaxy1
generate_count_matrix Generate count matrix 1.0
generate_pc_lda_matrix Generate A Matrix 1.0.0
generode 0.5.1
Reference-free profiling of polyploid genomes | We have developed GenomeScope 2.0, which applies classical insights from combinatorial theory to establish a detailed mathematical model of how k-mer frequencies will be distributed in heterozygous and polyploid genomes | Average k-mer coverage for polyploid genome | Upload results from running Jellyfish or KMC genomescope 10.1101/747568 Apache-2.0 GenomeScope 2.0.1+galaxy0 1.0.01.0.0 1.0.01.0.0
Free collection of bioinformatics tools for genome informatics. genometools Genome tools: A comprehensive software library for efficient processing of structured genome annotations GenomeTools BSD-3-Clause 1.6.2
genrich Genrich 0.5+galaxy2
get_pdb_file Get PDB file 0.1.0
A fast and versatile toolkit for accurate de novo assembly of organelle genomes. This toolkit assemblies organelle genome from genomic skimming data. getorganelle GetOrganelle: A fast and versatile toolkit for accurate de novo assembly of organelle genomes GPL-3.0 2 tools
gfa_to_fa GFA to FASTA 0.1.2
gfastats is a single fast and exhaustive tool for summary statistics and simultaneous genome assembly file manipulation. gfastats also allows seamless fasta/fastq/gfa conversion. gfastats MIT gfastats 1.3.6+galaxy0 1.3.6
gff2bed1 GFF-to-BED 1.0.1
gff3sort 0.1.a1a2bc9--hdfd78af_2
Program for comparing, annotating, merging and tracking transcripts in GFF files. gffcompare MIT GffCompare 0.12.6+galaxy0 0.12.2-gcc-10.3.0
gffcompare_to_bed Convert gffCompare annotated GTF to BED 0.2.1
program for filtering, converting and manipulating GFF files gffread MIT gffread 2.2.1.4+galaxy0 0.12.7 0.12.7-gcccore-10.3.0
Plotting system for R, based on the grammar of graphics. ggplot2 10.1007/978-3-319-24277-4 6 tools
ghostscript 9.54.0-gcccore-10.3.09.56.1-gcccore-11.3.0 (D) 9.54.0-gcccore-10.3.09.56.1-gcccore-11.3.0 (D)
GMAJ GMAJ 2.0.1
Genomic Mapping and Alignment Program for mRNA and EST Sequences. gmap 2 publications gmap 2023.04.28
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel. parallel GPL-3.0 20191022 20210622-gcccore-10.3.020220722-gcccore-11.3.0 (D) 20210622-gcccore-10.3.020220722-gcccore-11.3.0 (D)
The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. gsl GPL-3.0 2.6 2.7-gcc-10.3.02.7-gcc-11.3.0 (D) 2.7-gcc-10.3.02.7-gcc-11.3.0 (D)
GOEnrichment is a tool for performing GO enrichment analysis of gene sets, such as those obtained from RNA-seq or Microarray experiments, to help characterize them at the functional level. It is available in Galaxy Europe and as a stand-alone tool. GOEnrichment is flexible in that it allows the user to use any version of the Gene Ontology and any GO annotation file they desire. To enable the use of GO slims, it is accompanied by a sister tool GOSlimmer, which can convert annotation files from full GO to any specified GO slim. The tool features an optional graph clustering algorithm to reduce the redundancy in the set of enriched GO terms and simplify its output. It was developed by the BioData.pt / ELIXIR-PT team at the Instituto Gulbenkian de Ciência. goenrichment Apache-2.0 2 tools
Detect Gene Ontology and/or other user defined categories which are over/under represented in RNA-seq data. goseq 10.1186/gb-2010-11-2-r14 goseq GPL-2.0 goseq 1.50.0+galaxy0
gramenemart GrameneMart 1.0.1
GraPhlAn is a software tool for producing high-quality circular representations of taxonomic and phylogenetic trees. GraPhlAn focuses on concise, integrative, informative, and publication-ready representations of phylogenetically- and taxonomically-driven investigation. graphlan Compact graphical representation of phylogenetic data and metadata with GraPhlAn MIT 2 tools
Versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since it is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers. gromacs 10 publications LGPL-2.1 8 tools 2019.3 2019.3-plumed 2019.3-gpuvolta 2018.3 2018.3-plumed 2020.1 2020.1-plumed 2020.1-gpuvolta 2020.3 2020.3-gpuvolta 2021-gpuvolta 2021 2021.2 2021.2-gpuvolta 2021.4 2021.4-gpuvolta 2022 2022-gpuvolta 2020.3-gpuampere 2019.3-gpuampere 2021-gpuampere 2020.1-gpuampere 2021.2-gpuampere 2021.4-gpuampere 2022-gpuampere 2021.3-foss-2021a
a toolkit to classify genomes with the Genome Taxonomy Database. GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes. GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the GNU General Public License (Version 3). gtdb-tk GTDB-Tk: A toolkit to classify genomes with the genome taxonomy database GPL-3.0 GTDB-Tk Classify genomes 2.3.2+galaxy1 2.0.0-foss-2021a2.2.6 2.0.0-foss-2021a2.2.6
gtf2bedgraph GTF-to-BEDGraph 1.0.0
gtf2gene_list GTF2GeneList 1.52.0+galaxy0
Gubbins is a tool for rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences. gubbins 10.1093/nar/gku1196 GPL-2.0 Gubbins 3.2.1+galaxy0
gzip Compress file(s) 0.1.0 1.10-gcccore-10.3.01.12-gcccore-11.3.0 (D) 1.10-gcccore-10.3.01.12-gcccore-11.3.0 (D)
HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads, designed to "just work" with excellent speed and accuracy across a range of long- and short-read sequencing technologies. The output is in Haplotype block format described here: https://github.com/vibansal/HapCUT2/blob/master/outputformat.md hapcut2 Hapcut2 1.3.3+galaxy0+ga1
hbvar HbVar 2.0.0
Tool for single-species active module discovery. heinz XHeinz: An algorithm for mining cross-species network modules under a flexible conservation model 4 tools
Deep Learning to predict gene annotations helixer Helixer: Cross-species gene annotation of large eukaryotic genomes using deep learning GPL-3.0 Helixer 0.3.2
This tool provides functional annotation for a list of genes by connecting with DAVID database. hgv_david 3 publications DAVID 1.0.1
This tool can be used to analyze the patterns of linkage disequilibrium (LD) between polymorphic sites in a locus. hgv_ldtools 3 publications LD 1.0.0
This tool creates a link to the g:GOSt tool (Gene Group Functional Profiling), which provides functional profiling of gene lists. hgv_linkToGProfile 3 publications g:Profiler 1.0.0
A web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. HiCExplorer — HiCExplorer 3.6 documentation. scHiCExplorer — scHiCExplorer 7 documentation. Free document hosting provided by Read the Docs. hicexplorer Galaxy HiCExplorer 3: A web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization 57 tools
Remove CCS reads with remnant PacBio adapter sequences and convert outputs to a compressed .fastq (.fastq.gz). hifiadapterfilt HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly GPL-3.0 HiFi Adapter Filter 2.0.0+galaxy0 2.0.0
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads hifiasm Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm MIT Hifiasm 0.19.9+galaxy0 0.16.10.18.90.19.60.19.80.19.9 0.16.10.18.90.19.60.19.80.19.9 0.16.10.18.90.19.60.19.80.19.9 0.16.10.18.90.19.60.19.80.19.9 0.16.10.18.90.19.60.19.80.19.9 0.16.1-gcccore-10.3.0
Hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads. hifiasm_meta MIT Hifiasm_meta 0.3.1+galaxy0
Alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). hisat2 3 publications HISAT2 GPL-3.0 HISAT2 2.2.1+galaxy1 2.2.1 2.2.1-gompi-2021a2.2.1-gompi-2022a2.2.1--h87f3376_4 2.2.1-gompi-2021a2.2.1-gompi-2022a2.2.1--h87f3376_4 2.2.1-gompi-2021a2.2.1-gompi-2022a2.2.1--h87f3376_4
hmmcleaner 0.180750
This tool is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models. The new HMMER3 project, HMMER is now as fast as BLAST for protein search. hmmer 4 publications Other 12 tools 3.3.2 3.3.2-gompi-2021a3.3.2-gompi-2022a (D) 3.3.2-gompi-2021a3.3.2-gompi-2022a (D)
Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use. horovod Apache-2.0 0.19.0 0.22.1
Python framework to process and analyse high-throughput sequencing (HTS) data htseq HTSeq-A Python framework to work with high-throughput sequencing data HTSeq GPL-3.0 htseq-count 2.0.5+galaxy0 2.0.2-foss-2022a
The main purpose of HTSlib is to provide access to genomic information files, both alignment data (SAM, BAM, and CRAM formats) and variant data (VCF and BCF formats). The library also provides interfaces to access and index genome reference data in FASTA format and tab-delimited files with genomic coordinates. It is utilized and incorporated into both SAMtools and BCFtools. htslib HTSlib: C library for reading/writing high-Throughput sequencing data HTSlib MIT 1.9 1.12 1.16 1.19.11.20 1.19.11.20 1.12-gcc-10.3.01.15.1-gcc-11.3.0 (D) 1.12-gcc-10.3.01.15.1-gcc-11.3.0 (D)
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question “What are the microbes in my community-of-interest doing (or are capable of doing)?” humann Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with biobakery 3 MIT 12 tools 3.6-foss-2022a
HUMAnN 2.0 is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question “What are the microbes in my community-of-interest doing (or capable of doing)?” humann2 Species-level functional profiling of metagenomes and metatranscriptomes 7 tools
Paralogs and off-target sequences improve phylogenetic resolution in a densely-sampled study of the breadfruit genus (Artocarpus, Moraceae). Recovering genes from targeted sequence capture data. Current version: 1.3.1 (August 2018). -- Read our article in Applications in Plant Sciences (Open Access). HybPiper was designed for targeted sequence capture, in which DNA sequencing libraries are enriched for gene regions of interest, especially for phylogenetics. HybPiper is a suite of Python scripts that wrap and connect bioinformatics tools in order to extract target sequences from high-throughput DNA sequencing reads. hybpiper 10.1101/854232 GPL-3.0 HybPiper 2.1.6+galaxy0
Software package for the analysis of genetic sequences using techniques in phylogenetics, molecular evolution, and machine learning. HyPhy 2 publications Unlicense 2 tools
IDL
idl 8.6 8.8
imagemagick 7.0.11 7.0.11-14-gcccore-10.3.07.1.0-37-gcccore-11.3.0 (D) 7.0.11-14-gcccore-10.3.07.1.0-37-gcccore-11.3.0 (D)
Improved Phased Assembler (IPA) is the official PacBio software for HiFi genome assembly. IPA was designed to utilize the accuracy of PacBio HiFi reads to produce high-quality phased genome assemblies pbipa 1.5.01.8.0 1.5.01.8.0
Infernal ("INFERence of RNA ALignment") is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence. infernal Infernal 1.1: 100-fold faster RNA homology searches BSD-3-Clause 6 tools
insight-toolkit 5.2.1
High-performance visualization tool for interactive exploration of large, integrated datasets. It supports a wide variety of data types and format, including short-read alignments in the SAM/BAM format. Data can be viewed from local files or over the web via http. igv 3 publications Integrated Genomics Viewer LGPL-2.1 2.13.1
A tool to detect Integron in DNA sequences. integron_finder 2 publications GPL-3.0 Integron Finder 2.0.5+galaxy0
Open source data warehouse built specifically for the integration and analysis of complex biological data. It enables the creation of biological databases accessed by sophisticated web query tools. Parsers are provided for integrating data from many common biological data sources and formats, and there is a framework for adding your own data. intermine 2 publications LGPL-2.1 InterMine 1.0.0
Scan sequences against the InterPro protein signature databases. interproscan 2 publications InterProScan 5.59-91.0+galaxy3 5.55-88.0-foss-2021a
Interactive assembly and analysis of RADseq datasets. ipyrad: interactive assembly and analysis of RAD-seq data sets. Welcome to ipyrad, an interactive toolkit for assembly and analysis of restriction-site associated genomic data sets (e.g., RAD, ddRAD, GBS) for population genetic and phylogenetic studies. Welcome to ipyrad — ipyrad documentation. ipyrad Ipyrad: Interactive assembly and analysis of RADseq datasets GPL-3.0 0.9.84 0.9.93
Very efficient phylogenetic software for reconstructing maximum-likelihood trees and assessing branch supports with the ultrafast bootstrap approximation. It is based on the IQPNNI algorithm with 10-fold speedup together with substantially additional features. iq-tree W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis IQ-TREE 2.3.6+galaxy0 2.1.2 2.2.2.3--h2202e69_2
Provides functions for creating an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. Particular attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results. isee iSEE: Interactive SummarizedExperiment Explorer [version 1; referees: 2 approved] iSEE MIT iSEE 1.0.0
Automated identification of insertion sequence elements in prokaryotic genomes. isescan ISEScan: automated identification of insertion sequence elements in prokaryotic genomes ISEScan 1.7.2.3+galaxy1
Enables identification of isoform switches with predicted functional consequences from RNA-seq data. Consequences can be chosen from a long list but includes protein domains gain/loss changes in NMD sensitivity etc. It directly supports import of data from Cufflinks/Cuffdiff, Kallisto, Salmon and RSEM but other transcript qunatification tools are easy to import as well. isoformswitchanalyzer The landscape of isoform switches in human cancers GPL-2.0 IsoformSwitchAnalyzeR 1.20.0+galaxy5
IsoSeq v3 contains the newest tools to identify transcripts in PacBio single-molecule sequencing data. Starting in SMRT Link v6.0.0, those tools power the IsoSeq GUI-based analysis application. A composable workflow of existing tools and algorithms, combined with a new clustering technique. isoseq3 BSD-3-Clause-Clear 4.0.0--h9ee0642_0
Interpretation-oriented tool to manage the update and revision of variant annotation and classification. iVar - DataBase of Genomics Variants. ivar 10.22541/AU.160610419.99549785/V1 AGPL-3.0 6 tools
Implementation of the Interval-Wise Testing (IWT) for omics data. This inferential procedure tests for differences in "Omics" data between two groups of genomic regions (or between a group of genomic regions and a reference center of symmetry), and does not require fixing location and scale at the outset. iwtomics IWTomics: Testing high-resolution sequence-based 'Omics' data at multiple locations and scales GPL-2.0 3 tools
JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation. jags GPL-2.0 4.3.0-foss-2021a
JASMINE (Jointly Accurate Sv Merging with Intersample Network Edges) is an automated pipeline for alignment and SV calling in long-read datasets. The tool is used to merge structural variants (SVs) across samples. Each sample has a number of SV calls, consisting of position information (chromosome, start, end, length), type and strand information, and a number of other values. Jasmine represents the set of all SVs across samples as a network, and uses a modified minimum spanning forest algorithm to determine the best way of merging the variants such that each merged variants represents a set of analogous variants occurring in different samples. jasminesv 10.1101/2021.05.27.445886 MIT 1.1.4
jdk 11.0.2
jbigkit 2.1-gcccore-10.3.02.1-gcccore-11.3.0 (D) 2.1-gcccore-10.3.02.1-gcccore-11.3.0 (D)
Slick, speedy genome browser with a responsive and dynamic AJAX interface for visualization of genome data. Being developed by the GMOD project as a successor to GBrowse. jbrowse 10.1101/gr.094607.109 JBrowse 2 tools
jcvi Genome annotation statistics 0.8.4
A command-line algorithm for counting k-mers in DNA sequence. jellyfish 10.1093/bioinformatics/btr011 jellyfish GPL-3.0 jellyfish 2.3.0+galaxy1 2.3.0 2.3.0-gcc-10.3.02.3.0-gcc-11.3.0 (D) 2.3.0-gcc-10.3.02.3.0-gcc-11.3.0 (D)
jq
jq JQ 1.0
Juicer is a platform for analyzing kilobase resolution Hi-C data. In this distribution, we include the pipeline for generating Hi-C maps from fastq raw data files and command line tools for feature annotation on the Hi-C maps. juicer Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments MIT 1.6
jupyterlab 3.4.3-py3.9 3.5.0-gcccore-11.3.0
A program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. kallisto Near-optimal probabilistic RNA-seq quantification BSD-2-Clause 2 tools 0.48.0-gompi-2021a0.48.0-gompi-2022a0.48.0--h15996b6_2 0.48.0-gompi-2021a0.48.0-gompi-2022a0.48.0--h15996b6_2 0.48.0-gompi-2021a0.48.0-gompi-2022a0.48.0--h15996b6_2
KAT
Suite of tools that generate, analyse and compare k-mer spectra produced from sequence files kat KAT: A K-mer analysis toolkit to quality control NGS datasets and genome assemblies GPL-3.0 2.4.2
kentutils 0.0
khmer is a set of command-line tools for working with DNA shotgun sequencing data from genomes, transcriptomes, metagenomes, and single cells. khmer can make de novo assemblies faster, and sometimes better. khmer can also identify (and fix) problems with shotgun data. khmer 4 publications khmer BSD-3-Clause 8 tools
KMC
KMC is a utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. kmc KMC 2: Fast and resource-frugal k-mer counting KMC 3.2.13.2.4 3.2.13.2.4
KofamScan is a gene function annotation tool based on KEGG Orthology and hidden Markov model. You need KOfam database to use this tool. kofamscan MIT 1.3.0--hdfd78af_2
System for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. It aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm. kraken Kraken: Ultrafast metagenomic sequence classification using exact alignments kraken GFDL-1.3 9 tools
Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm. kraken2 10.1101/762302 MIT Kraken2 2.1.3+galaxy1 2.1.2-gompi-2021a2.1.2-gompi-2022a (D) 2.1.2-gompi-2021a2.1.2-gompi-2022a (D)
KrakenTools provides individual scripts to analyze Kraken/Kraken2/Bracken/KrakenUniq output files krakentools krakentools GPL-3.0 4 tools
Krona creates interactive HTML5 charts of hierarchical data (such as taxonomic abundance in a metagenome). krona Interactive metagenomic visualization in a Web browser krona Proprietary 2 tools
kronatools 2.8.1-gcccore-11.3.0
Automated image analysis for developmental phenotyping of mouse embryos. LAMA (Lightweight Analysis of Morphological Abnormalities). Welcome to LAMA, an open source pipeline to automatically identify embryo dysmorphology from 3D volumetric images. lama 10.1101/2020.05.04.075853 0.9.1001.0.01.0.11.0.2 0.9.1001.0.01.0.11.0.2 0.9.1001.0.01.0.11.0.2 0.9.1001.0.01.0.11.0.2
A tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically. lastz 3 tools 1.04.15
length_and_gc_content Gene length and GC content 0.1.2
An Easy-To-Use Interactive Web Platform To Analyze and Visualize Label-Free Proteomics Data Preprocessed with MaxQuant. A tool for analysing label-free quantitative proteomics dataset https://bioinformatics.erc.monash.edu/apps/LFQ-Analyst/. LFQ-Analyst: An easy-to-use interactive web-platform to analyze and visualize proteomics data preprocessed with MaxQuant. LFQ-Analyst is an easy-to-use, interactive web application developed to perform differential expression analysis with “one click” and to visualize label-free quantitative proteomic datasets preprocessed with MaxQuant. LFQ-Analyst provides a wealth of user-analytic features and offers numerous publication-quality result output graphics and tables to facilitate statistical and exploratory analysis of label-free quantitative datasets LFQ-Analyst Lfq-Analyst: An easy-To-use interactive web platform to analyze and visualize label-free proteomics data preprocessed with maxquant GPL-3.0 LFQ Analyst 1.2.6+galaxy0
liftOver1 Convert genome coordinates 1.0.6
Data analysis, linear models and differential expression for microarray data. limma Limma powers differential expression analyses for RNA-sequencing and microarray studies limma GPL-2.0 limma 3.58.1+galaxy0
LINKS (Long Interval Nucleotide K-mer Scaffolder) is a genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS. links LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads GPL-3.0 LINKS 2.0.1+galaxy+1
LoFreq* (i.e. LoFreq version 2) is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing (e.g. mapping or base/indel alignment uncertainty), which are usually ignored by other methods or only used for filtering. lofreq LoFreq: A sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets MIT 5 tools
Longshot is a variant calling tool for diploid genomes using long error prone reads such as Pacific Biosciences (PacBio) SMRT and Oxford Nanopore Technologies (ONT). It takes as input an aligned BAM file and outputs a phased VCF file with variants and haplotype information. It can also genotype and phase input VCF files. It can output haplotype-separated BAM files that can be used for downstream analysis. Currently, it only calls single nucleotide variants (SNVs), but it can genotype indels if they are given in an input VCF. longshot Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing MIT 0.4.1
lpsolve 5.5.2.11 5.5.2.11-gcc-10.3.05.5.2.11-gcc-11.3.0 (D) 5.5.2.11-gcc-10.3.05.5.2.11-gcc-11.3.0 (D)
LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package. ltr_retriever LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons GPL-3.0 2.9.4
Lua
lua 5.4.3-gcccore-10.3.05.4.4-gcccore-11.3.0 (D) 5.4.3-gcccore-10.3.05.4.4-gcccore-11.3.0 (D)
Model-based Analysis of ChIP-seq data. macs2 Model-based analysis of ChIP-Seq (MACS) Artistic-2.0 10 tools 2.2.9.1
maeparser 1.3.0-gompi-2021a1.3.0-gompi-2022a (D) 1.3.0-gompi-2021a1.3.0-gompi-2022a (D)
MAFFT (Multiple Alignment using Fast Fourier Transform) is a high speed multiple sequence alignment program. mafft 6 publications BSD-Source-Code 2 tools 7.5057.525 7.5057.525 7.490-gcc-10.3.0-with-extensions
Computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology. MAGeCK Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR 5 tools
Portable and easily configurable genome annotation pipeline. It’s purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. maker 2 publications MAKER Artistic-2.0 2 tools 3.01.04 3.01.03--pl526hb8757ab_0 3.01.03--pl5262h8f1cd36_2
MALDIquant is a complete analysis pipeline for matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) and other two-dimensional mass spectrometry data. In addition to commonly used plotting and processing methods it includes distinctive features, namely baseline subtraction methods such as morphological filters (TopHat) or the statistics-sensitive non-linear iterative peak-clipping algorithm (SNIP), peak alignment using warping functions, handling of replicated measurements as well as allowing spectra with different resolutions. maldi_quant Maldiquant: A versatile R package for the analysis of mass spectrometry data GPL-3.0 2 tools
music_manipulate_eset Manipulate Expression Set Object 0.1.1+galaxy4
Fast genome and metagenome distance estimation using MinHash. mash 10.1186/s13059-016-0997-x mash CC-BY-4.0 2 tools 2.3-gcc-10.3.0
master2pgSnp MasterVar to pgSnp 1.0.0
Whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454). masurca The MaSuRCA genome assembler MaSuRCA simple 4.0.6+galaxy0
Tool to import, process, clean, and compare mass spectrometry data. matchms Apache-2.0 11 tools
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. matplotlib MIT 3.4.2-foss-2021a3.5.2-foss-2022a (D) 3.4.2-foss-2021a3.5.2-foss-2022a (D)
Software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. maxbin 26515820 MaxBin2 2.2.7+galaxy2
Quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. maxquant 4 publications MaxQuant 3 tools 2.2.0.0-gcccore-11.3.0
mcl
MCL is a clustering algorithm widely used in bioinformatics and gaining traction in other fields. mcl 10.1007/978-1-61779-361-5_15 mcl GPL-3.0 14-137
mcquant MCQUANT 1.5.3+galaxy1
MDAnalysis is an object-oriented python toolkit to analyze molecular dynamics trajectories generated by CHARMM, Gromacs, NAMD, LAMMPS, Amber or DL_POLY; it also reads other formats (e.g. PDB files and XYZ format trajectories; see the supported coordinate formats for the full list). It can write most of these formats, too, together with atom selections for use in Gromacs, CHARMM, VMD and PyMol mdanalysis MDAnalysis: A toolkit for the analysis of molecular dynamics simulations GPL-2.0 Cosine Content 1.0.0+galaxy0
MDTraj 2 tools
medaka is a tool to create consensus sequences and variant calls from nanopore sequencing data. This task is performed using neural networks applied a pileup of individual sequencing reads against a draft assembly. medaka MPL-2.0 4 tools 1.9.1
Single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph to achieve low memory usage, whereas its goal is not to make memory usage as low as possible. megahit MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph 2 tools 1.2.9-gcccore-10.3.01.2.9-gcccore-11.3.0 (D) 1.2.9-gcccore-10.3.01.2.9-gcccore-11.3.0 (D)
merge_cols Merge Columns 1.0.3
Reference-free quality, completeness, and phasing assessment for genome assemblies. Evaluate genome assemblies with k-mers and more. Often, genome assembly projects have illumina whole genome sequencing reads available for the assembled individual. Merqury provides a set of tools for this purpose. merqury 10.1101/2020.03.15.992941 2 tools 1.3
Meryl is a tool for counting and working with sets of k-mers that was originally developed for use in the Celera Assembler and has since been migrated and maintained as part of Canu. meryl 10.1186/s13059-020-02134-9 Freeware Meryl 1.3+galaxy6 1.4.1
an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies | MetaBAT2 clusters metagenomic contigs into different "bins", each of which should correspond to a putative genome | MetaBAT2 uses nucleotide composition information and source strain abundance (measured by depth-of-coverage by aligning the reads to the contigs) to perform binning metabat 2 publications MetaBAT2 2.15+galaxy3
Galaxy workflow for differential abundance analysis of 16s metagenomic data. You are over your disk quota. Tool execution is on hold until your disk usage drops below your allocated quota. This history is empty. You can load your own data or get data from an external source MetaDEGalaxy MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data 9 tools
MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics metaeuk 2 publications GPL-3.0 5-34c21f2 5-gcc-10.3.06-gcc-11.3.0 (D) 5-gcc-10.3.06-gcc-11.3.0 (D)
Computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. metaphlan Metagenomic microbial community profiling using unique clade-specific marker genes MIT 4 tools
metaQuantome software suite analyzes the state of a microbiome by leveraging complex taxonomic and functional hierarchies to summarize peptide-level quantitative information. metaQuantome offers differential abundance analysis, principal components analysis, and clustered heat map visualizations, as well as exploratory analysis for a single sample or experimental condition. metaQuantome 2 publications 6 tools
Genome assembler for metagenomics datasets. metaspades 3 publications metaSPAdes 3.15.5+galaxy2
MetaWRAP aims to be an easy-to-use metagenomic wrapper suite that accomplishes the core tasks of metagenomic analysis from start to finish: read quality control, assembly, visualization, taxonomic profiling, extracting draft genomes (binning), and functional annotation. metawrap MetaWRAP - A flexible pipeline for genome-resolved metagenomic data analysis 08 Information and Computing Sciences 0803 Computer Software 08 Information and Computing Sciences 0806 Information Systems MIT MetaWRAP 1.3.0+galaxy1
A (mostly) universal methylation extractor for BS-seq experiments. MethylDackel MIT MethylDackel 0.5.2+galaxy0
metilene metilene 0.2.6.1
Estimates effective population sizes,past migration rates between n population assuming a migration matrix model with asymmetric migration rates and different subpopulation sizes, and population divergences or admixture. MIGRATE 2 publications MIT
A lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data and to select the best models in each locus. mikado Leveraging multiple transcriptome assembly methods for improved gene structure annotation LGPL-3.0 2.2.4--py39h70b41aa_0
mimodd 14 tools
Short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. minia Using cascading Bloom filters to improve the memory usage for de Brujin graphs minia CECILL-2.0 Minia 3.2.6
Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. miniasm Minimap and miniasm: Fast mapping and de novo assembly for noisy long sequences MIT miniasm 0.3_r179+galaxy1 0.3
miniconda 4.12.0 4.12.0
minigraph 0.20
Pairwise aligner for genomic and spliced nucleotide sequences minimap2 Minimap2: Pairwise alignment for nucleotide sequences minimap2 MIT Map with minimap2 2.28+galaxy0 2.17 2.22 2.24 2.26 2.24-gcccore-11.3.0
Miniprot aligns a protein sequence against a genome with affine gap penalty, splicing and frameshift. It is primarily intended for annotating protein-coding genes in a new species using known genes from other species. miniprot MIT 0.5
MIRA 3 - Whole Genome Shotgun and EST Sequence Assembler mira Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs mira GPL-3.0 4.9.6--1
mircounts miRcounts 1.4.0
miRDeep2 discovers active known or novel miRNAs from deep sequencing data. mirdeep2 GPL-3.0 3 tools
mitobim MITObim 1.9.1
Find, circularise and annotate mitogenome from PacBio assemblies mitohifi MitoFinder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics MIT MitoHiFi 3+galaxy0
De novo metazoan mitochondrial genome annotation. mitos2 10.1016/j.ympev.2012.08.023 MITOS2 2.1.9+galaxy0
Multi Locus Sequence Typing from an assembled genome or from a set of reads. mlst Multilocus sequence typing of total-genome-sequenced bacteria MLST Other 2 tools 2.23.0--hdfd78af_1
MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed. MMseqs2 includes Linclust, the first clustering algorithm whose runtime scales linearly With Linclust we clustered 1.6 billion metagenomic sequence fragments in 10 h on a single server to 50% sequence identity. mmseqs2 6 publications GPL-3.0 13-45111
Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance. MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Plasmids are mobile genetic elements (MGEs), which allow for rapid evolution and adaption of bacteria to new niches through horizontal transmission of novel traits to different genetic backgrounds. The MOB-suite is designed to be a modular set of tools for the typing and reconstruction of plasmid sequences from WGS assemblies. mob-suite Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance 2 tools
A modest Feature Finder to extract features in MS1 Data. moff MoFF: A robust and automated approach to extract peptide ion intensities moFF 2.0.3.0
monailabel 0.6.00.7.00.8.0 0.6.00.7.00.8.0 0.6.00.7.00.8.0
A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. morpheus A proteomics search algorithm specifically designed for high-resolution tandem mass spectra MIT Morpheus 288+galaxy0
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing. mosdepth Mosdepth: Quick coverage calculation for genomes and exomes mosdepth MIT mosdepth 0.3.8+galaxy0
Open-source, platform-independent, community-supported software for describing and comparing microbial communities mothur 10.1128/AEM.01541-09 GPL-3.0 131 tools
Data warehouse for accessing mouse data from Mouse Genome Informatics (MGI). Supports powerful query, reporting, and analysis capabilities, the ability to save and combine results from different queries, easy integration into larger workflows, and a comprehensive Web Services layer. mousemine MouseMine: a new data warehouse for MGI MouseMine 1.0.0
Program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. It uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters. mrbayes 3 publications GPL-3.0 3.2.7--h19cf415_2 3.2.7a-foss-2022a
A fast, flexible and open software framework for medical image processing and visualisation | MRtrix3 is an open-source, cross-platform software package for medical image processing, analysis and visualisation, with a particular emphasis on the investigation of the brain using diffusion MRI. It is implemented using a fast, modular and flexible general-purpose code framework for image data access and manipulation, enabling efficient development of new applications, whilst retaining high computational performance and a consistent command-line interface between applications. In this article, we provide a high-level overview of the features of the MRtrix3 framework and general-purpose image processing applications provided with the software mrtrix MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation 3.0.3-foss-2021a
msConvert is a command-line utility for converting between various mass spectrometry data formats, including from raw data from several commercial companies (with vendor libraries, Windows-only). For Windows users, there is also a GUI, msConvertGUI. msconvert A cross-platform toolkit for mass spectrometry and proteomics Apache-2.0 msconvert 3.0.20287.4
Tool for mass spectra metadata annotation. msmetaenhancer 10.21105/joss.04494 MIT MSMetaEnhancer 0.4.0+galaxy1
Statistical tool for quantitative mass spectrometry-based proteomics. msstats MSstats: An R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments MSstats MSstats 4.0.0+galaxy1
Tools for detecting differentially abundant peptides and proteins in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling bioconductor-msstatstmt MSstatsTMT 2.0.0+galaxy1
mtag 20230414
MultiQC aggregates results from multiple bioinformatics analyses across many samples into a single report. It searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools. multiqc 10.1093/bioinformatics/btw354 MultiQC GPL-3.0 MultiQC 1.11+galaxy1 1.9 1.11-foss-2021a1.14-foss-2022a (D) 1.11-foss-2021a1.14-foss-2022a (D)
MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Basically it is a ultra-fast alignment of large-scale DNA and protein sequences mummer 4 publications MUMmer Artistic-2.0 6 tools 3.23--pl5321h1b792b2_13
MuSiC is a suite of programs that evaluate the biophysical effects of amino acid mutations in proteins. They request the experimental or modeled 3-dimensional protein structure as input, and predict the impact of specific single-site mutations requested by the user or of all possible single-site mutations. PoPMuSiC and HoTMuSiC predict the changes in thermodynamic and thermal stability, respectively, upon mutation. They are helpful for the rational design of modified proteins with controlled stability properties. SNPMuSiC predicts whether protein variants are deleterious or benign due to stability issues, thus providing a molecular-level interpretation of disease phenotype. music_compare 6 publications MuSiC Compare 0.1.1+galaxy4
music_deconvolution 3 tools
Convert proteomics data files into a SQLite database mztosqlite mz to sqlite 2.1.1+galaxy0
nag
nag nll6i27dbl nll6i27dbl-i8 nll6i27dbl-mkl fll6i26dcl fll6i26dcl-mkl
nvc Naive Variant Caller (NVC) 0.0.4
RNA modifications detection by comparative Nanopore direct RNA sequencing. RNA modifications detection from Nanopore dRNA-Seq data. Nanocompore identifies differences in ONT nanopore sequencing raw signal corresponding to RNA modifications by comparing 2 samples. Analyses performed for the nanocompore paper. Nanocompore compares 2 ONT nanopore direct RNA sequencing datasets from different experimental conditions expected to have a significant impact on RNA modifications. It is recommended to have at least 2 replicates per condition. For example one can use a control condition with a significantly reduced number of modifications such as a cell line for which a modification writing enzyme was knocked-down or knocked-out. Alternatively, on a smaller scale transcripts of interests could be synthesized in-vitro nanocompore 10.1101/843136 GPL-3.0 SampComp 1.0.0rc3.post2+galaxy1 1.0.4--pyhdfd78af_0
nanofilt NanoFilt 0.1.0
NanoPlot is a tool with various visualizations of sequencing data in bam, cram, fastq, fasta or platform-specific TSV summaries, mainly intended for long-read sequencing from Oxford Nanopore Technologies and Pacific Biosciences nanoplot NanoPack: Visualizing and processing long-read sequencing data GPL-3.0 NanoPlot 1.43.0+galaxy0
A package for detecting cytosine methylations and genetic variations from nanopore MinION sequencing data. nanopolish Detecting DNA cytosine methylation using nanopore sequencing Nanopolish MIT 4 tools 0.14.0--hb24e783_1
nanosv 1.2.4
natural_product_likeness_scorer Natural Product likeness calculator 2.1
NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases. Find and download sequence, annotation, and metadata for genes and genomes using our command-line tools or web interface. ncbi-datasets-cli NCBI Datasets 14.2.214.13.014.29.116.6.0 14.2.214.13.014.29.116.6.0 14.2.214.13.014.29.116.6.0 14.2.214.13.014.29.116.6.0
ncbi-vdb 2.10.9-gompi-2021a3.0.2-gompi-2022a (D) 2.10.9-gompi-2021a3.0.2-gompi-2022a (D)
The National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. ncbi_acc_download 14 publications NCBI Accession Download 0.2.8+galaxy0
nccmp compares two NetCDF files bitwise, semantically or with a user defined tolerance (absolute or relative percentage). Parallel comparisons are done in local memory without requiring temporary files. Highly recommended for regression testing scientific models or datasets in a test-driven development environment. nccmp GPL-2.0 1.8.5.0
NCL
ncl 6.6.2
NCO
The NCO toolkit manipulates and analyzes data stored in netCDF-accessible formats, including DAP, HDF4, and HDF5. nco BSD-3-Clause 4.7.7 4.9.2 5.0.5
Ncview is a netCDF visual browser. ncview GPL-1.0 2.1.7
NetCDF (Network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. netcdf 4.7.1 4.7.1p 4.6.3 4.6.3p 4.7.3 4.7.3p 4.7.4 4.7.4p 4.6.3-i8r8 4.8.0 4.8.0p 4.9.0 4.9.0p 4.8.0-gompi-2021a4.9.0-gompi-2022a (D) 4.8.0-gompi-2021a4.9.0-gompi-2022a (D)
The Newick Utilities are a set of command-line tools for processing phylogenetic trees. They can process arbitrarily large amounts of data and do not require user interaction, which makes them suitable for automating phylogeny processing tasks. newick_utils 10.1093/bioinformatics/btq243 Newick Display 1.6+galaxy1
Nextclade is an open-source project for viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement. nextclade MIT Nextclade 2.7.0+galaxy0
Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages. nextflow Nextflow enables reproducible computational workflows Apache-2.0 21.04.3 22.04.3 22.10.1
A fast and efficient genome polishing tool for long read assembly. Fast and accurately polish the genome generated by noisy long reads. NextPolish is used to fix base errors (SNV/Indel) in the genome generated by noisy long reads, it can be used with short read data only or long read data only or a combination of both. It contains two core modules, and use a stepwise fashion to correct the error bases in reference genome. To correct the raw third-generation sequencing (TGS) long reads with approximately 15-10% sequencing errors, please use NextDenovo nextpolish NextPolish: A fast and efficient genome polishing tool for long-read assembly 1.4.1--py311he4a0461_1
A fast and efficient genome polishing tool for long read assembly. Fast and accurately polish the genome generated by noisy long reads. NextPolish is used to fix base errors (SNV/Indel) in the genome generated by noisy long reads, it can be used with short read data only or long read data only or a combination of both. It contains two core modules, and use a stepwise fashion to correct the error bases in reference genome. To correct the raw third-generation sequencing (TGS) long reads with approximately 15-10% sequencing errors, please use NextDenovo nextpolish2 NextPolish: A fast and efficient genome polishing tool for long-read assembly 0.1.0--hd03093a_0
NGS
ngs 2.10.9-gcccore-10.3.0
NGSUtils is a suite of software tools for working with next-generation sequencing datasets ngsutils NGSUtils: A software suite for analyzing and manipulating next-generation sequencing datasets ngsutils GPL-3.0 BAM filter 0.5.9
Nearly Infinite Neighbor Joining Application ninja MIT 0.98-cluster_only 1.10.2-gcccore-10.3.01.10.2-gcccore-11.3.0 (D) 1.10.2-gcccore-10.3.01.10.2-gcccore-11.3.0 (D)
NOVOPlasty NOVOplasty 4.3.1+galaxy0
nseg 1.0.1
bg_find_subsequences Nucleotide subsequence search 0.2
Set of python programs developed to simplify the manipulation of sequence files. They were mainly designed to help us for analyzing Next Generation Sequencer outputs (454 or Illumina) in the context of DNA Metabarcoding. obitools obitools: A unix-inspired software package for DNA metabarcoding OBITools 10 tools
ont-fast5-api 4.1.1--pyhdfd78af_0
Open source library and a collection of tools and interfaces for the analysis of mass spectrometry data. Includes over 200 standalone (TOPP) tools that can be combined to a workflow with the integrated workflow editor TOPPAS. Raw and intermediate mass spectrometry data can be visualised with the included viewer TOPPView. openms 2 publications OpenMS BSD-3-Clause 35 tools
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics. It finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplcation events in those gene trees. It also infers a rooted species tree for the species being analysed and maps the gene duplication events from the gene trees to branches in the species tree. OrthoFinder also provides comprehensive statistics for comparative genomic analyses. orthofinder 2 publications GPL-3.0 OrthoFinder 2.5.5+galaxy0
pampa 5 tools
pandoc 3.1.2
Pangolin is a deep-learning based method for predicting splice site strengths (for details, see Zeng and Li, Genome Biology 2022). It is available as a command-line tool that can be run on a VCF or CSV file containing variants of interest; Pangolin will predict changes in splice site strength due to each variant, and return a file of the same format. Pangolin's models can also be used with custom sequences. pangolin Predicting RNA splicing from DNA sequence using Pangolin GPL-3.0 Pangolin 4.3+galaxy2
ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView’s batch processing capabilities. paraview BSD-3-Clause 5.8.0 5.8.0-mesa 5.8.0-gpu 5.9.1 5.9.1-mesa 5.9.1-gpu 5.10.1 5.9.1-mesa 5.10.1-mesa 5.9.1-gpu
parse_mito_blast Parse mitochondrial blast 1.0.2+galaxy0
param_value_from_file Parse parameter value 0.1.0
Parsnp is a command-line-tool for efficient microbial core genome alignment and SNP detection. Parsnp was designed to work in tandem with Gingr, a flexible platform for visualizing genome alignments and phylogenetic trees. parsnp BSD-3-Clause 1.7.4--hdcf5f25_2
Tool set for pathway based data integration and visualization that maps and renders a wide variety of biological data on relevant pathway graphs. It downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, it integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis. pathview Pathview: An R/Bioconductor package for pathway-based data integration and visualization Pathview GPL-3.0 Pathview 1.34.0+galaxy0
Web application for exploring metagenomics classification results, with a special focus on infectious disease diagnosis. Pinpointing pathogens in metagenomics classification results is often complicated by host and laboratory contaminants as well as many non-pathogenic microbiota. Researchers can analyze, display and transform results from the Kraken and Centrifuge classifiers using interactive tables, heatmaps and flow diagrams. pavian 10.1101/084715 GPL-3.0 Pavian 1.0
Multithread blat algorithm speeding up aligning sequences to genomes. pblat pblat: A multithread blat algorithm speeding up aligning sequences to genomes Unlicense 2.5
Productive visualization of high-throughput sequencing data using the SeqCode open portable platform. pe_histogram Productive visualization of high-throughput sequencing data using the SeqCode open portable platform GPL-3.0 Paired-end histogram 1.0.1
Paired-end read merger. PEAR evaluates all possible paired-end read overlaps without requiring the target fragment size as input. In addition, it implements a statistical test for minimizing false-positive results. pear PEAR: A fast and accurate Illumina Paired-End reAd mergeR CC-BY-NC-1.0 Pear 0.9.6.3 0.9.6--h9d449c0_10
PepPointer PepPointer 0.1.3+galaxy1
peptide_genomic_coordinate Peptide Genomic Coordinate 1.0.0
PeptideShaker is a search engine independent platform for interpretation of proteomics identification results from multiple search engines, currently supporting X!Tandem, MS-GF+, MS Amanda, OMSSA, MyriMatch, Comet, Tide, Mascot, Andromeda and mzIdentML. By combining the results from multiple search engines, while re-calculating PTM localization scores and redoing the protein inference, PeptideShaker attempts to give you the best possible understanding of your proteomics data peptideshaker PeptideShaker enables reanalysis of MS-derived proteomics data sets: To the editor Apache-2.0 4 tools
Semi-supervised learning for peptide identification from MS/MS data. percolator Semi-supervised learning for peptide identification from shotgun proteomics datasets Percolator 4 tools
perllib v5.26.3
This tool is used to search a FASTA sequence against a library of Pfam HMM. pfamscan PfamScan 1.6+galaxy0
Pharokka is a rapid standardised annotation tool for bacteriophage genomes and metagenomes. pharokka Pharokka: a fast scalable bacteriophage annotation tool MIT pharokka 1.3.2+galaxy0
phinch Phinch Visualisation 0.1
PHP
php 5.6.40
Provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data. phyloseq Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data Phyloseq GPL-3.0 2 tools
Phylogenetic estimation software using Maximum Likelihood phyml 5 publications PhyML GPL-2.0 PhyML 3.3.20220408+galaxy0
A set of command line tools for manipulating high-throughput sequencing (HTS) data in formats such as SAM/BAM/CRAM and VCF. Available as a standalone program or within the GATK4 program. picard PICARD MIT 31 tools 2.27.43.1.1 2.27.43.1.1 2.25.1-java-11
PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes. picrust Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences 6 tools
PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a software for predicting functional abundances based only on marker gene sequences. picrust2 10.1038/s41587-020-0548-6 GPL-3.0 7 tools
pileup_interval Pileup-to-Interval 1.0.3
Read alignment analysis to diagnose, report, and automatically improve de novo genome assemblies. pilon Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement Pilon pilon 1.20.1
PIPE-T is a Galaxy Workflow for processing and analyzing miR expression profiles by RTqPCR. It is a tool that offers several state-of-the-art options for parsing, filtering, normalizing, imputing and analyzing RT-qPCR expression data. Integration of PIPE-T into Galaxy allows experimentalists with strong bioinformatic background, as well as those without any programming or development expertise, to perform complex analysis in a simple to use, transparent, accessible, reproducible, and user-friendly environment pipe_t PIPE-T: a new Galaxy tool for the analysis of RT-qPCR expression data MIT PIPE-T 1.0
PlasFlow is a set of scripts used for prediction of plasmid sequences in metagenomic contigs. PlasFlow GPL-3.0 PlasFlow 1.1.0+galaxy0
PlasmidFinder is a tool for the identification and typing of Plasmid Replicons in Whole-Genome Sequencing (WGS). plasmidfinder PlasmidFinder and In Silico pMLST: Identification and Typing of Plasmid Replicons in Whole-Genome Sequencing (WGS) PlasmidFinder 2.1.6+galaxy1
Free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. plink PLINK: A tool set for whole-genome association and population-based linkage analyses plink GPL-2.0 v2.00a3.7 2.00a3.6-gcc-11.3.0
PnetCDF: A Parallel I/O Library for NetCDF File Access pnetcdf 10.1109/SC.2003.10053 Freeware 1.11.2
poisson2test Poisson two-sample test 1.0.0
porechop Porechop 0.2.4+galaxy0
Flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis. poretools Poretools: A toolkit for analyzing nanopore sequence data poretools 13 tools
Tools for performing taxonomic assignment based on phylogeny using pplacer and clst. pplacer Orchestrating high-throughput genomic analysis with Bioconductor pplacer GPL-3.0 1.1.alpha19
pretext_map PretextMap 0.1.9+galaxy1
Pretext is an OpenGL-powered pretext contact map viewer. pretextview MIT Pretext Snapshot 0.0.3+galaxy2
The pipeline runs PRODIGAL gene predictions on all genomes, runs pan-reciprocal BLAST, and identifies ortholog sets. For a set of orthologous genes, if the positions of the PRODIGAL selected starts coincide in a multiple sequence alignment, they are accepted. If they do not coincide, a consistent start position is sought where a majority of the highest-scoring PRODIGAL selected sites coincide. If such a position is found, it is accepted, and the predictions are changed for the outlying genes. prodigal Genome majority vote improves gene predictions 2.6.3 2.6.3-gcccore-10.3.02.6.3-gcccore-11.3.0 (D) 2.6.3-gcccore-10.3.02.6.3-gcccore-11.3.0 (D)
Flow Injection Analysis coupled to High-Resolution Mass Spectrometry is a promising approach for high-throughput metabolomics. FIA-HRMS data, however, cannot be pre-processed with current software tools which rely on liquid chromatography separation, or handle low resolution data only. Here we present the package that implements a new methodology to pre-process FIA-HRMS raw data (netCDF, mzData, mzXML, and mzML) and generates the peak table. profia Orchestrating high-throughput genomic analysis with Bioconductor CECILL-2.1 proFIA 3.1.0
progressbar2 4.2.0
Software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files. prokka 10.1093/bioinformatics/btu153 Prokka Prokka 1.14.6+galaxy1 1.14.5-gompi-2021a1.14.5-gompi-2022a (D) 1.14.5-gompi-2021a1.14.5-gompi-2022a (D)
pslcdnafilter 0
Identifying and removing haplotypic duplication in primary genome assemblies | haplotypic duplication identification tool | scripts/pd_config.py: script to generate a configuration file used by run_purge_dups.py | purge haplotigs and overlaps in an assembly based on read depth | Given a primary assembly pri_asm and an alternative assembly hap_asm (optional, if you have one), follow the steps shown below to build your own purge_dups pipeline, steps with same number can be run simultaneously. Among all the steps, although step 4 is optional, we highly recommend our users to do so, because assemblers may produce overrepresented seqeuences. In such a case, The final step 4 can be applied to remove those seqeuences purge_dups 10.1101/729962 MIT Purge overlaps 1.2.6+galaxy0
pybigwig 0.3.18-foss-2021a0.3.18-foss-2022a (D) 0.3.18-foss-2021a0.3.18-foss-2022a (D)
PycoQC computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data. pycoqc GPL-3.0 Pycoqc 2.5.2+galaxy0 2.5.2-foss-2021a
reproducible plots for multivariate genomic data sets. Standalone program and library to plot beautiful genome browser tracks. pyGenomeTracks aims to produce high-quality genome browser tracks that are highly customizable. Currently, it is possible to plot:. pygenometracks pyGenomeTracks: reproducible plots for multivariate genomic datasets GPL-3.0 pyGenomeTracks 3.8+galaxy2
pyprophet 4 tools
A Python module for reading and manipulating SAM/BAM/VCF/BCF files. pysam The Sequence Alignment/Map format and SAMtools MIT 0.16.0.1-gcc-10.3.00.19.1-gcc-11.3.0 (D) 0.16.0.1-gcc-10.3.00.19.1-gcc-11.3.0 (D)
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. pytorch BSD-3-Clause 1.4.0a0 1.5.1 1.9.0 1.10.0 1.12.1
QIIME 2™ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed. qiime2 Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 BSD-3-Clause 161 tools 2022.8
Platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data. qualimap 22914218 qualimap 4 tools
A Collection of Tools for Viral Quasispecies Analysis | Abstract Summary quasitools is a collection of newly-developed, open-source tools for analyzing viral quasispcies data. The application suite includes tools with the ability to create consensus sequences, call nucleotide, codon, and amino acid variants, calculate the complexity of a quasispecies, and measure the genetic distance between two similar quasispecies. These tools may be run independently or in user-created workflows. Availability The quasitools suite is a freely available application licensed under the Apache License, Version 2.0. The source code, documentation, and file specifications are available at: https: phac-nml.github.io quasitools Contact gary.vandomselaar@canada.ca quasitools 10.1101/733238 Apache-2.0 12 tools
QUAST stands for QUality ASsessment Tool. It evaluates a quality of genome assemblies by computing various metrics and providing nice reports. quast QUAST: Quality assessment tool for genome assemblies GPL-2.0 Quast 5.2.0+galaxy1 5.1.0rc15.2.0 5.1.0rc15.2.0 5.0.2-foss-2021a5.2.0-foss-2022a (D) 5.0.2-foss-2021a5.2.0-foss-2022a (D)
Query Tabular is a Galaxy-based tool which manipulates tabular files. Query Tabular automatically creates a SQLite database directly from a tabular file within a Galaxy workflow. The SQLite database can be saved to the Galaxy history, and further process to generate tabular outputs containing desired information and formatting. query_tabular Improve your Galaxy text life: The Query Tabular Tool CC-BY-4.0 Query Tabular 3.3.2
R
Free software environment for statistical computing and graphics. r 10.11120/msor.2001.01010023 3.6.1 4.0.0 4.1.0 4.2.1 4.1.0-foss-2021a4.2.1-foss-2021a4.2.1-foss-2022a (D) 4.1.0-foss-2021a4.2.1-foss-2021a4.2.1-foss-2022a (D) 4.1.0-foss-2021a4.2.1-foss-2021a4.2.1-foss-2022a (D)
r-raceid 5 tools
Consensus module for raw de novo DNA assembly of long uncorrected reads Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step. The goal of Racon is to generate genomic consensus which is of similar or better quality compared to the output generated by assembly methods which employ both error correction and consensus steps, while providing a speedup of several times compared to those methods. It supports data produced by both Pacific Biosciences and Oxford Nanopore Technologies. racon Constructing a reference genome in a single lab: The possibility to use oxford nanopore technology MIT Racon 1.5.0+galaxy1 1.4.3
fast and accurate reference-guided scaffolding of draft genomes. Fast Reference-Guided Scaffolding of Genome Assembly Contigs. Index of /shares/schatzlab/www-data/ragoo. A tool to order and orient genome assembly contigs via Minimap2 alignments to a reference genome. Alonge, Michael, et al. "RaGOO: fast and accurate reference-guided scaffolding of draft genomes." Genome biology 20.1 (2019): 1-17. Contigs and reference fasta files may now be gzipped. RaGOO is a tool for coalescing genome assembly contigs into pseudochromosomes via minimap2 alignments to a closely related reference genome. The focus of this tool is on practicality and therefore has the following features: ragoo RaGOO: Fast and accurate reference-guided scaffolding of draft genomes MIT RaGOO 1.0
RagTag is a collection of software tools for scaffolding and improving modern genome assemblies. ragtag RaGOO: Fast and accurate reference-guided scaffolding of draft genomes MIT RagTag 2.1.0+galaxy1
A feature clustering algorithm for non-targeted mass spectrometric metabolomics data. ramclustr RAMClust: A novel feature clustering method enables spectral-matching-based annotation for metabolomics data GPL-2.0 2 tools
Ratatosk – Hybrid error correction of long reads enables accurate variant calling and assembly. Phased hybrid error correction of long reads using colored de Bruijn graphs. Ratatosk is a phased error correction tool for erroneous long reads based on compacted and colored de Bruijn graphs built from accurate short reads. ratatosk 10.1101/2020.07.15.204925 BSD-2-Clause 0.7.6.3--h43eeafb_2
a de novo genome assembler for long reads. Raven is a de novo genome assembler for long uncorrected reads. raven 10.1101/2020.08.07.242461 MIT Raven 1.8.3+galaxy0
A standalone tool for extracting data directly from raw files generated by Thermo Orbitrap family instruments. rawtools RawTools: Rapid and Dynamic Interrogation of Orbitrap Data Files for Mass Spectrometer System Management Apache-2.0 Raw Tools 1.4.2.0
A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. raxml 2 publications RAxML RAxML 8.2.12+galaxy1 8.2.12
RDKit is an Open-Source Cheminformatics Software. Fast, Efficient Fragment-Based Coordinate Generation for Open Babel. rdkit 10.26434/CHEMRXIV.7791947.V2 2 tools
rdock Create Frankenstein ligand 2013.1-0+galaxy0
recetox-aplcms is a tool for peak detection in mass spectrometry data. The tool performs (1) noise removal, (2) peak detection, (3) retention time drift correction, (4) peak alignment and (5) weaker signal recovery as well as (6) suspect screening. recetox-aplcms GPL-2.0 8 tools
Tool for calculating the probability of nucleosome formation along a DNA sequence input by the user. recon RECON: A program for prediction of nucleosome formation potential 1.08
RED
This is a program to detect and visualize RNA editing events at genomic scale using next-generation sequencing data. red RED: A Java-MySQL software for identifying and visualizing RNA editing sites using rule- based and statistical filters Red 2018.09.10+galaxy1
regenie 3.2.9
A program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). repeatmasker OSL-2.1 RepeatMasker 4.1.5+galaxy0 4.1.2-p14.1.5 4.1.2-p14.1.5 4.1.5--pl5321hdfd78af_0
RepeatModeler is a de novo transposable element (TE) family identification and modeling package. At the heart of RepeatModeler are three de-novo repeat finding programs ( RECON, RepeatScout and LtrHarvest/Ltr_retriever ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. repeatmodeler 10.1101/856591 RepeatModeler 2.0.4+galaxy1 2.0.32.0.42.0.4-conda 2.0.32.0.42.0.4-conda 2.0.32.0.42.0.4-conda 2.0.4--pl5321hdfd78af_0
RepeatScout is a tool to discover repetitive substrings in DNA. repeatscout 10.1093/bioinformatics/bti1018 1.0.6
repenrich RepEnrich 1.6.1
The rjags package provides an interface from R to the JAGS library for Bayesian data analysis. JAGS uses Markov Chain Monte Carlo (MCMC) to generate a sequence of dependent samples from the posterior distribution of the parameters. rjags GPL-2.0 4-10-foss-2021a-r-4.1.0
Workflow to process tandem MS files and build MassBank records. Functions include automated extraction of tandem MS spectra, formula assignment to tandem MS fragments, recalibration of tandem MS spectra with assigned fragments, spectrum cleanup, automated retrieval of compound information from Internet databases, and export to MassBank records. rmassbank Automatic recalibration and processing of tandem mass spectra using formula annotation Artistic-2.0 RMassBank 3.0.0+galaxy3
rmats-turbo 4.1.2
RMBlast is a RepeatMasker compatible version of the standard NCBI blastn program. The primary difference between this distribution and the NCBI distribution is the addition of a new program "rmblastn" for use with RepeatMasker and RepeatModeler. rmblast OSL-2.1 2.11.02.14.0 2.11.02.14.0
rnachipintegrator 2 tools
Quality assessment tool for de novo transcriptome assemblies. rnaquast RnaQUAST: A quality assessment tool for de novo transcriptome assemblies Unlicense rnaQUAST 2.2.3+galaxy0
A high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome. roary 10.1093/bioinformatics/btv421 roary Roary 3.13.0+galaxy3
We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNASeq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. rsem RNA-Seq gene expression estimation with read mapping uncertainty RSEM 1.3.3--pl5321ha04fe3b_5
Provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, transcript level RNA integrity etc. rseqc 2 publications rseqc 22 tools 5.0.1
Integrated development environment (IDE) for the R programming language. rstudio RSTUDIO: A platform-independent IDE for R and sweave RStudio 0.3
RTG Core: Software for alignment and analysis of next-gen sequencing data. rtg-tools Other 3.12.1
rxdock 2 tools
s3segmenter s3segmenter 1.3.12+galaxy0
A software tool that implements a novel, is an alignment-free algorithm for the estimation of isoform abundances directly from a set of reference sequences and RNA-seq reads. sailfish Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms Sailfish 0.10.1.1
A tool for transcript expression quantification from RNA-seq data salmon Salmon provides fast and bias-aware quantification of transcript expression GPL-3.0 Salmon quant 1.10.1+galaxy2 1.1.0 1.4.0-gompi-2021a1.9.0-gcc-11.3.0 (D) 1.4.0-gompi-2021a1.9.0-gcc-11.3.0 (D)
salmon_kallisto_mtx_to_10x salmonKallistoMtxTo10x 0.0.1+galaxy6
> VERY_LOW CONFIDENCE! | > CORRECT NAME OF TOOL COULD ALSO BE 'chromosome-scale', 'reference-quality', 'Hi-C', 'scaffolder' | Integrating Hi-C links with assembly graphs for chromosome-scale assembly | SALSA: A tool to scaffold long read assemblies with Hi-C data | SALSA: A tool to scaffold long read assemblies with Hi-C | This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch salsa Integrating Hi-C links with assembly graphs for chromosome-scale assembly MIT 2.3
sam2interval Convert SAM 1.0.2
sam_pileup Generate pileup 1.1.3
This tool is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM, BAM and CRAM formats. sambamba Sambamba: Fast processing of NGS alignment formats Sample, Slice or Filter BAM 0.7.1+galaxy1 0.8.1 0.8.1--h41abebc_0
A tool to mark duplicates and extract discordant and split reads from SAM files. samblaster SAMBLASTER: Fast duplicate marking and structural variant read extraction samblaster 0.1.24
SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. samtools 3 publications SAMTools MIT 23 tools 1.9 1.10 1.12 1.181.19.2 1.181.19.2 1.15--h3843a85_0 1.13-gcc-10.3.01.13-gcc-11.3.01.16.1-gcc-11.3.0 (D) 1.13-gcc-10.3.01.13-gcc-11.3.01.16.1-gcc-11.3.0 (D) 1.13-gcc-10.3.01.13-gcc-11.3.01.16.1-gcc-11.3.0 (D)
Scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells. scanpy SCANPY: Large-scale single-cell gene expression data analysis BSD-3-Clause 34 tools
Pre-processing, quality control, normalization and visualization of single-cell RNA-seq data. scater Scater: Pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R 6 tools
scikit-build 0.11.1-gcccore-10.3.0
Scikit-image contains image processing algorithms for SciPy, including IO, morphology, filtering, warping, color manipulation, object detection, etc. scikit-image 10.7287/PEERJ.PREPRINTS.336V2 BSD-3-Clause 7 tools
scikit 14 tools
scipio 1.4
send_to_cloud Send to cloud 0.1.0
SEPP stands for SATé-Enabled Phylogenetic Placement and addresses the problem of phylogenetic placement for meta-genomic short reads sepp 10.1142/9789814366496_0024 sepp GPL-3.0 4.5.1 4.5.0-foss-2021a4.5.1-foss-2022a (D) 4.5.0-foss-2021a4.5.1-foss-2022a (D)
seq_select_by_id Select sequences by ID 0.0.14
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. seqkit 10.1371/journal.pone.0163962 2 tools 2.2.02.3.12.5.1 2.2.02.3.12.5.1 2.2.02.3.12.5.1
seqlib 1.2.0-gcc-10.3.0
A tool for processing sequences in the FASTA or FASTQ format. It parses both FASTA and FASTQ files which can also be optionally compressed by gzip. seqtk FastQ-brew: Module for analysis, preprocessing, and reformatting of FASTQ sequence data seqtk MIT 15 tools 1.3 1.4 1.3-gcc-10.3.01.3-gcc-11.3.0 (D) 1.3-gcc-10.3.01.3-gcc-11.3.0 (D)
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and to integrate diverse types of single-cell data. seurat Integrated analysis of multimodal single-cell data MIT 16 tools
De novo assembly from Oxford Nanopore reads. shasta Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes MIT Shasta 0.6.0+galaxy0
Shovill is a pipeline for assembly of bacterial isolate genomes from Illumina paired-end reads. Shovill uses SPAdes at its core, but alters the steps before and after the primary assembly step to get similar results in less time. Shovill also supports other assemblers like SKESA, Velvet and Megahit, so you can take advantage of the pre- and post-processing the Shovill provides with those too. shovill shovill GPL-3.0 Shovill 1.1.0+galaxy2 1.1.0
A clustering approach for identification of enriched domains from histone modification ChIP-seq data. sicer A clustering approach for identification of enriched domains from histone modification ChIP-Seq data SICER 1.1
A text mining framework for interactive analysis and visualization of similarities among biomedical entities. For each search query, PMIDs or abstracts from PubMed are saved. $ git clone https://github.com/dlal-group/simtext. For all PMIDs in each row of a table the according abstracts are saved in additional columns. simtext 10.1101/2020.07.06.190629 2 tools
SIP
sip 6.7.9
The Salmonella In Silico Typing Resource (SISTR) is an open-source and freely available web application for rapid in silico typing and serovar prediction from Salmonella genome assemblies using cgMLST and O and H antigen gene searching. sistr Performance and accuracy of four open-source tools for in silico serotyping of salmonella spp. Based on whole-genome short-read sequencing data Apache-2.0 sistr_cmd 1.1.1+galaxy1
slow5-dorado 0.2.1
slow5-guppy 6.0.1
Slow5tools is a simple toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format. About SLOW5 format: SLOW5 is a new file format for storing signal data from Oxford Nanopore Technologies (ONT) devices. SLOW5 was developed to overcome inherent limitations in the standard FAST5 signal data format that prevent efficient, scalable analysis and cause many headaches for developers. SLOW5 can be encoded in human-readable ASCII format, or a more compact and efficient binary format (BLOW5) - this is analogous to the seminal SAM/BAM format for storing DNA sequence alignments. The BLOW5 binary format supports zlib (DEFLATE) compression, or other compression methods, thereby minimising the data storage footprint while still permitting efficient parallel access. Detailed benchmarking experiments have shown that SLOW5 format is an order of magnitude faster and significantly smaller than FAST5. slow5tools 0.8.0 0.3.01.0.01.1.0 0.3.01.0.01.1.0 0.3.01.0.01.1.0
smithwaterman 20160702-gcccore-10.3.0
Reference-free profiling of polyploid genomes | Inference of ploidy and heterozygosity structure using whole genome sequencing data | Smudgeplots are computed from raw or even better from trimmed reads and show the haplotype structure using heterozygous kmer pairs. For example: | This tool extracts heterozygous kmer pairs from kmer dump files and performs gymnastics with them. We are able to disentangle genome structure by comparing the sum of kmer pair coverages (CovA + CovB) to their relative coverage (CovA / (CovA + CovB)). Such an approach also allows us to analyze obscure genomes with duplications, various ploidy levels, etc | GenomeScope 2.0 and Smudgeplots: Reference-free profiling of polyploid genomes Timothy Rhyker Ranallo-Benavidez, Kamil S. Jaron, Michael C. Schatz bioRxiv 747568; doi: https://doi.org/10.1101/747568 Smudgeplots 10.1101/747568 Apache-2.0 Smudgeplot 0.2.5+galaxy3
Workflow engine and language. It aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style. snakemake Snakemake-a scalable bioinformatics workflow engine Snakemake 7.18.2 6.6.1-foss-2021a7.22.0-foss-2022a (D) 6.6.1-foss-2021a7.22.0-foss-2022a (D)
The Semi-HMM-based Nucleic Acid Parser is a gene prediction tool. snap 10.1186/1471-2105-5-59 SNAP Train SNAP 2013_11_29+galaxy1 2006 2013_11_29
SnapATAC (Single Nucleus Analysis Pipeline for ATAC-seq) is a fast, accurate and comprehensive method for analyzing single cell ATAC-seq datasets. snapatac Comprehensive analysis of single cell ATAC-seq data with SnapATAC GPL-3.0 4 tools
a snakemake pipeline for scalable HIV-1 subtyping by phylogenetic pairing | SNAPPy is a Snakemake pipeline for HIV-1 subtyping by phylogenetic pairing | This is the repository for SNAPPy, a Snakemake pipeline for HIV-1 subtyping by phylogenetic pairing. SNAPPy allows high-throughput HIV-1 subtyping locally while being resource efficient and scalable. This pipeline was constructed using Snakemake , and it uses MAFFT and for multiple sequence alignment, BLAST for similarirty querys, IQ-TREE for phylogenetic inference, and several Biopython modules for data parsing an analysis. For in-depth information on how the tool works please visit the documentation page. SNAPPy was design for Linux based operative systems | Welcome to snappy’s documentation! — SNAPPy-HIV1-Subtyping 1.0.0 documentation | Free document hosting provided by Read the Docs snappy SNAPPy: A snakemake pipeline for scalable HIV-1 subtyping by phylogenetic pairing MIT 1.1.8-gcccore-10.3.01.1.9-gcccore-11.3.0 (D) 1.1.8-gcccore-10.3.01.1.9-gcccore-11.3.0 (D)
An algorithm for structural variation detection from third generation sequencing alignment. sniffles Accurate detection of complex structural variations using single-molecule sequencing sniffles MIT sniffles 1.0.12+galaxy0 2.0.22.3.32.4 2.0.22.3.32.4 2.0.22.3.32.4
Rapid haploid variant calling and core SNP phylogeny generation. snippy snippy GPL-3.0 3 tools
snp-dists SNP distance matrix 0.8.2+galaxy0
snp_sites Finds SNP sites 2.5.1+galaxy0
Variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes and proteins (such as amino acid changes). snpeff 22728672 snpeff 6 tools
snpfreq snpFreq 1.0.1
snpfreqplot Variant Frequency Plot 1.0+galaxy3
Toolbox that allows you to filter and manipulate annotated vcf files. snpsift Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift LGPL-3.0 8 tools
SOAPdenovo2 is a next generation sequencing reads de novo assembler. soapdenovo2 SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler GPL-3.0 2.41
Sequence analysis tool for filtering, mapping and OTU-picking NGS reads. sortmerna SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data sortmerna Filter with SortMeRNA 4.3.6+galaxy0 4.3.6--h9ee0642_0
spaceranger 2.0.1-gcc-11.3.0
St. Petersburg genome assembler – is intended for both standard isolates and single-cell MDA bacteria assemblies. SPAdes 3.9 works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. Additional contigs can be provided and can be used as long reads. spades 2 publications SPAdes GPL-2.0 8 tools 3.15.4--h95f258a_0 3.15.3-gcc-10.3.03.15.5-gcc-11.3.0 (D) 3.15.3-gcc-10.3.03.15.5-gcc-11.3.0 (D)
spectra 1.0.1-gcccore-11.3.0
Spectral Repeat Finder (SRF) is a program to find repeats through an analysis of the power spectrum of a given DNA sequence. srf Spectral repeat finders (SRF): Identification of repetitive sequences using Fourier transformation 2022.11.22
sqlite 3.36
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. sra-tools Database resources of the National Center for Biotechnology Information. 3 tools 3.0.2 3.0.3-gompi-2022a3.0.3--h87f3376_0 3.0.3-gompi-2022a3.0.3--h87f3376_0
srst2 0.2.0--py_4
ssw
A fast implementation of the Smith-Waterman algorithm whose API that can be flexibly used by programs written in C, C++ and other languages. ssw SSW library: An SIMD Smith-Waterman C/C++ library for use in genomic applications 1.1-gcccore-10.3.0
Developed to work with restriction enzyme based sequence data, such as RADseq, for building genetic maps and conducting population genomics and phylogeography analysis. stacks Stacks: An analysis tool set for population genomics Stacks GPL-3.0 25 tools
Ultrafast universal RNA-seq data aligner star 3 publications star GPL-3.0 2 tools 2.7.10a 2.7.10a--h9ee0642_0 2.7.9a-gcc-10.3.02.7.10b-gcc-11.3.0 (D)2.7.10a--h9ee0642_0 2.7.9a-gcc-10.3.02.7.10b-gcc-11.3.0 (D)2.7.10a--h9ee0642_0 2.7.9a-gcc-10.3.02.7.10b-gcc-11.3.0 (D)2.7.10a--h9ee0642_0
STAR-Fusion, a method that is both fast and accurate in identifying fusion transcripts from RNA-Seq data star_fusion STAR-Fusion 0.5.4-3+galaxy1
staramr (*AMR) scans bacterial genome contigs against the ResFinder, PointFinder, and PlasmidFinder databases (used by the ResFinder webservice and other webservices offered by the Center for Genomic Epidemiology) and compiles a summary report of detected antimicrobial resistance genes. The star|* in staramr indicates that it can handle all of the ResFinder, PointFinder, and PlasmidFinder databases. staramr staramr 0.10.0+galaxy1
Fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. stringtie StringTie enables improved reconstruction of a transcriptome from RNA-seq reads Artistic-2.0 2 tools 2.1.7-gcc-10.3.0
Subread is a general-purpose read aligner which can be used to map both genomic DNA-seq reads and RNA-seq reads. It uses a new mapping paradigm called "seed-and-vote" to achieve fast, accurate and scalable read mapping. It automatically determines if a read should be globally or locally aligned, therefore particularly powerful in mapping RNA-seq reads. It supports indel detection and can map reads with both fixed and variable lengths. subread 2 publications subread GPL-3.0 2.0.32.0.6 2.0.32.0.6
An agile homology-based approach using a reduced SEED database to report the subsystems present in metagenomic samples and profile their abundances. superfocus SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data 1.4.1
This tool generates Alternative Splicing (AS) events from an annotation and calculates the PSI ("Percentage Spliced In") value for each event exploiting fast quantification of transcript abundances from multiple samples. suppa Leveraging transcript quantification for fast computation of alternative splicing profiles MIT 2.3--py_2
Structural variant detection from haploid and diploid genome assemblies. SVIM-asm - Structural variant identification method (Assembly edition). SVIM-asm (pronounced SWIM-assem) is a structural variant caller for haploid or diploid genome-genome alignments. It analyzes a given sorted BAM file (preferably from minimap2) and detects five different variant classes between the query assembly and the reference: deletions, insertions, tandem and interspersed duplications and inversions. svim-asm 10.1101/2020.10.27.356907 GPL-3.0 1.0.3
SyRI is tool for finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genomic differences range from single nucleotide differences to complex structural variations. Current methods typically annotate sequence differences ranging from SNPs to large indels accurately but do not unravel the full complexity of structural rearrangements, including inversions, translocations, and duplications, where highly similar sequence changes in location, orientation, or copy number. Here, we present SyRI, a pairwise whole-genome comparison tool for chromosome-level assemblies. SyRI starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearranged regions. This distinction is important as rearranged regions are inherited differently compared to syntenic regions. syri SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies 1.6
Targetfinder.org provides a web based resource that allows users to find genes that have a similar expression to a query gene signature. targetfinder Targetfinder.org: A resource for systematic discovery of transcription factor target genes TargetFinder 1.7.0+galaxy1
tb_variant_filter TB Variant Filter 0.4.0+galaxy0
Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Genome Workbench but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. tbl2asn 20220427-linux6420230119-linux64 (D) 20220427-linux6420230119-linux64 (D)
A tool for drug resistance prediction from _M. tuberculosis_ genomic data (sequencing reads, alignments or variants). tbprofiler Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences GPL-3.0 TB-Profiler Profile 6.2.1+galaxy1
The COMBAT-TB Workbench is an IRIDA based, module workbench for M. tuberculosis bioinformatics. It is designed to be easily deployed on a single server. tbvcfreport 10.1101/2021.09.23.21263983 Apache-2.0 TB Variant Report 1.0.1+galaxy0
Prediction of cognitive impairment via deep learning trained with multi-center neuropsychological test data. An end-to-end open source machine learning platform. Announcing the TensorFlow Dev Summit 2020 Learn more. TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications tensorflow Prediction of cognitive impairment via deep learning trained with multi-center neuropsychological test data 2.0.0 2.1.0 2.3.0 2.4.1 2.6.0 2.8.0
lineage-level classification of transposable elements using conserved protein domains. Note: do not move or hard link TEsorter.py alone to anywhere else, as it rely on database/ and bin/. You can add the directory to PATH or soft link TEsorter.py to PATH tesorter 10.1101/800177 1.4.6
TEtranscripts TEtranscripts 2.2.3+galaxy0
Open-source, crossplatform tool that converts Thermo RAW files into open file formats such as MGF and to the HUPO-PSI standard file format mzML ThermoRawFileParser 10.1101/622852 Apache-2.0 Thermo 1.3.4+galaxy0
tmt-analyst TMT Analyst 0.11+galaxy0
Program that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie. A stable SAMtools version is now packaged with the program. tophat 2 publications tophat 2 tools 2.1.1
TotalView is a debugger for High Performance Computing applications. totalview Proprietary 2019.3.14 2020.1.13
Institute for Systems Biology "Trans-Proteomic Pipeline" tpp 4 publications 5 tools
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks. transdecoder TransDecoder 5.5.0+galaxy2 5.5.0--pl5321hdfd78af_5
A tool for the analysis of Tn-Seq data. It provides an easy to use graphical interface and access to three different analysis methods that allow the user to determine essentiality in a single condition as well as between conditions. transit TRANSIT - A Software Tool for Himar1 TnSeq Analysis 5 tools
transvar 2.4.0
treeshrink 1.3.9
trf
Tandem Repeats Finder. Find tandem repeats in DNA sequences without the need to specify either the pattern or pattern size. It uses the method of k-tuple matching to avoid the need for full scale alignment matrix computations. It requires no a priori knowledge of the pattern, pattern size or number of copies. There are no restrictions on the size of the repeats that can be detected. It determines a consensus pattern for the smallest repetitive unit in the tandem repeat. trf Tandem repeats finder: A program to analyze DNA sequences Other 4.09.1
trf-mod 4.10.0
A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries. trim_galore Trim Galore! 0.6.7+galaxy0
Tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment. trimal trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses 1.4.1
A flexible read trimming tool for Illumina NGS data trimmomatic RobiNA: A user-friendly, integrated software solution for RNA-Seq-based transcriptomics Trimmomatic Trimmomatic 0.36.6 0.39 0.39--hdfd78af_2 0.39-java-11
Trinity is a transcriptome assembler which relies on three different tools, inchworm an assembler, chrysalis which pools contigs and butterfly which amongst others compacts a graph resulting from butterfly with reads. trinity 2 publications 13 tools 2.9.1 2.12.0 2.13.2--ha140323_0 2.9.1-foss-2021a2.15.1--h6ab5fc9_2 2.9.1-foss-2021a2.15.1--h6ab5fc9_2
Comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. trinotate A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors Trinotate 3.2.2+galaxy0 3.2.2--pl5321hdfd78af_1
A program for improved detection of transfer RNA genes in genomic sequence. trnascan-se 2 publications tRNA prediction 0.4
Trycycler: consensus long-read assemblies for bacterial genomes trycycler Trycycler: consensus long-read assemblies for bacterial genomes GPL-3.0 5 tools
Utilities for handling sequences and assemblies from the UCSC Genome Browser project. ucsc Other 6 tools
Tools for handling Unique Molecular Identifiers in NGS data sets. umi_tools UMI-tools: Modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy MIT 5 tools
A tool for assembling bacterial genomes from a combination of short (2nd generation) and long (3rd generation) sequencing reads. unicycler Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads Unicycler GPL-3.0 Create assemblies with Unicycler 0.5.1+galaxy0
Metaproteomics data analysis with a focus on interactive data visualizations. unipept 2 publications MIT Unipept 4.5.1
The universal protein knowledgebase in 2021. You are using a version of browser that may not display all the features of this website. UniProt_Downloader 14 publications CC-BY-4.0 UniProt 2.4.0
The universal protein knowledgebase in 2021. You are using a version of browser that may not display all the features of this website. uniprot_rest_interface 14 publications CC-BY-4.0 UniProt 0.6
unzip Unzip 6.0+galaxy0 6.0-gcccore-10.3.06.0-gcccore-11.3.06.0-gcccore-12.3.0 (D) 6.0-gcccore-10.3.06.0-gcccore-11.3.06.0-gcccore-12.3.0 (D) 6.0-gcccore-10.3.06.0-gcccore-11.3.06.0-gcccore-12.3.0 (D)
Tool for predicting effects of variants for any genome in Ensembl or with genome annotation (via GFF). This includes vertebrates and also plants, fungi, protists, metazoa and bacteria. There is a web and a REST API version but the most powerful is the Perl script version. See McLaren et al., 2016, Genome Biology vep The Ensembl Variant Effect Predictor Apache-2.0 107-gcc-11.3.0
VarScan, an open source tool for variant detection that is compatible with several short read align-ers. varscan 2 publications VarScan 4 tools
API and command line utilities for the manipulation of VCF files. vcflib 10.1101/023754 MIT 23 tools 1.0.3-foss-2021a-r-4.1.0
Provide easily accessible methods for working with complex genetic variation data in the form of VCF files. vcftools The variant call format and VCFtools VCFTools GPL-3.0 2 tools 0.1.16 0.1.16--pl5321h9a82719_6 0.1.16-gcc-10.3.00.1.16-gcc-11.3.0 (D) 0.1.16-gcc-10.3.00.1.16-gcc-11.3.0 (D)
A de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454 or SOLiD. velvet 10.1101/gr.074492.107 Velvet GPL-3.0 3 tools 1.2.10 1.2.10--h7132678_5
verkko 1.1
visit 3.1.2
High-throughput search and clustering sequence analysis tool. It supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering and conversion. vsearch 10.7717/peerj.2584 vsearch GPL-3.0 8 tools
weblogo3 Sequence Logo 3.5.0
Software for phasing genomic variants using DNA sequencing reads, also called haplotype assembly. It is especially suitable for long reads, but works also well with short reads. whatshap 2 publications MIT 1.72.3 1.72.3
windowmasker identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. windowmasker 2 tools
Winnowmap is a long-read mapping algorithm optimized for mapping ONT and PacBio reads to repetitive reference sequences. Winnowmap development began on top of minimap2 codebase, and since then we have incorporated the following two ideas to improve mapping accuracy within repeats winnowmap 2 publications Not licensed 2.03
First fully open-source and collaborative online platform for computational metabolomics. It includes preprocessing, normalization, quality control, statistical analysis of LC/MS, FIA-MS, GC/MS and NMR data. workflow4metabolomics 2 publications 19 tools
Caenorhabditis elegans genome database. International consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes. Founded in 2000, the Consortium is led by Paul Sternberg of CalTech, Paul Kersey of the EBI, Matt Berriman of the Wellcome Trust Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research. wormbase WormBase 2014: New views of curated biology WormBase 1.0.1
xarray 2 tools
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. The packages enables imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files and preprocesses data for high-throughput, untargeted analyte profiling. xcms Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data GPL-2.0 10 tools
xml4ena
Detection of differential RNA modifications from direct RNA sequencing of human cell lines. Python package for detection of differential RNA modifications from direct RNA sequencing. xpore 10.1101/2020.06.18.160010 xpore MIT 2.1--pyh5e36f6f_0
Matches tandem mass spectra with peptide sequences. xtandem TANDEM: Matching proteins with tandem mass spectra xtandem 2 tools
YaHS is scaffolding tool using Hi-C data. It relies on a new algorithm for contig joining detection which considers the topological distribution of Hi-C signals aiming to distinguish real interaction signals from mapping noises. yahs 10.1101/2022.06.09.495093 MIT YAHS 1.2a.2+galaxy2 1.1
Search and retrieve S. cerevisiae data, populated by SGD and powered by InterMine yeastmine InterMine: A flexible data warehouse system for the integration and analysis of heterogeneous biological data LGPL-2.1 YeastMine 1.0.0
ZebrafishMine is powered by the InterMIne data warehouse system, and integrates biological data sets from multiple sources. It currently includes updates of data from ZFIN, the zebrafish model organism database. There is also data from the Panther database. zebrafishmine InterMine: A flexible data warehouse system for the integration and analysis of heterogeneous biological data LGPL-2.1 ZebrafishMine 1.0.0
  • The Tool identifier column (hidden by default) contains an identifier for the tool / workflow: typically the module name (used for matching to HPC lists).
  • The Topic(s) column categorises the tools by purpose, using an EDAM concept where possible.
  • More information about a tool can be found by following the bio.tools links.
  • When a tool has been containerised to allow for easier installation on any compute infrastructure, a link to the containerised software that can be downloaded from BioContainers is shown in the Containers available? (BioContainers) column.
  • The primary source material for the table is manually curated, and while we endeavour to keep the information as current as possible, there is a natural limit to the volume of information maintained here. Production of this information will be automated over time, and tools that are not relevant for bioinformatics analyses removed.