Changes in deepTools2.0

Major changes

Note

The major changes encompass features for increased efficiency, new sequencing data types, and additional plots, particularly for QC.

Moreover, deepTools modules can now be used by other python programs. The deepTools API example is part of the new documentation.

Accommodating additional data types

  • correlation and comparisons can now be calculated for bigWig files (in addition to BAM files) using multiBigwigSummary and bigwigCompare
  • RNA-seq: split-reads are now natively supported
  • MNase-seq: using the new option --MNase in bamCoverage, one can now compute read coverage only taking the 2 central base pairs of each mapped fragment into account.

Structural updates

  • All modules have comprehensive and automatic tests that evaluate proper functioning after any modification of the code.
  • Virtualization for stability: we now provide a docker image and enable the easy deployment of deepTools via the Galaxy toolshed.
  • Our documentation is now version-aware thanks to readthedocs and sphinx.
  • The API is public and documented.

Renamed tools

Increased efficiency

  • We dramatically improved the speed of bigwig related tools (multiBigwigSummary and computeMatrix) by using the new pyBigWig module.
  • It is now possible to generate one composite heatmap and/or meta-gene image based on multiple bigwig files in one go (see computeMatrix, plotHeatmap, and plotProfile for examples)
  • computeMatrix now also accepts multiple input BED files. Each is treated as a group within a sample and is plotted independently.
  • We added additional filtering options for handling BAM files, decreasing the need for prior filtering using tools other than deepTools: The --samFlagInclude and --samFlagExclude parameters can, for example, be used to only include (or exclude) forward reads in an analysis.
  • We separated the generation of read count tables from the calculation of pairwise correlations that was previously handled by bamCorrelate. Now, read counts are calculated first using multiBamSummary or multiBigWigCoverage and the resulting output file can be used for calculating and plotting pairwise correlations using plotCorrelation or for doing a principal component analysis using plotPCA.

New features and tools

  • Correlation analyses are no longer limited to BAM files – bigwig files are possible, too! (see multiBigwigSummary)
  • Correlation coefficients can now be computed even if the data contains NaNs.
  • Added new quality control tools:
    • use plotCoverage to plot the coverage over base pairs
    • use plotPCA for principal component analysis
    • bamPEFragmentSize can be used to calculate the average fragment size for paired-end read data
  • Added the possibility for hierarchical clustering, besides k-means to plotProfile and plotHeatmap
  • plotProfile has many more options to make compelling summary plots

Minor changes

Changed parameters names and settings

  • computeMatrix can now read files with DOS newline characters.
  • --missingDataAsZero was renamed to --skipNonCoveredRegions for clarity in bamCoverage and bamCompare.
  • Read extension was made optional and we removed the need to specify a default fragment length for most of the tools: --fragmentLength was thus replaced by the new optional parameter --extendReads.
  • Added option --skipChromosomes to multiBigwigSummary, which can be used to, for example, skip all ‘random’ chromosomes.
  • Added the option for adding titles to QC plots.

Bug fixes

  • Resolved an error introduced by numpy version 1.10 in computeMatrix.
  • Improved plotting features for plotProfile when using as plot type: ‘overlapped_lines’ and ‘heatmap’
  • Fixed problem with BED intervals in multiBigwigSummary and multiBamSummary that returned wrongly labeled raw counts.
  • multiBigwigSummary now also considers chromosomes as identical when the names between samples differ by ‘chr’ prefix, e.g. chr1 vs. 1.
  • Fixed problem with wrongly labeled proper read pairs in a BAM file. We now have additional checks to determine if a read pair is a proper pair: the reads must face each other and are not allowed to be farther apart than 4x the mean fragment length.
  • For bamCoverage and bamCompare, the behavior of scaleFactor was updated such that now, if given in combination with the normalization options (--normalizeTo1x or --normalizeUsingRPKM), the given scaling factor will be multiplied with the factor computed by the respective normalization method.
deepTools Galaxy. code @ github.