multiBamCoverage

multiBamCoverage computes the read coverages for genomic regions for two or more BAM files. The analysis can be performed for the entire genome by running the program in ‘bins’ mode. If you want to count the read coverage for specific regions only, use the ‘BED-file’ mode instead. The standard output of bamCorrelate is a compressed numpy array. It can be directly used to calculate and visualize pairwise correlation values between the read coverages using the tool ‘plotCorrelation’. Similarly, ‘plotPCA’ can be used for principal component analysis of the read coverages using the .npz file.

A detailed sub-commands help is available by typing:

multiBamCoverage bins -h

multiBamCoverage BED-file -h

usage: multiBamCoverage [-h] [--version]  ...
optional arguments
--version show program’s version number and exit
commands

subcommands

Possible choices: bins, BED-file

Sub-commands:
bins

The coverage calculation is done for consecutive bins of equal size (10 kilobases by default). This mode is useful to assess the genome-wide similarity of BAM files. The bin size and distance between bins can be adjusted.

usage: multiBamCoverage bins --bamfiles file1.bam file2.bam -out results.npz
Required arguments
--bamfiles, -b List of indexed bam files separated by spaces.
--outFileName, -out
 File name to save the coverage matrix. This matrix can be subsequently plotted using plotCorrelation or or plotPCA.
Optional arguments
--labels, -l User defined labels instead of default labels from file names. Multiple labels have to be separated by a space, e.g. –labels sample1 sample2 sample3
--binSize=10000, -bs=10000
 Length in bases of the window used to sample the genome.
--distanceBetweenBins=0, -n=0
 By default, multiBamCoverage considers consecutive bins of the specified –binSize. However, to reduce the computation time, a larger distance between bins can by given. Larger distances result in fewer bins considered.
--version show program’s version number and exit
--region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example –region chr10 or –region chr10:456700:891000.
--numberOfProcessors=max/2, -p=max/2
 Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors.
--verbose=False, -v=False
 Set to see processing messages.
Output optional options
--outRawCounts Save the counts per region to a tab-delimited file.
Read processing options
--extendReads=False, -e=False
 This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception. *NOTE*: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions. *Single-end*: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended. *Paired-end*: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like singe-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads).
--ignoreDuplicates=False
 If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate’s position also has to coincide to ignore a read.
--minMappingQuality
 If set, only reads that have a mapping quality score of at least this are considered.
--centerReads=False
 By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions.
--samFlagInclude
 Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage.
--samFlagExclude
 Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use –samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand.
BED-file

The user provides a BED file that contains all regions that should be considered for the coverage analysis. A common use is to compare ChIP-seq coverages between two different samples for a set of peak regions.

usage: multiBamCoverage BED-file --BED selection.bed --bamfiles file1.bam file2.bam -out results.npz
Required arguments
--bamfiles, -b List of indexed bam files separated by spaces.
--outFileName, -out
 File name to save the coverage matrix. This matrix can be subsequently plotted using plotCorrelation or or plotPCA.
--BED Limits the coverage analysis to the regions specified in this file.
Optional arguments
--labels, -l User defined labels instead of default labels from file names. Multiple labels have to be separated by a space, e.g. –labels sample1 sample2 sample3
--version show program’s version number and exit
--region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example –region chr10 or –region chr10:456700:891000.
--numberOfProcessors=max/2, -p=max/2
 Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors.
--verbose=False, -v=False
 Set to see processing messages.
Output optional options
--outRawCounts Save the counts per region to a tab-delimited file.
Read processing options
--extendReads=False, -e=False
 This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception. *NOTE*: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions. *Single-end*: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended. *Paired-end*: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like singe-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads).
--ignoreDuplicates=False
 If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate’s position also has to coincide to ignore a read.
--minMappingQuality
 If set, only reads that have a mapping quality score of at least this are considered.
--centerReads=False
 By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions.
--samFlagInclude
 Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage.
--samFlagExclude
 Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use –samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand.

example usages: multiBamCoverage bins –bamfiles file1.bam file2.bam -out results.npz

multiBamCoverage BED-file –BED selection.bed –bamfiles file1.bam file2.bam -out results.npz

Example

The default output of multiBamCoverage (a compressed numpy array: *.npz) can be visualized using plotCorrelation or plotPCA.

The optional output (--outRawCounts) is a simple tab-delimited file that can be used with any other program. The first three columns define the region of the genome for which the reads were summarized.

$ deepTools2.0/bin/multiBamCoverage bins \
 --bamfiles testFiles/*bam \ # using all BAM files in the folder
 --minMappingQuality 30 \
 --region 19 \ # limiting the binning of the genome to chromosome 19
 --labels H3K27me3 H3K4me1 H3K4me3 HeK9me3 input \
 -out readCounts.npz --outRawCounts readCounts.tab

 $ head readCounts.tab
 #'chr'     'start' 'end'   'H3K27me3'      'H3K4me1'       'H3K4me3'       'HeK9me3'       'input'
 19 10000   20000   0.0     0.0     0.0     0.0     0.0
 19 20000   30000   0.0     0.0     0.0     0.0     0.0
 19 30000   40000   0.0     0.0     0.0     0.0     0.0
 19 40000   50000   0.0     0.0     0.0     0.0     0.0
 19 50000   60000   0.0     0.0     0.0     0.0     0.0
 19 60000   70000   1.0     1.0     0.0     0.0     1.0
 19 70000   80000   0.0     1.0     7.0     0.0     1.0
 19 80000   90000   15.0    0.0     0.0     6.0     4.0
 19 90000   100000  73.0    7.0     4.0     16.0    5.0