# multiBamSummary¶

multiBamSummary computes the read coverages for genomic regions for typically two or more BAM files. The analysis can be performed for the entire genome by running the program in ‘bins’ mode. If you want to count the read coverage for specific regions only, use the BED-file mode instead. The standard output of multiBamSummary is a compressed numpy array (.npz). It can be directly used to calculate and visualize pairwise correlation values between the read coverages using the tool ‘plotCorrelation’. Similarly, plotPCA can be used for principal component analysis of the read coverages using the .npz file. Note that using a single bigWig file is only recommended if you want to produce a bedGraph file (i.e., with the --outRawCounts option; the default output file cannot be used by ANY deepTools program if only a single file was supplied!).

A detailed sub-commands help is available by typing:

multiBamSummary bins -h

multiBamSummary BED-file -h

usage: multiBamSummary [-h] [--version]  ...


## Named Arguments¶

 --version show program’s version number and exit

## commands¶

subcommands

 Possible choices: bins, BED-file subcommands

## Sub-commands:¶

### bins¶

The coverage calculation is done for consecutive bins of equal size (10 kilobases by default). This mode is useful to assess the genome-wide similarity of BAM files. The bin size and distance between bins can be adjusted.

multiBamSummary bins --bamfiles file1.bam file2.bam -o results.npz


#### Required arguments¶

 --bamfiles, -b List of indexed bam files separated by spaces. --outFileName, -out, -o File name to save the coverage matrix. This matrix can be subsequently plotted using plotCorrelation or or plotPCA.

#### Optional arguments¶

 --labels, -l User defined labels instead of default labels from file names. Multiple labels have to be separated by a space, e.g. –labels sample1 sample2 sample3 --smartLabels Instead of manually specifying labels for the input BAM files, this causes deepTools to use the file name after removing the path and extension. --genomeChunkSize Manually specify the size of the genome provided to each processor. The default value of None specifies that this is determined by read density of the BAM file. --binSize, -bs Length in bases of the window used to sample the genome. (Default: 10000) --distanceBetweenBins, -n By default, multiBamSummary considers consecutive bins of the specified –binSize. However, to reduce the computation time, a larger distance between bins can by given. Larger distances result in fewer bins considered. (Default: 0) --version show program’s version number and exit --region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example –region chr10 or –region chr10:456700:891000. --blackListFileName, -bl A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant. --numberOfProcessors, -p Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors. (Default: 1) --verbose, -v Set to see processing messages.

#### Output optional options¶

 --outRawCounts Save the counts per region to a tab-delimited file. --scalingFactors Compute scaling factors (in the DESeq2 manner) compatible for use with bamCoverage and write them to a file. The file has tab-separated columns “sample” and “scalingFactor”.

### BED-file¶

The user provides a BED file that contains all regions that should be considered for the coverage analysis. A common use is to compare ChIP-seq coverages between two different samples for a set of peak regions.

multiBamSummary BED-file --BED selection.bed --bamfiles file1.bam file2.bam -o results.npz


#### Required arguments¶

 --bamfiles, -b List of indexed bam files separated by spaces. --outFileName, -out, -o File name to save the coverage matrix. This matrix can be subsequently plotted using plotCorrelation or or plotPCA. --BED Limits the coverage analysis to the regions specified in these files.

#### Optional arguments¶

 --labels, -l User defined labels instead of default labels from file names. Multiple labels have to be separated by a space, e.g. –labels sample1 sample2 sample3 --smartLabels Instead of manually specifying labels for the input BAM files, this causes deepTools to use the file name after removing the path and extension. --genomeChunkSize Manually specify the size of the genome provided to each processor. The default value of None specifies that this is determined by read density of the BAM file. --version show program’s version number and exit --region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example –region chr10 or –region chr10:456700:891000. --blackListFileName, -bl A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant. --numberOfProcessors, -p Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors. (Default: 1) --verbose, -v Set to see processing messages.

#### Output optional options¶

 --outRawCounts Save the counts per region to a tab-delimited file. --scalingFactors Compute scaling factors (in the DESeq2 manner) compatible for use with bamCoverage and write them to a file. The file has tab-separated columns “sample” and “scalingFactor”.

#### GTF/BED12 options¶

 --metagene When either a BED12 or GTF file are used to provide regions, perform the computation on the merged exons, rather than using the genomic interval defined by the 5-prime and 3-prime most transcript bound (i.e., columns 2 and 3 of a BED file). If a BED3 or BED6 file is used as input, then columns 2 and 3 are used as an exon. (Default: False) --transcriptID When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as transcripts. (Default: “transcript”) --exonID When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as exons. CDS would be another common value for this. (Default: “exon”) --transcript_id_designator Each region has an ID (e.g., ACTB) assigned to it, which for BED files is either column 4 (if it exists) or the interval bounds. For GTF files this is instead stored in the last column as a key:value pair (e.g., as ‘transcript_id “ACTB”’, for a key of transcript_id and a value of ACTB). In some cases it can be convenient to use a different identifier. To do so, set this to the desired key. (Default: “transcript_id”)

example usages: multiBamSummary bins –bamfiles file1.bam file2.bam -o results.npz

multiBamSummary BED-file –BED selection.bed –bamfiles file1.bam file2.bam -o results.npz

## Example¶

The default output of multiBamSummary (a compressed numpy array: *.npz) can be visualized using plotCorrelation or plotPCA.

The optional output (--outRawCounts) is a simple tab-delimited file that can be used with any other program. The first three columns define the region of the genome for which the reads were summarized.

$deepTools2.0/bin/multiBamSummary bins \ --bamfiles testFiles/*bam \ # using all BAM files in the folder --minMappingQuality 30 \ --region 19 \ # limiting the binning of the genome to chromosome 19 --labels H3K27me3 H3K4me1 H3K4me3 HeK9me3 input \ -out readCounts.npz --outRawCounts readCounts.tab$ head readCounts.tab
#'chr'     'start' 'end'   'H3K27me3'      'H3K4me1'       'H3K4me3'       'HeK9me3'       'input'
19 10000   20000   0.0     0.0     0.0     0.0     0.0
19 20000   30000   0.0     0.0     0.0     0.0     0.0
19 30000   40000   0.0     0.0     0.0     0.0     0.0
19 40000   50000   0.0     0.0     0.0     0.0     0.0
19 50000   60000   0.0     0.0     0.0     0.0     0.0
19 60000   70000   1.0     1.0     0.0     0.0     1.0
19 70000   80000   0.0     1.0     7.0     0.0     1.0
19 80000   90000   15.0    0.0     0.0     6.0     4.0
19 90000   100000  73.0    7.0     4.0     16.0    5.0