# bamCompare¶

bamCompare can be used to generate a bigWig or bedGraph file based on two BAM files that are compared to each other while being simultaneously normalized for sequencing depth.

If you are not familiar with BAM, bedGraph and bigWig formats, you can read up on that in our Glossary of NGS terms

This tool compares two BAM files based on the number of mapped reads. To compare the BAM files, the genome is partitioned into bins of equal size, then the number of reads found in each bin is counted per file, and finally a summary value is reported. This value can be the ratio of the number of reads per bin, the log2 of the ratio, or the difference. This tool can normalize the number of reads in each BAM file using the SES method proposed by Diaz et al. (2012) “Normalization, bias correction, and peak calling for ChIP-seq”. Statistical Applications in Genetics and Molecular Biology, 11(3). Normalization based on read counts is also available. The output is either a bedgraph or bigWig file containing the bin location and the resulting comparison value. By default, if reads are paired, the fragment length reported in the BAM file is used. Each mate, however, is treated independently to avoid a bias when a mixture of concordant and discordant pairs is present. This means that each end will be extended to match the fragment length.

usage:  bamCompare -b1 treatment.bam -b2 control.bam -o log2ratio.bw

Required arguments
 --bamfile1, -b1 Sorted BAM file 1. Usually the BAM file for the treatment. --bamfile2, -b2 Sorted BAM file 2. Usually the BAM file for the control.
Output
 --outFileName, -o Output file name. --outFileFormat=bigwig, -of=bigwig Output file type. Either “bigwig” or “bedgraph”. Possible choices: bigwig, bedgraph
Optional arguments
 --scaleFactorsMethod=readCount Method to use to scale the samples. Default “readCount”. Possible choices: readCount, SES --sampleLength=1000, -l=1000 *Only relevant when SES is chosen for the scaleFactorsMethod.* To compute the SES, specify the length (in bases) of the regions (see –numberOfSamples) that will be randomly sampled to calculate the scaling factors. If you do not have a good sequencing depth for your samples consider increasing the sampling regions’ size to minimize the probability that zero-coverage regions are used. --numberOfSamples=100000.0, -n=100000.0 *Only relevant when SES is chosen for the scaleFactorsMethod.* Number of samplings taken from the genome to compute the scaling factors. --scaleFactors Set this parameter manually to avoid the computation of scaleFactors. The format is scaleFactor1:scaleFactor2.For example, –scaleFactor 0.7:1 will cause the first BAM file tobe multiplied by 0.7, while not scaling the second BAM file (multiplication with 1). --ratio=log2 The default is to output the log2ratio of the two samples. The reciprocal ratio returns the the negative of the inverse of the ratio if the ratio is less than 0. The resulting values are interpreted as negative fold changes. *NOTE*: Only with –ratio subtract can –normalizeTo1x or –normalizeUsingRPKM be used. Possible choices: log2, ratio, subtract, add, reciprocal_ratio --pseudocount=1 small number to avoid x/0. Only useful together with –ratio log2 or –ratio ratio . --version show program’s version number and exit --binSize=50, -bs=50 Size of the bins, in bases, for the output of the bigwig/bedgraph file. --region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example –region chr10 or –region chr10:456700:891000. --numberOfProcessors=max/2, -p=max/2 Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors. --verbose=False, -v=False Set to see processing messages.
If you know that your files will be strongly affected by the kind of filtering you would like to apply (e.g., removal of duplicates with --ignoreDuplicates or ignoring reads of low quality) then consider removing those reads beforehand.