deeptools package modules

deeptools.SES_scaleFactor module

class deeptools.SES_scaleFactor.Tester[source]

Bases: object

deeptools.SES_scaleFactor.estimateScaleFactor(bamFilesList, binLength, numberOfSamples, normalizationLength, avg_method='median', blackListFileName=None, numberOfProcessors=1, verbose=False, chrsToSkip=[], mappingStatsList=[])[source]

Subdivides the genome into chunks to be analyzed in parallel using several processors. The code handles the creation of workers that compute fragment counts (coverage) for different regions and then collect and integrates the results.

bamFilesList : list
list of bam files to normalize
binLength : int
the window size in bp, where reads are going to be counted.
numberOfSamples : int
number of sites to sample from the genome. For more info see the documentation of the CountReadsPerBin class
normalizationLength : int
length, in bp, to normalize the data. For a value of 1, on average 1 read per base pair is found
avg_method : str
defines how the different values are to be summarized. The options are ‘mean’ and ‘median’
chrsToSkip : list
name of the chromosomes to be excluded from the scale estimation. Usually the chrX is included.
blackListFileName : str
BED file containing blacklisted regions
mappingStatsList : list
List of the number of mapped reads per file
dict
Dictionary with the following keys::
‘size_factors’ ‘size_factors_based_on_mapped_reads’ ‘size_factors_SES’ ‘size_factors_based_on_mean’ ‘size_factors_based_on_median’ ‘mean’ ‘meanSES’ ‘median’ ‘reads_per_bin’ ‘std’ ‘sites_sampled’
>>> test = Tester()
>>> bin_length = 50
>>> num_samples = 4
>>> _dict = estimateScaleFactor([test.bamFile1, test.bamFile2], bin_length, num_samples,  1)
>>> _dict['size_factors']
array([ 1. ,  0.5])
>>> _dict['size_factors_based_on_mean']
array([ 1. ,  0.5])

deeptools.bamHandler module

deeptools.bamHandler.countReadsInInterval(args)[source]
deeptools.bamHandler.getMappingStats(bam, nThreads)[source]

This is used for CRAM files, since idxstats() and .mapped/.unmapped are meaningless

This requires pysam > 0.13.0

deeptools.bamHandler.openBam(bamFile, returnStats=False, nThreads=1, minimalDecoding=True)[source]

A wrapper for opening BAM/CRAM files.

bamFile: str
A BAM/CRAM file name
returnStats: bool
Return a tuple of (file_handle, nMappedReads, nUnmappedReads, statsDict). These additional values are needed by some downstream functions, since one can’t use file_handle.mapped on CRAM files (or idxstats())
nThreads: int
If returnStats is True, number of threads to use for computing statistics
minimalDecoding: Bool
For CRAM files, don’t decode the read name, sequence, qual, or auxiliary tag fields (these aren’t used by most functions).

Returns either the file handle or a tuple as described in returnStats

deeptools.correlation module

deeptools.correlation_heatmap module

deeptools.correlation_heatmap.plot_correlation(corr_matrix, labels, plotFileName, vmax=None, vmin=None, colormap='jet', image_format=None, plot_numbers=False, plot_title='')[source]

deeptools.countReadsPerBin module

class deeptools.countReadsPerBin.CountReadsPerBin(bamFilesList, binLength=50, numberOfSamples=None, numberOfProcessors=1, verbose=False, region=None, bedFile=None, extendReads=False, blackListFileName=None, minMappingQuality=None, ignoreDuplicates=False, chrsToSkip=[], stepSize=None, center_read=False, samFlag_include=None, samFlag_exclude=None, zerosToNans=False, smoothLength=0, minFragmentLength=0, maxFragmentLength=0, out_file_for_raw_data=None, statsList=[], mappedList=[])[source]

Bases: object

Collects coverage over multiple bam files using multiprocessing

This function collects read counts (coverage) from several bam files and returns an numpy array with the results. This class uses multiprocessing to compute the coverage.

bamFilesList : list
List containing the names of indexed bam files. E.g. [‘file1.bam’, ‘file2.bam’]
binLength : int
Length of the window/bin. This value is overruled by bedFile if present.
numberOfSamples : int
Total number of samples. The genome is divided into numberOfSamples, each with a window/bin length equal to binLength. This value is overruled by stepSize in case such value is present and by bedFile in which case the number of samples and bins are defined in the bed file
numberOfProcessors : int
Number of processors to use. Default is 4
verbose : bool
Output messages. Default: False
region : str
Region to limit the computation in the form chrom:start:end.
bedFile : list of file_handles.
Each file handle corresponds to a bed file containing the regions for which to compute the coverage. This option overrules binLength, numberOfSamples and stepSize.
blackListFileName : str
A string containing a BED file with blacklist regions.

extendReads : bool, int

Whether coverage should be computed for the extended read length (i.e. the region covered by the two mates or the regions expected to be covered by single-reads). If the value is ‘int’, then then this is interpreted as the fragment length to extend reads that are not paired. For Illumina reads, usual values are around 300. This value can be determined using the peak caller MACS2 or can be approximated by the fragment lengths computed when preparing the library for sequencing. If the value is of the variable is true and not value is given, the fragment size is sampled from the library but only if the library is paired-end. Default: False
minMappingQuality : int
Reads of a mapping quality less than the give value are not considered. Default: None
ignoreDuplicates : bool
Whether read duplicates (same start, end position. If paired-end, same start-end for mates) are to be excluded. Default: false
chrToSkip: list
List with names of chromosomes that do not want to be included in the coverage computation. This is useful to remove unwanted chromosomes (e.g. ‘random’ or ‘Het’).
stepSize : int
the positions for which the coverage is computed are defined as follows: range(start, end, stepSize). Thus, a stepSize of 1, will compute the coverage at each base pair. If the stepSize is equal to the binLength then the coverage is computed for consecutive bins. If seepSize is smaller than the binLength, then teh bins will overlap.
center_read : bool
Determines if reads should be centered with respect to the fragment length.
samFlag_include : int
Extracts only those reads having the SAM flag. For example, to get only reads that are the first mates a samFlag of 64 could be used. Similarly, the samFlag_include can be used to select only reads mapping on the reverse strand or to get only properly paired reads.
samFlag_exclude : int
Removes reads that match the SAM flag. For example to get all reads that map to the forward strand a samFlag_exlude 16 should be used. Which translates into exclude all reads that map to the reverse strand.
zerosToNans : bool
If true, zero values encountered are transformed to Nans. Default false.
minFragmentLength : int
If greater than 0, fragments below this size are excluded.
maxFragmentLength : int
If greater than 0, fragments above this size are excluded.
out_file_for_raw_data : str
File name to save the raw counts computed
statsList : list
For each BAM file in bamFilesList, the associated per-chromosome statistics returned by openBam
mappedList : list
For each BAM file in bamFilesList, the number of mapped reads in the file.

numpy array

Each row correspond to each bin/bed region and each column correspond to each of the bamFiles.

The test data contains reads for 200 bp.

>>> test = Tester()

The transpose function is used to get a nicer looking output. The first line corresponds to the number of reads per bin in bam file 1

>>> c = CountReadsPerBin([test.bamFile1, test.bamFile2], 50, 4)
>>> np.transpose(c.run())
array([[ 0.,  0.,  1.,  1.],
       [ 0.,  1.,  1.,  2.]])
count_reads_in_region(chrom, start, end, bed_regions_list=None)[source]

Counts the reads in each bam file at each ‘stepSize’ position within the interval (start, end) for a window or bin of size binLength.

The stepSize controls the distance between bins. For example, a step size of 20 and a bin size of 20 will create bins next to each other. If the step size is smaller than the bin size the bins will overlap.

If a list of bedRegions is given, then the number of reads that overlaps with each region is counted.

chrom : str
Chrom name
start : int
start coordinate
end : int
end coordinate
bed_regions_list: list
List of list of tuples of the form (start, end) corresponding to bed regions to be processed. If not bed file was passed to the object constructor then this list is empty.
numpy array
The result is a numpy array that as rows each bin and as columns each bam file.

Initialize some useful values

>>> test = Tester()
>>> c = CountReadsPerBin([test.bamFile1, test.bamFile2], 25, 0, stepSize=50)

The transpose is used to get better looking numbers. The first line corresponds to the number of reads per bin in the first bamfile.

>>> _array, __ = c.count_reads_in_region(test.chrom, 0, 200)
>>> _array
array([[ 0.,  0.],
       [ 0.,  1.],
       [ 1.,  1.],
       [ 1.,  2.]])
getReadLength(read)[source]
getSmoothRange(tileIndex, tileSize, smoothRange, maxPosition)[source]

Given a tile index position and a tile size (length), return the a new indices over a larger range, called the smoothRange. This region is centered in the tileIndex an spans on both sizes to cover the smoothRange. The smoothRange is trimmed in case it is less than zero or greater than maxPosition

---------------|==================|------------------
           tileStart
      |--------------------------------------|
      |    <--      smoothRange     -->      |
      |
tileStart - (smoothRange-tileSize)/2

Test for a smooth range that spans 3 tiles.

>>> c = CountReadsPerBin([], 1, 1, 1, 0)
>>> c.getSmoothRange(5, 1, 3, 10)
(4, 7)

Test smooth range truncated on start.

>>> c.getSmoothRange(0, 10, 30, 200)
(0, 2)

Test smooth range truncated on start.

>>> c.getSmoothRange(1, 10, 30, 4)
(0, 3)

Test smooth range truncated on end.

>>> c.getSmoothRange(5, 1, 3, 5)
(4, 5)

Test smooth range not multiple of tileSize.

>>> c.getSmoothRange(5, 10, 24, 10)
(4, 6)
get_chunk_length(bamFilesHandles, genomeSize, chromSizes, chrLengths)[source]
get_coverage_of_region(bamHandle, chrom, regions, fragmentFromRead_func=None)[source]

Returns a numpy array that corresponds to the number of reads that overlap with each tile.

>>> test = Tester()
>>> import pysam
>>> c = CountReadsPerBin([], stepSize=1, extendReads=300)

For this case the reads are length 36. The number of overlapping read fragments is 4 and 5 for the positions tested.

>>> c.get_coverage_of_region(pysam.AlignmentFile(test.bamFile_PE), 'chr2',
... [(5000833, 5000834), (5000834, 5000835)])
array([ 4.,  5.])

In the following example a paired read is extended to the fragment length which is 100 The first mate starts at 5000000 and the second at 5000064. Each mate is extended to the fragment length independently At position 500090-500100 one fragment of length 100 overlap, and after position 5000101 there should be zero reads.

>>> c.zerosToNans = True
>>> c.get_coverage_of_region(pysam.AlignmentFile(test.bamFile_PE), 'chr2',
... [(5000090, 5000100), (5000100, 5000110)])
array([  1.,  nan])

In the following case the reads length is 50. Reads are not extended.

>>> c.extendReads=False
>>> c.get_coverage_of_region(pysam.AlignmentFile(test.bamFile2), '3R', [(148, 150), (150, 152), (152, 154)])
array([ 1.,  2.,  2.])
get_fragment_from_read(read)[source]

Get read start and end position of a read. If given, the reads are extended as follows: If reads are paired end, each read mate is extended to match the fragment length, otherwise, a default fragment length is used. If reads are split (give by the CIGAR string) then the multiple positions of the read are returned. When reads are extended the cigar information is skipped.

read: pysam object.

The following values are defined (for forward reads):

     |--          -- read.tlen --              --|
     |-- read.alen --|
-----|===============>------------<==============|----
     |               |            |
read.reference_start
            read.reference_end  read.pnext

  and for reverse reads


     |--             -- read.tlen --           --|
                                 |-- read.alen --|
-----|===============>-----------<===============|----
     |                           |               |
  read.pnext           read.reference_start  read.reference_end

this is a sketch of a pair-end reads

The function returns the fragment start and end, either using the paired end information (if available) or extending the read in the appropriate direction if this is single-end.

read : pysam read object

list of tuples
[(fragment start, fragment end)]
>>> test = Tester()
>>> c = CountReadsPerBin([], 1, 1, 200, extendReads=True)
>>> c.defaultFragmentLength=100
>>> c.get_fragment_from_read(test.getRead("paired-forward"))
[(5000000, 5000100)]
>>> c.get_fragment_from_read(test.getRead("paired-reverse"))
[(5000000, 5000100)]
>>> c.defaultFragmentLength = 200
>>> c.get_fragment_from_read(test.getRead("single-forward"))
[(5001491, 5001691)]
>>> c.get_fragment_from_read(test.getRead("single-reverse"))
[(5001536, 5001736)]
>>> c.defaultFragmentLength = 'read length'
>>> c.get_fragment_from_read(test.getRead("single-forward"))
[(5001491, 5001527)]
>>> c.defaultFragmentLength = 'read length'
>>> c.extendReads = False
>>> c.get_fragment_from_read(test.getRead("paired-forward"))
[(5000000, 5000036)]

Tests for read centering.

>>> c = CountReadsPerBin([], 1, 1, 200, extendReads=True, center_read=True)
>>> c.defaultFragmentLength = 100
>>> assert(c.get_fragment_from_read(test.getRead("paired-forward")) == [(5000032, 5000068)])
>>> c.defaultFragmentLength = 200
>>> assert(c.get_fragment_from_read(test.getRead("single-reverse")) == [(5001618, 5001654)])
static is_proper_pair(read, maxPairedFragmentLength)[source]

Checks if a read is proper pair meaning that both mates are facing each other and are in the same chromosome and are not to far away. The sam flag for proper pair can not always be trusted. Note that if the fragment size is > maxPairedFragmentLength (~2kb usually) that False will be returned. :return: bool

>>> import pysam
>>> import os
>>> from deeptools.countReadsPerBin import CountReadsPerBin as cr
>>> root = os.path.dirname(os.path.abspath(__file__)) + "/test/test_data/"
>>> bam = pysam.AlignmentFile("{}/test_proper_pair_filtering.bam".format(root))
>>> iter = bam.fetch()
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "keep" read
True
>>> cr.is_proper_pair(read, 200) # "keep" read, but maxPairedFragmentLength is too short
False
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "improper pair"
False
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "mismatch chr"
False
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "same orientation1"
False
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "same orientation2"
False
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "rev first"
False
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "rev first OK"
True
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "for first"
False
>>> read = next(iter)
>>> cr.is_proper_pair(read, 1000) # "for first"
True
run(allArgs=None)[source]
class deeptools.countReadsPerBin.Tester[source]

Bases: object

getRead(readType)[source]

prepare arguments for test

deeptools.countReadsPerBin.countReadsInRegions_wrapper(args)[source]

Passes the arguments to countReadsInRegions_worker. This is a step required given the constrains from the multiprocessing module. The args var, contains as first element the ‘self’ value from the countReadsPerBin object

deeptools.countReadsPerBin.remove_row_of_zeros(matrix)[source]

deeptools.deepBlue

class deeptools.deepBlue.deepBlue(sample, url='http://deepblue.mpi-inf.mpg.de/xmlrpc', userKey='anonymous_key')[source]

Bases: object

chroms(chrom=None)[source]

Like the chroms() function in pyBigWig, returns either chromsDict (chrom is None) or the length of a given chromosome

close()[source]
getChroms()[source]

Determines and sets the chromosome names/sizes for a given sample. On error, this raises a runtime exception.

self.chroms is then a dictionary of chromosome:length pairs

getEID()[source]

Given a sample name, return its associated experiment ID (or None on error).

self.experimentID is then the internal ID (e.g., e52525)

getGenome()[source]

Determines and sets the genome assigned to a given sample. On error, this raises a runtime exception.

self.genome is then the internal genome ID.

preload(regions, tmpDir=None)[source]

Given a sample and a set of regions, write a bigWig file containing the underlying signal.

This function returns the file name, which needs to be deleted by the calling function at some point.

This sends queries one chromosome at a time, due to memory limits on deepBlue

deeptools.deepBlue.isDeepBlue(fname)[source]

Returns true if the file ends in .wig, .wiggle, or .bedgraph, since these indicate a file on the deepBlue server

deeptools.deepBlue.makeChromTiles(db)[source]

Make a region for each chromosome

deeptools.deepBlue.makeRegions(BED, args)[source]

Given a list of BED/GTF files, make a list of regions. These are vaguely extended as appropriate. For simplicity, the maximum of –beforeRegionStartLength and –afterRegionStartLength are tacked on to each end and transcripts are used for GTF files.

deeptools.deepBlue.makeTiles(db, args)[source]

Given a deepBlue object, return a list of regions that will be queried

deeptools.deepBlue.mergeRegions(regions)[source]

Given a list of [(chrom, start, end), …], merge all overlapping regions

This returns a dict, where values are sorted lists of [start, end].

deeptools.deepBlue.preloadWrapper(foo)[source]

This is a wrapper around the preload function for multiprocessing

deeptools.getFragmentAndReadSize module

deeptools.getFragmentAndReadSize.getFragmentLength_worker(chrom, start, end, bamFile, distanceBetweenBins)[source]

Queries the reads at the given region for the distance between reads and the read length

chrom : str
chromosome name
start : int
region start
end : int
region end
bamFile : str
BAM file name
distanceBetweenBins : int
the number of bases at the end of each bin to ignore
np.array
an np.array, where first column is fragment length, the second is for read length
deeptools.getFragmentAndReadSize.getFragmentLength_wrapper(args)[source]
deeptools.getFragmentAndReadSize.get_read_and_fragment_length(bamFile, return_lengths=False, blackListFileName=None, binSize=50000, distanceBetweenBins=1000000, numberOfProcessors=None, verbose=False)[source]

Estimates the fragment length and read length through sampling

bamFile : str
BAM file name

return_lengths : bool numberOfProcessors : int verbose : bool binSize : int distanceBetweenBins : int

d : dict
tuple of two dictionaries, one for the fragment length and the other for the read length. The dictionaries summarise the mean, median etc. values

deeptools.getRatio module

deeptools.getRatio.compute_ratio(value1, value2, args)[source]
deeptools.getRatio.getRatio(tileCoverage, args)[source]

The mapreduce method calls this function for each tile. The parameters (args) are fixed in the main method.

>>> funcArgs= {'valueType': 'ratio', 'scaleFactors': (1,1), 'pseudocount': 1}
>>> getRatio([9, 19], funcArgs)
0.5
>>> getRatio([0, 0], funcArgs)
1.0
>>> getRatio([np.nan, np.nan], funcArgs)
nan
>>> getRatio([np.nan, 1.0], funcArgs)
nan
>>> funcArgs['valueType'] ='subtract'
>>> getRatio([20, 10], funcArgs)
10
>>> funcArgs['scaleFactors'] = (1, 0.5)
>>> getRatio([10, 20], funcArgs)
0.0

The reciprocal ratio is of a and b is: is a/b if a/b > 1 else -1* b/a >>> funcArgs[‘valueType’] =’reciprocal_ratio’ >>> funcArgs[‘scaleFactors’] = (1, 1) >>> funcArgs[‘pseudocount’] = 0 >>> getRatio([2, 1], funcArgs) 2.0 >>> getRatio([1, 2], funcArgs) -2.0 >>> getRatio([1, 1], funcArgs) 1.0

deeptools.getScorePerBigWigBin module

class deeptools.getScorePerBigWigBin.Tester[source]

Bases: object

deeptools.getScorePerBigWigBin.countFragmentsInRegions_worker(chrom, start, end, bigWigFiles, stepSize, binLength, save_data, bedRegions=None)[source]

returns the average score in each bigwig file at each ‘stepSize’ position within the interval start, end for a ‘binLength’ window. Because the idea is to get counts for window positions at different positions for sampling the bins are equally spaced and not adjacent.

If a list of bedRegions is given, then the number of reads that overlaps with each region is counted.

Test dataset with two samples covering 200 bp. >>> test = Tester()

Fragment coverage. >>> np.transpose(countFragmentsInRegions_worker(test.chrom, 0, 200, [test.bwFile1, test.bwFile2], 50, 25, False)[0]) array([[ 1., 1., 2., 2.],

[ 1., 1., 1., 3.]])
>>> np.transpose(countFragmentsInRegions_worker(test.chrom, 0, 200, [test.bwFile1, test.bwFile2], 200, 200, False)[0])
array([[ 1.5],
       [ 1.5]])

BED regions: >>> bedRegions = [[test.chrom, [(45, 55)]], [test.chrom, [(95, 105)]], [test.chrom, [(145, 155)]]] >>> np.transpose(countFragmentsInRegions_worker(test.chrom, 0, 200,[test.bwFile1, test.bwFile2], 200, 200, False, … bedRegions=bedRegions)[0]) array([[ 1. , 1.5, 2. ],

[ 1. , 1. , 2. ]])
deeptools.getScorePerBigWigBin.countReadsInRegions_wrapper(args)[source]
deeptools.getScorePerBigWigBin.getChromSizes(bigwigFilesList)[source]

Get chromosome sizes from bigWig file with pyBigWig

Test dataset with two samples covering 200 bp. >>> test = Tester()

Chromosome name(s) and size(s). >>> assert(getChromSizes([test.bwFile1, test.bwFile2]) == ([(‘3R’, 200)], set([])))

deeptools.getScorePerBigWigBin.getScorePerBin(bigWigFiles, binLength, numberOfProcessors=1, verbose=False, region=None, bedFile=None, blackListFileName=None, stepSize=None, chrsToSkip=[], out_file_for_raw_data=None, allArgs=None)[source]

This function returns a matrix containing scores (median) for the coverage of fragments within a region. Each row corresponds to a sampled region. Likewise, each column corresponds to a bigwig file.

Test dataset with two samples covering 200 bp. >>> test = Tester() >>> np.transpose(getScorePerBin([test.bwFile1, test.bwFile2], 50, 3)) array([[ 1., 1., 2., 2.],

[ 1., 1., 1., 3.]])

deeptools.heatmapper module

deeptools.heatmapper_utilities module

deeptools.mapReduce module

deeptools.mapReduce.blSubtract(t, chrom, chunk)[source]

If a genomic region overlaps with a blacklisted region, then subtract that region out

returns a list of lists

deeptools.mapReduce.getUserRegion(chrom_sizes, region_string, max_chunk_size=1000000.0)[source]

Verifies if a given region argument, given by the user is valid. The format of the region_string is chrom:start:end:tileSize where start, end and tileSize are optional.

Parameters:
  • chrom_sizes – dictionary of chromosome/scaffold size. Key=chromosome name
  • region_string – a string of the form chr:start:end
  • max_chunk_size – upper limit for the chunk size
Returns:

tuple chrom_size for the region start, region end, chunk size

#>>> data = getUserRegion({‘chr2’: 1000}, “chr1:10:10”) #Traceback (most recent call last): # … #NameError: Unknown chromosome: chr1 #Known chromosomes are: [‘chr2’]

If the region end is biger than the chromosome size, this value is used instead >>> getUserRegion({‘chr2’: 1000}, “chr2:10:1001”) ([(‘chr2’, 1000)], 10, 1000, 990)

Test chunk and regions size reduction to match tile size >>> getUserRegion({‘chr2’: 200000}, “chr2:10:123344:3”) ([(‘chr2’, 123344)], 9, 123345, 123336)

Test chromosome name mismatch >>> getUserRegion({‘2’: 200000}, “chr2:10:123344:3”) ([(‘2’, 123344)], 9, 123345, 123336) >>> getUserRegion({‘chrM’: 200000}, “MT:10:123344:3”) ([(‘chrM’, 123344)], 9, 123345, 123336)

deeptools.mapReduce.mapReduce(staticArgs, func, chromSize, genomeChunkLength=None, region=None, bedFile=None, blackListFileName=None, numberOfProcessors=4, verbose=False, includeLabels=False, keepExons=False, transcriptID='transcriptID', exonID='exonID', transcript_id_designator='transcript_id', self_=None)[source]

Split the genome into parts that are sent to workers using a defined number of procesors. Results are collected and returned.

For each genomic region the given ‘func’ is called using the following parameters:

chrom, start, end, staticArgs

The arg are static, pickable variables that need to be sent to workers.

The genome chunk length corresponds to a fraction of the genome, in bp, that is send to each of the workers for processing.

Depending on the type of process a larger or shorter regions may be preferred

Parameters:
  • chromSize – A list of duples containing the chromosome name and its length
  • region – The format is chr:start:end:tileSize (see function getUserRegion)
  • staticArgs – tuple of arguments that are sent to the given ‘func’
  • func – function to call. The function is called using the following parameters (chrom, start, end, staticArgs)
  • bedFile – Is a bed file is given, the args to the func to be called are extended to include a list of bed defined regions.
  • blackListFileName – A list of regions to exclude from all computations. Note that this has genomeChunkLength resolution…
  • self – In case mapreduce should make a call to an object the self variable has to be passed.
  • includeLabels – Pass group and transcript labels into the calling function. These are added to the static args (groupLabel and transcriptName).

If “includeLabels” is true, a tuple of (results, labels) is returned

deeptools.sumCoveragePerBin

deeptools.utilities module

deeptools.utilities.bam_blacklisted_reads(bam_handle, chroms_to_ignore, blackListFileName=None, numberOfProcessors=1)[source]
deeptools.utilities.bam_blacklisted_worker(args)[source]
deeptools.utilities.bam_total_reads(bam_handle, chroms_to_ignore, stats)[source]

Count the total number of mapped reads in a BAM file, filtering the chromosome given in chroms_to_ignore list

deeptools.utilities.convertCmap(c, vmin=0, vmax=1)[source]
deeptools.utilities.copyFileInMemory(filePath, suffix='')[source]

copies a file into the special /dev/shm device which moves the file into memory. This process speeds ups the multiprocessor access to such files

deeptools.utilities.getCommonChrNames(bamFileHandles, verbose=True)[source]

Compares the names and lengths of a list of bam file handles. The input is list of pysam file handles.

The function returns a duple containing the common chromosome names and the common chromome lengths.

Hopefully, only _random and chrM are not common.

deeptools.utilities.getGC_content(tb, chrom, fragStart, fragEnd, fraction=True)[source]
deeptools.utilities.getTLen(read, notAbs=False)[source]

Get the observed template length of a read. For a paired-end read, this is normally just the TLEN field. For SE reads this is the observed coverage of the genome (excluding splicing).

deeptools.utilities.getTempFileName(suffix='')[source]

Return a temporary file name. The calling function is responsible for deleting this upon completion.

deeptools.utilities.gtfOptions(allArgs=None)[source]

This is used a couple places to setup arguments to mapReduce

deeptools.utilities.mungeChromosome(chrom, chromList)[source]

A generic chromosome munging function. “chrom” is munged by adding/removing “chr” such that it appears in chromList

On error, None is returned, but a common chromosome list should be used beforehand to avoid this possibility

deeptools.utilities.smartLabel(label)[source]

Given a file name, likely with a path, return the file name without the path and with the file extension removed. Thus, something like /path/to/some.special.file should return some.special, since only the first extension (if present) should be stripped.

deeptools.utilities.smartLabels(labels)[source]
deeptools.utilities.tbitToBamChrName(tbitNames, bamNames)[source]

checks if the chromosome names from the two-bit and bam file coincide. In case they do not coincide, a fix is tried. If successful, then a mapping table is returned. tbitNames and bamNames should be lists

deeptools.utilities.toBytes(s)[source]

Like toString, but for functions requiring bytes in python3

deeptools.utilities.toString(s)[source]

This takes care of python2/3 differences

deeptools.writeBedGraph module

class deeptools.writeBedGraph.WriteBedGraph(bamFilesList, binLength=50, numberOfSamples=None, numberOfProcessors=1, verbose=False, region=None, bedFile=None, extendReads=False, blackListFileName=None, minMappingQuality=None, ignoreDuplicates=False, chrsToSkip=[], stepSize=None, center_read=False, samFlag_include=None, samFlag_exclude=None, zerosToNans=False, smoothLength=0, minFragmentLength=0, maxFragmentLength=0, out_file_for_raw_data=None, statsList=[], mappedList=[])[source]

Bases: deeptools.countReadsPerBin.CountReadsPerBin

Reads bam files coverages and writes a bedgraph or bigwig file

Extends the CountReadsPerBin object such that the coverage of bam files is writen to multiple bedgraph files at once.

The bedgraph files are later merge into one and converted into a bigwig file if necessary.

The constructor arguments are the same as for CountReadsPerBin. However, when calling the run method, the following parameters have to be passed

Given the following distribution of reads that cover 200 on a chromosome named ‘3R’:

  0                              100                           200
  |------------------------------------------------------------|
A                                ===============
                                                ===============


B                 ===============               ===============
                                 ===============
                                                ===============
>>> import tempfile
>>> test_path = os.path.dirname(os.path.abspath(__file__)) + "/test/test_data/"
>>> outFile = tempfile.NamedTemporaryFile()
>>> bam_file = test_path +  "testA.bam"

For the example a simple scaling function is going to be used. This function takes the coverage found at each region and multiplies it to the scaling factor. In this case the scaling factor is 1.5

>>> function_to_call = scaleCoverage
>>> funcArgs = {'scaleFactor': 1.5}

Restrict process to a region between positions 0 and 200 of chromosome 3R

>>> region = '3R:0:200'

Set up such that coverage is computed for consecutive bins of length 25 bp >>> bin_length = 25 >>> step_size = 25

>>> num_sample_sites = 0 #overruled by step_size
>>> c = WriteBedGraph([bam_file], binLength=bin_length, region=region, stepSize=step_size)
>>> c.run(function_to_call, funcArgs, outFile.name)
>>> f = open(outFile.name, 'r')
>>> f.readlines()
['3R\t0\t100\t0\n', '3R\t100\t200\t1.5\n']
>>> f.close()
>>> outFile.close()
run(func_to_call, func_args, out_file_name, blackListFileName=None, format='bedgraph', smoothLength=0)[source]

Given a list of bamfiles, a function and a function arguments, this method writes a bedgraph file (or bigwig) file for a partition of the genome into tiles of given size and a value for each tile that corresponds to the given function and that is related to the coverage underlying the tile.

func_to_call : str
function name to be called to convert the list of coverages computed for each bam file at each position into a single value. An example is a function that takes the ratio between the coverage of two bam files.
func_args : dict
dict of arguments to pass to func. E.g. {‘scaleFactor’:1.0}
out_file_name : str
name of the file to save the resulting data.
smoothLength : int
Distance in bp for smoothing the coverage per tile.
writeBedGraph_worker(chrom, start, end, func_to_call, func_args, bed_regions_list=None)[source]

Writes a bedgraph based on the read coverage found on bamFiles

The given func is called to compute the desired bedgraph value using the funcArgs

chrom : str
Chrom name
start : int
start coordinate
end : int
end coordinate
func_to_call : str
function name to be called to convert the list of coverages computed for each bam file at each position into a single value. An example is a function that takes the ratio between the coverage of two bam files.
func_args : dict
dict of arguments to pass to func.
smoothLength : int
Distance in bp for smoothing the coverage per tile.
bed_regions_list: list
List of tuples of the form (chrom, start, end) corresponding to bed regions to be processed. If not bed file was passed to the object constructor then this list is empty.

A list of [chromosome, start, end, temporary file], where the temporary file contains the bedgraph results for the region queried.

>>> test_path = os.path.dirname(os.path.abspath(__file__)) + "/test/test_data/"
>>> bamFile1 = test_path +  "testA.bam"
>>> bin_length = 50
>>> number_of_samples = 0 # overruled by step_size
>>> func_to_call = scaleCoverage
>>> funcArgs = {'scaleFactor': 1.0}
>>> c = WriteBedGraph([bamFile1], bin_length, number_of_samples, stepSize=50)
>>> tempFile = c.writeBedGraph_worker( '3R', 0, 200, func_to_call, funcArgs)
>>> f = open(tempFile[3], 'r')
>>> f.readlines()
['3R\t0\t100\t0\n', '3R\t100\t200\t1\n']
>>> f.close()
>>> os.remove(tempFile[3])
deeptools.writeBedGraph.bedGraphToBigWig(chromSizes, bedGraphFiles, bigWigPath)[source]

Takes a sorted list of bedgraph files and write them to a single bigWig file using pyBigWig. The order of bedGraphFiles must match that of chromSizes!

deeptools.writeBedGraph.getGenomeChunkLength(bamHandles, tile_size, mappedList)[source]

Tries to estimate the length of the genome sent to the workers based on the density of reads per bam file and the number of bam files.

The chunk length should be a multiple of the tileSize

deeptools.writeBedGraph.ratio(tile_coverage, args)[source]

tileCoverage should be an list of two elements

deeptools.writeBedGraph.scaleCoverage(tile_coverage, args)[source]

tileCoverage should be an list with only one element

deeptools.writeBedGraph.writeBedGraph_wrapper(args)[source]

Passes the arguments to writeBedGraph_worker. This is a step required given the constrains from the multiprocessing module. The args var, contains as first element the ‘self’ value from the WriteBedGraph object

deeptools.writeBedGraph_bam_and_bw module

deeptools.writeBedGraph_bam_and_bw.getCoverageFromBigwig(bigwigHandle, chrom, start, end, tileSize, missingDataAsZero=False)[source]
deeptools.writeBedGraph_bam_and_bw.writeBedGraph(bamOrBwFileList, outputFileName, fragmentLength, func, funcArgs, tileSize=25, region=None, blackListFileName=None, numberOfProcessors=1, format='bedgraph', extendPairedEnds=True, missingDataAsZero=False, smoothLength=0, fixed_step=False, verbose=False)[source]

Given a list of bamfiles, a function and a function arguments, this method writes a bedgraph file (or bigwig) file for a partition of the genome into tiles of given size and a value for each tile that corresponds to the given function and that is related to the coverage underlying the tile.

deeptools.writeBedGraph_bam_and_bw.writeBedGraph_worker(chrom, start, end, tileSize, defaultFragmentLength, bamOrBwFileList, func, funcArgs, extendPairedEnds=True, smoothLength=0, missingDataAsZero=False, fixed_step=False)[source]

Writes a bedgraph having as base a number of bam files.

The given func is called to compute the desired bedgraph value using the funcArgs

tileSize

deeptools.writeBedGraph_bam_and_bw.writeBedGraph_wrapper(args)[source]

Module contents

deepTools Galaxy. code @ github.