alignmentSieve

This tool filters alignments in a BAM/CRAM file according the the specified parameters. It can optionally output to BEDPE format.

usage: Example usage: alignmentSieve.py -b sample1.bam -o sample1.filtered.bam --minMappingQuality 10 --filterMetrics log.txt

Required arguments

--bam, -b An indexed BAM file.
--outFile, -o The file to write results to. These are the alignments or fragments that pass the filtering criteria.

General arguments

--numberOfProcessors, -p
 Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors.
--filterMetrics
 The number of entries in total and filtered are saved to this file
--filteredOutReads
 If desired, all reads NOT passing the filtering criteria can be written to this file.
--label, -l User defined label instead of the default label (file name).
--smartLabels Instead of manually specifying a labels for the input file, this causes deepTools to use the file name after removing the path and extension.
--verbose, -v Set to see processing messages.
--version show program’s version number and exit
--shift Shift the left and right end of a read (for BAM files) or a fragment (for BED files). A positive value shift an end to the right (on the + strand) and a negative value shifts a fragment to the left. Either 2 or 4 integers can be provided. For example, “2 -3” will shift the left-most fragment end two bases to the right and the right-most end 3 bases to the left. If 4 integers are provided, then the first and last two refer to fragments whose read 1 is on the left or right, respectively. Consequently, it is possible to take strand into consideration for strand-specific protocols. A fragment whose length falls below 1 due to shifting will not be written to the output. See the online documentation for graphical examples. Note that non-properly-paired reads will be filtered.
--ATACshift Shift the produced BAM file or BEDPE regions as commonly done for ATAC-seq. This is equivalent to –shift 4 -5 5 -4.

Output arguments

--BED Instead of producing BAM files, write output in BEDPE format (as defined by MACS2). Note that only reads/fragments passing filtering criterion are written in BEDPE format.

Optional arguments

--filterRNAstrand
 

Possible choices: forward, reverse

Selects RNA-seq reads (single-end or paired-end) in the given strand.

--ignoreDuplicates
 If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate’s position also has to coincide to ignore a read.
--minMappingQuality
 If set, only reads that have a mapping quality score of at least this are considered.
--samFlagInclude
 Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage.
--samFlagExclude
 Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use –samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand.
--blackListFileName, -bl
 A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant.
--minFragmentLength
 The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments.
--maxFragmentLength
 The maximum fragment length needed for read/pair inclusion.

Background

This tool filters alignments in a BAM/CRAM file according the the specified parameters. It can optionally output to BEDPE format, possibly with the fragment ends shifted in a custom manner.

Usage example

alignmentSieve needs a sorted and indexed BAM file and the desired filtering criteria.

$ alignmentSieve -b paired_chr2L.bam \
--minMappingQuality 5 --samFlagInclude 16 \
--samFlagExclude 256 --ignoreDuplicates \
-o filtered.bam --filterMetrics metrics.txt

The alignments passing the filtering criteria are then written to the file specified by -o. You can additionally save alignments NOT passing the filtering criteria with the -filteredOutReads If you would like to store metrics about the number of reads seen and the number remaining after filtering, then specify the file for that with --filterMetrics. An example metrics file is below:

#bamFilterReads –filterMetrics #File Reads Remaining Total Initial Reads paired_chr2L.bam 8440 12644

Instead of a BAM file, a BEDPE file (suitable for input into MACS2) can be produced. Like the BAM/CRAM output, BEDPE also allows shifting of fragment ends, as is often desirable in ATAC-seq and related protocols:

$ alignmentSieve -b paired_chr2L.bam \
--minFragmentLength 140 --BED \
--shift -5 3 -o fragments.bedpe

The --shift option can take either 2 or 4 integers. If two integers are given, then the first value shifts the left-most end of a fragment and the second the right-most end. Positive values shift to the right and negative values to the left. See below for how the above settings would shift a single fragment:

     ----> read 1
                 read 2 <----

     ------------------------ fragment

-------------------------------- shifted fragment

The same results will be produced if read 1 and read 2 are swapped. If, instead, the protocol is strand-specific, then the first set of integers in a pair would be applied to fragments where read 1 precedes read 2, and the second set to cases where read 2 precedes read 1. In this case, the first value in each pair is applied to the end of read 1 and the second to the end of read 2. Take the following command as an example:

$ alignmentSieve -b paired_chr2L.bam \
--minFragmentLength 140 --BED \
--shift -5 3 -1 4 -o fragments.bedpe

Given that, the -5 3 set would produce the following:

     ----> read 1
                 read 2 <----

     ------------------------ fragment

-------------------------------- shifted fragment

and the -1 4 set would produce the following:

----> read 2
            read 1 <----

------------------------ fragment

    --------------------- shifted fragment

As can be seen, such fragments are considered to be on the - strand, so negative values then shift to the left on its frame of reference (thus, to the right relative to the + strand).

Note

If the --shift or --ATACshift options are used, then only properly-paired reads will be used.

deepTools Galaxy. code @ github.