computeMatrixOperations

This tool performs a variety of operations on files produced by computeMatrix.

detailed help:

computeMatrixOperations info -h

or

computeMatrixOperations relabel -h

or

computeMatrixOperations subset -h

or

computeMatrixOperations filterStrand -h

or

computeMatrixOperations filterValues -h

or

computeMatrixOperations rbind -h

or

computeMatrixOperations cbind -h

or

computeMatrixOperations sort -h

or

computeMatrixOperations dataRange -h

usage: computeMatrixOperations [-h] [--version]  ...

Named Arguments

--version

show program’s version number and exit

Commands

Possible choices: info, relabel, subset, filterStrand, filterValues, rbind, cbind, sort, dataRange

Sub-commands

info

Print group and sample information

An example usage is:
  computeMatrixOperations info -m input.mat.gz

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

relabel

Change sample and/or group label information

An example usage is:
  computeMatrixOperations relabel -m input.mat.gz -o output.mat.gz --sampleLabels "sample 1" "sample 2"

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

Optional arguments

--groupLabels

Groups labels. If none are specified then the current labels will be kept.

--sampleLabels

Sample labels. If none are specified then the current labels will be kept.

subset

Actually subset the matrix. The group and sample orders are honored, so one can also reorder files.

An example usage is:
  computeMatrixOperations subset -m input.mat.gz -o output.mat.gz --groups "group 1" "group 2" --samples "sample 3" "sample 10"

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

Optional arguments

--groups

Groups to include. If none are specified then all will be included.

--samples

Samples to include. If none are specified then all will be included.

filterStrand

Filter entries by strand.

Example usage:
  computeMatrixOperations filterStrand -m input.mat.gz -o output.mat.gz --strand +

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

--strand, -s

Possible choices: +, -, .

Strand

filterValues

Filter entries by min/max value.

Example usage:
  computeMatrixOperations filterValues -m input.mat.gz -o output.mat.gz --min 10 --max 1000

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

Optional arguments

--min

Minimum value. Any row having a single entry less than this will be excluded. The default is no minimum.

--max

Maximum value. Any row having a single entry more than this will be excluded. The default is no maximum.

rbind

merge multiple matrices by concatenating them head to tail. This assumes that the same samples are present in each in the same order.

Example usage:
  computeMatrixOperations rbind -m input1.mat.gz input2.mat.gz -o output.mat.gz

Required arguments

--matrixFile, -m

Matrix files from the computeMatrix tool.

--outFileName, -o

Output file name

cbind

merge multiple matrices by concatenating them left to right. No assumptions are made about the row order. Regions not present in the first file specified are ignored. Regions missing in subsequent files will result in NAs. Regions are matches based on the first 6 columns of the computeMatrix output (essentially the columns in a BED file).

Example usage:
  computeMatrixOperations cbind -m input1.mat.gz input2.mat.gz -o output.mat.gz

Required arguments

--matrixFile, -m

Matrix files from the computeMatrix tool.

--outFileName, -o

Output file name

sort

Sort a matrix file to correspond to the order of entries in the desired input file(s). The groups of regions designated by the files must be present in the order found in the output of computeMatrix (otherwise, use the subset command first). Note that this subcommand can also be used to remove unwanted regions, since regions not present in the input file(s) will be omitted from the output.

Example usage:
  computeMatrixOperations sort -m input.mat.gz -R regions1.bed regions2.bed regions3.gtf -o input.sorted.mat.gz

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

--regionsFileName, -R

File name(s), in BED or GTF format, containing the regions. If multiple bed files are given, each one is considered a group that can be plotted separately. Also, adding a “#” symbol in the bed file causes all the regions until the previous “#” to be considered one group. Alternatively for BED files, putting deepTools_group in the header can be used to indicate a column with group labels. Note that these should be sorted such that all group entries are together.

Optional arguments

--transcriptID

When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as transcripts. (Default: “transcript”)

--transcript_id_designator

Each region has an ID (e.g., ACTB) assigned to it, which for BED files is either column 4 (if it exists) or the interval bounds. For GTF files this is instead stored in the last column as a key:value pair (e.g., as ‘transcript_id “ACTB”’, for a key of transcript_id and a value of ACTB). In some cases it can be convenient to use a different identifier. To do so, set this to the desired key. (Default: “transcript_id”)

dataRange

Returns the min, max, median, 10th and 90th percentile of the matrix values per sample.

Example usage:
  computeMatrixOperations dataRange -m input.mat.gz

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

example usages: computeMatrixOperations subset -m input.mat.gz -o output.mat.gz –group “group 1” “group 2” –samples “sample 3” “sample 10”

Details

computeMatrixOperations can perform a variety of operations on one or more files produced by computeMatrix (N.B., the output is always written to a new file):

Subcommand

What it does

info

Prints out the sample and region group names in the order in which they appear.

subset

Subsets a file by the desired samples/region group names. This can also change the order of these samples/region groups.

filterStrand

Filters the file to only include regions annotated as being on a particular strand.

rbind

Concatenates multiple matrices together, top to bottom.

cbind

Merges multiple matrices, left to right.

sort

Sorts the given file so regions are in the order of occurence in the input BED/GTF file(s).

These operations are useful when you want to run computeMatrix on multiple files (thereby keeping all of the values together) and later exclude regions/samples or add new ones. Another common use would be if you require the output of computeMatrix to be sorted to match the order of regions in the input file.

Attention

As of version 3.0, computeMatrix (and therefore also computeMatrixOperations) produces output with labels present for each sample. If you run any operations on matrices output by older versions then they will be modified to be comformant with the new output, which is not backward compatible!

Examples

Suppose that we have a strand-specific RNAseq dataset and would like to plot only the strand-specific signal across spliced transcripts. The general steps would be as follows:

  1. Run bamCoverage on each sample twice, once with –filterRNAstrand forward and again with –filterRNAstrand reverse. This will result in twice the number of bigWig files as samples.

  2. Run computeMatrix scale-regions with all of these bigWig files, including the –metagene option and a BED12 and/or a GTF file. This produces a file containing the signal separated by strand for each transcript.

  3. Get the list of sample names stored in the matrix file:

$ computeMatrixOperations info -m foo.mat.gz
Groups:
    genes
Samples:
    SRR648667.forward
    SRR648668.forward
    SRR648669.forward
    SRR648670.forward
    SRR648667.reverse
    SRR648668.reverse
    SRR648669.reverse
    SRR648670.reverse
  1. Create two new files, each containing only the samples containing signal from a given strand.

$ computeMatrixOperations subset -m foo.mat.gz -o forward.mat.gz --samples SRR648667.forward SRR648668.forward SRR648669.forward SRR648670.forward
$ computeMatrixOperations subset -m foo.mat.gz -o reverse.mat.gz --samples SRR648667.reverse SRR648668.reverse SRR648669.reverse SRR648670.reverse
  1. These files can then be subset to contain only transcripts on a particular strand. Note that it’s best to double check that the --strand - setting produces the intended results. There are many peculiar variants of RNAseq library preparation and the settings for one type may not be appropriate for another (to check this, use the different --strand options on the same matrix and then run plotHeatmap, one of them will be obviously correct and the other largely blank).

$ computeMatrixOperations filterStrand -m forward.mat.gz -o forward.subset.mat.gz --strand -
$ computeMatrixOperations filterStrand -m reverse.mat.gz -o reverse.subset.mat.gz --strand +
  1. Finally, the files can be merged back together, head to tail. The samples are already in the correct order, as indicated by step 3.

$ computeMatrixOperations rbind -m forward.subset.mat.gz reverse.subset.mat.gz -o merged.mat.gz
  1. If desired, the transcripts can then be resorted to match the order of the input GTF file.

$ computeMatrixOperations sort -m merged.mat.gz -o sorted.mat.gz -R genes.gtf

The resulting file can then be used with plotHeatmap or plotProfile. Note that we could have skipped the subset step and run computeMatrix independently on the forward and reverse bigWig files.

deepTools Galaxy.

code @ github.