Effective Genome Size

A number of tools can accept an “effective genome size”. This is defined as the length of the “mappable” genome. There are two common alternative ways to calculate this:

1. The number of non-N bases in the genome.
2. The number of regions (of some size) in the genome that are uniquely mappable (possibly given some maximal edit distance).

Option 1 can be computed using faCount from Kents tools. The effective genome size for a number of genomes using this method is given below:

Genome

Effective size

GRCh37

2864785220

GRCh38

2913022398

T2T/CHM13CAT_v2

3117292070

GRCm37

2620345972

GRCm38

2652783500

GRCm39

2654621783

dm3

162367812

dm6

142573017

GRCz10

1369631918

GRCz11

1368780147

WBcel235

100286401

TAIR10

119482012

These values only appropriate if multimapping reads are included. If they are excluded (or there’s any MAPQ filter applied), then values derived from option 2 are more appropriate. These are then based on the read length. We can approximate these values for various read lengths using the khmer program program and unique-kmers.py in particular. A table of effective genome sizes given a read length using this method is provided below:

Read length

GRCh37

GRCh38

T2T/CHM13CAT_v2

GRCm37

GRCm38

GRCm39

dm3

dm6

GRCz10

GRCz11

WBcel235

TAIR10

50

2685511454

2701495711

2725240337

2304947876

2308125299

2309746861

130428510

125464678

1195445541

1197575653

95159402

114339094

75

2736124898

2747877702

2786136059

2404646149

2407883243

2410055689

135004387

127324557

1251132611

1250812288

96945370

115317469

100

2776919708

2805636231

2814334875

2462480910

2467481008

2468088461

139647132

129789773

1280188944

1280354977

98259898

118459858

150

2827436883

2862010428

2931551487

2489384085

2494787038

2495461690

144307658

129940985

1312207019

1311832909

98721103

118504138

200

2855463800

2887553103

2936403235

2513019076

2520868989

2521902382

148523810

132508963

1321355041

1322366338

98672558

117723393

250

2855044784

2898802627

2960856300

2528988583

2538590322

2538633971

151901455

132900923

1339205109

1342093482

101271756

119585546

deepTools Galaxy.

code @ github.