Effective Genome Size
A number of tools can accept an “effective genome size”. This is defined as the length of the “mappable” genome. There are two common alternative ways to calculate this:
1. The number of non-N bases in the genome.
2. The number of regions (of some size) in the genome that are uniquely mappable (possibly given some maximal edit distance).
Option 1 can be computed using faCount
from Kents tools.
The effective genome size for a number of genomes using this method is given below:
Genome |
Effective size |
---|---|
GRCh37 |
2864785220 |
GRCh38 |
2913022398 |
T2T/CHM13CAT_v2 |
3117292070 |
GRCm37 |
2620345972 |
GRCm38 |
2652783500 |
GRCm39 |
2654621783 |
dm3 |
162367812 |
dm6 |
142573017 |
GRCz10 |
1369631918 |
GRCz11 |
1368780147 |
WBcel235 |
100286401 |
TAIR10 |
119482012 |
These values only appropriate if multimapping reads are included. If they are excluded (or there’s any MAPQ filter applied),
then values derived from option 2 are more appropriate.
These are then based on the read length.
We can approximate these values for various read lengths using the khmer program program and unique-kmers.py
in particular.
A table of effective genome sizes given a read length using this method is provided below:
Read length |
GRCh37 |
GRCh38 |
T2T/CHM13CAT_v2 |
GRCm37 |
GRCm38 |
GRCm39 |
dm3 |
dm6 |
GRCz10 |
GRCz11 |
WBcel235 |
TAIR10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
50 |
2685511454 |
2701495711 |
2725240337 |
2304947876 |
2308125299 |
2309746861 |
130428510 |
125464678 |
1195445541 |
1197575653 |
95159402 |
114339094 |
75 |
2736124898 |
2747877702 |
2786136059 |
2404646149 |
2407883243 |
2410055689 |
135004387 |
127324557 |
1251132611 |
1250812288 |
96945370 |
115317469 |
100 |
2776919708 |
2805636231 |
2814334875 |
2462480910 |
2467481008 |
2468088461 |
139647132 |
129789773 |
1280188944 |
1280354977 |
98259898 |
118459858 |
150 |
2827436883 |
2862010428 |
2931551487 |
2489384085 |
2494787038 |
2495461690 |
144307658 |
129940985 |
1312207019 |
1311832909 |
98721103 |
118504138 |
200 |
2855463800 |
2887553103 |
2936403235 |
2513019076 |
2520868989 |
2521902382 |
148523810 |
132508963 |
1321355041 |
1322366338 |
98672558 |
117723393 |
250 |
2855044784 |
2898802627 |
2960856300 |
2528988583 |
2538590322 |
2538633971 |
151901455 |
132900923 |
1339205109 |
1342093482 |
101271756 |
119585546 |