Effective Genome Size

A number of tools can accept an “effective genome size”. This is defined as the length of the “mappable” genome. There are two common alternative ways to calculate this:

1. The number of non-N bases in the genome.
2. The number of regions (of some size) in the genome that are uniquely mappable (possibly given some maximal edit distance).

Option 1 can be computed using faCount from Kents tools. The effective genome size for a number of genomes using this method is given below:

Genome	Effective size
GRCh37	2864785220
GRCh38	2913022398
T2T/CHM13CAT_v2	3117292070
GRCm37	2620345972
GRCm38	2652783500
GRCm39	2654621783
dm3	162367812
dm6	142573017
GRCz10	1369631918
GRCz11	1368780147
WBcel235	100286401
TAIR10	119482012

These values only appropriate if multimapping reads are included. If they are excluded (or there’s any MAPQ filter applied), then values derived from option 2 are more appropriate. These are then based on the read length. We can approximate these values for various read lengths using the khmer program program and unique-kmers.py in particular. A table of effective genome sizes given a read length using this method is provided below:

Read length	GRCh37	GRCh38	T2T/CHM13CAT_v2	GRCm37	GRCm38	GRCm39	dm3	dm6	GRCz10	GRCz11	WBcel235	TAIR10
50	2685511454	2701495711	2725240337	2304947876	2308125299	2309746861	130428510	125464678	1195445541	1197575653	95159402	114339094
75	2736124898	2747877702	2786136059	2404646149	2407883243	2410055689	135004387	127324557	1251132611	1250812288	96945370	115317469
100	2776919708	2805636231	2814334875	2462480910	2467481008	2468088461	139647132	129789773	1280188944	1280354977	98259898	118459858
150	2827436883	2862010428	2931551487	2489384085	2494787038	2495461690	144307658	129940985	1312207019	1311832909	98721103	118504138
200	2855463800	2887553103	2936403235	2513019076	2520868989	2521902382	148523810	132508963	1321355041	1322366338	98672558	117723393
250	2855044784	2898802627	2960856300	2528988583	2538590322	2538633971	151901455	132900923	1339205109	1342093482	101271756	119585546

deepTools Galaxy.

code @ github.