Table 2

Statistics of datasets used in this study. The first column is the name of the organism, the second column – the number of protein-coding genes in its genome, Ngenes, the third column – the number of proteins for which we found at least one paralogous partner, the fourth column is the percentage of proteins with at least one paralog, the fifth column – the total number of distinct BLAST hits generated before we applied subsequent filtering, the sixth column – the number of paralogous pairs included in Na(p), and the seventh column – in Nd(p).

Organism
Proteome size
Number of proteins with paralogs
% of proteins with paralogs
BLASTP hits
Number of pairs in Na(p)
Number of pairs in Nd(p)

H. pylori
1590
230
14%
3228
260
148
E. coli
4288
1428
33%
16768
2614
1013
S. cerevisiae
5885
1689
29%
43915
2297
1025
C. elegans
19099
6894
36%
204398
46463
5545
D. melanogaster
14015
4153
30%
557047
17621
3238
H. sapiens
25319
9252
37%
1330721
31078
6595

Axelsen et al. Biology Direct 2007 2:32   doi:10.1186/1745-6150-2-32

Open Data