ResearchParameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins1 Center for Models of Life, Niels Bohr Institute, Blegdamsvej 17, DK-2100, Copenhagen Ø, Denmark 2 Department of Condensed Matter Physics and Materials Science, Brookhaven National Laboratory, Upton, New York 11973, USA 3 Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York 11794, USA
Biology Direct 2007, 2:32doi:10.1186/1745-6150-2-32
Additional filesAdditional file 1: The overall shape of the PID histogram is independent of the alignment algorithm and the E-value cutoff. The PID histogram Na(p) in the fly (D. melanogaster genomes when pairs of paralogous proteins were detected using the blastp algorithm [1] with E-value cutoff of 10-10 (filled circles) and 10-30 (open diamonds). The inset shows the ratio of these two histograms, which is very close to 1 for p > 40%. Thus the overall shape of Na(p) in most of the Region II (Fig. 1) is nearly + cutoff independent. The Na(p) also is insensitive to a particular algorithm used to align the pairs. Indeed, when paralogous pairs detected by the blastp with the E-value cutoff of 10-10 (filled circles) were realigned using the Smith-Waterman algorithm [28] the resulting distribution (blue stars) changed very little. Format: PDF Size: 18KB Download file This file can be viewed with: Adobe Acrobat Reader Additional file 2: The quadratic scaling of the total number of paralogous pairs with the number of genes in the genome. The total number of paralogous pairs ∑pNa(p) generated by the all-to-all alignment of all protein sequences encoded in the genome (the y-axis) scales as the square of the total number Ngenes of protein-coding genes in the genome. Solid symbols are six model organisms used in our study. The solid line has the slope 2 on this log-log plot. Format: PDF Size: 12KB Download file This file can be viewed with: Adobe Acrobat Reader |





on Google Scholar









author email
corresponding author email