In plants, expression breadth and expression level distinctly and non-linearly correlate with gene structure
-
Correspondence: Hangxing Yang hxyang@fudan.edu.cn
T-Life Research Center, Department of Physics, Fudan University, 220 Handan Road, Shanghai 200433, PR China
Biology Direct 2009, 4:45 doi:10.1186/1745-6150-4-45
Published: 21 November 2009Additional files
Additional file 1:
Table S1.pdf. Spearman's rank sum correlations between expression pattern (microarray data) and structural parameters for Arabidopsis and rice genes. For each structural parameter, the first line represents the corrleations with expression pattern, while the second line represents partial correlations. Controlled variable for the columns of Expavg is expression breadth and that for the columns of Width is average expression level. Exptot, total expression level; Expavg, average expression level; Width, expression breadth; CDS, Coding Sequence; UTR, Untranslated Region. ***, P < 1e - 10;**, 1e - 10 <P < 1e 2; *, 0.01 <P < 0.05.
Format: PDF Size: 23KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 2:
Fig S1.pdf. Principal components analysis of the correlation between sequence structural parameters and gene expression. Points represent genes, while arrows represnt variables. In each graph, if the angle between two arrows is > 90°, the two variables represented by these arrows are negatively correlated, while if the angle is < 90°, the variables are positively correlated. These figures were produced using expression data from MPSS experiments. Using data from microarray data gives similar pictures.
Format: PDF Size: 280KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 3:
Fig S2.pdf. Boxplots of structural characteristics versus expression level (microarray data) for Arabidopsis and rice genes. Boxes represent the range of parameters for each gene group, with bold central lines represent the medians, lower and upper boundaries represent the first and third quartiles respectively, whereas whiskers extend to the most extrem points within 1.5× interquartile ranges from the boxes. The red curves represent mean values of parameters for each gene group, whereas horizontal darkviolet lines indicate the population median for each structural parameter. Presented parameters are: CDS length in (a) Arabidopsis and (b) rice; total intron length per gene in (c) Arabidopsis and (d) rice; number of introns per gene in (e) Arabidopsis and (f) rice. Differences in structural parameters between different expression groups are statistically significant (all Kruskal-Wallis rank sum test P < 2e-16).
Format: PDF Size: 50KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 4:
Fig S3.pdf. Boxplots of structural characteristics versus expression breadth for Arabidopsis and rice genes. Boxes represent the range of parameters for each gene group, with bold central lines represent the medians, lower and upper boundaries represent the first and third quartiles respectively, whereas whiskers extend to the most extreme points within 1.5× interquartile ranges from boxes. The red curves represent mean values of parameters for each gene group, whereas horizontal dotted lines indicate the median of the population for each parameter. Presented parameters are: number of introns per gene in (a) Arabidopsis and (b) rice; total intron length per gene in (c) Arabidopsis and (d) rice; length of CDS in (e) Arabidopsis and (f) rice. Differences in structural parameters between different expression groups are statistically significant (all Kruskal-Wallis rank sum test P < 2e-16).
Format: PDF Size: 34KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 5:
Fig S4.pdf. extreme transcript lengths versus expression levels (microarray data) for plant genes. Figure (a), extreme transcript lengths of Arabidopsis genes scale as a power-law of average expression level; Figure (b)-(f), extreme transcript lengths of Arabidopsis and rice genes scale as logrithmic functions of expression levels. In each figure, points represent the whole dataset, whereas triangles represent data subset used to fit the dark-violet linear line; dashed red curve represents the extreme energy-cost of transcription; dotted vertical line indicates the maximum point of the energy-cost curve. Equations show the functional form for corresponding curves. Figures at the left side represent Arabidopsis genes, whereas that at the right side represent rice genes. The adjusted r-squares for the linear regression analyses range from 0.80 to 0.91, and analyses of variance indicate high statistical significance (all P-value < 2e-16). Similar trends could be observed for other structural parameters, such as total intron length per gene and intron number per gene.
Format: PDF Size: 9.8MB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 6:
Fig S5.pdf. Extreme energy-/time- costs for the expression of plant genes vary with expression level (microarray data). Under the assumption that extreme sequence lengths scale as logrithmic functions of expression level, the black solid curve shows how the extreme energy-cost will change with expression level, while other curves indicate the trends of time-cost, which is assumed to scale as sublinear functions (with α being the scaling factor) of expression level. It is shown that, smaller α implies higher effciency requirements for highly expressed genes. Y-axis represents the scale of energy-cost, while the numerical values of time-cost have been scaled to the same range for the convenience of comparison. a = 66094, b = 3494, taken from the case of extreme transript lengths versus total expression level for Arabidopsis genes. Scenarios for other cases are essentially the same.
Format: PDF Size: 18KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 7:
Table S2.pdf. Library information for MPSS expression data.
Format: PDF Size: 15KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 8:
Table S3.pdf. Sample information for Arabidopsis microarray data.
Format: PDF Size: 23KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 9:
Table S4.pdf. Sample information for rice microarray data.
Format: PDF Size: 13KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 10:
Table S5.pdf. Correlation between expression pattern and sequence structural parameters for Arabidopsis genes. The expression data are the microarray data Ren et al. (2006) used in their study. For each structural parameter, ρs represent Spearman's rank sum corrleation coefficients between expression pattern and structural parameters, while partial ρs represent Spearman's partial correlations. Controlled variable for Expavg is expression width and that for Width is average expression level. Exptot, total expression level; Expavg, average expression level; Width, expression breadth. CDS, Coding Sequence; UTR, Untranslated Region. Level of significance: *, P > 0.05; **, 0.001 <P < 0.05; ***, 1e - 10 <P < 1e - 3; No asterisks indicates P < 1e - 10. Numbers in bold indicate highly significant partial correlations (P < 1e - 10).
Format: PDF Size: 22KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 11:
Table S6.pdf. The correlations between expression pattern and sequence structural parameters for Arabidopsis and rice genes. Genes were separately sorted according to their expression levels in each library; the ranks for each gene were then averaged to give the value of Expavg. Notably, in each library, a gene was taken as expressed only when > = 5 tags could be mapped onto it. For each structural parameter, the first line shows Spearman's rank sum corrleations with expression pattern, while the second line shows Spearman's partial correlations. Controlled variable for the columns of Expavg is expression width and that for the columns of Width is average expression level. CDS, Coding Sequence; UTR, Untranslated Region. Level of significance: *, P > 0.05; **, 0.001 <P < 0.05; ***, 1e - 10 <P < 1e - 3; No asterisks indicates P < 1e - 10. Numbers in bold indicate highly significant partial correlations (P < 1e - 10).
Format: PDF Size: 22KB Download file
This file can be viewed with: Adobe Acrobat Reader
