Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution
-
* Corresponding author: Jaroslav P Novak jaroslav.novak@mail.mcgill.ca
1 McGill University and Genome Québec Innovation Centre, 740 Docteur Penfield Avenue, Montreal, Québec, H3A 1A4, Canada
2 Human Genomics Laboratory, Genome Research Center, 52 Eoeun-dong, Yuseong-gu, Daejon, 305-333, Korea
3 Transcriptional Genomics Core, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
4 Institut fur Onkologische Chemie, Heinrich Heine Universitat Dusseldorf, Moorenstr. 5, D-40225 Dusseldorf, Germany
5 St. Luke's-Roosevelt Hospital Center and Columbia University, Molecular Virology Division, 432 West 58th Street, Antenucci Building, Room 709, New York, NY 10019, USA
6 Institute of Experimental Botany AS CR, Rozvojová 135, CZ-165 02, Praha 6, Czech Republic and Charles University in Prague, Department of Plant Physiology, Viničná 5, 12844, Praha 2, Czech Republic
7 Department of Biology, Higley Hall, 202 N. College Dr., Kenyon College, Gambier, OH 43022, USA
8 Environmental Genomics Section, C3-03, PO Box 12233, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
9 Department of Genetics, 425 Henry Mall, University of Wisconsin, Madison, WI 53706, USA
10 Department of Plant Sciences, University of California, One Shields Ave, Davis, CA 95616, USA
11 Department of Pharmaceutical Sciences, University of Arkansas for Medical Sciences, 4301 West Markham, Slot 522-3, Little Rock AR 72205, USA
12 Respiratory Division, Department of Medicine, McGill University, Montreal, Quebec, Canada
13 Department of Pathology, Creighton University School of Medicine, 601 North 30th Street, Omaha, NE, 68131-2197, USA
14 Laboratory of Experimental Medicine, Department of Pediatrics, Faculty of Medicine and Dentistry, Palacky University in Olomouc, Puskinova 6, 775 20 Olomouc, Czech Republic
15 Institute of Neurosciences and Department of Cellular Biology, Physiology and Immunology, Animal Physiology unit, Faculty of Sciences, Autonomous University of Barcelona, Bellaterra, Barcelona, 08193, Spain
16 Programs in Genetics and Developmental Biology, The Research Institute, The Hospital for Sick Children, Toronto, Canada M5G 1X8; Departments of Molecular and Medical Genetics and Pediatrics, University of Toronto, Toronto, M5S 1A1, Canada
17 Environmental Genomics Section, C3-03, PO Box 12233, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
18 Section of Neuroprotection, Centre of Inflammation and Metabolism, The Faculty of Health Sciences, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen Denmark
19 Arthritis and Inflammation Research Program, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst NSW 2010, Australia
20 Department of Plant Sciences, University of California, One Shields Ave, Davis, CA 95616, USA
21 Genetics Unit, Shriners Hospital for Children and Departments of Surgery and Human Genetics, McGill University, Montréal H3A 2T5, Québec, Canada
22 Programs in Genetics and Developmental Biology, The Research Institute, The Hospital for Sick Children, Toronto, Canada M5G 1X8; Departments of Molecular and Medical Genetics, University of Toronto, Toronto, M5S 1A1, Canada
23 Department of Biology, University of Leicester, LE1 7RH Leicester, UK
24 Department of Medicine, Cedars-Sinai Medical Center, David Geffen School of Medicine, UCLA, Los Angeles, CA 90048, USA
Biology Direct 2006, 1:27 doi:10.1186/1745-6150-1-27
Published: 7 September 2006Abstract
Background
DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data.
Results
Here we examine the expression data obtained from 682 Affymetrix GeneChips® with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution.
Conclusion
In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kα coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kα distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.
Reviewers
This article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser).