Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
Open AccessResearch

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

Jaroslav P Novak1 email, Seon-Young Kim2 email, Jun Xu3 email, Olga Modlich4 email, David J Volsky5 email, David Honys6 email, Joan L Slonczewski7 email, Douglas A Bell8 email, Fred R Blattner9 email, Eduardo Blumwald10 email, Marjan Boerma11 email, Manuel Cosio12 email, Zoran Gatalica13 email, Marian Hajduch14 email, Juan Hidalgo15 email, Roderick R McInnes16 email, Merrill C Miller III17 email, Milena Penkowa18 email, Michael S Rolph19 email, Jordan Sottosanto20 email, Rene St-Arnaud21 email, Michael J Szego22 email, David Twell23 email and Charles Wang3,24 email

McGill University and Genome Québec Innovation Centre, 740 Docteur Penfield Avenue, Montreal, Québec, H3A 1A4, Canada

Human Genomics Laboratory, Genome Research Center, 52 Eoeun-dong, Yuseong-gu, Daejon, 305-333, Korea

Transcriptional Genomics Core, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA

Institut fur Onkologische Chemie, Heinrich Heine Universitat Dusseldorf, Moorenstr. 5, D-40225 Dusseldorf, Germany

St. Luke's-Roosevelt Hospital Center and Columbia University, Molecular Virology Division, 432 West 58th Street, Antenucci Building, Room 709, New York, NY 10019, USA

Institute of Experimental Botany AS CR, Rozvojová 135, CZ-165 02, Praha 6, Czech Republic and Charles University in Prague, Department of Plant Physiology, Viničná 5, 12844, Praha 2, Czech Republic

Department of Biology, Higley Hall, 202 N. College Dr., Kenyon College, Gambier, OH 43022, USA

Environmental Genomics Section, C3-03, PO Box 12233, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA

Department of Genetics, 425 Henry Mall, University of Wisconsin, Madison, WI 53706, USA

10  Department of Plant Sciences, University of California, One Shields Ave, Davis, CA 95616, USA

11  Department of Pharmaceutical Sciences, University of Arkansas for Medical Sciences, 4301 West Markham, Slot 522-3, Little Rock AR 72205, USA

12  Respiratory Division, Department of Medicine, McGill University, Montreal, Quebec, Canada

13  Department of Pathology, Creighton University School of Medicine, 601 North 30th Street, Omaha, NE, 68131-2197, USA

14  Laboratory of Experimental Medicine, Department of Pediatrics, Faculty of Medicine and Dentistry, Palacky University in Olomouc, Puskinova 6, 775 20 Olomouc, Czech Republic

15  Institute of Neurosciences and Department of Cellular Biology, Physiology and Immunology, Animal Physiology unit, Faculty of Sciences, Autonomous University of Barcelona, Bellaterra, Barcelona, 08193, Spain

16  Programs in Genetics and Developmental Biology, The Research Institute, The Hospital for Sick Children, Toronto, Canada M5G 1X8; Departments of Molecular and Medical Genetics and Pediatrics, University of Toronto, Toronto, M5S 1A1, Canada

17  Environmental Genomics Section, C3-03, PO Box 12233, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA

18  Section of Neuroprotection, Centre of Inflammation and Metabolism, The Faculty of Health Sciences, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen Denmark

19  Arthritis and Inflammation Research Program, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst NSW 2010, Australia

20  Department of Plant Sciences, University of California, One Shields Ave, Davis, CA 95616, USA

21  Genetics Unit, Shriners Hospital for Children and Departments of Surgery and Human Genetics, McGill University, Montréal H3A 2T5, Québec, Canada

22  Programs in Genetics and Developmental Biology, The Research Institute, The Hospital for Sick Children, Toronto, Canada M5G 1X8; Departments of Molecular and Medical Genetics, University of Toronto, Toronto, M5S 1A1, Canada

23  Department of Biology, University of Leicester, LE1 7RH Leicester, UK

24  Department of Medicine, Cedars-Sinai Medical Center, David Geffen School of Medicine, UCLA, Los Angeles, CA 90048, USA

author email corresponding author email

Biology Direct 2006, 1:27doi:10.1186/1745-6150-1-27

Published: 7 September 2006

Abstract

Background

DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data.

Results

Here we examine the expression data obtained from 682 Affymetrix GeneChips® with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution.

Conclusion

In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kα coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kα distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.

Reviewers

This article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser).


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.