An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects
Sergio Arciniegas-Alarcón ; Marisol García-Peña ; Wojtek Janusz Krzanowski ; Carlos Tadeu dos Santos Dias
Biometrical Letters, Tome 51 (2014), p. 75-88 / Harvested from The Polish Digital Mathematics Library

A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

Publié le : 2014-01-01
EUDML-ID : urn:eudml:doc:268748
@article{bwmeta1.element.doi-10_2478_bile-2014-0006,
     author = {Sergio Arciniegas-Alarc\'on and Marisol Garc\'\i a-Pe\~na and Wojtek Janusz Krzanowski and Carlos Tadeu dos Santos Dias},
     title = {An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects},
     journal = {Biometrical Letters},
     volume = {51},
     year = {2014},
     pages = {75-88},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.doi-10_2478_bile-2014-0006}
}
Sergio Arciniegas-Alarcón; Marisol García-Peña; Wojtek Janusz Krzanowski; Carlos Tadeu dos Santos Dias. An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects. Biometrical Letters, Tome 51 (2014) pp. 75-88. http://gdmltest.u-ga.fr/item/bwmeta1.element.doi-10_2478_bile-2014-0006/

Arciniegas-Alarcón S., García-Peña M., Dias C.T.S. (2011): Data imputation in trials with genotype×environment interaction. Interciencia 36(6): 444-449.

Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski W.J. (2010): An alternative methodology for imputing missing data in trials with genotypeby- environment interaction. Biometrical Letters 47(1): 1-14.

Bergamo G.C., Dias C.T.S., Krzanowski W.J. (2008): Distribution-free multiple imputation in an interaction matrix through singular value decomposition. Scientia Agricola 65(4): 422-427.[WoS]

Calinski T., Czajka S., Kaczmarek Z., Krajewski P., Pilarczyk W. (2009): Analyzing the Genotype-by-Environment Interactions Under a Randomization- Derived Mixed Model. Journal of Agricultural, Biological and Environmental Statistics 14(2): 224-241.[WoS][Crossref] | Zbl 1306.62254

Ching W., Li L., Tsing N., Tai C., Ng T. (2010): A weighted local least squares imputation method for missing value estimation in microarray gene expression data. International Journal of Data Mining and Bioinformatics 4(3): 331-347.

Denis J.B., Baril C.P. (1992): Sophisticated models with numerous missing values: the multiplicative interaction model as an example. Biuletyn Oceny Odmian 24-25: 33-45.

Di Ciaccio A. (2011): Bootstrap and nonparametric predictors to impute missing data. In: B. Fichet et al. (eds.), Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag Berlin Heidelberg.

Dias C.T.S., Krzanowski W.J. (2003): Model selection and cross validation in additive main effect and multiplicative interaction models. Crop Science 43: 865-873.[Crossref]

Gabriel K.R. (2002): Le biplot - outil d’exploration de données multidimensionelles. Journal de la Société Française de Statistique 143(3-4): 5-55.

García-Peña M., Dias C.T.S. (2009): Analysis of bivariate additive models with multiplicative interaction (AMMI). Biometric Brazilian Journal 27(4): 586-602.

Gauch H.G. (2013): A simple protocol for AMMI analysis of yield trials. Crop Science 53: 1860-1869.[Crossref][WoS]

Gauch H.G., Zobel R.W. (1990): Imputing missing yield trial data. Theoretical and Applied Genetics 79: 753-761.

Josse J., Pagès J., Husson F. (2011): Multiple imputation in PCA. Advances in data analysis and classification 5(3): 231-246. | Zbl 1274.62409

Josse J., Husson F. (2012): Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique 153(2): 79-99. | Zbl 1316.62006

Krzanowski W.J. (1988): Missing value imputation in multivariate data using the singular value decomposition of a matrix. Biometrical Letters XXV(1-2): 31-39.

Krzanowski W.J. (2000): Principles of multivariate analysis: A user’s perspective. Oxford: University Press. | Zbl 0678.62001

Kroonenberg P.M. (2008): Applied multiway data analysis. John Wiley & Sons. | Zbl 1160.62002

Kumar A., Verulkar S.B., Mandal N.P., Variar M., Shukla V.D., Dwivedi J.L., Singh B.N., Singh O.N., Swain P., Mall A.K., Robin S., Chandrababu R., Jain A., Haefele S.M., Piepho H.P., Raman A. (2012): High-yielding, droughttolerant, stable rice genotypes for the shallow rainfed lowland droughtprone ecosystem. Field Crops Research 133: 37-47.[WoS]

Little R., Rubin D. (2002): Statistical analysis with missing data. 2nd ed. John Wiley & Sons, New York, NY. | Zbl 1011.62004

Paderewski J., Rodrigues P.C. (2014): The usefulness of EM-AMMI to study the influence of missing data pattern and application to Polish post-registration winter wheat data. Australian Journal of Crop Science 8: 640-645.

Piepho H.P. (1995): Methods for estimating missing genotype-location combinations in multilocation trials - an empirical comparison. Informatik Biometrie und Epidemiologie in Medizin und Biologie 26: 335-349.

Piepho H.P., Möhring J. (2006): Selection in cultivar trials - Is it ignorable? Crop Science 46: 192-201.[Crossref]

R Development Core Team (2013): R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/

Rodrigues P., Pereira D.G.S., Mexia J.T. (2011): A comparison between joint regression analysis and the additive main and multiplicative interaction model: the robustness with increasing amounts of missing data. Scientia Agricola 68(6): 679-686.[Crossref]

Rubin D.B. (1978): Multiple imputation in sample surveys: a phenomenological Bayesian approach to nonresponse. In: Survey Research Methods Section Of The American Statistical Association. Proceedings: 20-34.

Sabaghnia N., Karimizadeh R., Mohammadi M. (2012): Model selection in additive main effect and multiplicative interaction model in durum wheat. Genetika 44(2): 325-339.[Crossref][WoS]

Schafer J.L., Graham J.W. (2002): Missing data: our view of the state of the art. Psychological Methods 7(2): 147-177.[Crossref][PubMed]

van Buuren S. (2012): Flexible imputation of missing data. CRC press. | Zbl 1256.62005

Wright K. (2012): agridat: Agricultural datasets. R package version 1.4. http://CRAN.R-project.org/package=agridat>

Yan W., Pageau D., Frégeau-Reid J., Durand J. (2011): Assessing the representativeness and repeatability of test locations for genotype evaluation. Crop Science 51: 1603-1610.[Crossref][WoS]

Yan W. (2013): Biplot analysis of incomplete two-way data. Crop Science 53(1): 48-57. [WoS][Crossref]