A rainfall forecasting method using machine learning models and its application to the Fukuoka city case
S. Monira Sumi ; M. Faisal Zaman ; Hideo Hirose
International Journal of Applied Mathematics and Computer Science, Tome 22 (2012), p. 841-854 / Harvested from The Polish Digital Mathematics Library

In the present article, an attempt is made to derive optimal data-driven machine learning methods for forecasting an average daily and monthly rainfall of the Fukuoka city in Japan. This comparative study is conducted concentrating on three aspects: modelling inputs, modelling methods and pre-processing techniques. A comparison between linear correlation analysis and average mutual information is made to find an optimal input technique. For the modelling of the rainfall, a novel hybrid multi-model method is proposed and compared with its constituent models. The models include the artificial neural network, multivariate adaptive regression splines, the k-nearest neighbour, and radial basis support vector regression. Each of these methods is applied to model the daily and monthly rainfall, coupled with a pre-processing technique including moving average and principal component analysis. In the first stage of the hybrid method, sub-models from each of the above methods are constructed with different parameter settings. In the second stage, the sub-models are ranked with a variable selection technique and the higher ranked models are selected based on the leave-one-out cross-validation error. The forecasting of the hybrid model is performed by the weighted combination of the finally selected models.

Publié le : 2012-01-01
EUDML-ID : urn:eudml:doc:244573
@article{bwmeta1.element.bwnjournal-article-amcv22z4p841bwm,
     author = {S. Monira Sumi and M. Faisal Zaman and Hideo Hirose},
     title = {A rainfall forecasting method using machine learning models and its application to the Fukuoka city case},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {22},
     year = {2012},
     pages = {841-854},
     zbl = {1283.68305},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv22z4p841bwm}
}
S. Monira Sumi; M. Faisal Zaman; Hideo Hirose. A rainfall forecasting method using machine learning models and its application to the Fukuoka city case. International Journal of Applied Mathematics and Computer Science, Tome 22 (2012) pp. 841-854. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv22z4p841bwm/

[000] Abrahart, R.J. and See, L. (2002). Multi-model data fusion for river flow forecasting: An evaluation of six alternative methods based on two contrasting catchments, Hydrology and Earth System Sciences 6(4): 655-670.

[001] Baruque, B., Porras, S. and Corchado, E. (2011). Hybrid classification ensemble using topology-preserving clustering, New Generation Computing 29(3): 329-344.

[002] Chalimourda, A., Schölkopf, B. and Smola, A.J. (2004). Experimentally optimal ν in support vector regression for different noise models and parameter settings, Neural Networks: The Official Journal of the International Neural Network Society 17(1): 127-41. | Zbl 1072.68541

[003] Cherkassky, V. and Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression, Neural Networks: The Official Journal of the International Neural Network Society 17(1): 113-26. | Zbl 1075.68632

[004] Coulibaly, P., Haché, M., Fortin, V. and Bobée, B. (2005). Improving daily reservoir inflow forecasts with model combination, Journal of Hydrologic Engineering 10(2): 91.

[005] Dawson, C.W. and Wilby, R.L. (2001). Hydrological modelling using artificial neural networks, Progress in Physical Geography 25(1): 80-108.

[006] De Vos, N.J. and Rientjes, T.H.M. (2005). Constraints of artificial neural networks for rainfall-runoff modelling: Trade-offs in hydrological state representation and model evaluation, Hydrology and Earth System Sciences 9(1-2): 111-126.

[007] Deng, Y.-F., Jin, X. and Zhong, Y.-X. (2005). Ensemble SVR for prediction of time series, Proceedings of the International Conference on Machine Learning and Cybernetics, Guangzhou, China, Vol. 2, pp. 734-748.

[008] Diebold, F.X. and Mariano, R.S. (1995). Comparing predictive accuracy, Journal of Business & Economic Statistics 13(3): 253-263.

[009] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression, Annals of Statistics 32(2): 407-499. | Zbl 1091.62054

[010] Everingham, Y.L., Smyth, C.W. and Inman-Bamber, N.G. (2009). Ensemble data mining approaches to forecast regional sugarcane crop production, Agricultural and Forest Meteorology 149(3-4): 689-696.

[011] Fraley, C. and Hesterberg, T. (2009). Least angle regression and LASSO for large datasets, Statistical Analysis and Data Mining 1(4): 251-259.

[012] Fraser, A.M. and Swinney, H.L. (1986). Independent coordinates for strange attractors from mutual information, Physical Review A 33(2): 1134-1140. | Zbl 1184.37027

[013] Friedman, J.H. (1991). Multivariate adaptive regression splines, Annals of Statistics 19(1): 1-67. | Zbl 0765.62064

[014] Gheyas, I.A. and Smith, L.S. (2011). A novel neural network ensemble architecture for time series forecasting, Neurocomputing 74(18): 3855-3864.

[015] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd Edn., Springer, New York, NY. | Zbl 1273.62005

[016] Hong, W. (2008). Rainfall forecasting by technological machine learning models, Applied Mathematics and Computation 200(1): 41-57. | Zbl 1164.86025

[017] Hyndman, R.J., Slava R. and Schmidt, D. (2012). forecast: Forecasting functions for time series and linear models, R package version 3.19, http://CRAN.R-project.org/package=forecast.

[018] Kim, T., Heo, J.-H. and Jeong, C.-S. (2006). Multireservoir system optimization in the Han River basin using multi-objective genetic algorithms, Hydrological Processes 20(9): 2057-2075.

[019] Kitanidis, P.K. and Bras, R.L. (1980). Real-time forecasting with a conceptual hydrologic model, 2: Application and results, Water Resources Research 16(6): 1034-1044.

[020] Lee, C.F., Lee, J.C. and Lee, A.C. (2000). Statistics for Business and Financial Economics, 2nd Edn., World Scientific, Singapore. | Zbl 1281.62225

[021] Legates, D.R. and McCabe, G.J. (1999). Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation, Water Resources Research 35(1): 233-241.

[022] Li, P.W. and Lai, E.S.T. (2004). Short-range quantitative precipitation forecasting in Hong Kong, Development 288(1-2): 189-209.

[023] Myers, R.H. (1990). Classical and Modern Regression with Applications, Duxbury, Boston, MA.

[024] Nash, J. and Sutcliffe, J. (1970). River flow forecasting through conceptual models, I: A discussion of principles, Journal of Hydrology 10(3): 282-290.

[025] Newbold, P., Carlson, W. and Thorne, B. (2007). Statistics for Business and Economics, 6th Edn., Prentice Hall, Upper Saddle River, NJ.

[026] Pucheta, J., Patino, D. and Kuchen, B. (2009). A statistically dependent approach for the monthly rainfall forecast from one point observations, in D. Li and Z. Chunjiang (Eds.), Computer and Computing Technologies in Agriculture II, Volume 2, IFIP Advances in Information and Communication Technology, Vol. 294, Springer, Boston, MA, pp. 787-798.

[027] Racine, J. (2000). Consistent cross-validatory model-selection for dependent data: hv-block cross-validation, Journal of Econometrics 99(1): 39-61. | Zbl 1011.62118

[028] Siwek, K., Osowski, S., Szupiluk, R. (2009). Ensemble neural network approach for accurate load forecasting in a power system, International Journal of Applied Mathematics and Computer Science 19(2): 303-315, DOI: 10.2478/v10006-009-0026-2. | Zbl 1167.93338

[029] Schölkopf, B. and Smola, A.J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Adaptive Computation and Machine Learning, Vol. 98, MIT Press, Cambridge, MA.

[030] Schölkopf, B. and Smola, A.J. (2004). A tutorial on support vector regression, Statistics and Computing 14(3): 199-122.

[031] Shrestha, D.L. and Solomatine, D.P. (2006). Machine learning approaches for estimation of prediction interval for the model output, Neural Networks 19(2): 225-235. | Zbl 1160.68516

[032] Solomatine, D.P. and Ostfeld, A. (2008). Data-driven modelling: Some past experiences and new approaches, Journal of Hydroinformatics 10(1): 3.

[033] Sudheer, K.P., Gosain, A.K. and Ramasastri, K.S. (2002). A data-driven algorithm for constructing artificial neural network rainfall-runoff models, Hydrological Processes 16(6): 1325-1330.

[034] Syed, A.R. (2011). A review of cross validation and adaptive model selection, Statistics, Mathematics Theses, Georgia State University, Arlanta, GA, Paper 99.

[035] Timmermann, A. (2006). Forecast combinations, in G. Elliott, C. Granger and A. Timmermann (Eds.), Handbook of Economic Forecasting, Elsevier, Amsterdam, Chapter 4, pp. 135-196.

[036] Wichard, J. (2011). Forecasting the NN5 time series with hybrid models, International Journal of Forecasting 27(3): 700-707.

[037] Wichard, J. and Ogorzalek, M. (2007). Time series prediction with ensemble models applied to the CATS benchmark, Neurocomputing 70(13-15): 2371-2378.

[038] Wu, C., Chau, K. and Li, Y. (2008). River stage prediction based on a distributed support vector regression, Journal of Hydrology 358(1-2): 96-111.

[039] Xiong, L., Shamseldin, A. Y. and Oconnor, K. (2001). A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi-Sugeno fuzzy system, Journal of Hydrology 245(1-4): 196-217.

[040] Yang, Y., Lin, H., Guo, Z. and Jiang, J. (2007). A data mining approach for heavy rainfall forecasting based on satellite image sequence analysis, Computers Geosciences 33(1): 20-30.

[041] Zaman, M. and Hirose, H. (2011). Classification performance of bagging and boosting type ensemble methods with small training sets, New Generation Computing 29(3): 277-292.