Evolutionary learning of rich neural networks in the Bayesian model selection framework

Matteucci, Matteo; Spadoni, Dario

International Journal of Applied Mathematics and Computer Science, Tome 14 (2004), p. 423-440 / Harvested from The Polish Digital Mathematics Library

Access to full text
Full (PDF)

Résumé

In this paper we focus on the problem of using a genetic algorithm for model selection within a Bayesian framework. We propose to reduce the model selection problem to a search problem solved using evolutionary computation to explore a posterior distribution over the model space. As a case study, we introduce ELeaRNT (Evolutionary Learning of Rich Neural Network Topologies), a genetic algorithm which evolves a particular class of models, namely, Rich Neural Networks (RNN), in order to find an optimal domain-specific non-linear function approximator with a good generalization capability. In order to evolve this kind of neural networks, ELeaRNT uses a Bayesian fitness function. The experimental results prove that ELeaRNT using a Bayesian fitness function finds, in a completely automated way, networks well-matched to the analysed problem, with acceptable complexity.

Publié le : 2004-01-01

Zbl 1138.62343

EUDML-ID : urn:eudml:doc:207708

@article{bwmeta1.element.bwnjournal-article-amcv14i3p423bwm,
     author = {Matteucci, Matteo and Spadoni, Dario},
     title = {Evolutionary learning of rich neural networks in the Bayesian model selection framework},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {14},
     year = {2004},
     pages = {423-440},
     zbl = {1138.62343},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv14i3p423bwm}
}

Matteucci, Matteo; Spadoni, Dario. Evolutionary learning of rich neural networks in the Bayesian model selection framework. International Journal of Applied Mathematics and Computer Science, Tome 14 (2004) pp. 423-440. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv14i3p423bwm/

Bibliographie

[000] Angeline P.J. (1994): Genetic Programming and Emergent Intelligence, In: Advances in Genetic Programming (Jr. Kinnear and E. Kenneth, Eds.). - Cambridge, MA: MIT Press, pp. 75-98.

[001] Bebis G., Georgiopoulos M. and Kasparis T. (1997): Coupling weight elimination with genetic algorithms to reduce network size and preserve generalization. - Neurocomput., Vol. 17, No. 3-4, pp. 167-194.

[002] Bernardo J.M. and Smith A.F.M. (1994): Bayesian Theory. - New York: Wiley.

[003] Bishop C.M. (1995): Neural Networks for Pattern Recognition. - Oxford: Oxford University Press. | Zbl 0868.68096

[004] Castellano G., Fanelli A.M. and Pelillo M. (1997): An iterative pruning algorithm for feedforward neural networks. - IEEE Trans. Neural Netw., Vol. 8, No. 3, pp. 519-531.

[005] Chib S. and Greenberg E. (1995): Understanding the Metropolis-Hastings algorithm. -Amer. Stat., Vol. 49, No. 4, pp. 327-335.

[006] Denison D.G.T., Holmes C.C., Mallick B.K. and Smith A.F.M. (2002): Bayesian Methods for Nonlinear Classification and Regression. - New York: Wiley. | Zbl 0994.62019

[007] Dudzinski M.L. and Mykytowycz R. (1961): The eye lens as an indicator of age in the wild rabbit in Australia. - CSIRO Wildlife Res., Vol. 6, No. 1, pp. 156-159.

[008] Flake G.W. (1993): Nonmonotonic activation functions in multilayer perceptrons. - Ph.D. thesis, Dept. Comput. Sci., University of Maryland, College Park, MD.

[009] Fletcher R. (1987): Practical Methods of Optimization. - New York: Wiley. | Zbl 0905.65002

[010] Goldberg D.E. (1989): Genetic Algorithms in Search, Optimization, and Machine Learning.Reading, MA: Addison-Wesley. | Zbl 0721.68056

[011] Gull S.F. (1989): Developments in maximum entropy data analysis, In: Maximum Entropy and Bayesian Methods, Cambridge 1998 (J. Skilling, Ed.). - Dordrecht: Kluwer, pp. 53-71. | Zbl 0701.62015

[012] Hancock P.J.B. (1992): Genetic algorithms and permutation problems: A comparison of recombination operators for neural net structure specification. - Proc. COGANN Workshop, Int. Joint Conf. Neural Networks, Piscataway, NJ, IEEE Computer Press, pp. 108-122.

[013] Hashem S. (1997): Optimal linear combinations of neural networks. - Neural Netw., Vol 10, No. 4, pp. 599-614.

[014] Hassibi B. and Stork D.G. (1992): Second order derivatives for network pruning: Optimal Brain Surgeon, In: Advances in Neural Information ProcessingSystems (S.J. Hanson, J.D. Cowan and C. Lee Giles, Eds.). -San Matteo, CA: Morgan Kaufmann, Vol. 5, pp. 164-171.

[015] Hastings W.K. (1970): Monte Carlo sampling methods using Markov chains and their applications. - Biometrika, Vol. 57, pp. 97-109. | Zbl 0219.65008

[016] Haykin S. (1999): Neural Networks. A Comprehensive Foundation (2nd Edition). - New Jersey: Prentice Hall. | Zbl 0934.68076

[017] Hoeting J., Madigan D., Raftery A. and Volinsky C. (1998): Bayesian model averaging. - Tech. Rep. No. 9814, Department of Statistics, Colorado State University. | Zbl 1059.62525

[018] Hornik K.M., Stinchcombe M. and White H. (1989): Multilayer feedforward networks are universal approximators.- Neural Netw., Vol. 2, No. 5, pp. 359-366.

[019] Liu Y. and Yao X. (1996): A population-based learning algorithm which learns both architectures and weights of neural networks.- Chinese J. Adv. Softw. Res., Vol. 3, No. 1, pp. 54-65.

[020] Lovell D. and Tsoi A. (1992): The performance of the neocognitron with various s-cell and c-cell transfer functions. - Tech. Rep., Intelligent Machines Laboratory, Department of Electrical Engineering, University of Queensland.

[021] MacKay D.J.C. (1992): A practical Bayesian framework for back propagation networks. - Neural Comput., Vol. 4, No. 3, pp. 448-472.

[022] MacKay D.J.C. (1995): Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks. - Netw. Comput. Neural Syst., Vol. 6, No. 3, pp. 469-505. | Zbl 0834.68098

[023] MacKay D.J.C. (1999): Comparison of approximate methods for handling hyperparameters. - Neural Comput., Vol. 11, No. 5, pp. 1035-1068.

[024] Mani G. (1990): Learning by gradient descent in function space. - Tech. Rep. No. WI 52703, Computer Sciences Department, University of Winsconsin, Madison, WI.

[025] Matteucci M. (2002a): ELeaRNT: Evolutionary learning of rich neural network topologies. - Tech. Rep. No. CMU-CALD-02-103, Carnegie Mellon University, Pittsburgh, PA.

[026] Matteucci M. (2002b): Evolutionary learning of adaptive models within a Bayesian framework. - Ph.D. thesis, Dipartimento di Elettronica e Informazione, Politecnico di Milano.

[027] Montana D.J. and Davis L. (1989): Training feedforward neural networks using genetic algorithms. - Proc. 3rd Int. Conf. Genetic Algorithms, San Francisco, CA, USA, pp. 762-767. | Zbl 0709.68060

[028] Pearlmutter B.A. (1994): Fast exact multiplication by the Hessian. - Neural Comput., Vol. 6, No. 1, pp. 147-160.

[029] Press W.H., Teukolsky S.A., Vetterling W.T. and Flannery B.P. (1992): Numerical Recipes in C: The Art of Scientific Computing.- Cambridge, UK: University Press. | Zbl 0845.65001

[030] Ronald E. and Schoenauer M. (1994): Genetic lander: An experiment in accurate neuro-genetic control. - Proc. 3rd Conf. Parallel Problem Solving from Nature, Berlin, Germany, pp. 452-461.

[031] Rumelhart D.E., Hinton G.E. and Williams R.J. (1986): Learning representations by back-propagating errors. - Nature, Vol. 323, pp. 533-536.

[032] Stone M. (1974): Cross-validation choice and assessment of statistical procedures. - J. Royal Stat. Soc., Series B, Vol. 36, pp. 111-147. | Zbl 0308.62063

[033] Tierney L. and Kadane J.B. (1986): Accurate approximations for posterior moments and marginal densities. - J. Amer. Stat. Assoc., Vol. 81, pp. 82-86. | Zbl 0587.62067

[034] Tikhonov A.N. (1963): Solution of incorrectly formulated problems and the regularization method. - Soviet Math. Dokl., Vol. 4, pp. 1035-1038. | Zbl 0141.11001

[035] Wasserman L. (1999): Bayesian model selection and model averaging. - J. Math. Psych., Vol. 44, No. 1, pp. 92-107. | Zbl 0946.62032

[036] Weigend A.S., Rumelhart D.E. and Huberman B.A. (1991): Generalization by weight elimination with application to forecasting, In: Advances in Neural Information Processing Systems, Vol. 3 (R. Lippmann, J. Moody and D. Touretzky, Eds.). - San Francisco, CA: Morgan-Kaufmann, pp. 875-882.

[037] Williams P.M. (1995): Bayesian regularization and pruning using a Laplace prior. - Neural Comput., Vol. 7, No. 1, pp. 117-143.