Imitation learning of car driving skills with decision trees and random forests

Paweł Cichosz; Łukasz Pawełczak

International Journal of Applied Mathematics and Computer Science, Tome 24 (2014), p. 579-597 / Harvested from The Polish Digital Mathematics Library

Access to full text
Full (PDF)
Access to full text

Résumé

Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots.

Publié le : 2014-01-01

Zbl 1322.68149

EUDML-ID : urn:eudml:doc:271897

@article{bwmeta1.element.bwnjournal-article-amcv24i3p579bwm,
     author = {Pawe\l\ Cichosz and \L ukasz Pawe\l czak},
     title = {Imitation learning of car driving skills with decision trees and random forests},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {24},
     year = {2014},
     pages = {579-597},
     zbl = {1322.68149},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv24i3p579bwm}
}

Paweł Cichosz; Łukasz Pawełczak. Imitation learning of car driving skills with decision trees and random forests. International Journal of Applied Mathematics and Computer Science, Tome 24 (2014) pp. 579-597. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv24i3p579bwm/

Bibliographie

[000] Anderson, C.W., Draper, B.A. and Peterson, D.A. (2000). Behavioral cloning of student pilots with modular neural networks, Proceedings of the 17th International Conference on Machine Learning (ML-2000), Stanford, CA, USA, pp. 25-32.

[001] Atkeson, C.G. and Schaal, S. (1997). Robot learning from demonstration, Proceedings of the 14th International Conference on Machine Learning (ML-97), Nashville, TN, USA, pp. 12-20.

[002] Baluja, S. (1996). Evolution of an artificial neural network based autonomous land vehicle controller, IEEE Transactions on Systems, Man and Cybernetics 26(3): 450-463.

[003] Bratko, I., Urbancic, T. and Sammut, C. (1998). Behavioural cloning of control skill, in R.S. Michalski, I. Bratko and M. Kubat (Eds.), Machine Learning and Data Mining, John Wiley & Sons, Chichester.

[004] Breiman, L. (1996). Bagging predictors, Machine Learning 24(2): 123-240. | Zbl 0858.68080

[005] Breiman, L. (2001). 45(1): 5-32.

[006] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees, Chapman and Hall, New York, NY. | Zbl 0541.62042

[007] Buehler, M., Iagnemma, K. and Singh, S. (Eds.) (2007). The 2005 DARPA Grand Challenge: The Great Robot Race, Springer, Berlin.

[008] Buehler, M., Iagnemma, K. and Singh, S. (Eds.) (2009). The DARPA Urban Challenge: Autonomous Vehicles in City Traffic, Springer, Berlin.

[009] Cardamone, L., Loiacono, D. and Lanzi, P. (2009a). On-line neuroevolution applied to The Open Racing Car Simulator, Proceedings of the 2009 IEEE Congress on Evolutionary Computation (CEC-09), Trondheim, Norway, pp. 2622-2629.

[010] Cardamone, L., Loiacono, D. and Lanzi, P. (2010). Learning to drive in The Open Racing Car Simulator using online neuroevolution, IEEE Transactions on Computational Intelligence and AI in Games 2(3): 176-190.

[011] Cardamone, L., Loiacono, D. and Lanzi, P.L. (2009b). Learning drivers for TORCS through imitation using supervised methods, Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games (CIG-09), Milano, Italy, pp. 148-155.

[012] Chambers, R.A. and Michie, D. (1969). Man-machine co-operation on a learning task, in R. Parslow, R. Prowse and R. Elliott-Green (Eds.), Computer Graphics: Techniques and Applications, Plenum, London, pp. 179-186.

[013] Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning, Journal of Artificial Intelligence Research 2: 287-318.

[014] Cichosz, P. (2007). Learning Systems, 2nd Edn., WNT, Warsaw, (in Polish). | Zbl 0930.93048

[015] D'Este, C., O'Sullivan, M. and Hannah, N. (2003). Behavioural cloning and robot control, Proceedings of the International Conference on Robotics and Applications, Salzburg, Austria, pp. 179-182.

[016] Dietterich, T.G. (2000). Ensemble methods in machine learning, Proceedings of the 1st International Workshop on Multiple Classifier Systems, Cagliari, Italy, pp. 1-15.

[017] Esposito, F., Malerba, D. and Semeraro, G. (1997). A comparative analysis of methods for pruning decision trees, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5): 476-491.

[018] Forbes, J.R.N. (2002). Reinforcement Learning for Autonomous Vehicles, Ph.D. thesis, University of California at Berkeley, Berkeley, CA.

[019] Guizzo, E. (2011). How Google's self-driving car works, IEEE Spectrum, http://spectrum.ieee.org.

[020] Han, J. and Kamber, M. (2006). Data Mining: Concepts and Techniques, 2nd Edn., Morgan Kaufmann, San Francisco, CA. | Zbl 05951239

[021] Hertz, J., Krogh, A. and Palmer, R.G. (1991). Introduction to the Theory of Neural Computation, Addison-Wesley, Boston, MA.

[022] John, G.H. (1996). Robust linear discriminant trees, in D. Fisher and H. Lenz (Eds.), Learning from Data: Artificial Intelligence and Statistics V, Springer, New York, NY, pp. 375-385.

[023] Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4: 237-285.

[024] Kohl, N., Stanley, K., Miikkulainen, R., Samples, M. and Sherony, R. (2006). Evolving a real-world vehicle warning system, Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO-06), Seattle, WA, USA, pp. 1681-1688.

[025] Krödel, M. and Kuhnert, K.-D. (2002). Reinforcement learning to drive a car by pattern matching, Proceedings of the 24th DAGM Symposium on Pattern Recognition, Zurich, Switzerland, pp. 322-329. | Zbl 1017.68762

[026] Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Kolter, J., Langer, D., Pink, O., Pratt, V., Sokolsky, M., Stanek, G., Stavens, D., Teichman, A., Werling, M. and Thrun, S. (2011). Towards fully autonomous driving: Systems and algorithms, Proceedings of the IEEE Intelligent Vehicles Symposium (IV-11), Baden-Baden, Germany, pp. 163-168.

[027] Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest, R News 2/3: 18-22.

[028] Loiacano, D., Cardamone, L. and Lanzi, P.L. (2009). Simulated car racing championship 2009: Competition software manual, Technical report, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano.

[029] Loiacono, D., Prete, A., Lanzi, P. L. and Cardamone, L. (2010). Learning to overtake in TORCS using simple reinforcement learning, Proceedings of the 2010 IEEE Congress on Evolutionary Computation (CEC-2010), Barcelona, Spain, pp. 1-8.

[030] Mitchell, T. (1997). Machine Learning, McGraw Hill, New York, NY. | Zbl 0913.68167

[031] Munoz, J., Gutierrez, G. and Sanchis, A. (2009). Controller for TORCS created by imitation, Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games (CIG-09), Milano, Italy, pp. 271-278.

[032] Park, B.-H. and Kargupta, H. (2002). Constructing simpler decision trees from ensemble models using Fourier analysis, Proceedings of the 7th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Madison, WI, USA, pp. 18-23.

[033] Pomerleau, D. (1988). ALVINN: An autonomous land vehicle in a neural network, Advances in Neural Information Processing Systems 1 (NIPS-88), Denver, CO, USA, pp. 305-313.

[034] Quinlan, J.R. (1986). Induction of decision trees, Machine Learning 1(1): 81-106.

[035] Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.

[036] Quinlan, J.R. (1999). Simplifying decision trees, International Journal of Human-Computer Studies 51(2): 497-491.

[037] R Development Core Team (2010). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, www.R-project.org.

[038] Sammut, C. (1996). Automatic construction of reactive control systems using symbolic machine learning, Knowledge Engineering Review 11(1): 27-42.

[039] Sammut, C., Hurst, S., Kedzier, D. and Michie, D. (1992). Learning to fly, Proceedings of the 9th International Conference on Machine Learning (ML-92), Aberdeen, UK, pp. 385-393.

[040] Stavens, D.M. (2011). Learning to Drive: Perception for Autonomous Cars, Ph.D. thesis, Stanford University, Stanford, CA.

[041] Sutton, R.S. and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.

[042] Therneau, T.M. and Atkinson, E.J. (1997). An introduction to recursive partitioning using the RPART routines, Technical report, Mayo Clinic, Rochester, MN.

[043] Thrun, S. (2010). What we're driving at, Google Official Blog, http://googleblog.blogspot.com/2010/10/ what-were-driving-at.html.

[044] Togelius, J., De Nardi, R. and Lucas, S.M. (2006). Making racing fun through player modeling and track evolution, Proceedings of the SAB-06 Workshop on Adaptive Approaches for Optimizing Player Satisfaction in Computer and Physical Games, Rome, Italy, pp. 61-70.

[045] Triviño Rodriguez, J.L., Ruiz-Sepúlveda, A. and Morales-Bueno, R. (2008). How an ensemble method can compute a comprehensible model, Proceedings of the 10th International Conference Data Warehousing and Knowledge Discovery (DaWaK-08), Turin, Italy, pp. 368-378.

[046] Urbancic, T. and Bratko, I. (1994). Reconstructing human skill with machine learning, Proceedings of the 11th European Conference on Artificial Intelligence (ECAI-94), Amsterdam, The Netherlands, pp. 498-502.

[047] Utgoff, P. E. (1989). Incremental induction of decision trees, Machine Learning 4(2): 161-186.

[048] Van Assche, A. and Blockeel, H. (2007). Seeing the forest through the trees: Learning a comprehensible model from an ensemble, Proceedings of the 18th European Conference on Machine Learning (ECML-07), Warsaw, Poland, pp. 418-429. | Zbl 1136.68506

[049] Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edn., Morgan Kaufmann, San Francisco, CA. | Zbl 1076.68555

[050] Wymann, B. (2006). TORCS manual installation and robot tutorial, http://www.berniw.org/aboutme/publications/torcs.pdf.

[051] Zajdel, R. (2013). Epoch-incremental reinforcement learning algorithms, International Journal of Applied Mathematics and Computer Science 23(3): 623-635, DOI: 10.2478/amcs-2013-0047. | Zbl 1281.93113