Performance evaluation of MapReduce using full virtualisation on a departmental cloud
Horacio González-Vélez ; Maryam Kontagora
International Journal of Applied Mathematics and Computer Science, Tome 21 (2011), p. 275-284 / Harvested from The Polish Digital Mathematics Library

This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1 + 16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced when using Hadoop on this virtualisation platform on a departmental cloud. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.

Publié le : 2011-01-01
EUDML-ID : urn:eudml:doc:208046
@article{bwmeta1.element.bwnjournal-article-amcv21i2p275bwm,
     author = {Horacio Gonz\'alez-V\'elez and Maryam Kontagora},
     title = {Performance evaluation of MapReduce using full virtualisation on a departmental cloud},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {21},
     year = {2011},
     pages = {275-284},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv21i2p275bwm}
}
Horacio González-Vélez; Maryam Kontagora. Performance evaluation of MapReduce using full virtualisation on a departmental cloud. International Journal of Applied Mathematics and Computer Science, Tome 21 (2011) pp. 275-284. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv21i2p275bwm/

[000] Anon, E.A. (1998). A measure of transaction processing power, in M. Stonebraker and J.M. Hellerstein (Eds.), Readings in Database Systems, 3rd Edn., Morgan Kaufmann, San Francisco, CA, pp. 609-621.

[001] Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I. and Zaharia, M. (2010). A view of cloud computing, Communications of the ACM 53(4): 50-58.

[002] Bacci, B., Danelutto, M., Pelagatti, S. and Vanneschi, M. (1999). SkIE: A heterogeneous environment for HPC applications, Parallel Computing 25(13): 1827-1852.

[003] Beaumont, O., Casanova, H., Legrand, A., Robert, Y. and Yang, Y. (2005). Scheduling divisible loads on star and tree networks: Results and open problems, IEEE Transactions on Parallel and Distributed Systems 16(3): 207-218.

[004] Buono, D., Danelutto, M. and Lametti, S. (2010). Map, reduce and MapReduce, the skeleton way, Procedia Computer Science 1(1): 2089-2097.

[005] Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. and Brandic, I. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems-The International Journal of Grid Computing: Theory Methods and Applications 25(6): 599-616.

[006] Buzen, J.P. and Gagliardi, U.O. (1973). The evolution of virtual machine architecture, Proceedings of the National Computer Conference and Exposition, AFIPS '73 , ACM, New York, NY, pp. 291-299.

[007] Cole, M. (1989). Algorithmic Skeletons: Structured Management of Parallel Computation, Pitman/MIT Press, London. | Zbl 0681.68041

[008] Cole, M. (2004). Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming, Parallel Computing 30(3): 389-406.

[009] Danelutto, M. (2004). Adaptive task farm implementation strategies, 12th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, PDP 2004, IEEE, La Coruña, pp. 416-423.

[010] Dean, J. and Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters, Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation OSDI'04, Vol. 6, USENIX, San Francisco, CA, pp. 137-150.

[011] Dean, J. and Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters, Communications of the ACM 51(1): 107-113.

[012] González-Vélez, H. (2006). Self-adaptive skeletal task farm for computational grids, Parallel Computing 32(7-8): 479-490.

[013] González-Vélez, H. and Cole, M. (2010a). Adaptive statistical scheduling of divisible workloads in heterogeneous systems, Journal of Scheduling 13(4): 427-441.

[014] González-Vélez, H. and Cole, M. (2010b). Adaptive structured parallelism for distributed heterogeneous architectures: A methodological approach with pipelines and farms, Concurrency and Computation: Practice and Experience 22(15): 2073-2094.

[015] González-Vélez, H. and Leyton, M. (2010). A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers, Software: Practice and Experience 40(12): 1135-1160.

[016] Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S. and Shi, X. (2009). Evaluating MapReduce on virtual machines: The Hadoop case, in M. Jaatun, G. Zhao, and C. Rong (Eds.) CloudCom 2009, Lecture Notes in Computer Science, Vol. 5931, Springer-Verlag, Berlin/Heidelberg, pp. 519-528.

[017] Kontagora, M. and González-Vélez, H. (2010). Benchmarking a MapReduce environment on a full virtualisation platform, in L. Barolli, F. Xhafa, S. Vitabile and H.-H. Hsu (Eds.), CISIS 2010, The Fourth International Conference on Complex, Intelligent and Software Intensive Systems, Krakow, Poland, 15-18 February 2010, IEEE Computer Society, Washington, DC, pp. 433-438.

[018] Kuchen, H. and Striegnitz, J. (2005). Features from functional programming for a C++ skeleton library, Concurrency and Computation: Practice and Experience 17(7-8): 739-756.

[019] Mesghouni, K., Hammadi, S. and Borne, P. (2004). Evolutionary algorithms for job-shop scheduling, International Journal of Applied Mathematics and Computer Science 14(1): 91-103. | Zbl 1171.90402

[020] Nagarajan, A.B., Mueller, F., Engelmann, C. and Scott, S.L. (2007). Proactive fault tolerance for HPC with Xen virtualization, in B. J. Smith (Ed.), Proceedings of the 21th Annual International Conference on Supercomputing, ICS 2007, Seattle, Washington, USA, June 17-21, 2007, ACM, New York, NY, pp. 23-32.

[021] Nokia Research Center (2009). Disco, Manual version 0.2.3, Nokia Research Center, discoproject.org.

[022] Pisoni, A. (2007). Skynet, Manual version 0.9.3, Geni.com, skynet.rubyforge.org.

[023] Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G. and Kozyrakis, C. (2007). Evaluating MapReduce for multi-core and multiprocessor systems, 13th International Conference on High-Performance Computer Architecture (HPCA-13 2007), Phoenix, AZ, USA, pp. 13-24.

[024] Robertazzi, T.G. (2003). Ten reasons to use divisible load theory, Computer 36(5): 63-68.

[025] Sandholm, T. and Lai, K. (2009). MapReduce optimization using regulated dynamic prioritization, in J.R. Douceur, A.G. Greenberg, T. Bonald, J. Nieh (Eds.), Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, SIGMETRICS/Performance 2009, Seattle, WA, USA, June 15-19, 2009, ACM, New York, NY, pp. 299-310.

[026] The Apache Software Foundation (2008). Hadoop MapReduce tutorial, Manual version 0.15, Hadoop Project, hadoop.apache.org.

[027] VMware (2007). Understanding full virtualization, paravirtualization, and hardware assist, White Paper Revision: 20070911, VMware, Inc., Palo Alto, CA.

[028] Whitaker, A., Shaw, M. and Gribble, S.D. (2002). Scale and performance in the Denali isolation kernel, ACM SIGOPS Operating Systems Review 36(SI): 195-209.

[029] Youseff, L., Wolski, R., Gorda, B. and Krintz, C. (2006). Paravirtualization for HPC systems, in G. Min, B. Di Martino, L.T. Yang, M. Guo and Gudula Rünger (Eds.), Frontiers of High Performance Computing and Networking - ISPA 2006 International Workshops, Sorrento, Italy, December 4-7, 2006, Lecture Notes in Computer Science, Vol. 4331, Springer-Verlag, Berlin/Heidelberg, pp. 474-486.

[030] Zaharia, M., Konwinski, A., Joseph, A., Katz, R. and Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments, in R. Draves and R. van Renesse (Eds.), 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, USENIX Association, Berkeley, CA.