Energy consumption in a Chip MultiProcessor (CMP) is one of the most important costs. It is related to design aspects such as thermal and power constrains. Besides efficient on-chip processing elements, a well-designed Processor Allocator (PA) and a Network-on-Chip (NoC) are also important factors in the energy budget of novel CMPs. In this paper, the authors propose an energy model for NoCs with 2D-mesh and 2D-torus topologies. All important NoC architectures are described and discussed. Energy estimation is presented for PAs. The estimation is based on synthesis results for PAs targeting FPGA. The PAs are driven by allocation algorithms that are studied as well. The proposed energy model is employed in a simulation environment, where exhaustive experiments are performed. Simulation results show that a PA with an IFF allocation algorithm for mesh systems and a torus-based NoC with express-virtual-channel flow control are very energy efficient. Combination of these two solutions is a clear choice for modern CMPs.
@article{bwmeta1.element.bwnjournal-article-amcv21i2p385bwm, author = {Dawid Zydek and Henry Selvaraj and Grzegorz Borowik and Tadeusz \L uba}, title = {Energy characteristic of a processor allocator and a network-on-chip}, journal = {International Journal of Applied Mathematics and Computer Science}, volume = {21}, year = {2011}, pages = {385-399}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv21i2p385bwm} }
Dawid Zydek; Henry Selvaraj; Grzegorz Borowik; Tadeusz Łuba. Energy characteristic of a processor allocator and a network-on-chip. International Journal of Applied Mathematics and Computer Science, Tome 21 (2011) pp. 385-399. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv21i2p385bwm/
[000] Ababneh, I. (2006). An efficient free-list submesh allocation scheme for two-dimensional mesh-connected multicomputers, Journal of Systems and Software 79(8): 1168-1179, DOI: 10.1016/j.jss.2006.01.019.
[001] Altera Corporation (2009). Quatrus II 9.1 Handbook, Vol. 3, Altera, San Jose, CA.
[002] Boura, Y. and Das, C. R. (1994). Efficient fully adaptive wormhole routing in n-dimensional meshes, 14th International Conference on Distributed Computing Systems, Poznań, Poland, pp. 589-596, DOI: 10.1109/ICDCS.1994.302473.
[003] Cardarilli, G., Re, A. D., Nannarelli, A. and Re, M. (2002). Power characterization of digital filters implemented on FPGA, IEEE International Symposium on Circuits and Systems (ISCAS 2002), Vol. 5, pp. 801-804, DOI: 10.1109/ISCAS.2002.1010825.
[004] Chmaj, G., Zydek, D. and Koszalka, L. (2004). Comparison of task allocation algorithms for mesh-structured systems, Computer Systems Engineering, Theory & Applications, 4th Polish-British Workshop, Szklarska Poręba, Poland, pp. 39-50.
[005] Dally, W. (1990). Performance analysis of k-ary n-cube interconnection networks, IEEE Transactions on Computers 39(6): 775-785, DOI: 10.1109/12.53599.
[006] Dally, W. (1992). Virtual-channel flow control, IEEE Transactions on Parallel and Distributed Systems 3(2): 194-205.
[007] Dally, W. and Seitz, C.L. (1987). Deadlock-free message routing in multiprocessor interconnection networks, IEEE Transactions on Computers 36(5): 547-553, DOI: 10.1109/TC.1987.1676939. | Zbl 0617.68037
[008] Dally, W. and Towles, B. (2001). Route packets, not wires: On-chip interconnection networks, 38th Annual Design Automation Conference, pp. 684-689, DOI: 10.1109/DAC.2001.156225.
[009] Dally, W. and Towles, B. (2004). Principles and Practices of Interconnection Networks, Morgan Kaufmann, San Francisco, CA.
[010] Ding, J. and Bhuyan, L. N. (1993). An adaptive submesh allocation strategy for two-dimensional mesh connected systems, International Conference on Parallel Processing, Syracuse, NY, USA, Vol. 2, pp. 193-200, DOI: 10.1109/ICPP.1993.39.
[011] Duato, J., Yalamanchili, S. and Ni, L. (2003). Interconnection Networks, Morgan Kaufmann, San Francisco, CA.
[012] Jayasimha, D., Zafar, B. and Hoskote, Y. (2006). On-chip interconnection networks: Why they are different and how to compare them, Technical report, Intel Corp, Oration ,Santa Clara, CA.
[013] Kavaldjiev, N., Smit, G.J.M. and Jansen, P.G. (2004). A virtual channel router for on-chip networks, IEEE International System-on-Chip Conference, pp. 289-293, DOI: 10.1109/SOCC.2004.1362438..
[014] Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N. and Das, C.R. (2005). A low latency router supporting adaptivity for on-chip interconnects, 42nd Annual Design Automation Conference, pp. 559-564, DOI: 10.1145/1065579.1065726..
[015] Krishna, T., Kumarand, A., Chiang, P., Erez, M. and Peh, L.S. (2008). NoC with near-ideal express virtual channels using global-line communication, 16th IEEE Symposium on High Performance Interconnects, pp. 11-20, DOI: 10.1109/HOTI.2008.22.
[016] Krueger, P., Lai, T. H. and Dixit-Radiya, V. A. (1994). Job scheduling is more important than processor allocation for hypercube computers, IEEE Transactions on Parallel and Distributed Systems 5(5): 488-497, DOI: 10.1109/71.282559.
[017] Kumar, A., Peh, L.S., Kundu, P. and Jha, N.K. (2007). Express virtual channels: Towards the ideal interconnection fabric, ACM SIGARCH Computer Architecture News 35(2): 150-161, DOI: 10.1145/1273440.1250681.
[018] Mohapatra, P., Yu, C., Das, C.R. and Kim, J. (1993). A lazy scheduling for improving hypercube performance, The 1993 International Conference on Parallel Processing (ICPP '93), Vol. 1, pp. 110-117, DOI: 10.1109/ICPP.1993.26.
[019] Rezazad, M. and Sarbazi-Azad, H. (2005). The effect of virtual channel organization on the performance of interconnection networks, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), Vol. 15, DOI: 10.1109/IPDPS.2005.427.
[020] Rose, C., Heiss, H.U. and Linnert, B. (2007). Distributed dynamic processor allocation for multicomputers, Parallel Computing 33(3): 145-158, DOI: 10.1016/j.parco.2006.11.010.
[021] Su, C. and Shin, K.G. (1993). Adaptive deadlock-free routing in multicomputers using only one extra virtual channel, 1993 International Conference on Parallel Processing, Vol. 1, pp. 227-231, DOI: 10.1109/ICPP.1993.37.
[022] Taylor, M., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S. and Agarwal, A. (2002). The raw microprocessor: A computational fabric for software circuits and general-purpose programs, IEEE Micro 22(2): 25-35, DOI: 10.1109/MM.2002.997877.
[023] Upadhyay, J., Varavithya, V. and Mohapatra, P. (1997). A traffic-balanced adaptive wormhole routing scheme for two-dimensional meshes, IEEE Transactions on Computers 46(2): 190-197, DOI: 10.1109/12.565594.
[024] Valiant, L. and Brebner, G.J. (1981). Universal schemes for parallel communication, 13th Annual ACM Symposium on Theory of Computing, pp. 263-277, DOI: 10.1145/800076.802479.
[025] Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y. and Borkar, N. (2007). An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS, IEEE International Solid-State Circuits Conference (ISSCC 2007), San Francisco, CA, USA, pp. 98-589, DOI: 10.1109/ISSCC.2007.373606.
[026] Wolkotte, P., Smit, G. and Becker, J. (2005a). Energy efficient NoC for best effort communication, 15th International Conference on Field Programmable Logic and Applications, pp. 197-202, DOI: 10.1109/FPL.2005.1515722.
[027] Wolkotte, P., Smit, G.J.M., Kavaldjiev, N., Becker, J.E. and Becker, J. (2005b). Energy model of networks-on-chip and a bus, 2005 International Symposium on System-onChip, Tampere, Finland, pp. 82-85, DOI: 10.1109/ISSOC.2005.1595650.
[028] Ye, T., Benini, L. and Micheli, G.D. (2002). Analysis of power consumption on switch fabrics in network routers, 39th Annual Design Automation Conference, pp. 524-529, DOI: 10.1145/513918.514051.
[029] Yoo, B. and Das, C.R. (2002). A fast and efficient processor allocation scheme for mesh-connected multicomputers, IEEE Transactions on Computers 51(1): 46-60, DOI: 10.1109/12.980016.
[030] Zhu, Y. (1992). Efficient processor allocation strategies for mesh-connected parallel computers, Journal of Parallel and Distributed Computing 16(4): 328-337, DOI: 10.1016/0743-7315(92)90016-G. | Zbl 0786.68016
[031] Zydek, D. and Selvaraj, H. (2009). Processor allocation problem for NoC-based chip multiprocessors, 6th International Conference on Information Technology: New Generations (ITNG 2009), Las Vegas, NV, USA, pp. 96-101, DOI: 10.1109/ITNG.2009.182.
[032] Zydek, D. and Selvaraj, H. (2011). Fast and efficient processor allocation algorithm for torus-based chip multiprocessors. Journal of Computers & Electrical Engineering 37(1): 91-105, DOI: 10.1016/j.compeleceng.2010.10.001.. | Zbl 1214.68116
[033] Zydek, D. and Selvaraj, H. (2010). Hardware implementation of processor allocation schemes for mesh-based chip multiprocessors, Microprocessors and Microsystems 34(1): 39-48, DOI: 10.1016/j.micpro.2009.11.003.
[034] Zydek, D., Selvaraj, H. and Gewali, L. (2010). Synthesis of processor allocator for torus-based chip multiprocessors, 7th International Conference on Information Technology: New Generations (ITNG 2010), Las Vegas, NV, USA, pp. 13-18, DOI: 10.1109/ITNG.2010.145.
[035] Zydek, D., Shlayan, N., Regentova, E. and Selvaraj, H. (2008). Review of packet switching technologies for future NoC, 19th International Conference on Systems Engineering (ICSEng 2008), Las Vegas, NV, USA, pp. 306-311, DOI: 10.1109/ICSEng.2008.47.