EXECUTION REPLAY AND DEBUGGING OF DISTRIBUTED MULTI-THREADED PARALLEL PROGRAMS
Jacques Chassin de Kergommeaux ; Michiel Ronsse ; Koen De Bosschere
Computing and Informatics, Tome 28 (2012) no. 1, / Harvested from Computing and Informatics
Clusters of shared-memory symmetric multiprocessors are  increasingly used for high performance computing. To exploit in a convenient way both the inner parallelism of nodes and the parallelism between nodes, programming models for communicating threads are being developed. However, most of these models result in programs exhibiting non-deterministic behavior. This makes cyclic debugging of programs impossible, unless an efficient execution replay system can be provided. This article describes such an execution replay system for distributed thread programming combining synchronization primitives for threads sharing the same node, with communication primitives for threads of different nodes.  The execution replay system combines the most efficient trace size reduction technique for shared memory, based on the use of logical clocks, with a very efficient compression technique for trace data that originates from the test functions used in non-blocking communications.
Publié le : 2012-01-26
Classification: 
@article{cai575,
     author = {Jacques Chassin de Kergommeaux and Michiel Ronsse and Koen De Bosschere},
     title = {EXECUTION REPLAY AND DEBUGGING OF DISTRIBUTED MULTI-THREADED PARALLEL PROGRAMS},
     journal = {Computing and Informatics},
     volume = {28},
     number = {1},
     year = {2012},
     language = {en},
     url = {http://dml.mathdoc.fr/item/cai575}
}
Jacques Chassin de Kergommeaux; Michiel Ronsse; Koen De Bosschere. EXECUTION REPLAY AND DEBUGGING OF DISTRIBUTED MULTI-THREADED PARALLEL PROGRAMS. Computing and Informatics, Tome 28 (2012) no. 1, . http://gdmltest.u-ga.fr/item/cai575/