A fundamental issue affecting the performance of a parallel application running on message-passing parallel system is the assignment of tasks to processors in order to achieve the minimum completion time. Tools for static and dynamic task assignment can be considered complementary: static mapping tools compute an initial assignment of tasks on processors while dynamic load balancing tools are used at execution time. In this paper, we present a compilation-time two-stage mapping strategy (denoted as CREMA: Clustering and Reassignment-based Mapping Algorithm) used for efficiently mapping arbitrary programs (modelled as TIGs: Task Interaction Graph) onto message-passing parallel systems with any topology. Moreover, we also present a new fully distributed dynamic load balancing algorithm (denoted as DASUD: Diffusion Algorithm Searching Unbalanced Domains) for load balancing among the processors of an arbitrary interconnected network of processors. We include a description of both strategies and the results obtained in their respective evaluation.