Publications by year
Most documents on this website are protected by copyright.
By clicking on a PDF icon, you confirm that you or your institution
has the right to do so. Note that the definitive versions of all EG papers (Eurographics,...) can be downloaded from http://www.eg.org/EG/DL. ACM papers (Siggraph, ...) can be downloaded from http://www.acm.org/dl/.
Experimentation of Data Locality Performance for a Parallel Hierarchical Algorithm on the Origin2000
Fourth European CRAY-SGI MPP Workshop, Garching/Munich, Germany, 1998
Abstract: Hierarchical algorithms form a class of applications widely being used in high-performance scientific computing, due to their capability to solve very large physical problems. They are based on the physical property that the further two points are, the less they influence each other. However, their irregular and dynamic characteristics make parallelizing them efficiently a challenge. Indeed, two conflicting objectives have to be taken into account\,: load balancing and data locality.It has been shown that the message passing paradigm was not well suited for this kind of applications, because of the intensive communication they introduce. Implicit communication through a shared address space appears to be better adapted. Particularly, the ccNUMA architecture of the Origin2000 can help us getting the desired data locality through its memory hierarchy.We have experimented a parallel implementation of a well known computer graphics hierarchical algorithm\,: the wavelet radiosity. This algorithm is a very efficient approach to compute global illumination in diffuse environments but still remains too much time and memory consuming when dealing with extremely complex models.Our parallel algorithm focuses on load balancing optimization and heavily relies on the ccNUMA architecture efficiency for data locality. Load balancing is handled with a general dynamic tasking mechanism with specific improvements. Minimal efforts are made towards memory management (like the writing of thread-safe non-blocking malloc/free C functionalities) and the Origin2000 proves all its capabilities to efficiently handle the natural data locality of our application.Our best results yield a speed-up of 24 with 36 processors. Moreover, we were able to compute the illumination of a complex scene (a cloister in Quito, composed of 54789 initial surfaces and leading to 600000 final meshes) in 2 hours 41 minutes with 24 processors. To the knowledge of the authors, this is the most complex real world scene ever computed.
AUTHOR = "Cavin, Xavier and Alonso, Laurent and Paul, Jean-Claude",
TITLE = "Experimentation of Data Locality Performance for a Parallel Hierarch
ical Algorithm on the Origin2000",
BOOKTITLE = "Fourth European CRAY-SGI MPP Workshop, Garching/Munich, Germany",
YEAR = "1998",
NUMBER = "R/46",
PAGES = "178-187",
MONTH = "sep",
PUBLISHER = "IPP",