Publications by year

2015  2014  2013  2012  2011  2010  2009  2008  2007  2006  2005  2004  2003  2002  2001  2000  1999  1998  

Most documents on this website are protected by copyright. By clicking on a PDF icon, you confirm that you or your institution has the right to do so. Note that the definitive versions of all EG papers (Eurographics,...) can be downloaded from ACM papers (Siggraph, ...) can be downloaded from


“Experimentation of Data Locality Performance for a Parallel Hierarchical Algorithm on the Origin2000”
Xavier Cavin, Laurent Alonso and Jean-Claude Paul
Fourth European CRAY-SGI MPP Workshop, Garching/Munich, Germany, 1998

Abstract: Hierarchical algorithms form a class of applications widely being used in high-performance scientific computing, due to their capability to solve very large physical problems. They are based on the physical property that the further two points are, the less they influence each other. However, their irregular and dynamic characteristics make parallelizing them efficiently a challenge. Indeed, two conflicting objectives have to be taken into account\,: load balancing and data locality. It has been shown that the message passing paradigm was not well suited for this kind of applications, because of the intensive communication they introduce. Implicit communication through a shared address space appears to be better adapted. Particularly, the ccNUMA architecture of the Origin2000 can help us getting the desired data locality through its memory hierarchy. We have experimented a parallel implementation of a well known computer graphics hierarchical algorithm\,: the wavelet radiosity. This algorithm is a very efficient approach to compute global illumination in diffuse environments but still remains too much time and memory consuming when dealing with extremely complex models. Our parallel algorithm focuses on load balancing optimization and heavily relies on the ccNUMA architecture efficiency for data locality. Load balancing is handled with a general dynamic tasking mechanism with specific improvements. Minimal efforts are made towards memory management (like the writing of thread-safe non-blocking malloc/free C functionalities) and the Origin2000 proves all its capabilities to efficiently handle the natural data locality of our application. Our best results yield a speed-up of 24 with 36 processors. Moreover, we were able to compute the illumination of a complex scene (a cloister in Quito, composed of 54789 initial surfaces and leading to 600000 final meshes) in 2 hours 41 minutes with 24 processors. To the knowledge of the authors, this is the most complex real world scene ever computed.

BibTex reference

   AUTHOR     = "Cavin, Xavier and Alonso, Laurent and Paul, Jean-Claude",
   TITLE      = "Experimentation of Data Locality Performance for  a Parallel Hierarch
                   ical Algorithm on the Origin2000",
   BOOKTITLE  = "Fourth European CRAY-SGI MPP Workshop, Garching/Munich, Germany",
   YEAR       = "1998",
   NUMBER     = "R/46",
   PAGES      = "178-187",
   MONTH      = "sep",