Design and Evaluation of the Hamal Parallel Computer

Please use this identifier to cite or link to this item: http://dspace.mediu.edu.my:8181/xmlui/handle/1721.1/6828

Full metadata record

DC Field	Value	Language
dc.creator	Grossman, J.P.	-
dc.date	2004-10-20T20:00:24Z	-
dc.date	2004-10-20T20:00:24Z	-
dc.date	2002-12-05	-
dc.date.accessioned	2013-10-09T02:47:07Z	-
dc.date.available	2013-10-09T02:47:07Z	-
dc.date.issued	2013-10-09	-
dc.identifier	AITR-2002-011	-
dc.identifier	http://hdl.handle.net/1721.1/6828	-
dc.identifier.uri	http://koha.mediu.edu.my:8181/xmlui/handle/1721	-
dc.description	Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.	-
dc.format	186 p.	-
dc.format	14854547 bytes	-
dc.format	6844439 bytes	-
dc.format	application/postscript	-
dc.format	application/pdf	-
dc.language	en_US	-
dc.relation	AITR-2002-011	-
dc.subject	AI	-
dc.subject	parallel	-
dc.subject	network	-
dc.subject	simulation	-
dc.subject	hashing	-
dc.subject	multithreading	-
dc.subject	synchronization	-
dc.title	Design and Evaluation of the Hamal Parallel Computer	-
Appears in Collections:	MIT Items

Files in This Item:

There are no files associated with this item.

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets