We have prototyped and analyzed design of a novel approach for the high throughput computing-a core element for the emerging HENP computational grid.Independent event processing in HENP is well suted for computing in ...We have prototyped and analyzed design of a novel approach for the high throughput computing-a core element for the emerging HENP computational grid.Independent event processing in HENP is well suted for computing in parallel.The prototype facilitateds use of inexpensive mass-market components by poviding fault tolerant resilienece (instead of the expensive total system reliablity) via highly scalable management components. The ability to handle both hardware and software failures on a large dedicated HENP facility limits the need for user intervention.A robust data management is especially important in HENP computing since large data-flows occur before and /or atfer each processing task.The architecture of our active object object coordination schema implements a multi-level hierarchical agent model,It provides fault tolerance by splitting a large overall task into independent atomic processes,performed by lower level agents synchronizing each other via a local database.Necessary control function performed by higher level agents interact with the same database thus managing distributed data production.The system has been tested in production environment for simulations in the STAR experiment at RHIC.Our architectural prototype controlled processes on more than a hundred processors at a time and has run for extended periods of time.Twenty terabytes of simulated data hava been produced.The generic nature of our two level architectural solution fault tolerance in distributed environment has been demonstrated by ist successful test for the grid file replication services between BNL and LBNL.展开更多
文摘We have prototyped and analyzed design of a novel approach for the high throughput computing-a core element for the emerging HENP computational grid.Independent event processing in HENP is well suted for computing in parallel.The prototype facilitateds use of inexpensive mass-market components by poviding fault tolerant resilienece (instead of the expensive total system reliablity) via highly scalable management components. The ability to handle both hardware and software failures on a large dedicated HENP facility limits the need for user intervention.A robust data management is especially important in HENP computing since large data-flows occur before and /or atfer each processing task.The architecture of our active object object coordination schema implements a multi-level hierarchical agent model,It provides fault tolerance by splitting a large overall task into independent atomic processes,performed by lower level agents synchronizing each other via a local database.Necessary control function performed by higher level agents interact with the same database thus managing distributed data production.The system has been tested in production environment for simulations in the STAR experiment at RHIC.Our architectural prototype controlled processes on more than a hundred processors at a time and has run for extended periods of time.Twenty terabytes of simulated data hava been produced.The generic nature of our two level architectural solution fault tolerance in distributed environment has been demonstrated by ist successful test for the grid file replication services between BNL and LBNL.