Grid Datafarm Architecture for Petascale Data Intensive Computing http://datafarm.apgrid.org/ Osamu Tatebe^1, Youhei Morita^2, Satoshi Matsuoka^3, Noriyuki Soda^4, Satoshi Sekiguchi^1 1. National Institute of Advanced Industrial Science and Technology (AIST) 2. High Energy Accelerator Research Organization (KEK) 3. Tokyo Institute of Technology 4. Software Research Associates, Inc. Abstract The Grid Datafarm (Gfarm) architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. Gfarm parallel I/O APIs and commands provide a single filesystem image and manipulate filesystem metadata consistently. Fault tolerance and load balancing are automatically managed by file duplication or re-computation using a command history log. Preliminary performance evaluation has shown scalable disk I/O and network bandwidth on 64 nodes of the Presto III Athlon cluster. The Gfarm parallel I/O write and read operations has achieved data transfer rates of 1.74 GB/s and 1.97 GB/s, respectively, using 64 cluster nodes. The Gfarm parallel file copy reached 443 MB/s with 23 parallel streams on the Myrinet 2000. The Gfarm architecture is expected to enable petascale data-intensive Grid computing with an I/O bandwidth scales to the TB/s range and scalable computational power.