Abstract
A parallel finite element groundwater transport code is used to compare three different strategies for performing parallel I/O: (1) have a single processor collect data and perform sequential I/O in large blocks, (2) use variations of vendor-specific I/O extensions, or (3) use the extended distributed object network I/O (EDONIO) library. Each processor performs many writes of 1 to 4 kilobytes to reorganize local data in a global shared file. The findings suggest having a single processor collect data and per form large block-contiguous operations may be quite effi cient and portable for up to 32 processor configurations. This approach does not scale well for a larger number of processors because the single processor becomes a bot tleneck for gathering data. The effective application I/O rate observed, which includes times for opening and clos ing files, is only a fraction of the peak device read/write rates. Some form of data redistribution and buffering in remote memory as performed in EDONIO may yield sig nificant improvements for noncontiguous data I/O access patterns and short requests. Implementors of parallel I/O systems may consider some form of buffering as per formed in EDONIO to speed up such I/O requirements.
Get full access to this article
View all access options for this article.
