Welcome!

Open Source Authors: Ignacio M. Llorente, John Ryan, Rebel Brown, Yeshim Deniz, Liz McMillan

Related Topics: Open Source

Open Source: Article

Linux Clustering File Systems

Comparing read and write performance

Clustering is a way to create a system where computers gain access to each others' data and resources. In principle, this adds more computing power and redundancy to the system; however, practical implementations often consume more resources due to the overhead associated with synchronization of facilities in different computers.

One type of a clustering system is one where file systems form a cluster to better serve clients of data stored in the files. For instance, many Internet servers and telecommunications systems can benefit from such a setup, as an error in one computer does not necessarily harm the whole cluster. Instead, the failing computer can simply be removed from the cluster, and others can continue their normal operations.

A number of open source clustering file systems exist for Linux. In this paper, we compare read and write performances of file systems listed in Table 1. The rest of the paper introduces the results structured as follows: first, we define the test setup, second, we introduce the tests we executed and provide the results from the tests, and finally, we present some final points of conclusion.

Test Setting
General Test Setup and Associated Hardware
The hardware environment we used for testing was a two node cluster where both nodes were identical HP dc7600SFF P630 machines. Both nodes consisted of Intel Pentium 4 630, 3.0 GHz processors with 1024 MB of main memory, and 160 GB of disk space between the two physical - HP 80GB SATA 3.0Gb/s - disks. Both nodes ran either Ubuntu Linux 5.10 [11] or Fedora Core 5 [3], depending on the tested file system. To enable clustering, a gigabyte Ethernet connected the nodes. Exact tools, auxiliaries, and their version numbers are listed in Table 2.

Two different test setups were constructed, because the way the different clustering file systems work varies. File systems were tested for read and write performances, which are easy to test for with new file systems that are under constant development. There are no more complicated test cases, and it is worth considering that this is obviously only one very simplistic way to compare file systems. We introduce test configurations in Figure 1.

Test Setup for NFS Derivatives
Figure 1 describes the environment used to test file systems based on NFS [8], including CacheFS, UnionFS, and plain NFS. The environment consists of two different computers. One is a server machine that shares a NFS exported hard drive. The exported hard disk contains an Ext3 file system, and a NFS server that shares the file system to the network. The other is a client node, which mounts the exported disk. This test setup is smaller than a basic cluster, which normally includes more cluster nodes. When performing tests, CacheFS [2] uses Fedora Core kernel. UnionFS [12] and NFS tests are performed with Linux kernel 2.6.16.

GFS and OCFS2 Test Setup
In contrast to NFS derivatives, GFS [4] and OCFS2 [9] were tested with a different test environment. Figure 2 shows this environment, which consists of a cluster node and a storage system (PC). The storage system exports one disk to the storage area network (SAN) using iSCSI. The Target daemon [6] on the PC enables the sharing of the hard drive, and the Open-iSCSI initiator [10] on each node makes the device visible (/dev/sdx). The Logical Volume Manager (LVM) [7] creates one logical volume, where /dev/sdx is attached. The logical volume contains the GFS or OCFS2 file system. The file system will be created only in one node, and after that, the device must be mounted on every machine in cluster. Obviously, a cluster normally includes more nodes, and each one of them uses PC's storage, but this test setup is reduced to the bare minimum. GFS uses the Ubuntu Breezy kernel and the OCFS2 Linux kernel 2.6.16.

Test Cases
Iozone
Iozone [5] is a file system benchmark tool. The benchmark tests file I/O performance for various operations. We executed Iozone's write, rewrite, read, and reread tests. The write test measures the performance of writing the new file. This contains "metadata" overhead that consists of directory information, space allocation, and other data associated with a file that is not part of the data contained in the file. The rewrite test measures the performance of writing a file that already exists, and for that reason, it requires less metadata. The read test measures the performance of reading an existing file. The reread test measures the performance of reading a file that was recently read. Iozone tests were executed with the following command:

$ iozone -Racb output.xls -g 2G -i 0 -i 1

The command executes write and read tests with a 2 Gb file with all different block sizes, and output will be stored in a binary format spreadsheet. There is also one extra option "-c," which includes the close() in the measurement. This helps to reduce the client-side cache effects of NFS version 3.

Bonnie++
Bonnie++ [1] is a benchmark suite that performs a number of simple tests that will exercise the storage and file system combination. This project includes bonnie++'s sequential output and input tests. Per-character tests use putc() or getc() stdio macros, and the loop should be small enough to fit into any reasonable cache. Block tests use write(2) or read(2) system calls when reading or writing files. Rewrite reads files with read(2), changes a few bytes, and rewrites with write(2).

Bonnie++ tests were executed with the following command:

$ bonnie++ -u root -d /cluster -r 1000 > output.txt

The command makes all bonnie++ tests in a /cluster directory, where the mounted file system is located, and writes results to the output.txt file. Bonnie++ uses a file for tests that is double the size of a main memory. Main memory is given with the "-r" option for Bonnie++.

Executing Tests
Tests were executed a numbers of times, but test results were almost identical, and only minor deviators were observed. Therefore, each figure is based on a single representative test case, not an average of all performed tests. Iozone tests are shown in Figures 3, 4, 5, and 6. Bonnie++ tests are shown in Figures 7, 8, 9, 10, and 11. Ext3 in each figure is the same test run on a local file system.

Conclusion
In this study, we have addressed open source clustering file systems. Next, we present some final points to consider. First, under certain conditions, some new cluster file systems can be considerably faster than plain NFS (Figures 3 and 4). Second, tests were performed with two different architectures because the goals of clustering file systems are different, and this might have an effect on the test results. Both the OCFS2 and GFS are targeted for clusters where data is located in a storage area network where a storage device is visible to all cluster nodes over the network. Then, all the cluster nodes play similar roles, and there is no client/server architecture that is common in NFS-related file systems, which is also reflected in our tests. While iSCSI plays a role like that of an NFS server, the resulting system is fundamentally different. Still, NFS is in many ways so elaborated and optimized that new file systems may find it difficult to challenge (see Figure 10).

One problem we experienced between test series was the cluster file systems' instability. The NFS and GFS were the only file systems that caused no problems in running tests. The most problematic file system was CacheFS, which builds on using NFS, too. There was a lot of stability problems, as the NFS client that included CacheFS crashed many times. Also UnionFS and OCFS2 included some minor stability problems when executing tests. The lack of documentation was yet another problem when setting up different environments. However, support via IRC-channels and e-mail worked surprisingly well, and finally, someone found a way to solve each problem. An additional observation is that different communities associated with clustering file systems are all different sizes, and their development capacity varies a lot - some are supported by companies and others by a few active developers and small communities which can have an effect on the output of the community.

Finally, it seems that the tested clustering file systems are developing rapidly, and associated communities have been very active recently. The development of CacheFS has been very active since last spring, and its functionality has increased a lot during this period. OCFS2 is now included in the Linux kernel (>2.6.16), and this is a big step for its development. UnionFS has also released specified versions for each kernel version.

References
1.  Bonnie++: www.coker.com.au/bonnie++/.

2.  CacheFS: www.redhat.com/archives/linux-cachefs/.

3.  Fedora Core: http://fedora.redhat.com/.

4.  GFS: www.redhat.com/software/rha/gfs/.

5.  Iozone File system Benchmark: www.iozone.org/.

6.  iSCSI Enterprise Target: http://iscsitarget.sourceforge.net/.

7.  Logical Volume Manager: http://sourceware.org/lvm2/.

8.  NFS: http://nfs.sourceforge.net/.

9.  OCFS2: http://oss.oracle.com/projects/ocfs2/.

10.  Open-iSCSI project: www.open-iscsi.org/.

11.  Ubuntu: www.ubuntu.com/.

12.  UnionFS: www.unionfs.org/.

About Matti Kosola

Matti Kosola is 25 years old, about to graduate this fall as a Master of Science from Tampere University of Technology, in Finland. He has major in software engineering and minors in communications engineering and industrial management. He started his career as a research assistant at 2005 in Tampere U of Tech working on developing tool project to Nokia 770. Since spring 2006 he has been researching Linux clustering file systems.

About Tommi Mikkonen

Prof. Tommi Mikkonen (MSc 1992, Lic. Tech. 1995, Dr. Tech 1999, all from Tampere University of Technology, Tampere Finland) works on software architectures, software engineering and open source software development at the Institute of Software Systems at Tampere U of Tech. Over the years, he has written a number of research papers, and supervised theses and research projects on software engineering. At present, he is also the supervising software engineering track of a multi-disciplinary project on open source software, where the focus is placed on supporting software architecture and development process.

About Jyke Jokinen

Jyke Jokinen is a Teaching Research Scientist at Tampere University of Technology. His current research interests include distributed systems, concurrent programming and programming languages. Jokinen received his Msc in Information Technology (main subject Software Systems, subsidiary Engineering Physics) from Tampere University of Technology.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Kevin Closson 12/05/06 02:24:25 PM EST

This is a good article. I know this is about Open Source "CFS" technology, but readers may also be interested in other testing that has been done on Linux systems with Open Source and closed source commercial products.

There is a lot of such information on my blog: kevinclosson.wordpress.com

sniper 11/06/06 07:02:00 PM EST

Couod you elaborate on the individual test results? As in, what was being tested.

Also, what were the parameters for the individual filesystems? Like blocksize, etc.