This post will describe you the installation and configuration of "GlusterFS" file-system in AWS Environment.
1. Introduction:
Whether you are administrating a small homenetwork or an enterprise network for a large company, the data storage is always a concern. Which can be in terms of lack of disk space or in-efficient backup solution. In both cases GlusterFS can be the right tool to fix your problem as it allows you to scale your resources horizontally as well as vertically.
GlusterFS is a distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!) and handling thousands of clients simultaneously. It lets you build a scalable networked storage using pretty much any common hardware. It is easy to start small and add more storage later.
GlusterFS combines blocks of storage over the network into a single, large storage pool that is served to client appliances. Gluster can deliver exceptional performance for diverse workloads.
GlusterFS is a free and open source software enabling people to build data storage with high performance and great value, for a variety of demanding applications.
In this guide we will discuss two GlusterFS configuring method which are Distributed and Replicated/Mirror data storage. As the name suggests a GlusterFS’s distributed storage mode will allow you to evenly redistribute your data across multiple network nodes, while a replicated mode will make sure that all your data are mirrored across all network nodes.
2. Preliminary Setup:
In this tutorial we use three linux machines for GlusterFS installation/configuration. These can be physical machines or virtual machines. I will be using my virtual environment i.e. Amazon Linux instance for this.
In this tutorial the details of these three instance are as follows:
Server1: GlusterFS storage server, Private IP: 10.0.0.21
Server2: GlusterFS storage server, Private IP: 10.0.0.20
Client1: GlusterFS storage client, Private IP: 10.0.0.110
Note: If you want to use Other IP/Hostname of the Instances in this tutorial you have to add them to /etc/hosts file.
2.1. Configure Firewall –
Gluster makes use of the following ports:
– 24007 TCP for the Gluster Daemon
– 24008 TCP for Infiniband management (optional unless you are using IB)
– One TCP port for each brick in a volume. So, for example, if you have 4 bricks in a volume, port 49152 – 49155 from GlusterFS 3.4 & later.
– 38465, 38466 and 38467 TCP for the inline Gluster NFS server.
– Additionally, port 111 TCP and UDP (since always) and port 2049 TCP-only (from GlusterFS 3.4 & later) are used for port mapper and should be open.
Note: by default Gluster/NFS does not provide services over UDP, it is TCP only. You would need to enable the nfs.mount-udp option if you want to add UDP support for the MOUNT protocol. That’s completely optional and is up to your judgement to use.
I am using above configuration for Security group.
2.2. Format and mount the bricks –
I have attach additional 2 GB volumes on both Server1 and Server2 which we will use to make GlusterFS pool. Now run these commands on both Server1 and Server2.
Switch to root user
$ sudo su –
Install administration and debugging tools for the XFS file system
# yum install xfsprogs -y
To view your available disk devices and their mount points
# lsblk
Create xfs filesystems
# mkfs.xfs /dev/xvdf
Create mountpoint
# mkdir -p /exports/data
To mount this volume on every reboot add this entry
# echo "/dev/sdf /exports/data xfs defaults 0 1" >> /etc/fstab
To mount all file systems in /etc/fstab
# mount -a
To check if mount is working
# mount | grep /exports/data
Storage filesystem that has been assigned to a volume called brick
# mkdir -p /exports/data/brick
2.3. Software –
Gluster version = glusterfs 3.6.1 built on Nov 7 2014
3. GlusterFS Installation:
GlusterFS server needs to be installed on all hosts you wish to add to your final storage volume. In our case it will be Server1 and Server2
3.1 On Both Servers –
Update all Packages
# yum update -y
Amazon AMI don’t have GlusterFS repo enabled in it so we need to enable it.
# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo
Fix it for Amazon Linux
# sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo
Use the following command on both servers to install the GlusterFS server
# yum install -y glusterfs-server
Start gluster server
# service glusterd start
Start this service on boot
# chkconfig glusterd on
Confirm that both glusterfs-server are running
# service glusterd status
Add filesystem to kernel
# modprobe fuse
3.2 Configure the trusted pool –
First, we need to make both GlusterFS servers to talk to each other, which means that we are effectively creating a pool of trusted servers.
# gluster peer probe <Server2 IP>
The above command will add Server2 to a trusted server pool. This settings are replicated across any connected servers so you do not have to run the above command on other servers.
Check the status of trusted storage pool on both servers
# gluster peer status
4. Storage configuration:
4.1. Distributed storage configuration
4.1.1. Create storage volume –
We can use both servers to define a new storage volume consisting of two bricks, one from each server. This command will create a new volume as dist-vol consisting of two bricks.
# gluster volume create dist-vol <Server1 IP>:/<folder-name> <Server2 IP>:/<folder-name>
Start storage volume
# gluster volume start dist-vol
Get the status of created volume.
# gluster volume info dist-vol
4.1.2 Setting up Client –
Now that we have created a new GlusterFS volume,we can use the GlusterFS client to mount this volume to any hosts. Login to the client host and install the GlusteFS client:
Login as root
$ sudo su -
Update all Packages
# yum update -y
Amazon AMI don’t have GlusterFS repo enabled in it so we need to enable it.
# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo
Fix it for Amazon Linux
# sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo
Install the GlusteFS client
# yum install glusterfs-client -y
Next, create a mount point to which you will mount your new dist-vol GlusterFS volume, for example export-dist
# mkdir /mnt/export-dist
Now, we can mount the dist-vol GlusterFS volume with the mount command
# mount -t glusterfs <Server1 IP>/<Server2 IP>:dist-vol /mnt/export-dist
All should be ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly.
# mount | grep glusterfs
4.1.3. Testing GlusterFS distributed configuration –
Everything is ready so we can start some tests. On the client side create 4 files in the GlusterFS mounted directory:
Client1# cd /mnt/export-dist
Client1# touch file1 file2 file3 file4
The GlusterFS will now take all files and redistribute them evenly among all bricks in the dist-vol volume. Therefore, Server1 will contain:
Server1# ls /exports/dist-data/
and Server2 will contain:
Server2# ls /dist-data
Results may be different.
4.2. Replicated storage configuration
The procedure of creating a replicated GlusterFS volume is similar to the distributed volume explained earlier. In fact, the only difference is the way how the ClusterFS volume is created. But let’s go again from the start:
4.2.1. Create storage volume –
We can use both servers to define a new storage volume consisting of two bricks, one from each server. This command will create a new volume as dist-vol consisting of two bricks.
# gluster volume create repl-vol replica 2 <Server1 IP>:/<folder-name> <Server2 IP>:/<folder-name>
Start storage volume
# gluster volume start repl-vol
Get the status of created volume.
# gluster volume info repl-vol
4.2.2. Setting up Client –
Now that we have created a new GlusterFS volume,we can use the GlusterFS client to mount this volume to any hosts. Login to the client host and install the GlusteFS client.
Login as root
$ sudo su -
Update all Packages
# yum update -y
Amazon AMI don’t have GlusterFS repo enabled in it so we need to enable it.
# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo
Fix it for Amazon Linux
# sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo
Install the GlusteFS client.
# yum install glusterfs-client -y
Next, create a mount point to which you will mount your new repl-vol GlusterFS volume, for example export-repl
# mkdir /mnt/export-repl
Now, we can mount the repl-vol GlusterFS volume with the mount command.
# mount -t glusterfs <Server1 IP>/<Server2 IP>:repl-vol /mnt/export-repl
All should be ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly.
# mount | grep export-repl
4.2.3. Testing GlusterFS replicated configuration –
Everything is ready so we can start some tests. On the client side create 4 files in the GlusterFS mounted directory.
Client1# cd /mnt/export-repl
Client1# touch file1 file2 file3 file4
The GlusterFS will now take all files and redistribute them evenly among all bricks in the dist-vol volume. Therefore, Server1 will contain
Server1# ls /exports/data/brick2
and Server2 will contain
Server2# ls /exports/data/brick2
5. Security Settings:
In addition to the above configuration you can make the entire volume more secure by allowing only certain hosts to join the pool of trust. For example, if we want only the host with 10.1.1.10 to be allowed into participating in the volume repl-vol we use the following command on any one Server.
# gluster volume set repl-vol auth.allow 10.0.0.110
In the case that we need the entire subnet, simply use an asterisk.
# gluster volume set repl-vol auth.allow 10.0.0.*
That’s it! The GlusterFS is up and running now.
6. Important Links:
GlusterFS: http://www.gluster.org/
GlusterFS 3.2 Documentation: http://www.gluster.org/documentation/
Gluster Community main page: http://www.gluster.org/community/documentation/index.php/Main_Page