Installation and Configuration of GlusterFS filesystem on Amazon Linux EC2 Instances

This post will describe you the installation and configuration of "GlusterFS" file-system in AWS Environment.

1. Introduction:

Whether you are administrating a small homenetwork or an enterprise network for a large company, the data storage is always a concern. Which can be in terms of lack of disk space or in-efficient backup solution. In both cases GlusterFS can be the right tool to fix your problem as it allows you to scale your resources horizontally as well as vertically.

GlusterFS is a distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!) and handling thousands of clients simultaneously. It lets you build a scalable networked storage using pretty much any common hardware. It is easy to start small and add more storage later.

GlusterFS combines blocks of storage over the network into a single, large storage pool that is served to client appliances. Gluster can deliver exceptional performance for diverse workloads.

GlusterFS is a free and open source software enabling people to build data storage with high performance and great value, for a variety of demanding applications.

In this guide we will discuss two GlusterFS configuring method which are Distributed and Replicated/Mirror data storage. As the name suggests a GlusterFS’s distributed storage mode will allow you to evenly redistribute your data across multiple network nodes, while a replicated mode will make sure that all your data are mirrored across all network nodes.

2. Preliminary Setup:

In this tutorial we use three linux machines for GlusterFS installation/configuration. These can be physical machines or virtual machines. I will be using my virtual environment i.e. Amazon Linux instance for this.

In this tutorial the details of these three instance are as follows:

Server1:   GlusterFS storage server,    Private IP: 10.0.0.21

Server2:   GlusterFS storage server,    Private IP: 10.0.0.20

Client1:     GlusterFS storage client,     Private IP: 10.0.0.110

Note: If you want to use Other IP/Hostname of the Instances in this tutorial you have to add them to /etc/hosts file.

2.1. Configure Firewall –

Gluster makes use of the following ports:
24007 TCP for the Gluster Daemon
24008 TCP for Infiniband management (optional unless you are using IB)
– One TCP port for each brick in a volume. So, for example, if you have 4 bricks in a volume, port 49152 – 49155 from GlusterFS 3.4 & later.
– 38465, 38466 and 38467 TCP for the inline Gluster NFS server.
– Additionally, port 111 TCP and UDP (since always) and port 2049 TCP-only (from GlusterFS 3.4 & later) are used for port mapper and should be open.
Note: by default Gluster/NFS does not provide services over UDP, it is TCP only. You would need to enable the nfs.mount-udp option if you want to add UDP support for the MOUNT protocol. That’s completely optional and is up to your judgement to use.

Screenshot_14I am using above configuration for Security group.

2.2. Format and mount the bricks –

I have attach additional 2 GB volumes on both Server1 and Server2 which we will use to make GlusterFS pool. Now run these commands on both Server1 and Server2.

Switch to root user

$ sudo su –

Install administration and debugging tools for the XFS file system

# yum install xfsprogs -y

Screenshot_0To view your available disk devices and their mount points

# lsblk

Create xfs filesystems

# mkfs.xfs /dev/xvdf

Screenshot_1Create mountpoint

# mkdir -p /exports/data

To mount this volume on every reboot add this entry

# echo "/dev/sdf /exports/data xfs defaults 0 1" >> /etc/fstab

To mount all file systems in /etc/fstab

# mount -a

To check if mount is working

# mount | grep /exports/data

Storage filesystem that has been assigned to a volume called brick

# mkdir -p /exports/data/brick

Screenshot_22.3. Software –

Gluster version = glusterfs 3.6.1 built on Nov 7 2014

3. GlusterFS Installation:

GlusterFS server needs to be installed on all hosts you wish to add to your final storage volume. In our case it will be Server1 and Server2

3.1 On Both Servers –

Update all Packages

# yum update -y

Screenshot_3Amazon AMI don’t have GlusterFS repo enabled in it so we need to enable it.

# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo

Fix it for Amazon Linux

# sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo

Screenshot_4Use the following command on both servers to install the GlusterFS server

# yum install -y glusterfs-server

Screenshot_5Start gluster server

# service glusterd start

Start this service on boot

# chkconfig glusterd on

Confirm that both glusterfs-server are running

# service glusterd status

Screenshot_6Add filesystem to kernel

# modprobe fuse

Screenshot_7

3.2 Configure the trusted pool –

First, we need to make both GlusterFS servers to talk to each other, which means that we are effectively creating a pool of trusted servers.

# gluster peer probe <Server2 IP>

The above command will add Server2 to a trusted server pool. This settings are replicated across any connected servers so you do not have to run the above command on other servers. 

Screenshot_9Check the status of trusted storage pool on both servers

# gluster peer status

Screenshot_11

4. Storage configuration:

4.1. Distributed storage configuration

4.1.1. Create storage volume –

We can use both servers to define a new storage volume consisting of two bricks, one from each server. This command will create a new volume as dist-vol consisting of two bricks.

# gluster volume create dist-vol <Server1 IP>:/<folder-name>  <Server2 IP>:/<folder-name>

Start storage volume

# gluster volume start dist-vol

Screenshot_10Get the status of created volume.

# gluster volume info dist-vol

Screenshot_84.1.2 Setting up Client –

Now that we have created a new GlusterFS volume,we can use the GlusterFS client to mount this volume to any hosts. Login to the client host and install the GlusteFS client:

Login as root

$ sudo su -

Update all Packages

# yum update -y

Screenshot_3Amazon AMI don’t have GlusterFS repo enabled in it so we need to enable it.

# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo

Fix it for Amazon Linux

# sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo

Screenshot_4Install the GlusteFS client

# yum install glusterfs-client -y

Screenshot_12Next, create a mount point to which you will mount your new dist-vol GlusterFS volume, for example export-dist

# mkdir /mnt/export-dist

Now, we can mount the dist-vol GlusterFS volume with the mount command

# mount -t glusterfs <Server1 IP>/<Server2 IP>:dist-vol /mnt/export-dist

All should be ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly.

# mount | grep glusterfs

Screenshot_164.1.3. Testing GlusterFS distributed configuration –

Everything is ready so we can start some tests. On the client side create 4 files in the GlusterFS mounted directory:

Client1# cd /mnt/export-dist
Client1# touch file1 file2 file3 file4

Screenshot_17The GlusterFS will now take all files and redistribute them evenly among all bricks in the dist-vol volume. Therefore, Server1 will contain:

Server1# ls /exports/dist-data/

Screenshot_18and Server2 will contain:

Server2# ls /dist-data

Screenshot_20Results may be different.

4.2. Replicated storage configuration

The procedure of creating a replicated GlusterFS volume is similar to the distributed volume explained earlier. In fact, the only difference is the way how the ClusterFS volume is created. But let’s go again from the start:

4.2.1. Create storage volume –

We can use both servers to define a new storage volume consisting of two bricks, one from each server. This command will create a new volume as dist-vol consisting of two bricks.

# gluster volume create repl-vol replica 2 <Server1 IP>:/<folder-name>  <Server2 IP>:/<folder-name>

Start storage volume

# gluster volume start repl-vol
Screenshot_21

Get the status of created volume.

# gluster volume info repl-vol
Screenshot_22

 4.2.2. Setting up Client –

Now that we have created a new GlusterFS volume,we can use the GlusterFS client to mount this volume to any hosts. Login to the client host and install the GlusteFS client.

Login as root

$ sudo su -

Update all Packages

# yum update -y

Screenshot_3Amazon AMI don’t have GlusterFS repo enabled in it so we need to enable it.

# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo

Fix it for Amazon Linux

# sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo

Screenshot_4Install the GlusteFS client.

# yum install glusterfs-client -y

Screenshot_15Next, create a mount point to which you will mount your new repl-vol GlusterFS volume, for example export-repl

# mkdir /mnt/export-repl

Now, we can mount the repl-vol GlusterFS volume with the mount command.

# mount -t glusterfs <Server1 IP>/<Server2 IP>:repl-vol /mnt/export-repl

Screenshot_25
All should be ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly.

# mount | grep export-repl

Screenshot_244.2.3. Testing GlusterFS replicated configuration –

Everything is ready so we can start some tests. On the client side create 4 files in the GlusterFS mounted directory.

Client1# cd /mnt/export-repl
Client1# touch file1 file2 file3 file4

Screenshot_17The GlusterFS will now take all files and redistribute them evenly among all bricks in the dist-vol volume. Therefore, Server1 will contain

Server1# ls /exports/data/brick2

Screenshot_26and Server2 will contain

Server2# ls /exports/data/brick2

Screenshot_27

5. Security Settings:

In addition to the above configuration you can make the entire volume more secure by allowing only certain hosts to join the pool of trust. For example, if we want only the host with 10.1.1.10 to be allowed into participating in the volume repl-vol we use the following command on any one Server.

# gluster volume set repl-vol auth.allow 10.0.0.110

Screenshot_28In the case that we need the entire subnet, simply use an asterisk.

# gluster volume set repl-vol auth.allow 10.0.0.*

Screenshot_29

That’s it! The GlusterFS is up and running now.

6. Important Links:

GlusterFS: http://www.gluster.org/

GlusterFS 3.2 Documentation: http://www.gluster.org/documentation/

Gluster Community main page: http://www.gluster.org/community/documentation/index.php/Main_Page

How to Install Puppet master and agent on Amazon Linux EC2 Instances

1. What is Puppet ?

Puppet is an open source ruby based configuration management tool that allows you to automate repetitive tasks such as the installation of applications and services, patch management, and deployments. Puppet automates tasks that sysadmins often do manually. It freeing up time and mental space so sysadmins can work on the projects that deliver greater business value.

Use the official puppet-labs website for complete description about What is Puppet and how it works.

2. How do I get Puppet ?

3. Prerequisites :

1. Create Amazon Linux AMI EC2 instances for Puppet master and agents.

2. Configure a security group to ensure that these ports are open on your new puppet master instance.

TCP 8140 - Agents will talk to the master on this port
TCP 22 - To login to the server/instance using SSH

4. Installing Puppet :

Step 1 : Install Puppet on the Puppet Master Server –

On your puppet master node run following command.

$ sudo yum install puppet-server

This will install Puppet and an init script (/etc/init.d/puppetmaster) for running a  puppet master server.

When the installation is done, set the Puppet server to automatically start on boot and turn it on.

$ sudo chkconfig puppetmaster on
$ sudo service puppetmaster start

Step 2: Install Puppet on Agent Nodes –

On your other nodes, run following command

$ sudo yum install puppet

This will install Puppet and an init script (/etc/init.d/puppet) for running the puppet agent daemon.

When the installation finishes, make sure that Puppet will start after boot and turn it on.

$ sudo chkconfig puppet on
$ sudo service puppet start
Note : “At here Puppet is installed and running.”

5. Configuring Puppet :

Step 1 : During installation, you’ll also need to take note of your instance’s fqdn and IP address –

Use this command to find fully qualified hostname (fqdn) of the instance/server :

$ hostname -f

Use this command to find  ip address of the instance/server :

$ hostname -i

Step 2 : On puppet agent nodes –

When you create the puppet master’s certificate, By default all puppet agents will try to find a master by name puppet. So you must add an entry in DNS or in /etc/hosts file as :

<IP-address-of-puppet-server>  puppet

So that puppet name should be resolved to your puppet master IP address.

Now restart puppet service on agents :

$ sudo service puppet restart

Step 3 : Sign the Node’s Certificate –

In an agent/master deployment, a master must approve a certificate request for each agent node before that node can fetch configurations. Agent nodes will request certificates the first time they attempt to run.

We need to tell our Puppet agent to contact with the Puppet master. Run this command on agent node:

$ sudo puppet agent --test

You may see something like the following output. Don’t panic, this is because the agent node is still not verified on the Puppet master server.

Exiting; failed to retrieve certificate and waitforcert is disabled

Now go back to your puppet master server and check certificate signing requests:

# puppet cert list

You will see a list of all the agent nodes that requested for certificate signing from your puppet master. Find the hostname of your agent node and sign it using the following command .

# puppet cert sign agent-node
Note : ‘agent-node’ is the fqdn of your agent node.

Congratulations!!

Now you have working puppet master and agent. But there is no catalogues in puppet master server , from which agent apply configuration to itself. To apply configuration to Puppet agent we have to create manifests in puppet master.

In my next post we will learn about how to create some basic manifests and apply them to puppet nodes.