Tags

, , ,


gluster-community-logoIntroduction

Ive been working on a lot of HA projects lately, during one of these projects (a CloudLinux implementation from Parallels) I wrote a document on Load Balancing, one of the sub-topics in the document was using a clustered file system to store application and session data for a Magento shopping cart that was running on a number of servers, so I decided I should build the cluster and do some performance testing on it.

This document represents the build process for a cluster of 3 Centos 6.5 (minimal build) servers hosting Glusterfs version 3.5. And while its going to be used as a cluster to host Magento eCommerce stores in a clustered HA environment, it could be used for anything that needs shared data access, so I first needed to make sure the build process was straight forward and the server farm was going to be reliable. Then I could publish a paper on it.

The Hardware

I grabbed some old IBM 326m x-servers we had lying around, I could have build this from our pool of HP DL Series boxes or built it from the collection of Dell 2950 server we ripped out and replaced with “R” series servers last year, so the hardware didnt matter, it just had to fire up and work. The IBM’s I grabbed had 2 disks each (a mix of 73, 146 and 300G Ultra 320disks) configured in a RAID-1 mirror.

Architectural Background

The design goal is to have the Glusterfs Volumes available to a virtualized container based environment. The containers don’t have access to physical disk, so they will need a network file system to achieve this, and since the application will have multiple front ends being sent traffic, the data has to be clustered. In my first testing case, the application is a distributed Magento eCommerce application running on Nginx with PHP-FPM and a MySQL DB. The goal is to have User session data stored in the cluster (not the Database). In addition, a copy of the Magento Application code as well as product images and related media are also stored there. To significantly improve the performance of the Magento app, every performance tweek possible including using a ram disk for holding the files will be implemented.  Later, I will move the Database to its own Galara Cluster and trial that, then do a paper on “Scaling eCommerce Sites”.

The reason behind all this is for a new cloud based Magento offering at work so I wanted to build a reference site from scratch using the latest Centos 6.5 minimal build and use yum package installs for everything. The data for the cluster will be stored in LVM Logical Volumes, this allows me to cut disks as needed and present them as Glusterfs “Bricks”. LVM gives me the ability to easily change the disk sizing and allocations as needed on the fly. You could also use this for a POP3/IMAP/SMTP mail system, the mailboxes being resident on the cluster and the SMTP servers load balanced… the list of uses is endless.

To ensure the front end nodes can operate at top speed and have the latest images and code, I was going to use a CRON job to check the files for differences (MD5 hashing) and rsync the files from the cluster into the RAM disk as changes occured, all support software would be on the cluster and run as needed. The other advantage was the ability to fire up more nodes as needed and add them to the cluster on the fly. The only data needing direct access would be the session data and placing it in the cluster which removes hits on the Database. I also have a front end web site design that needs access to lots of JSON data files created by remote applications in the backend, this design would allow me to test the performance of that system so killing two birds with one stone.

For the most part the servers are nothing special, minimal OS build, dual network, dual multicore CPU’s and direct attached disk in a RAID configuration. The OS services are toned down to only provide what’s needed so chkconfig is used to turn on whats needed and disable the rest. Apart from the sys-admin loging in to set the volumes up, no other user accounts or special software is needed. Ideally, the network for the cluster should be in a separate VLAN so the clients can access them but the traffic does not impede normal flow of the public side of the client, in this case a 192.168.2.0/24 network is being used.

Setting up the right Repos

First install the Gluster repo on each Server, these are downloaded from the gluster website and put into the yum repo directory:

#cd /etc/yum.repos.d
#wget http://download.gluster.org/pub/gluster/glusterfs/repos/YUM/glusterfs-3.4/3.4.0/CentOS/glusterfs-epel.repo

To be safe, I removed all trace of existing gluster packages in a 1 line shell script.

#for file in `yum list all | grep glusterfs | cut -d’ ‘ -f1`; do yum remove ${file} -y; done

Then I installed the server component which would bring all needed dependencies:

#yum install glusterfs-server -y

Install 8 Package(s)

Total download size: 2.3 M
Installed size: 7.6 M
Downloading Packages:
(1/8): glusterfs-3.5.0-2.el6.x86_64.rpm | 1.2 MB 00:06
(2/8): glusterfs-cli-3.5.0-2.el6.x86_64.rpm | 121 kB 00:00
(3/8): glusterfs-fuse-3.5.0-2.el6.x86_64.rpm | 91 kB 00:00
(4/8): glusterfs-libs-3.5.0-2.el6.x86_64.rpm | 247 kB 00:01
(5/8): glusterfs-server-3.5.0-2.el6.x86_64.rpm | 555 kB 00:02
(6/8): libgssglue-0.1-11.el6.x86_64.rpm | 23 kB 00:00
(7/8): libtirpc-0.2.1-6.el6_5.1.x86_64.rpm | 79 kB 00:00
(8/8): rpcbind-0.2.0-11.el6.x86_64.rpm | 51 kB 00:00
—————————————————————————————————————————
Total 111 kB/s | 2.3 MB 00:21
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : glusterfs-libs-3.5.0-2.el6.x86_64 1/8
Installing : glusterfs-3.5.0-2.el6.x86_64 2/8
Installing : libgssglue-0.1-11.el6.x86_64 3/8
Installing : libtirpc-0.2.1-6.el6_5.1.x86_64 4/8
Installing : rpcbind-0.2.0-11.el6.x86_64 5/8
Installing : glusterfs-fuse-3.5.0-2.el6.x86_64 6/8
Installing : glusterfs-cli-3.5.0-2.el6.x86_64 7/8
Installing : glusterfs-server-3.5.0-2.el6.x86_64 8/8
Verifying : rpcbind-0.2.0-11.el6.x86_64 1/8
Verifying : glusterfs-fuse-3.5.0-2.el6.x86_64 2/8
Verifying : glusterfs-3.5.0-2.el6.x86_64 3/8
Verifying : glusterfs-cli-3.5.0-2.el6.x86_64 4/8
Verifying : glusterfs-server-3.5.0-2.el6.x86_64 5/8
Verifying : libtirpc-0.2.1-6.el6_5.1.x86_64 6/8
Verifying : glusterfs-libs-3.5.0-2.el6.x86_64 7/8
Verifying : libgssglue-0.1-11.el6.x86_64 8/8

Installed:
glusterfs-server.x86_64 0:3.5.0-2.el6

Dependency Installed:
glusterfs.x86_64 0:3.5.0-2.el6 glusterfs-cli.x86_64 0:3.5.0-2.el6 glusterfs-fuse.x86_64 0:3.5.0-2.el6 glusterfs-libs.x86_64 0:3.5.0-2.el6 libgssglue.x86_64 0:0.1-11.el6 libtirpc.x86_64 0:0.2.1-6.el6_5.1
rpcbind.x86_64 0:0.2.0-11.el6

Complete!

Creating Space for the “Brick”

Gluster presents one or more volumes to all servers that are configured in the cluster, the volume(s) is/are made up of “Bricks”, logically, each server holds a brick for the cluster (but if you present a lot of volumes then you will need to create a lot of bricks). The bricks are just XFS formatted file systems sitting in LVM logical volumes.

For this exercise I was going to have each server present 1 brick for a single volume to test the concept. The brick would be a 20G XFS formatted file system (rather than ext4). I was also going to configure 3 replicas of the brick, that means n+1 servers so I needed 4 servers for safety, instead I will go with 3 servers and replicas set to 2 just for testing. Most clustered cloud offerings incorporate a replica facility and 3 is a common number, so failures of storage do not become an issue as the cluster automatically makes copies of the cluster data to maintain the replica count.

Of “Bricks” and Filesystem

To host the brick, you need it mounted as a valid recognized filesystem, so I created a mount point under /data/gluster, I always use /data for any application or user created data outside of the filesystem that represents the OS, this means I can build a new OS and replicate the /data directory without any issues. The mount point /data/gluster/gfs1 is where I will mount the logical volume which will represent the Glusterfs Brick. If I present more volumes I would use a different naming scheme live lvvol1, lvvol2 etc… and mount them UNDER /data/gluster.

#mkdir -p /data/gluster/gfs1
# vgs

VG #PV #LV #SN Attr VSize VFree
vg_data 1 1 0 wz–n- 134.69g 114.69g
vg_os 1 5 0 wz–n- 138.69g 122.69g

#lvcreate -L +20G -n lv_gfs1 vg_data
#mkfs.xfs -i size=512 /dev/vgt_data/lv_gfs1

If XFS is not installed:

#yum install xfsprogs xfsdump -y

After creating the mount point and the logical volume, I added it to /etc/fstab so it would be present in the event of a reboot.

Edit the /etc/fstab and add the logical volume:

/dev/vg_data/lv_gfs1    /data/gluster/gfs1      xfs     noatime,defaults 0 0

Now we mount the logical volume and create a point in it to hold a volume, notice I used noatime, there is no need to write the access times to the file system. This should reduce disk I/O slightly. Also, the volume we create in gluster will be called gfsvol1 so I matched it with the directroy name.

#mount -a
#cd /data/gluster/gfs1
#mkdir gfsvol1

Now to get partially setup ready to create the volume

There is still some house keeping work to do… disable SELinux, temporarily turn off iptables and configure the required services to startup on reboot.

setenforce 0
iptables -F

service iptables stop
service glusterd start
gluster volume info
Should show -> “No Volumes Present”
chkconfig –level 3 glusterd on
chkconfig –level 3 glusterfsd on

At this point you now need to build all the servers before you can create your volume, as the create volume command will try to create the replicas as its creating the volume, if the required number of server is not present it wont do it.

I repeated the build process for the other 2 servers. But I use lv_gfs2 for the second server volume name and lv_gfs3 for the third etc in this example.

Creating the Cluster – Peers

From the first server, probe the other server using:

#gluster peer probe 192.168.2.83
#gluster peer probe 192.168.2.84

If all goes well the peer should be detected and saved in a list of allowed peers.

Creating Volumes

Create the volume directory if you didn’t do it earlier, in my case I created “gfsvol1” on each server, this just means you create a directory under the mount point. This directory is the “Brick” referred to in the Gluster documentation.

#mkdir /data/gluster/gfs1/gfsvol1

Then on the master node (the first node we created), create the volume gfsvol1 using the following syntax, then start the volume if all good:

#gluster volume create gfsvol1 replica 2 192.168.2.82:/data/gluster/gfs1/gfsvol1 192.168.2.83:/data/gluster/gfs2/gfsvol1
volume create: gfsvol1: success: please start the volume to access data

The observant will notice that I have only created a volume from 2 bricks on 2 servers. I had a failure while building server 3 (tha’ts what happens when you grab a server being used as a door stop!).

Start the volume as requested

Starting the cluster is as simple as:

#gluster volume start gfsvol1
volume start: gfsvol1: success

Viewing the cluster status

#gluster volume info all

Volume Name: gfsvol1
Type: Replicate
Volume ID: dee53935-0659-40da-b1c1-2d66d9bd12e3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.2.82:/data/gluster/gfs1/gfsvol1
Brick2: 192.168.2.83:/data/gluster/gfs2/gfsvol1
[root@cnx-gfs-02 ~]#

Testing is easy, create a directory on the server in /data/gluster/gfs1/gfsvol1, copy some files to the volume and it should appear on the other server.

We can do a status check on the volume as well, here is the output:

gluster volume status
Status of volume: gfsvol1
Gluster process Port Online Pid
——————————————————————————
Brick 192.168.2.82:/data/gluster/gfs1/gfsvol1 49152 Y 1798
Brick 192.168.2.83:/data/gluster/gfs2/gfsvol1 49152 Y 2038
NFS Server on localhost 2049 Y 1807
Self-heal Daemon on localhost N/A Y 1811
NFS Server on 192.168.2.83 2049 Y 2050
Self-heal Daemon on 192.168.2.83 N/A Y 2057

Task Status of Volume gfsvol1
——————————————————————————
There are no active volume tasks

Accessing the Cluster from a Client

Now the big test, access from a client. Using my Ubuntu 14.04 desktop, first I created a mount point under /mnt and called it /mnt/glusterfs then I used apt-get to download and install glusterfs-client, then I used the “mount” command to mount the clustered volume to my Ubuntu desktop.

mkdir /mnt/glusterfs
mount -t glusterfs 192.168.2.83:/gfsvol1 /mnt/glusterfs

Next I copied in some files from my software development folder in /opt/code-dev and checked it on the other cluster node.

ls -la /mnt/glusterfs
total 16
drwxr-xr-x. 5 root root 48 May 21 22:25 .
drwxr-xr-x. 3 root root 20 May 21 21:40 ..
drwxr-x—. 12 root root 4096 May 21 22:25 code-dev
drw——-. 254 root root 8192 May 21 22:25 .glusterfs

This shows the client accessed the cluster, a check on each server shows the cluster replicated the data to each server, and I now have 3 copies of the data, 2 in the cluster and 1 on my local drive.. nice 🙂

Security

Gluster can use IP addresses to set access permissions. To test this I used the volume “set” command to lock down access to the volume to an IP other than my workstation and then tried to remount the cluster. As expected the mount failed. Changing it back allowed me to access the cluster again.

gluster volume set gfsvol1 auth.allow 192.168.2.22
volume set: success

My workstation kept a log in /var/log/glusterfs/mnt-glusterfs.log, it recorded the failed attempt as an AUTH_FAILED event,

[2014-05-21 02:57:25.571418] W [client-handshake.c:1365:client_setvolume_cbk] 0-gfsvol1-client-1: failed to set the volume (Permission denied)
[2014-05-21 02:57:25.571442] W [client-handshake.c:1391:client_setvolume_cbk] 0-gfsvol1-client-1: failed to get ‘process-uuid’ from reply dict
[2014-05-21 02:57:25.571449] E [client-handshake.c:1397:client_setvolume_cbk] 0-gfsvol1-client-1: SETVOLUME on remote-host failed: Authentication failed
[2014-05-21 02:57:25.571457] I [client-handshake.c:1483:client_setvolume_cbk] 0-gfsvol1-client-1: sending AUTH_FAILED event
[2014-05-21 02:57:25.571468] E [fuse-bridge.c:4834:notify] 0-fuse: Server authenication failed. Shutting down.
[2014-05-21 02:57:25.571626] W [glusterfsd.c:1002:cleanup_and_exit] (–>/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f742d0e530d] (–>/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f742d3b8182] (–>/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f742dea1ef5]))) 0-: received signum (15), shutting down 

 

 Additional References

The Administration Manual has all the commands listed – well worth a read: http://www.gluster.org/wp-content/uploads/2012/05/Gluster_File_System-3.3.0-Administration_Guide-en-US.pdf

 

 

-oOo-

Advertisements