Building Scalable and Highly Available Magento Stores


Introduction

Magento is an eCommerce software application used to by the majority of online web based eCommerce merchants. At last count the majority of eCommerce trade flows through Magento based installations.

Magento is offered in two variants, a Community Edition (CE) and an Enterprise Edition (EE). The most widely used version is the Community Edition as it is free to download and provides a fully functional software application that can power most demands in its basic form. The EE version provides paid support and is designed with more scalability and a longer support life than the CE version but has its roots in the CE version.

Magento runs best on Linux based operating systems, commonly known as LAMP (Linux, Apache, MySQL and PHP) or LEMP (Linux Engine-X, MySQL and PHP) installations. Most eCommerce sites use a number of Linux offerings to host with, Centos Linux and derivative being the most common.

By supporting both Apache and Nginx web servers, Magento can be used in both small low volume sites and large transaction processing environments. We will investigate both in the course of this article.

Gaining Maximum Performance

The Magento eCommerce software package pushes the limit in open source software. It’s extensive use of object oriented PHP, templates and XML make for an incredibly configurable environment into which can be built elaborate eCommerce stores.

However, like all good software, these features come at a cost, for Magento that cost can manifest itself in poor performance if not configured optimally for the environment in which its hosted.

A great deal can be done to improve the performance of Magento, a brief overview follows.

The Magento software bundle when installed on a Linux environment typically takes up around 130M of disk space. This does not include the database or web server software. Often a hosting provider will roll out a single virtual instance of the machine on a single disk of 5G to 20G and there might be 100-200 other virtual instances running on the same server.

In most cases, this is optimal for a web site that is hosting static pages or is lightly loaded serving out small amounts of dynamic content from a database driven CMS. But Magento is a very different environment from most PHP open source software. Its extensive use of XML and block oriented template design combined with an EAV database design make rendering an eCommerce site a performance nightmare.

In short, Magento requires a lot of CPU to process the layout XML before it begins to then render the layout and return it to the user. So the first thought many of you have is just add more CPU! On the face of it, that might seam like a logical step but the XML resides in files, usually on a single file system so a great deal of disk IO is queued to the OS to retrieve files to process. In a hosted environment, your disk IO and the IO requirements of all the other virtual instances will have a big impact on performance. Clearly disk IO is an issue and one that is hard to measure from a users perspective.

Caching Technology

This is where caching technology has evolved, its aim is to return as fast as possible, content that is normally static and often required. Caching is endemic in a server, the disks themselves have a cache, the disk controller has a cache, the virtualization OS implements a cache, the end user OS has a file system cache and the applications implement caches. That’s a lot of memory in use just to hold data, then software is required to work out what data in cache is old and what is more current. So memory capacity plays a big part in the performance equation of Magento, and that’s before we consider all the other applications that reside on the server and run at the same time as the Magento instance.

Some performance issues are in fact by design, limiting browser requests to a server limits the load on the server but in a content intensive system such as an eCommerce web site, the browser can only return small numbers of files thus slowing down the users experience and hence reflecting badly on the web site. If the browser can be made to return content from a lot of different domains simultaneously then the content loading time is spread and more data can be downloaded simultaneously, addressing this are “Content Delivery Networks”, designed to spread the IO and hence increase the rendering of the particular eCommerce Site.

As outlined above, performance is not a single issue but a cumulative effect of the different systems in play, some can be configured to be optimal and the sum of the optimisations is where performance improvements can be realised.

Increasing Performance

Performance increases can be obtained in a number of ways. Gains can be obtained by using faster CPU’s, or adding more cache memory, using faster disks and more of them. Once the hardware has been pushed to its limits rewriting the software to respond faster is the last step. Each has an impact and may directly or indirectly address the performance issue being observed.

Ideally a combination of all the above will yield favourable performance results. In the end speed is just a question of more money so financial costs weigh heavily in this pursuit.

Other factors affecting Performance

Networking plays a big part in eCommerce environments. High latency, slow links, congestion, routing issues and your connection to the Internet all play a part in determining the bandwidth available. Less bandwidth and high latency all add to a slower user experience.

Out of the box, Magento is configured with a basic single site. The default configuration has most safe options enabled and performance enhancing options disabled. It aims to be cautious so that the installation runs with no issues.

In most Magento hosting environments the database server is a standard mysql server instance running on the local host and in general the hosting environment is geared to smaller business who do not have a high transactional rate.

Making Magento go Faster!

Without delving into each specific change, the list below outlines the changes that can be put into effect before additional servers are required to handle increases in load. Suggested changes are:

  • Merging Javascrit Files / Merging CSS Files.
  • Disabling Unused Modules and Disabling unsed Extensions.
  • Using sub-domains to seperate content from application.
  • Implementing a Magento Flat Catalog.
  • Using the Magento Compiler.
  • Enabling and disabling the internal Magento Cache as needed.
  • Using memcached and APC cache.

Additional performance enhancements at the Operating System level are:

  • HTTP Compression.
  • Content Expiry.
  • Apache Tuning.
  • Installing Nginx to replace the Apache HTTP daemon.
  • Database layout.
  • Reducing disk IO.

These changes will not be covered in this article.

Growing past a Single Server

For the majority of businesses a single hosted instance of Magento in a LAMP environment provides more than sufficient capacity to power their site. However, being a single instance of your store, any downtime stops your business so if your online presence must be up close to 100% of the time then a better solution offering High Availability (HA) is a serious requirement.

In addition to the HA issue mentioned above, once your business grows and traffic increases dramatically a normal site (even with all performance enhancements) will start to reach some hard limits.

The first hard limit will be the ability of the web server to server requests. Another limit will be the response time of the database server and finally the amount of network traffic flowing to the server will become an issue.

Load Balancing

At this point an additional web server (or more) will be required. There are a number of benefits to increasing the number of web servers servicing user requests. Without delving into probability theory, simple queuing theory tells us that if the number of arriving requests exceeds the ability of the processing service then the queue of requests will grow. But if we add more processing services to service the queue then the queue will be at close to zero most of the time, much like people standing in line at a bank queue waiting to be serviced by a teller.

If we add additional servers we will reduce the overall load and service each web request in time. If our two servers are sitting behind a load balancer, traffic can be made to flow to both servers alternatively to maintain good response times. Additional servers can be added to boost the service rate of requests (in a complex design this can be done via software). A smart load balanced solution as used at Conetix can also monitor the load of your store’s web server and ensure that requests are sent to lightly loaded servers as needed. The load balancer also support round robin and weighted distribution models.

While the addition of more servers at first appears to be a simple solution it introduces some additional technical issues at the Magento Software level. The first issue is the need to replicate the Magento store software and product image data to every server and keep this in sync as products are added or removed (or software updated). In Magento, product data is stored in a database, but the images are stored in the file system. To solve this, replication of the image data can be performed when the next store instance is started.

An additional issue to be addressed is the persistent session data that represents a users session with the store, this is data that is dynamically generated and maintained as a visitor interacts with the store. Normally a single instance default install of the Magento software configures the storage of session data to the file system. If the load balancing is working correctly, the user will most likely connect to a different server from the initial interaction and the session data will not be present on the new server.

There are at least 3 ways to solve this issue:

  • Sessions can be configured to be stored in the database server.
  • The file system can be a cluster so all web servers see the same files and hence they see the same session data. Or
  • The user, on being redirected to a server initially is redirected to a specific server using a unique URL for the duration of their visit.

There are bound to be more solutions but we will limit ourselves to these three and address more technical solutions in future articles.

If more than one front end web server is used then an “Administration Only” server should be deployed to be the master of both software and product image data, accessible by a direct non-published URL.

Persistent Data

Databases

Each solution outlined in the previous section introduces additional technical issues that need to be addressed. Firstly using the Database to store sessions will increase the number of queries on the database server. The tuning of the database cache will eliminate a lot of the performance issues asthe standard MySQL installation is set up and tuned for reliability not performance.

The introduction of a caching technology such as Memcache or Redis cache will address performance but adds another layer to our architecture.

For a small store implementation, the Database server is usually local, once we move to multi server front end, the use of both separate Database servers will move disk I/O away from the web server. If this Database I/O is very large the addition of read only databases can be implemented. These are slaved off the master and receive updates only from the master. The majority of traffic to a Magento Database is read only data.

Clustered File Systems

Using a clustered file system can easily solve the persistent data issue as each server will see the same files on the clustered file system. The cluster will maintain changes between nodes and most clustered file system implementations implement a replication policy so that “N” copies of the data exists in the cluster based on the belief that nodes in the cluster can go off line at any time.

Using a clustered file system introduces delays in the data replication process as data is usually transported over a network link between nodes. If the nodes are directly connected on 1Gigabit or even 10Gigabit technologies then this will usually not be an issue but if replication to a remote site is implemented their will be propagation delay inherit in the transport technology type.

The “Glusterfs” clustered file system is often used to provide reliable clustering of file systems between servers. It installs vary easily in a large number of Linux installations and has a simple architectural design that is quite logical and reliable.

Sticky Sessions

The phrase “Sticky Sessions” refers to a client remaining with a specific host in a session based transaction. Sticky Sessions are often used with HTTPS traffic as the client generated session key is sent to the server on the initial transaction and all data transfers after that are fully encrypted, in our example, the load balanced host will detect the SSL session and maintain a session relationship between the client and the host.

Often the load balancer will present the public key for the server(s) and act as a “Man in the Middle”, presenting encrypted traffic to the client but clear text to the servers, a design which is perfectly acceptable when the servers and load balancer are directly connected.

Content Delivery Networks

Earlier, we mentioned that the Magento installation contains application code and product images. The product images, as well as CSS and Javascript content can be removed from the server and delivered via a Content Delivery Network (CDN) independent of your own physical hosting. The advantage of content delivery is it can be dispersed across many locations and if the delivery network can determine the location of the requesting client then the content can be delivered from a server close by.

Organisations such as Cloudflare are capable of achieving this. To implement this functionality, a server can be deployed to hold the content from your Magento instance. A separate unique URL can be configured for CSS, images and Javascript and these URL’s configured to the CDN provider. The net effect of this is that only traffic specifically retrieving application functionality is now serviced via your hosting plan and hence a reduction in traffic and reduction in request load.

The content on the servers can be synchronised from your Administration instance when changes are made.

Summary

Scaling up requires more instances of your web site, additional support servers to load balance, firewall and cache requests both at the web request layer and the database layer as well as expansion of the database infrastructure.

Implementing and tuning this system is best down by an expert and with a monitoring system in place so all the components and their services are monitored in real time. The cost of the software is minimal but the hosting costs will grow as the number of physical servers or virtual instances is increased.

-oOo-

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s