Saturday, July 19, 2008

Challenges with scaling MySql on Amazon EC2 and S3

During the past few months I've talked to many Amazon EC2 and S3 users. They all face the same interesting challenge: how do I develop a scalable, transactional and data-intensive application in the EC2-S3 cloud environment?

They are experiencing first-hand the painful gap between scalable hardware (what Amazon provides) and a scalable application. Some of them are already using the GigaSpaces EC2 solution to bridge that gap.

When it comes to persistence, without exception, they are all looking at our partners at MySQL as their solution. Kudos to the MySQL team. You guys really are the scale-out database of choice!

But persistency in a cloud environment, even with a great database like MySQL, poses a challenge. A comment on a recent post on Jonathan Schwartz's (Sun CEO) blog - describes this challenge well:

MySQL is actually quite hard to host on the Amazon cloud right now because you need to somehow schedule syncronization to the Amazon S3 storage system to prevent your elastic compute servers losing all that valuable data.

The issue is creating a fast and reliable solution to persist data from EC2 to the S3 storage system. Due the problematic connectivity between the two, an asynchronous solution, which provides a "buffer" layer between the in-memory data cloud and the MySQL database, is required.

The solution involves writing to the database in the background (async), while decoupling the runtime transaction from the EC2-S3 connection overhead.

We, at GigaSpaces, believe our Enterprise Data Grid - EDG, deployed as a front-end to MySQL provides EC2-S3 users with an elegant solution to this problem.

Nati Shalom's recent post, Scaling Out MySQL, provides a detailed description of the how to leverage GigaSpaces to scale MySQL.

1 comment:

Unknown said...

Isn't that EC2 provides you EBS for persistenc data store? If so, your sync doesn't need to happen so frequent right?