I’ve been working with a large-ish MySQL table on an EC2 instance. This table has approximately 147MM rows, and I’ve been disappointed with the performance I’ve seen. I took a look at how much performance I could gain by moving to a more powerful EC2 instance, and the answer is “some, but not all upgrades are equal”.
This is the query I’m using as a test case:
Select b.respondent_name, a.tv From (Select respondent_id, Sum(tr_tot_transaction_chrg) As tv From eqr_transaction Where report_yr = 2007 And report_prd = 3 Group By respondent_id) a Join eqr_s0_respondent_id b On b.respondent_id = a.respondent_id Order By a.tv Desc Limit 20
Some relevant details:
- The Where clause uses an index to select ~8.3MM rows from the 147MM in the table
- The Group By clause generates ~3000 result rows
- The database is MySQL 5
- The DB files are stored on an EBS device
- All queries are run from the mysql command line client
I ran the query 3 times on each instance. Here is a summary of the results of those timing experiments.
|VM information||ET results|
|Instance Type||Cost ($/hr)||Run 1||Run 2||Run 3||Average|
The c1.medium Instance Type is a decent upgrade: It offers roughly double the performance for double the price. For this problem, it appears strictly preferable to the m1.large and m1.xlarge Instance Types, and preferable on a “bang-for-the-buck” basis to the c1.xlarge Instance Type.
It is not clear why c1.medium outperformed m1.xlarge, since the latter completely outclasses the former on paper. I suspect either that my testing program was flawed (i.e. EC2 performance is extremely volatile) or that the 64-bit implementation of some piece of software is not all that it might be.