MySQL Cluster 7.2 (DMR2): NoSQL, Key/Value, Memcached
70x Higher Performance, Cross Data Center Scalability and New NoSQL Interface
MySQL Cluster is one of the fastest growing technologies available from MySQL today. To build on this momentum, we are announcing the second Development Milestone Release (DMR) at Oracle Open World 2011. As with all MySQL community releases, the DMR is available under the GPL and can be downloaded from: http://dev.mysql.com/downloads/cluster/ (select the Development Release tab).
The MySQL Cluster 7.2.1 builds upon the first DMR (7.2.0) announced in April 2011 with a range of new capabilities designed to enable next generation web services, enhance cross data center scalability and simplify provisioning:
- Enabling next generation web services: 70x higher complex query performance, native memcached API and integration with the latest MySQL 5.5 server
- Enhancing cross data scalability: new multi-site clustering and enhanced active/active replication
- Simplified provisioning: consolidated user privileges.
Before exploring each of these new capabilities, it's worth setting the context that is driving new feature development in MySQL Cluster.
It should come as no surprise that data volumes are growing at a much faster rate than anyone previously predicted. The McKinsey Global Institute estimates this growth at 40% per annum1. Increasing internet penetration rates, social networking, high speed wireless broadband connecting smart mobile devices, and increasing Machine to Machine (M2M) interactions are just some technologies fueling this growth. Global eCommerce revenues are expected to reach $1 trillion by 20142. By 2015, global IP networks will deliver 7.3 petabytes of data every 5 minutes - that's the gigabyte equivalent of all movies ever made crossing the network in just 5 minutes3!
The databases needed to support such a massive growth in data have to meet new challenges, including:
- Scaling write operations, not just reads, across commodity hardware
- Real-time user experience for querying and presenting data
- 24 x 7 availability for continuous service uptime
- Reducing barriers to entry, enabling developers to quickly launch new, innovative services.
Of course you can argue that databases have had these requirements for years - and that is true - but they have never had to address all of these requirements at once, in a single application. For example, some databases have been optimized for low latency and high availability, but then don't have a means to scale write operations and can be very difficult to use. Other databases maybe very simple to start with, but lack capabilities that make them suitable for applications with demanding uptime requirements.
It is also important to recognize that not all data is created equal. It is not catastrophic to lose individual elements of log data or status updates - the value of this data is more in its aggregated analysis. But some data is critical, such as ecommerce transactions, financial trades, customer updates, billing operations, order placement and processing, access and authorization, service entitlements, etc.
And yet, services generating and consuming this data still need the data store to meet the challenges discussed above. It is necessary to protect data integrity with ACID compliance, run complex queries against the data while leveraging the proven benefits of industry standards and skillsets to reduce cost and complexity - all while scaling write operations with high availability and real time responsiveness.
It is these types of workloads that MySQL Cluster is designed for. Examples include:
- High volume OLTP
- Real time analytics
- Ecommerce and financial trading with fraud detection
- Mobile and micro-payments
- Session management & caching
- Feed streaming, analysis and recommendations
- Content management and delivery
- Massively Multiplayer Online Games
- Communications and presence services
- Subscriber / user profile management and entitlements.
MySQL Cluster is a write-scalable, real-time, ACID-compliant transactional database, combining 99.999% availability with the low TCO of open source. Designed around a distributed, multi-master architecture with no single point of failure, MySQL Cluster horizontally scales on commodity hardware with auto-sharding to serve read and write intensive workloads, accessed via SQL and NoSQL interfaces.
The second Development Milestone Release of MySQL Cluster 7.2 brings the benefits of the technology to a new range of web, enterprise and telecom services by delivering the following enhancements.
70x Higher Performance with Adaptive Query Localization
Previewed as part of the original DMR, Adaptive Query Localization pushes JOIN operations down to the data nodes where the query executes in parallel on local copies of the data. A merged result set is then sent back to the MySQL Server, significantly enhancing performance by reducing network trips.
A new Index Statistics function enables the SQL optimizer to build a better execution plan for each query. In the past, non-optimal query plans required a manual enforcement of indexes via USE INDEX or FORCE INDEX to alter the execution plan. To get maximum benefit from AQL, it is strongly recommended to run the ANALYZE_TABLE command before the table is queried for the first time.
Testing on a real world query from a MySQL Cluster user in a web-based Content Management System delivered orders of magnitude higher performance using AQL.
- The application filters content assets against device capabilities and user entitlements across 11 tables with a total of 33,500 rows, returning a result set of just over 2,000 rows and 19 columns per row.
- AQL query reduced execution time from a horrible 87.23 seconds to a much happier 1.26 seconds!
You can read more about the testing behind this query here.
AQL reduced query execution time from 87.23 to 1.26 seconds
Adaptive Query Localization enables MySQL Cluster to better serve those use-cases that have the need to run real-time analytics across live data sets, along with high throughput OLTP operations. Examples include recommendations engines and clickstream analysis in web applications, pre-pay billing promotions in mobile telecoms networks or fraud detection in payment systems.
New NoSQL Interface and Schema-less Storage with the memcached API
The memcached interface is now integrated directly into the MySQL Cluster 7.2.1 trunk, enabling simpler evaluation.
Today, many websites use InnoDB for OLTP with memcached as caching layer to reduce latency and increase performance. But, memcached is not ACID.
Now, you can combine the ease of use of Memcached, with the power of MySQL Cluster. Using the standard memcached API, the application sends reads and writes to the memcached process, which in turn invokes the Memcached Driver for NDB (which is part of the same process). This calls the NDB API, providing very quick access to the data held in MySQL Cluster's data nodes, completely bypassing the SQL layer.
The solution has been designed to be very flexible, allowing the application architect to find a configuration that best fits their needs. It is possible to co-locate the memcached API in either the data nodes or application nodes, or alternatively within a dedicated memcached layer.
Co-locate the memcached API in the data nodes
- This is the simplest deployment option, and best used for data sets that are frequently updated[AM1]
- Each of the memcached processes accesses the same data, so if one process fails, MySQL Cluster transparently switches to another node
- As more data nodes are added, memcached scales along with it
Co-locating memcached API in the Data Nodes
Co-locate the memcached API in the application nodes
- This is the best deployment option for data that is rarely updated but frequently read
- Adding more application nodes automatically scales memcached throughput
- Users can scale capacity independently by adding more data nodes
- Simple failure handling - if one application / memcached node fails, then all of the applications just continue accessing their local memcached API
Co-locating memcached API in the Application Nodes
Dedicated memcached layer
- Best option for data that has a short lifetime and wouldn't benefit from being stored in MySQL Cluster
Dedicated memcached Layer
The benefit of this flexible approach to deployment is that users can configure behavior on a per-key-prefix basis (through tables in MySQL Cluster) and the application doesn't have to care - it just uses the memcached API and relies on the software to store data in the right place(s) and to keep everything synchronized.
MySQL Cluster as a Schema-less Key/Value store
The popularity of Key/Value stores has been on the rise. With MySQL Cluster, you have all the benefits of an ACID RDBMS, combined with the performance capabilities of Key/Value store.
By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row - thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table.
Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.
Integration with MySQL 5.5
MySQL Cluster 7.2.1 is integrated with MySQL Server 5.5, providing binary compatibility to existing MySQL Server deployments. Users can now fully exploit the latest capabilities of both the InnoDB and MySQL Cluster storage engines within a single application.
Users simply install the new MySQL Cluster binary including the MySQL 5.5 release, restart the server and immediate have access to both InnoDB and MySQL Cluster!
Enhancing Cross Data Center Scalability: Simplified Active / Active Replication
MySQL Cluster has long offered Geographic Replication, distributing clusters to remote data centers to reduce the affects of geographic latency by pushing data closer to the user, as well as providing a capability for disaster recovery.
Geographic replication has always been designed around an Active / Active technology, so if applications are attempting to update the same row on different clusters at the same time, the conflict can be detected and resolved. This ensures each site can actively serve read and write requests while maintaining data consistency across the clusters. It also eliminated the overhead of having to provision and run passive hardware at remote sites.
With the release of MySQL Cluster 7.2.1, implementing Active / Active replication has become a whole lot simpler. Developers no longer need to implement and manage timestamp columns within their applications. Also rollbacks can be made to whole transactions rather than just individual operations.
These enhancements make it much simpler to deploy globally scaled services across data centers.
Enhancing Cross Data Center Scalability: Multi-Site Clustering
MySQL Cluster 7.2.1 DMR provides a new option for cross data center scalability - multi-site clustering. For the first time splitting data nodes across data centers is a supported deployment option.
Improvements to MySQL Cluster's heartbeating mechanism with a new "ConnectivityCheckPeriod" parameter enables greater resilience to temporary latency spikes on a WAN, thereby maintaining operation of the cluster.
With this deployment model, users can synchronously replicate updates between data centers without needing conflict detection and resolution, and automatically failover between those sites in the event of a node failure.
Multi-site clustering enables data nodes to be split across data centers
Users need to characterize their network bandwidth and latencies, and observe best practices in configuring both their network environment and Cluster, including:
- Ensuring minimal, stable network latency;
- Provisioning the network with sufficient bandwidth for the expected peak load;
- Configuring the heartbeat period to ensure a safe margin above latency fluctuations;
- Configuring the ConnectivtyCheckPeriod to avoid unnecessary node failures;
- Configuring other timeouts accordingly including the GCP timeout, transaction deadlock timeout, and transaction inactivity timeout.
More guidance is available here:
User Privilege Consolidation
User privilege tables are now consolidated into the data nodes and centrally accessible by all MySQL servers accessing the cluster.
Previously the privilege tables were local to each MySQL server, meaning users and their associated privileges had to be managed separately on each server. By consolidating privilege data, users need only be defined once and managed centrally, saving Systems Administrators significant effort and reducing cost of operations.
The MySQL Cluster 7.2.1 DMR enables new classes of use-cases to benefit from web-scale performance with carrier-grade availability.
You can download the DMR for evaluation now from: http://dev.mysql.com/downloads/cluster/ (select Development Milestone Release tab).
Download the Evaluation Guide for best practices in testing and prototyping your application with the MySQL Cluster 7.2 DMR.
Let us know what you think of these enhancements directly in comments for each blog. We look forward to working with the community to perfect these new features.