The announcements at the Oracle Collaborate and O’Reilly MySQL conferences mark an exciting milestone in the development of the MySQL Cluster database.
MySQL Cluster is already proven as a write-scalable, real-time transactional database, combining 99.999% availability with the low TCO of open source. With a distributed, multi-master architecture and no single point of failure, MySQL Cluster scales horizontally on commodity hardware to serve read and write intensive workloads.
With these enhancements announced in the Development Milestone Release, MySQL Cluster can be extended to serve a broader range of workloads.
The MySQL Cluster 7.2 Development Milestone Release and latest labs.mysql.com builds deliver enhancements based on input from the community and customers, including support for the memcached NoSQL API, faster JOIN performance and simplified administration:
In addition to memcached access, the engineering team also previewed JSON as an additional NoSQL interface to MySQL Cluster, allowing applications to directly query and modify the database, and return results directly to a browser, eliminating transformations to SQL. Expect to hear more about this shortly.
In the meantime, what do the enhancements delivered by the Development Milestone Release and labs.mysql.com do, and why should you care?
Hopefully the following will answer those questions, as well as providing you links to the blogs written directly by the product team who will discuss the implementation detail – and wax lyrical about the fruits of their labor.
Like memcached, MySQL Cluster provides a distributed hash table with in-memory performance for caching, accessed via the simple memcached API. MySQL Cluster extends memcached functionality by adding support for write-intensive workloads, a full relational model with ACID compliance (including persistence), rich query support, transparent partitioning for scale-out and 99.999% availability, with extensive management and monitoring.
All writes are committed directly to MySQL Cluster, eliminating cache invalidation and the overhead of data consistency checking to ensure complete synchronization between the database and cache. Duplication of data between the cache and database can be eliminated, enabling simpler re-use of data across multiple applications, and reducing memory footprint.
Using the memcached API, users can simplify their architecture by compressing the caching and database layers into a single data tier, managed by MySQL Cluster, enabling them to:
The Memcached API adds another NoSQL access method to MySQL Cluster, which already includes C++ (NDB API), Java, JPA, LDAP and HTTP/REST APIs, all of which can be used concurrently with SQL to serve a broad range of web, telecoms and embedded use-cases handling the simplest to the most complex queries.
The Memcached API enables web services to directly access the MySQL Cluster database without transformations to SQL, ensuring low latency and high throughput for read/write operations.
Figure 1: Implementation of memcached Access to MySQL Cluster
Implementation is simple - MySQL-Cluster specific code is loaded into the Memcached server as a "memcached driver for NDB" which can access the NDB API directly to read and write from the clustered data nodes.
You can download the code now from http://labs.mysql.com and select the source build: mysql-cluster-7.2-labs-memcached
Delivering over 20x higher performance when executing complex JOIN operations, Adaptive Query Localization expands the range of use-cases that can benefit from MySQL Cluster:
By default, JOIN operations are executed in the MySQL Server, which provides high performance when the data is local. In MySQL Cluster, the data is distributed across multiple, redundant data nodes. As a consequence, the nested-loop-join in the MySQL Server needs to access the data nodes repeatedly at each step. As the depth of the JOIN or the size of the intermediate result sets grow, the number of messages to the data nodes increases quickly, which can dramatically slow down query execution.
The Adaptive Query Localization functionality ships queries from the MySQL Server to the data nodes where the query executes on local copies of the data in parallel, and then returns the merged result set back to the MySQL Server, significantly enhancing performance by reducing network trips.
Adaptive Query Localization enables MySQL Cluster to better serve those use-cases that have the need to run complex queries along with high throughput OLTP operations in one application and database solution or where multiple applications need access to the same data but have different access/query requirements.
Note: Adaptive Query Localization is supported with the following conditions:
The parameterised queries that underpin this feature can also be used directly by applications using the NDB API directly - reducing the number of database accesses they need to make and so improving performance even further.
Radically simplifying the provisioning and administration of MySQL Cluster, user privilege tables are now consolidated into the data nodes and centrally accessible by all MySQL servers accessing the cluster.
Previously the privilege tables were local to each MySQL server, meaning users and their associated privileges had to be managed separately on each server. By consolidating privilege data, users need only be defined once and managed centrally, saving Systems Administrators significant effort and reducing cost of operations.
The MySQL Cluster 7.2 Development Milestone Release and new labs.mysql.com builds enable new classes of use-cases to benefit from web-scale performance with carrier-grade availability.
You can download the MySQL Cluster 7.2 Development Milestone Release for evaluation now from: http://dev.mysql.com/downloads/cluster/ (select Development Milestone Release tab).
You can download the memcached API for MySQL Cluster now from http://labs.mysql.com and select the source build: mysql-cluster-7.2-labs-memcached
Let us know what you think of these enhancements directly in comments for each blog. We look forward to working with the community to perfect these new features.
1 Casey Brown, DBA and Development Manager: "Here is a sample reporting query. SET ndb_join_pushdown=0; SELECT DISTINCT f.ownerUser FROM invitations i INNER JOIN sharedfiles s ON i.sharedfile = s.id INNER JOIN files f ON s.file = f.id WHERE (i.group = 'ed881bac6397ede33c0a285c9f50bb83') AND f.ownerUser <> '0ac2006cffa2d22d5f34ee20075bfad1' 739 rows in set (2.11 sec) SET ndb_join_pushdown=1; <run query again> 739 rows in set (0.10 sec) That is a massive improvement"
2 Results reported in the Enhancing JOIN Performance Webinar, November 2010: http://www.mysql.com/news-and-events/on-demand-webinars/display-od-583.html