INSERT DELAYED component suffers from a few design flaws which could be addressed by easily copy known patterns. There is no reason why a INSERT DELAYED should be slower than a batched INSERT. Suggestions: 1) Improve producer-consumer pattern by introducing spin-buffers. This will scale better with many insert threads. 2) Improve handler throughput by writing data with handler::bulk_insert(). This will make the consumer thread work much more efficiently as we loop over much smaller instruction sets and also allows us to take advantage of any SE specific optimization.