WL#4258: NdbRecord packed read and long signal transactions
Status: Complete — Priority: Medium
1) Modify NdbRecord primary key read and unique hash index read code in NDBAPI to generate and handle 'packed' signals. Packed signals replace the sequence of AttrInfo headers+data with an attribute presence and null bitmask, followed by the concatenated contents of present attributes. This reduces the messaging required for reads, especially where there are lots of small columns, multiple bitfields and null columns. The NDBD nodes already support packed reads, so all changes are in NDBAPI for NdbRecord. NdbRecord code is changed to *always* generate packed reads (Non NdbRecord code optionally generated packed or 'normal' reads). 2) Modify NDBAPI originated signal trains to use 'long signals' to communicate with NDBD nodes. Long signals reduce per-signal messaging overheads.
Requirements and goals ---------------------- 1) Support use of long signals between API and NDBD nodes for primary key (TCKEYREQ), unique key (TCINDXREQ) and scan (SCANTABREQ) signals - Minimises per-signal header overheads, enables reduced copy in the kernel - Allows more signal processing to be uninterrupted with reduced need for state maintenance and data copying in the kernel. 2) Prepare for API zero-copy send optimisation when transporter send buffer reservation available - Reduces NDBAPI overheads 3) Reduce copying within kernel where possible by a) passing segmented section references between colocated blocks where possible b) Operating on data in place where possible New signal formats ------------------ Existing formats The short signal trains are generally of the type <Operation signal><Train of KeyInfo><Train of AttrInfo> where <Operation signal> is one of TCKEYREQ, TCINDXREQ, SCANTABREQ, LQHKEYREQ, SCANFRAGREQ etc. The Operation signals generally have space for a few KeyInfo and AttrInfo words, when there are more words, extra KeyInfo and AttrInfo signals are sent. Short TCKEYREQ supports 8 words of inline KeyInfo and 5 words of inline AttrInfo. Short LQHKEYREQ supports 4 words of inline KeyInfo and 5 words of inline AttrInfo. New formats ----------- The new signals consist of <Operation signal with long sections(s)>. All KeyInfo and AttrInfo are transported as long sections attached to the operation signal. The presence of these long sections is used to indicate that a long signal has been sent. Since the long section mechanism transports the length of the sections, there is no longer any need to use the parts of the operation signal that transported these lengths, and they can be reused in future. The main advantage of putting all of the KeyInfo or AttrInfo into long sections is that they are self-contained when they are copied into Segmented Sections by the transporter. They can be forwarded on outgoing signals, stored etc without any further copying. Long TCKEYREQ/TCINDXREQ Long TCKEYREQ is similar to a 'short' TCKEYREQ, but has 2 attached sections. Section 0 contains KeyInfo and the optional section 1 contains AttrInfo. No KeyInfo or AttrInfo is sent in the TCKEYREQ part of the signal. The keylength, attrlength, and attrinkeyreq parts of the TCKEYREQ signal are unused in the long variant. Long LQHKEYREQ Long LQHKEYREQ is similar to a 'short' LQHKEYREQ, but has 2 attached sections. Section 0 contains KeyInfo and the optional section 1 contains AttrInfo. No KeyInfo or AttrInfo is sent in the LQHKEYREQ part of the signal. The keylength, attrlength and attrinkeyreq parts of the LQHKEYREQ signal are unused in the long variant. Long SCANTABREQ Long SCANTABREQ has 3 attached sections. Section 0 is the list of NDBAPI receiver Ids that is currently sent. Section 1 is the ATTRINFO data, which is always present. Section 2 is KEYINFO data, which is presented for bounded index scans. The AttrInfo and Keyinfo length information from the SCANTABREQ header is no longer used. From mysql-5.1-telco-6.4 v6.4.0, all scans use long SCANTABREQ - only old API nodes (and possibly some internal clients?) use short SCANTABREQ. ScanFilters can generate large programs which become ATTRINFO attached to the SCANTABREQ signal. This pushes the SCANTABREQ signal over the ~16000 bytes per-signal length limit. To deal with this : 1) The maximum message size is increased to 32kB Max receive size is increased to 32kB in 6.3.16, Max send size is increased to 32kB in 6.4.0. Therefore only clusters at version >= 6.3.16 can safely upgrade to 6.4.0. 2) Fragmented SCANTABREQ is supported. If a single SCANTABREQ is > ~32kB, it will be fragmented and reassembled. Long SCANFRAGREQ Similar to Long SCANFRAGREQ. New methods were added to SimulatedBlock to support sending multiple fragmented long signals from a single set of segmented section buffers without having to copy the buffers unnecessarily.
Long TCKEYREQ + TCINDXREQ ------------------------- NDBAPI modifications : - Add support for 'Generic' long section type which uses iterator to obtain words for the long section - Implement generic iterator which copies words from a linked list of Signal objects. - Modify NdbRecord PK and unique key operations to build KeyInfo and AttrInfo for TCKEYREQ signals in linked Signal objects, suitable for iteration via generic section iterator. - Add TCKEYREQ signal length pre-calculation code in advance of zero-copy transporter buffer optimisations Kernel modifications : TCKEYREQ : - Modify Dbtc::execTCKEYREQ() to handle both short and long TCKEYREQ signals - In both cases, KeyInfo and AttrInfo is stored in Segmented Sections obtained from the Transporter - Modify Dbtc::hash() to avoid keyInfo copying if possible TCINDXREQ : - Modify Dbtc::execTCINDXREQ() to handle both short and long TCINDXREQ signals - In both cases, KeyInfo and AttrInfo is stored in Segmented Sections obtained from the Transporter - TCINDXREQ always invokes long TCKEYREQ, passing segmented sections - Modify TCINDXREQ to receive index TRANSID_AI into segmented sections, in KeyInfo format for base table TCKEYREQ (reduced copy) Test changes : Add testLimits() test program to check TCKEYREQ and TCINDXREQ behaviour under segmented section exhaustion. Add new table types to Hugo framework with maximum length keys, maximum length attr, maximum row size, maximum number of keys, maximum number of columns etc. Long LQHKEYREQ -------------- Modifications to TC - Modify execTCKEYREQ() to optionally send long LQHKEYREQ if target node is of high enough version to support it. Modifications to LQH - Modify execLQHKEYREQ() to store received KeyInfo and AttrInfo in Segmented Sections - Modify interactions with TUP and ACC to work from data stored in Segmented Sections - Modify redo log writing code to write log words from Segmented Sections. - Modify execLQHKEYREQ() to handle short or long LQHKEYREQ. - Modify execLQHKEYREQ() to send short of long LQHKEYREQ to next LQH block in replica set, based on version of target LQH. Modification to ACC - Currently : - Keyinfo is copied into signal from theData before passing to ACC - Acc uses a tmp buffer to xfrm the key if necessary signal->tmp->signal - Acc passes the signal through various hash index methods - Could reduce copies by passing long signal ACCKEYREQ to ACC - if < 60 words of KeyInfo, could operate directly from KeyInfo, same as DBTC::hash() optimisation - Not ideal as signal is currently passed around with data - Could reduce copy by xfrming direct from segmented section to signal if < 60 words of signal. - Probably not worth it. Leave for now unless it appears to be a significant cost (doubt it). Modifications to TUP - Modify TUP to send ATTRINFO for interpreted updates back to LQH in a Segmented Section. - Considered modifying TUP to enable it to run directly from the first segment of the ATTRFO long section rather than copying the data to a linear buffer. Not done. Reconsider when implementing long section storage for Scan 'stored procedures', as the saving may be greater there. Long SCANTABREQ --------------- NDBAPI : - Modify signal building to put AttrInfo and KeyInfo into generic signal objects. Signal sending uses these via a Generic Iterator to send long sections with the ScanTabReq. Simplify code where possible. - Add new sendFragmentedSignal() method which takes an array of up to 3 GenericSectionPtrs. These are used to generate signal fragments as necessary DBTC : - Add handler for out-of-segmented-sections. - Modify SCANTABREQ handling to store received KeyInfo and AttrInfo in long sections - Add handler for long SCANTABREQ. - Reuse methods created for long TCKEYREQ/TCINDXREQ. Remove old Key and Attr buffer structures and their supporting definitions. - Add fragmented signal reassembly code to execSCAN_TABREQ() Kernel general : - Modify max sent and received message sizes. Add upgrade support code. Long SCANFRAGREQ ---------------- DBTC : - Add methods to SimulatedBlock to support sending fragmented long signals with no segmented section release. This enables the scanfragreq 'multicast' to be more efficient. DBLQH : - Add handler for out of segmented sections - Modify SCANFRAGREQ handling to store received KeyInfo and AttrInfo in long sections. - Add handler for long SCANFRAGREQ - Modify stored proc code to store scan AttrInfo 'programs' in segmented sections (including default program used for copyFragReq). - Modify TUPKEYREQ to use segmented section programs. - Remove no-longer-used structures and types from LQH.
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.