MySQL 9.7.0
Source Code Documentation
histogram.h
Go to the documentation of this file.
1#ifndef HISTOGRAMS_HISTOGRAM_INCLUDED
2#define HISTOGRAMS_HISTOGRAM_INCLUDED
3
4/* Copyright (c) 2016, 2026, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27/**
28 @file sql/histograms/histogram.h
29 Histogram base class.
30
31 This file defines the base class for all histogram types. We keep the base
32 class itself non-templatized in order to more easily send a histogram as an
33 argument, collect multiple histograms in a single collection etc.
34
35 A histogram is stored as a JSON object. This gives the flexibility of storing
36 virtually an unlimited number of buckets, data values in its full length and
37 easily expanding with new histogram types in the future. They are stored
38 persistently in the system table mysql.column_stats.
39
40 We keep all histogram code in the namespace "histograms" in order to avoid
41 name conflicts etc.
42*/
43
44#include <cstddef> // size_t
45#include <functional>
46#include <map> // std::map
47#include <memory>
48#include <set> // std::set
49#include <string> // std::string
50#include <utility> // std::pair
51
52#include "lex_string.h" // LEX_CSTRING
53#include "my_base.h" // ha_rows
54#include "sql/field.h" // Field
56#include "sql/mem_root_allocator.h" // Mem_root_allocator
57#include "sql/stateless_allocator.h" // Stateless_allocator
58
59class Item;
60class Json_dom;
61class Json_object;
62class THD;
63struct TYPELIB;
64class Field;
65
66namespace dd {
67class Table;
68} // namespace dd
69namespace histograms {
70struct Histogram_comparator;
71template <class T>
72class Value_map;
73} // namespace histograms
74struct CHARSET_INFO;
75struct MEM_ROOT;
76class Table_ref;
77class Json_dom;
78
79namespace histograms {
80
81/// The default (and invalid) value for "m_null_values_fraction".
82static const double INVALID_NULL_VALUES_FRACTION = -1.0;
83
84enum class Message {
90 VIEW,
99
100 // JSON validation errors. See Error_context.
125};
126
128 void *operator()(size_t s) const;
129};
130
131template <class T>
133
134template <class T>
136
137template <typename T>
139 std::map<T, ha_rows, Histogram_comparator, value_map_allocator<T>>;
140
141using columns_set = std::set<std::string, std::less<std::string>,
143
144// Used as an array, so duplicate values are not checked.
145// TODO((tlchrist): Convert this std::map to an array.
147 std::map<std::string, Message, std::less<std::string>,
149
150/**
151 The different operators we can ask histogram statistics for selectivity
152 estimations.
153*/
154enum class enum_operator {
155 EQUALS_TO,
157 LESS_THAN,
158 IS_NULL,
163 BETWEEN,
165 IN_LIST,
167};
168
169/**
170 Error context to validate given JSON object which represents a histogram.
171
172 A validation error consists of two pieces of information:
173
174 1) error code - what kind of error it is
175 2) JSON path - where the error occurs
176
177 Errors are classified into a few conceptual categories, namely
178
179 1) absence of required attributes
180 2) unexpected JSON type of attributes
181 3) value encoding corruption
182 4) value out of domain
183 5) breaking bucket sequence semantics
184 6) breaking certain constraint between pieces of information
185
186 @see histograms::Message for the list of JSON validation errors.
187
188 Use of the Error_context class
189 ------------------------------
190
191 An Error_context object is passed along with other parameters to the
192 json_to_histogram() function that is used to create a histogram object (e.g.
193 Equi_height<longlong>) from a JSON string.
194
195 The json_to_histogram() function has two different use cases, with different
196 requirements for validation:
197
198 1) Deserializing a histogram that was retrieved from the dictionary. In this
199 case the histogram has already been validated, and the user is not
200 expecting validation feedback, so we pass along a default-constructed
201 "empty shell" Error_context object with no-op operations.
202
203 2) When validating the user-supplied JSON string to the UPDATE HISTOGRAM ...
204 USING DATA commmand. In this case we pass along an active Error_context
205 object that uses a Field object to validate bucket values, and stores
206 results in a results_map.
207
208 The binary() method is used to distinguish between these two contexts/cases.
209*/
211 public:
212 /// Default constructor. Used when deserializing binary JSON that has already
213 /// been validated, e.g. when retrieving a histogram from the dictionary, and
214 /// the Error_context object is not actively used for validation.
217
218 /**
219 Constructor. Used in the context of deserializing the user-supplied JSON
220 string to the UPDATE HISTOGRAM ... USING DATA command.
221
222 @param thd Thread context
223 @param field The field for values on which the histogram is built
224 @param results Where reported errors are stored
225 */
227 : m_thd(thd), m_field(field), m_results(results), m_binary(false) {}
228
229 /**
230 Report a global error to this context.
231
232 @param err_code The global error code
233 */
234 void report_global(Message err_code);
235
236 /**
237 Report to this context that a required attribute is missing.
238
239 @param name Name of the missing attribute
240 */
241 void report_missing_attribute(const std::string &name);
242
243 /**
244 Report to this context that an error occurs on the given dom node.
245
246 @param dom The given dom node
247 @param err_code The error code
248 */
249 void report_node(const Json_dom *dom, Message err_code);
250
251 /**
252 Check if the value is in the field definition domain.
253
254 @param v Pointer to the value.
255
256 @return true on error, false otherwise
257
258 @note Uses Field::store() on the field for which the user-defined histogram
259 is to be constructed in order to check the validity of the supplied value.
260 This will have the side effect of writing to the record buffer so this
261 should only be used with an active Error_context (with a non-nullptr field)
262 when we do not otherwise expect to use the record buffer. Currently the only
263 use case is to validate the JSON input to the command UPDATE HISTOGRAM ...
264 USING DATA where it should be OK to use the field for this purpose.
265 */
266 template <typename T>
267 bool check_value(T *v);
268
269 /**
270 Tell whether the input json is an internal persisted copy or
271 a user-defined input. If the input is an internal copy, there
272 should never be type/format errors. If it is a user-defined input,
273 errors may occur and should be handled, and some type casting may
274 be needed.
275
276 @return true for JSON, false otherwise
277 */
278 bool binary() const { return m_binary; }
279
280 /**
281 Return data-type of field in context if present. Used to enforce
282 that histogram datatype matches column datatype for user-defined
283 histograms.
284
285 @return datatype string if present, nullptr if not
286 */
287 Field *field() const { return m_field; }
288
289 private:
290 /// Thread context for error handlers
292 /// The field for checking endpoint values
294 /// Where reported errors are stored
296 /// Whether or not the JSON object to process is in binary format
298};
299
300/**
301 Histogram base class.
302
303 This is an abstract class containing the interface and shared code for
304 concrete histogram subclasses.
305
306 Histogram subclasses (Singleton, Equi_height) are constructed through factory
307 methods in order to catch memory allocation errors during construction.
308
309 The histogram subclasses have no public copy or move constructors. In order to
310 copy a histogram onto a given MEM_ROOT, use the public clone method. The clone
311 method ensures that members of the histogram, such String type buckets,
312 are also allocated on the given MEM_ROOT. Modifications to these methods need
313 to be careful that histogram buckets are cloned/copied correctly.
314*/
316 public:
317 /// All supported histogram types in MySQL.
319
320 /// String representation of the JSON field "histogram-type".
321 static constexpr const char *histogram_type_str() { return "histogram-type"; }
322
323 /// String representation of the JSON field "data-type".
324 static constexpr const char *data_type_str() { return "data-type"; }
325
326 /// String representation of the JSON field "collation-id".
327 static constexpr const char *collation_id_str() { return "collation-id"; }
328
329 /// String representation of the histogram type SINGLETON.
330 static constexpr const char *singleton_str() { return "singleton"; }
331
332 /// String representation of the histogram type EQUI-HEIGHT.
333 static constexpr const char *equi_height_str() { return "equi-height"; }
334
335 protected:
337
338 /// The fraction of NULL values in the histogram (between 0.0 and 1.0).
340
341 /// The character set for the data stored
343
344 /// The number of buckets originally specified
346
347 /// String representation of the JSON field "buckets".
348 static constexpr const char *buckets_str() { return "buckets"; }
349
350 /// String representation of the JSON field "last-updated".
351 static constexpr const char *last_updated_str() { return "last-updated"; }
352
353 /// String representation of the JSON field "null-values".
354 static constexpr const char *null_values_str() { return "null-values"; }
355
356 static constexpr const char *sampling_rate_str() { return "sampling-rate"; }
357
358 /// String representation of the JSON field "number-of-buckets-specified".
359 static constexpr const char *numer_of_buckets_specified_str() {
360 return "number-of-buckets-specified";
361 }
362
363 /// String representation of the JSON field "auto-update".
364 static constexpr const char *auto_update_str() { return "auto-update"; }
365
366 /**
367 Constructor.
368
369 @param mem_root the mem_root where the histogram contents will be allocated
370 @param db_name name of the database this histogram represents
371 @param tbl_name name of the table this histogram represents
372 @param col_name name of the column this histogram represents
373 @param type the histogram type (equi-height, singleton)
374 @param data_type the type of data that this histogram contains
375 @param[out] error is set to true if an error occurs
376 */
377 Histogram(MEM_ROOT *mem_root, const std::string &db_name,
378 const std::string &tbl_name, const std::string &col_name,
379 enum_histogram_type type, Value_map_type data_type, bool *error);
380
381 /**
382 Copy constructor
383
384 This will make a copy of the provided histogram onto the provided MEM_ROOT.
385
386 @param mem_root the mem_root where the histogram contents will be allocated
387 @param other the histogram to copy
388 @param[out] error is set to true if an error occurs
389 */
390 Histogram(MEM_ROOT *mem_root, const Histogram &other, bool *error);
391
392 /**
393 Write the data type of this histogram into a JSON object.
394
395 @param json_object the JSON object where we will write the histogram
396 data type
397
398 @return true on error, false otherwise
399 */
400 bool histogram_data_type_to_json(Json_object *json_object) const;
401
402 /**
403 Return the value that is contained in the JSON DOM object.
404
405 For most types, this function simply returns the contained value. For String
406 values, the value is allocated on this histograms MEM_ROOT before it is
407 returned. This allows the String value to survive the entire lifetime of the
408 histogram object.
409
410 @param json_dom the JSON DOM object to extract the value from
411 @param out the value from the JSON DOM object
412 @param context error context for validation
413
414 @return true on error, false otherwise
415 */
416 template <class T>
417 bool extract_json_dom_value(const Json_dom *json_dom, T *out,
418 Error_context *context);
419
420 /**
421 Populate the histogram with data from the provided JSON object. The base
422 class also provides an implementation that subclasses must call in order
423 to populate fields that are shared among all histogram types (character set,
424 null values fraction).
425
426 @param json_object the JSON object to read the histogram data from
427 @param context error context for validation
428
429 @return true on error, false otherwise
430 */
431 virtual bool json_to_histogram(const Json_object &json_object,
432 Error_context *context) = 0;
433
434 private:
435 /// The MEM_ROOT where the histogram contents will be allocated.
437
438 /// The type of this histogram.
440
441 /// The type of the data this histogram contains.
443
444 /// Name of the database this histogram represents.
446
447 /// Name of the table this histogram represents.
449
450 /// Name of the column this histogram represents.
452
453 /// True if the histogram was created with the AUTO UPDATE option, false if
454 /// MANUAL UPDATE.
456
457 /**
458 An internal function for getting a selectivity estimate prior to adustment.
459 @see get_selectivity() for details.
460 */
461 bool get_raw_selectivity(Item **items, size_t item_count, enum_operator op,
462 double *selectivity) const;
463
464 /**
465 An internal function for getting the selecitvity estimation.
466
467 This function will read/evaluate the value from the given Item, and pass
468 this value on to the correct selectivity estimation function based on the
469 data type of the histogram. For instance, if the data type of the histogram
470 is INT, we will call "val_int" on the Item to evaluate the value as an
471 integer and pass this value on to the next function.
472
473 @param item The Item to read/evaluate the value from.
474 @param op The operator we are estimating the selectivity for.
475 @param typelib In the case of ENUM or SET data type, this parameter holds
476 the type information. This is needed in order to map a
477 string representation of an ENUM/SET value into its correct
478 integer representation (ENUM/SET values are stored as
479 integer values in the histogram).
480 @param[out] selectivity The estimated selectivity, between 0.0 and 1.0
481 inclusive.
482
483 @return true on error (i.e the provided item was NULL), false on success.
484 */
485 bool get_selectivity_dispatcher(Item *item, const enum_operator op,
486 const TYPELIB *typelib,
487 double *selectivity) const;
488
489 /**
490 An internal function for getting the selecitvity estimation.
491
492 This function will cast the histogram to the correct class (using down_cast)
493 and pass the given value on to the correct selectivity estimation function
494 for that class.
495
496 @param value The value to estimate the selectivity for.
497
498 @return The estimated selectivity, between 0.0 and 1.0 inclusive.
499 */
500 template <class T>
501 double get_less_than_selectivity_dispatcher(const T &value) const;
502
503 /// @see get_less_than_selectivity_dispatcher
504 template <class T>
506
507 /// @see get_less_than_selectivity_dispatcher
508 template <class T>
509 double get_equal_to_selectivity_dispatcher(const T &value) const;
510
511 /**
512 An internal function for applying the correct function for the given
513 operator.
514
515 @param op The operator to apply
516 @param value The value to find the selectivity for.
517
518 @return The estimated selectivity, between 0.0 and 1.0 inclusive.
519 */
520 template <class T>
521 double apply_operator(const enum_operator op, const T &value) const;
522
523 public:
524 Histogram() = delete;
525 Histogram(const Histogram &other) = delete;
526
527 /// Destructor.
528 virtual ~Histogram() = default;
529
530 /// @return the MEM_ROOT that this histogram uses for allocations
531 MEM_ROOT *get_mem_root() const { return m_mem_root; }
532
533 /**
534 @return name of the database this histogram represents
535 */
537
538 /**
539 @return name of the table this histogram represents
540 */
541 const LEX_CSTRING get_table_name() const { return m_table_name; }
542
543 /**
544 @return name of the column this histogram represents
545 */
546 const LEX_CSTRING get_column_name() const { return m_column_name; }
547
548 /**
549 @return type of this histogram
550 */
552
553 /**
554 @return the fraction of NULL values, in the range [0.0, 1.0]
555 */
556 double get_null_values_fraction() const;
557
558 /// @return the character set for the data this histogram contains
559 const CHARSET_INFO *get_character_set() const { return m_charset; }
560
561 /// @return the sampling rate used to generate this histogram
562 double get_sampling_rate() const { return m_sampling_rate; }
563
564 /**
565 Returns the histogram type as a readable string.
566
567 @return a readable string representation of the histogram type
568 */
569 virtual std::string histogram_type_to_str() const = 0;
570
571 /**
572 @return number of buckets in this histogram
573 */
574 virtual size_t get_num_buckets() const = 0;
575
576 /**
577 Get the estimated number of distinct non-NULL values.
578 @return number of distinct non-NULL values
579 */
580 virtual size_t get_num_distinct_values() const = 0;
581
582 /**
583 @return the data type that this histogram contains
584 */
586
587 /**
588 @return number of buckets originally specified by the user. This may be
589 higher than the actual number of buckets in the histogram.
590 */
592
593 /**
594 @return True if automatic updates are enabled for the histogram, false
595 otherwise.
596 */
597 bool get_auto_update() const { return m_auto_update; }
598
599 /**
600 Sets the auto update property for the histogram.
601 */
602 void set_auto_update(bool auto_update) { m_auto_update = auto_update; }
603
604 /**
605 Converts the histogram to a JSON object.
606
607 @param[in,out] json_object output where the histogram is to be stored. The
608 caller is responsible for allocating/deallocating the JSON
609 object
610
611 @return true on error, false otherwise
612 */
613 virtual bool histogram_to_json(Json_object *json_object) const = 0;
614
615 /**
616 Converts JSON object to a histogram.
617
618 @param mem_root MEM_ROOT where the histogram will be allocated
619 @param schema_name the schema name
620 @param table_name the table name
621 @param column_name the column name
622 @param json_object output where the histogram is stored
623 @param context error context for validation
624
625 @return nullptr on error. Otherwise a histogram allocated on the provided
626 MEM_ROOT.
627 */
629 const std::string &schema_name,
630 const std::string &table_name,
631 const std::string &column_name,
632 const Json_object &json_object,
633 Error_context *context);
634
635 /**
636 Make a clone of the current histogram
637
638 @param mem_root the MEM_ROOT on which the new histogram will be allocated.
639
640 @return a histogram allocated on the provided MEM_ROOT. Returns nullptr
641 on error.
642 */
643 virtual Histogram *clone(MEM_ROOT *mem_root) const = 0;
644
645 /**
646 Store this histogram to persistent storage (data dictionary). The MEM_ROOT
647 that the histogram is allocated on is transferred to the dictionary.
648
649 @param thd Thread handler.
650
651 @return false on success, true on error.
652 */
653 bool store_histogram(THD *thd) const;
654
655 /**
656 Get selectivity estimation.
657
658 This function will try and get the selectivity estimation for a predicate
659 on the form "COLUMN OPERATOR CONSTANT", for instance "SELECT * FROM t1
660 WHERE col1 > 23;".
661
662 This function will take care of several of things, for instance checking
663 that the value we are estimating the selectivity for is a constant value.
664
665 The order of the Items provided does not matter. For instance, of the
666 operator argument given is "EQUALS_TO", it does not matter if the constant
667 value is provided as the first or the second argument; this function will
668 take care of this.
669
670 @param items an array of items that contains both the field we
671 are estimating the selectivity for, as well as the
672 user-provided constant values.
673 @param item_count the number of Items in the Item array.
674 @param op the predicate operator
675 @param[out] selectivity the calculated selectivity if a usable histogram was
676 found
677
678 @retval true if an error occurred (the Item provided was not a constant
679 value or similar).
680 @return false if success
681 */
682 bool get_selectivity(Item **items, size_t item_count, enum_operator op,
683 double *selectivity) const;
684
685 /**
686 @return the fraction of non-null values in the histogram.
687 */
689 return 1.0 - get_null_values_fraction();
690 }
691};
692
693/** Return true if 'histogram' was built on an empty table.*/
694inline bool empty(const Histogram &histogram) {
695 return histogram.get_num_distinct_values() == 0 &&
696 histogram.get_null_values_fraction() == 0.0;
697}
698
699/**
700 Create a histogram from a value map.
701
702 This function will build a histogram from a value map. The histogram type
703 depends on both the size of the input data, as well as the number of buckets
704 specified. If the number of distinct values is less than or equal to the
705 number of buckets, a Singleton histogram will be created. Otherwise, an
706 equi-height histogram will be created.
707
708 The histogram will be allocated on the supplied mem_root, and it is the
709 callers responsibility to properly clean up when the histogram isn't needed
710 anymore.
711
712 @param mem_root the MEM_ROOT where the histogram contents will be
713 allocated
714 @param value_map a value map containing [value, frequency]
715 @param num_buckets the maximum number of buckets to create
716 @param db_name name of the database this histogram represents
717 @param tbl_name name of the table this histogram represents
718 @param col_name name of the column this histogram represents
719
720 @return a histogram, using at most "num_buckets" buckets. The histogram
721 type depends on the size of the input data, and the number of
722 buckets
723*/
724template <class T>
725Histogram *build_histogram(MEM_ROOT *mem_root, const Value_map<T> &value_map,
726 size_t num_buckets, const std::string &db_name,
727 const std::string &tbl_name,
728 const std::string &col_name);
729
730/**
731 A simple struct containing the settings for a histogram to be built.
732*/
734 /// A null-terminated C-style string with the name of the column to build the
735 /// histogram for.
736 const char *column_name;
737
738 /// The target number of buckets for the histogram.
739 size_t num_buckets = 100;
740
741 /// Holds the JSON specification of the histogram for the UPDATE HISTOGRAM ...
742 /// USING DATA command, otherwise empty.
743 LEX_STRING data = {nullptr, 0};
744
745 /// True if AUTO UPDATE, false for MANUAL UPDATE.
746 bool auto_update = false;
747
748 /// A pointer to the field, used internally by update_histograms().
749 Field *field = nullptr;
750};
751
752/**
753 Create or update histograms for a set of columns of a given table.
754
755 This function will try to create a histogram for each HistogramSetting object
756 passed to it. It operates in two stages:
757
758 In the first stage it will attempt to resolve every HistogramSetting in
759 settings, verifying that the specified column exists and supports histograms.
760 If a setting cannot be resolved an error message will be generated (see note
761 below for details on error reporting), but the function will continue
762 executing. The collection of settings is modified in-place so that only the
763 resolved settings remain when the function returns.
764
765 In the second stage, after the settings have been resolved, the function
766 attempts to build a histogram for each resolved column. If an error is
767 encountered during this stage, the function will immediately abort and return
768 true. In other words, if the function returns true, it will have made an
769 attempt to update the histograms as specified in the output collection of
770 settings, but it could have failed halfway.
771
772 If no error occurs during the second stage the function will return false, and
773 the histograms specified in the output collection of settings will succesfully
774 have been updated.
775
776 @param thd Thread handler.
777 @param table The table where we should look for the columns/data.
778 @param[in,out] settings The settings for the histograms to be built.
779 @param[in,out] results A map where the result of each operation is stored.
780
781 @return False on success, true if an error was encountered.
782*/
785 results_map &results);
786
787/**
788 Updates existing histograms on a table that were specified with the AUTO
789 UPDATE option. If any histograms were updated a new snapshot of the current
790 collection of histograms for the table is inserted on the TABLE_SHARE.
791
792 @note The caller must manually ensure that the table share is flushed or that
793 tables are evicted from the table cache to guarantee that new queries will use
794 the updated histograms. This can be done by calling tdc_remove_table() and
795 passing the TDC_RT_REMOVE_UNUSED or TDC_RT_MARK_FOR_REOPEN option,
796 respectively.
797
798 @param thd Thread handle.
799 @param table Table_ref for the table to update histograms on. The table should
800 already be opened.
801
802 @return False if all automatically updated histograms on the table
803 (potentially none) were updated without encountering an error. True otherwise.
804*/
806
807/**
808 Retrieve an updated snapshot of the histograms on a table directly from the
809 dictionary (in an inefficient manner, querying all columns) and inserts this
810 snapshot in the Table_histograms_collection on the TABLE_SHARE.
811
812 @param thd The current thread.
813 @param table The table to retrieve updated histograms for.
814
815 @note This function assumes that the table is opened and generally depends on
816 the surrounding context. It also locks/unlocks LOCK_OPEN.
817
818 @return False on success. Returns true if an error occurred in which case the
819 TABLE_SHARE will not have been updated.
820*/
822
823/**
824 Updates existing histograms on a table that were specified with the AUTO
825 UPDATE option. Updated histograms are made available to the optimizer.
826
827 This function wraps auto_update_table_histograms()) in an appropriate
828 transaction-context for the background thread.
829
830 @note This function temporarily disables the binary log as we are not
831 interested in replicating or recovering updates to histograms that take place
832 in the background.
833
834 @note This function supresses some errors in order to avoid spamming the error
835 log, but unexpected errors are written to the error log, following the same
836 pattern as the event scheduler.
837
838 @param thd Background thread handle.
839 @param db_name Name of the database holding the table.
840 @param table_name Name of the table to update histograms for.
841
842 @return False on success, true on error.
843*/
845 THD *thd, const std::string &db_name, const std::string &table_name);
846
847/**
848 Drop histograms for all columns in a given table.
849
850 @param thd Thread handler.
851 @param table The table where we should look for the columns.
852 @param original_table_def Original table definition.
853 @param results A map where the result of each operation is stored.
854
855 @note Assumes that caller owns exclusive metadata lock on the table,
856 so there is no need to lock individual statistics.
857
858 @return false on success, true on error.
859*/
861 const dd::Table &original_table_def,
862 results_map &results);
863
864/**
865 Drop histograms for a set of columns in a given table.
866
867 This function will try to drop the histogram statistics for all specified
868 columns. If one of the columns fail, it will continue to the next one and try.
869
870 @param thd Thread handler.
871 @param table The table where we should look for the columns.
872 @param columns Columns specified by the user.
873 @param results A map where the result of each operation is stored.
874
875 @note Assumes that the caller has the appropriate metadata locks on both the
876 table and column statistics. That can either be an exclusive metadata lock on
877 the table itself, or a shared metadata lock on the table combined with
878 exclusive locks on individual column statistics.
879
880 @return false on success, true on error.
881*/
882bool drop_histograms(THD *thd, Table_ref &table, const columns_set &columns,
883 results_map &results);
884
885/**
886 Rename histograms for all columns in a given table.
887
888 @param thd Thread handler.
889 @param old_schema_name The old schema name
890 @param old_table_name The old table name
891 @param new_schema_name The new schema name
892 @param new_table_name The new table name
893 @param results A map where the result of each operation is stored.
894
895 @return false on success, true on error.
896*/
897bool rename_histograms(THD *thd, const char *old_schema_name,
898 const char *old_table_name, const char *new_schema_name,
899 const char *new_table_name, results_map &results);
900
901bool find_histogram(THD *thd, const std::string &schema_name,
902 const std::string &table_name,
903 const std::string &column_name,
904 const Histogram **histogram);
905} // namespace histograms
906
907#endif
Kerberos Client Authentication nullptr
Definition: auth_kerberos_client_plugin.cc:247
Definition: field.h:573
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:929
JSON DOM abstract base class.
Definition: json_dom.h:179
Represents a JSON container value of type "object" (ECMA), type J_OBJECT here.
Definition: json_dom.h:374
Mem_root_allocator is a C++ STL memory allocator based on MEM_ROOT.
Definition: mem_root_allocator.h:68
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:432
Stateless_allocator is a C++ STL memory allocator skeleton based on Malloc_allocator,...
Definition: stateless_allocator.h:92
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
Definition: table.h:2958
Definition: table.h:47
Error context to validate given JSON object which represents a histogram.
Definition: histogram.h:210
Field * m_field
The field for checking endpoint values.
Definition: histogram.h:293
void report_node(const Json_dom *dom, Message err_code)
Report to this context that an error occurs on the given dom node.
Definition: histogram.cc:245
Error_context()
Default constructor.
Definition: histogram.h:215
Error_context(THD *thd, Field *field, results_map *results)
Constructor.
Definition: histogram.h:226
Field * field() const
Return data-type of field in context if present.
Definition: histogram.h:287
results_map * m_results
Where reported errors are stored.
Definition: histogram.h:295
bool m_binary
Whether or not the JSON object to process is in binary format.
Definition: histogram.h:297
THD * m_thd
Thread context for error handlers.
Definition: histogram.h:291
void report_missing_attribute(const std::string &name)
Report to this context that a required attribute is missing.
Definition: histogram.cc:233
bool check_value(T *v)
Check if the value is in the field definition domain.
Definition: histogram.cc:299
void report_global(Message err_code)
Report a global error to this context.
Definition: histogram.cc:221
bool binary() const
Tell whether the input json is an internal persisted copy or a user-defined input.
Definition: histogram.h:278
Histogram base class.
Definition: histogram.h:315
static constexpr const char * auto_update_str()
String representation of the JSON field "auto-update".
Definition: histogram.h:364
bool extract_json_dom_value(const Json_dom *json_dom, T *out, Error_context *context)
Return the value that is contained in the JSON DOM object.
virtual std::string histogram_type_to_str() const =0
Returns the histogram type as a readable string.
size_t m_num_buckets_specified
The number of buckets originally specified.
Definition: histogram.h:345
Value_map_type get_data_type() const
Definition: histogram.h:585
double get_equal_to_selectivity_dispatcher(const T &value) const
Definition: histogram.cc:2277
virtual size_t get_num_buckets() const =0
MEM_ROOT * m_mem_root
The MEM_ROOT where the histogram contents will be allocated.
Definition: histogram.h:436
static constexpr const char * data_type_str()
String representation of the JSON field "data-type".
Definition: histogram.h:324
double m_sampling_rate
Definition: histogram.h:336
static constexpr const char * collation_id_str()
String representation of the JSON field "collation-id".
Definition: histogram.h:327
double get_less_than_selectivity_dispatcher(const T &value) const
An internal function for getting the selecitvity estimation.
Definition: histogram.cc:2238
static constexpr const char * buckets_str()
String representation of the JSON field "buckets".
Definition: histogram.h:348
static constexpr const char * numer_of_buckets_specified_str()
String representation of the JSON field "number-of-buckets-specified".
Definition: histogram.h:359
virtual ~Histogram()=default
Destructor.
virtual size_t get_num_distinct_values() const =0
Get the estimated number of distinct non-NULL values.
double get_sampling_rate() const
Definition: histogram.h:562
const enum_histogram_type m_hist_type
The type of this histogram.
Definition: histogram.h:439
virtual bool histogram_to_json(Json_object *json_object) const =0
Converts the histogram to a JSON object.
Definition: histogram.cc:376
const CHARSET_INFO * m_charset
The character set for the data stored.
Definition: histogram.h:342
static constexpr const char * last_updated_str()
String representation of the JSON field "last-updated".
Definition: histogram.h:351
void set_auto_update(bool auto_update)
Sets the auto update property for the histogram.
Definition: histogram.h:602
LEX_CSTRING m_table_name
Name of the table this histogram represents.
Definition: histogram.h:448
Histogram(const Histogram &other)=delete
size_t get_num_buckets_specified() const
Definition: histogram.h:591
bool get_raw_selectivity(Item **items, size_t item_count, enum_operator op, double *selectivity) const
An internal function for getting a selectivity estimate prior to adustment.
Definition: histogram.cc:2531
static constexpr const char * equi_height_str()
String representation of the histogram type EQUI-HEIGHT.
Definition: histogram.h:333
double get_non_null_values_fraction() const
Definition: histogram.h:688
virtual Histogram * clone(MEM_ROOT *mem_root) const =0
Make a clone of the current histogram.
bool get_selectivity(Item **items, size_t item_count, enum_operator op, double *selectivity) const
Get selectivity estimation.
Definition: histogram.cc:2485
double m_null_values_fraction
The fraction of NULL values in the histogram (between 0.0 and 1.0).
Definition: histogram.h:339
bool m_auto_update
True if the histogram was created with the AUTO UPDATE option, false if MANUAL UPDATE.
Definition: histogram.h:455
const Value_map_type m_data_type
The type of the data this histogram contains.
Definition: histogram.h:442
const LEX_CSTRING get_database_name() const
Definition: histogram.h:536
LEX_CSTRING m_column_name
Name of the column this histogram represents.
Definition: histogram.h:451
bool get_selectivity_dispatcher(Item *item, const enum_operator op, const TYPELIB *typelib, double *selectivity) const
An internal function for getting the selecitvity estimation.
Definition: histogram.cc:2347
double apply_operator(const enum_operator op, const T &value) const
An internal function for applying the correct function for the given operator.
Definition: histogram.cc:2331
const CHARSET_INFO * get_character_set() const
Definition: histogram.h:559
bool get_auto_update() const
Definition: histogram.h:597
const LEX_CSTRING get_table_name() const
Definition: histogram.h:541
double get_null_values_fraction() const
Definition: histogram.cc:426
MEM_ROOT * get_mem_root() const
Definition: histogram.h:531
enum_histogram_type get_histogram_type() const
Definition: histogram.h:551
virtual bool json_to_histogram(const Json_object &json_object, Error_context *context)=0
Populate the histogram with data from the provided JSON object.
Definition: histogram.cc:638
LEX_CSTRING m_database_name
Name of the database this histogram represents.
Definition: histogram.h:445
bool store_histogram(THD *thd) const
Store this histogram to persistent storage (data dictionary).
Definition: histogram.cc:2054
static constexpr const char * histogram_type_str()
String representation of the JSON field "histogram-type".
Definition: histogram.h:321
bool histogram_data_type_to_json(Json_object *json_object) const
Write the data type of this histogram into a JSON object.
Definition: histogram.cc:749
static constexpr const char * singleton_str()
String representation of the histogram type SINGLETON.
Definition: histogram.h:330
static constexpr const char * sampling_rate_str()
Definition: histogram.h:356
double get_greater_than_selectivity_dispatcher(const T &value) const
Definition: histogram.cc:2257
const LEX_CSTRING get_column_name() const
Definition: histogram.h:546
enum_histogram_type
All supported histogram types in MySQL.
Definition: histogram.h:318
static constexpr const char * null_values_str()
String representation of the JSON field "null-values".
Definition: histogram.h:354
static MEM_ROOT mem_root
Definition: client_plugin.cc:114
#define T
Definition: jit_executor_value.cc:373
This file includes constants used by all storage engines.
static PFS_engine_table_share_proxy table
Definition: pfs.cc:61
The version of the current data dictionary table definitions.
Definition: dictionary_client.h:44
Definition: column_statistics.h:34
std::set< std::string, std::less< std::string >, Histogram_key_allocator< std::string > > columns_set
Definition: histogram.h:142
bool drop_all_histograms(THD *thd, Table_ref &table, const dd::Table &table_definition, results_map &results)
Drop histograms for all columns in a given table.
Definition: histogram.cc:2008
std::map< std::string, Message, std::less< std::string >, Histogram_key_allocator< std::pair< const std::string, Message > > > results_map
Definition: histogram.h:148
Message
Definition: histogram.h:84
@ JSON_CUMULATIVE_FREQUENCY_NOT_ASCENDING
@ JSON_NUM_BUCKETS_MORE_THAN_SPECIFIED
bool update_share_histograms(THD *thd, Table_ref *table)
Retrieve an updated snapshot of the histograms on a table directly from the dictionary (in an ineffic...
Definition: histogram.cc:1592
bool auto_update_table_histograms_from_background_thread(THD *thd, const std::string &db_name, const std::string &table_name)
Updates existing histograms on a table that were specified with the AUTO UPDATE option.
Definition: histogram.cc:1858
bool drop_histograms(THD *thd, Table_ref &table, const columns_set &columns, results_map &results)
Drop histograms for a set of columns in a given table.
Definition: histogram.cc:2018
enum_operator
The different operators we can ask histogram statistics for selectivity estimations.
Definition: histogram.h:154
bool rename_histograms(THD *thd, const char *old_schema_name, const char *old_table_name, const char *new_schema_name, const char *new_table_name, results_map &results)
Rename histograms for all columns in a given table.
Definition: histogram.cc:2175
bool auto_update_table_histograms(THD *thd, Table_ref *table)
Updates existing histograms on a table that were specified with the AUTO UPDATE option.
Definition: histogram.cc:1735
Histogram * build_histogram(MEM_ROOT *mem_root, const Value_map< T > &value_map, size_t num_buckets, const std::string &db_name, const std::string &tbl_name, const std::string &col_name)
Create a histogram from a value map.
Definition: histogram.cc:436
bool find_histogram(THD *thd, const std::string &schema_name, const std::string &table_name, const std::string &column_name, const Histogram **histogram)
Definition: histogram.cc:2214
bool update_histograms(THD *thd, Table_ref *table, Mem_root_array< HistogramSetting > *settings, results_map &results)
Create or update histograms for a set of columns of a given table.
Definition: histogram.cc:1511
static const double INVALID_NULL_VALUES_FRACTION
The default (and invalid) value for "m_null_values_fraction".
Definition: histogram.h:82
std::map< T, ha_rows, Histogram_comparator, value_map_allocator< T > > value_map_type
Definition: histogram.h:139
Value_map_type
Datatypes that a Value_map and histogram can hold (including the invalid type).
Definition: value_map_type.h:33
bool empty(const Histogram &histogram)
Return true if 'histogram' was built on an empty table.
Definition: histogram.h:694
entry::Table Table
Definition: select.h:51
ValueType value(const std::optional< ValueType > &v)
Definition: gtid.h:83
const char * table_name
Definition: rules_table_service.cc:56
const char * db_name
Definition: rules_table_service.cc:55
required string type
Definition: replication_group_member_actions.proto:34
Definition: m_ctype.h:421
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:83
Definition: mysql_lex_string.h:40
Definition: mysql_lex_string.h:35
Definition: typelib.h:35
A simple struct containing the settings for a histogram to be built.
Definition: histogram.h:733
bool auto_update
True if AUTO UPDATE, false for MANUAL UPDATE.
Definition: histogram.h:746
LEX_STRING data
Holds the JSON specification of the histogram for the UPDATE HISTOGRAM ... USING DATA command,...
Definition: histogram.h:743
Field * field
A pointer to the field, used internally by update_histograms().
Definition: histogram.h:749
size_t num_buckets
The target number of buckets for the histogram.
Definition: histogram.h:739
const char * column_name
A null-terminated C-style string with the name of the column to build the histogram for.
Definition: histogram.h:736
Definition: histogram.h:127
void * operator()(size_t s) const
Definition: histogram.cc:123
Definition: dbug.cc:193