MySQL 8.4.2
Source Code Documentation
histogram.h
Go to the documentation of this file.
1#ifndef HISTOGRAMS_HISTOGRAM_INCLUDED
2#define HISTOGRAMS_HISTOGRAM_INCLUDED
3
4/* Copyright (c) 2016, 2024, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27/**
28 @file sql/histograms/histogram.h
29 Histogram base class.
30
31 This file defines the base class for all histogram types. We keep the base
32 class itself non-templatized in order to more easily send a histogram as an
33 argument, collect multiple histograms in a single collection etc.
34
35 A histogram is stored as a JSON object. This gives the flexibility of storing
36 virtually an unlimited number of buckets, data values in its full length and
37 easily expanding with new histogram types in the future. They are stored
38 persistently in the system table mysql.column_stats.
39
40 We keep all histogram code in the namespace "histograms" in order to avoid
41 name conflicts etc.
42*/
43
44#include <cstddef> // size_t
45#include <functional>
46#include <map> // std::map
47#include <memory>
48#include <set> // std::set
49#include <string> // std::string
50#include <utility> // std::pair
51
52#include "lex_string.h" // LEX_CSTRING
53#include "my_base.h" // ha_rows
54#include "sql/field.h" // Field
56#include "sql/mem_root_allocator.h" // Mem_root_allocator
57#include "sql/stateless_allocator.h" // Stateless_allocator
58
59class Item;
60class Json_dom;
61class Json_object;
62class THD;
63struct TYPELIB;
64class Field;
65
66namespace dd {
67class Table;
68} // namespace dd
69namespace histograms {
70struct Histogram_comparator;
71template <class T>
72class Value_map;
73} // namespace histograms
74struct CHARSET_INFO;
75struct MEM_ROOT;
76class Table_ref;
77class Json_dom;
78
79namespace histograms {
80
81/// The default (and invalid) value for "m_null_values_fraction".
82static const double INVALID_NULL_VALUES_FRACTION = -1.0;
83
84enum class Message {
89 VIEW,
98
99 // JSON validation errors. See Error_context.
124};
125
127 void *operator()(size_t s) const;
128};
129
130template <class T>
132
133template <class T>
135
136template <typename T>
138 std::map<T, ha_rows, Histogram_comparator, value_map_allocator<T>>;
139
140using columns_set = std::set<std::string, std::less<std::string>,
142
143// Used as an array, so duplicate values are not checked.
144// TODO((tlchrist): Convert this std::map to an array.
146 std::map<std::string, Message, std::less<std::string>,
148
149/**
150 The different operators we can ask histogram statistics for selectivity
151 estimations.
152*/
153enum class enum_operator {
154 EQUALS_TO,
156 LESS_THAN,
157 IS_NULL,
162 BETWEEN,
164 IN_LIST,
166};
167
168/**
169 Error context to validate given JSON object which represents a histogram.
170
171 A validation error consists of two pieces of information:
172
173 1) error code - what kind of error it is
174 2) JSON path - where the error occurs
175
176 Errors are classified into a few conceptual categories, namely
177
178 1) absence of required attributes
179 2) unexpected JSON type of attributes
180 3) value encoding corruption
181 4) value out of domain
182 5) breaking bucket sequence semantics
183 6) breaking certain constraint between pieces of information
184
185 @see histograms::Message for the list of JSON validation errors.
186
187 Use of the Error_context class
188 ------------------------------
189
190 An Error_context object is passed along with other parameters to the
191 json_to_histogram() function that is used to create a histogram object (e.g.
192 Equi_height<longlong>) from a JSON string.
193
194 The json_to_histogram() function has two different use cases, with different
195 requirements for validation:
196
197 1) Deserializing a histogram that was retrieved from the dictionary. In this
198 case the histogram has already been validated, and the user is not
199 expecting validation feedback, so we pass along a default-constructed
200 "empty shell" Error_context object with no-op operations.
201
202 2) When validating the user-supplied JSON string to the UPDATE HISTOGRAM ...
203 USING DATA commmand. In this case we pass along an active Error_context
204 object that uses a Field object to validate bucket values, and stores
205 results in a results_map.
206
207 The binary() method is used to distinguish between these two contexts/cases.
208*/
210 public:
211 /// Default constructor. Used when deserializing binary JSON that has already
212 /// been validated, e.g. when retrieving a histogram from the dictionary, and
213 /// the Error_context object is not actively used for validation.
216
217 /**
218 Constructor. Used in the context of deserializing the user-supplied JSON
219 string to the UPDATE HISTOGRAM ... USING DATA command.
220
221 @param thd Thread context
222 @param field The field for values on which the histogram is built
223 @param results Where reported errors are stored
224 */
226 : m_thd(thd), m_field(field), m_results(results), m_binary(false) {}
227
228 /**
229 Report a global error to this context.
230
231 @param err_code The global error code
232 */
233 void report_global(Message err_code);
234
235 /**
236 Report to this context that a required attribute is missing.
237
238 @param name Name of the missing attribute
239 */
240 void report_missing_attribute(const std::string &name);
241
242 /**
243 Report to this context that an error occurs on the given dom node.
244
245 @param dom The given dom node
246 @param err_code The error code
247 */
248 void report_node(const Json_dom *dom, Message err_code);
249
250 /**
251 Check if the value is in the field definition domain.
252
253 @param v Pointer to the value.
254
255 @return true on error, false otherwise
256
257 @note Uses Field::store() on the field for which the user-defined histogram
258 is to be constructed in order to check the validity of the supplied value.
259 This will have the side effect of writing to the record buffer so this
260 should only be used with an active Error_context (with a non-nullptr field)
261 when we do not otherwise expect to use the record buffer. Currently the only
262 use case is to validate the JSON input to the command UPDATE HISTOGRAM ...
263 USING DATA where it should be OK to use the field for this purpose.
264 */
265 template <typename T>
266 bool check_value(T *v);
267
268 /**
269 Tell whether the input json is an internal persisted copy or
270 a user-defined input. If the input is an internal copy, there
271 should never be type/format errors. If it is a user-defined input,
272 errors may occur and should be handled, and some type casting may
273 be needed.
274
275 @return true for JSON, false otherwise
276 */
277 bool binary() const { return m_binary; }
278
279 /**
280 Return data-type of field in context if present. Used to enforce
281 that histogram datatype matches column datatype for user-defined
282 histograms.
283
284 @return datatype string if present, nullptr if not
285 */
286 Field *field() const { return m_field; }
287
288 private:
289 /// Thread context for error handlers
291 /// The field for checking endpoint values
293 /// Where reported errors are stored
295 /// Whether or not the JSON object to process is in binary format
297};
298
299/**
300 Histogram base class.
301
302 This is an abstract class containing the interface and shared code for
303 concrete histogram subclasses.
304
305 Histogram subclasses (Singleton, Equi_height) are constructed through factory
306 methods in order to catch memory allocation errors during construction.
307
308 The histogram subclasses have no public copy or move constructors. In order to
309 copy a histogram onto a given MEM_ROOT, use the public clone method. The clone
310 method ensures that members of the histogram, such String type buckets,
311 are also allocated on the given MEM_ROOT. Modifications to these methods need
312 to be careful that histogram buckets are cloned/copied correctly.
313*/
315 public:
316 /// All supported histogram types in MySQL.
318
319 /// String representation of the JSON field "histogram-type".
320 static constexpr const char *histogram_type_str() { return "histogram-type"; }
321
322 /// String representation of the JSON field "data-type".
323 static constexpr const char *data_type_str() { return "data-type"; }
324
325 /// String representation of the JSON field "collation-id".
326 static constexpr const char *collation_id_str() { return "collation-id"; }
327
328 /// String representation of the histogram type SINGLETON.
329 static constexpr const char *singleton_str() { return "singleton"; }
330
331 /// String representation of the histogram type EQUI-HEIGHT.
332 static constexpr const char *equi_height_str() { return "equi-height"; }
333
334 protected:
336
337 /// The fraction of NULL values in the histogram (between 0.0 and 1.0).
339
340 /// The character set for the data stored
342
343 /// The number of buckets originally specified
345
346 /// String representation of the JSON field "buckets".
347 static constexpr const char *buckets_str() { return "buckets"; }
348
349 /// String representation of the JSON field "last-updated".
350 static constexpr const char *last_updated_str() { return "last-updated"; }
351
352 /// String representation of the JSON field "null-values".
353 static constexpr const char *null_values_str() { return "null-values"; }
354
355 static constexpr const char *sampling_rate_str() { return "sampling-rate"; }
356
357 /// String representation of the JSON field "number-of-buckets-specified".
358 static constexpr const char *numer_of_buckets_specified_str() {
359 return "number-of-buckets-specified";
360 }
361
362 /// String representation of the JSON field "auto-update".
363 static constexpr const char *auto_update_str() { return "auto-update"; }
364
365 /**
366 Constructor.
367
368 @param mem_root the mem_root where the histogram contents will be allocated
369 @param db_name name of the database this histogram represents
370 @param tbl_name name of the table this histogram represents
371 @param col_name name of the column this histogram represents
372 @param type the histogram type (equi-height, singleton)
373 @param data_type the type of data that this histogram contains
374 @param[out] error is set to true if an error occurs
375 */
376 Histogram(MEM_ROOT *mem_root, const std::string &db_name,
377 const std::string &tbl_name, const std::string &col_name,
378 enum_histogram_type type, Value_map_type data_type, bool *error);
379
380 /**
381 Copy constructor
382
383 This will make a copy of the provided histogram onto the provided MEM_ROOT.
384
385 @param mem_root the mem_root where the histogram contents will be allocated
386 @param other the histogram to copy
387 @param[out] error is set to true if an error occurs
388 */
389 Histogram(MEM_ROOT *mem_root, const Histogram &other, bool *error);
390
391 /**
392 Write the data type of this histogram into a JSON object.
393
394 @param json_object the JSON object where we will write the histogram
395 data type
396
397 @return true on error, false otherwise
398 */
399 bool histogram_data_type_to_json(Json_object *json_object) const;
400
401 /**
402 Return the value that is contained in the JSON DOM object.
403
404 For most types, this function simply returns the contained value. For String
405 values, the value is allocated on this histograms MEM_ROOT before it is
406 returned. This allows the String value to survive the entire lifetime of the
407 histogram object.
408
409 @param json_dom the JSON DOM object to extract the value from
410 @param out the value from the JSON DOM object
411 @param context error context for validation
412
413 @return true on error, false otherwise
414 */
415 template <class T>
416 bool extract_json_dom_value(const Json_dom *json_dom, T *out,
417 Error_context *context);
418
419 /**
420 Populate the histogram with data from the provided JSON object. The base
421 class also provides an implementation that subclasses must call in order
422 to populate fields that are shared among all histogram types (character set,
423 null values fraction).
424
425 @param json_object the JSON object to read the histogram data from
426 @param context error context for validation
427
428 @return true on error, false otherwise
429 */
430 virtual bool json_to_histogram(const Json_object &json_object,
431 Error_context *context) = 0;
432
433 private:
434 /// The MEM_ROOT where the histogram contents will be allocated.
436
437 /// The type of this histogram.
439
440 /// The type of the data this histogram contains.
442
443 /// Name of the database this histogram represents.
445
446 /// Name of the table this histogram represents.
448
449 /// Name of the column this histogram represents.
451
452 /// True if the histogram was created with the AUTO UPDATE option, false if
453 /// MANUAL UPDATE.
455
456 /**
457 An internal function for getting a selectivity estimate prior to adustment.
458 @see get_selectivity() for details.
459 */
460 bool get_raw_selectivity(Item **items, size_t item_count, enum_operator op,
461 double *selectivity) const;
462
463 /**
464 An internal function for getting the selecitvity estimation.
465
466 This function will read/evaluate the value from the given Item, and pass
467 this value on to the correct selectivity estimation function based on the
468 data type of the histogram. For instance, if the data type of the histogram
469 is INT, we will call "val_int" on the Item to evaluate the value as an
470 integer and pass this value on to the next function.
471
472 @param item The Item to read/evaluate the value from.
473 @param op The operator we are estimating the selectivity for.
474 @param typelib In the case of ENUM or SET data type, this parameter holds
475 the type information. This is needed in order to map a
476 string representation of an ENUM/SET value into its correct
477 integer representation (ENUM/SET values are stored as
478 integer values in the histogram).
479 @param[out] selectivity The estimated selectivity, between 0.0 and 1.0
480 inclusive.
481
482 @return true on error (i.e the provided item was NULL), false on success.
483 */
484 bool get_selectivity_dispatcher(Item *item, const enum_operator op,
485 const TYPELIB *typelib,
486 double *selectivity) const;
487
488 /**
489 An internal function for getting the selecitvity estimation.
490
491 This function will cast the histogram to the correct class (using down_cast)
492 and pass the given value on to the correct selectivity estimation function
493 for that class.
494
495 @param value The value to estimate the selectivity for.
496
497 @return The estimated selectivity, between 0.0 and 1.0 inclusive.
498 */
499 template <class T>
500 double get_less_than_selectivity_dispatcher(const T &value) const;
501
502 /// @see get_less_than_selectivity_dispatcher
503 template <class T>
504 double get_greater_than_selectivity_dispatcher(const T &value) const;
505
506 /// @see get_less_than_selectivity_dispatcher
507 template <class T>
508 double get_equal_to_selectivity_dispatcher(const T &value) const;
509
510 /**
511 An internal function for applying the correct function for the given
512 operator.
513
514 @param op The operator to apply
515 @param value The value to find the selectivity for.
516
517 @return The estimated selectivity, between 0.0 and 1.0 inclusive.
518 */
519 template <class T>
520 double apply_operator(const enum_operator op, const T &value) const;
521
522 public:
523 Histogram() = delete;
524 Histogram(const Histogram &other) = delete;
525
526 /// Destructor.
527 virtual ~Histogram() = default;
528
529 /// @return the MEM_ROOT that this histogram uses for allocations
530 MEM_ROOT *get_mem_root() const { return m_mem_root; }
531
532 /**
533 @return name of the database this histogram represents
534 */
536
537 /**
538 @return name of the table this histogram represents
539 */
540 const LEX_CSTRING get_table_name() const { return m_table_name; }
541
542 /**
543 @return name of the column this histogram represents
544 */
545 const LEX_CSTRING get_column_name() const { return m_column_name; }
546
547 /**
548 @return type of this histogram
549 */
551
552 /**
553 @return the fraction of NULL values, in the range [0.0, 1.0]
554 */
555 double get_null_values_fraction() const;
556
557 /// @return the character set for the data this histogram contains
558 const CHARSET_INFO *get_character_set() const { return m_charset; }
559
560 /// @return the sampling rate used to generate this histogram
561 double get_sampling_rate() const { return m_sampling_rate; }
562
563 /**
564 Returns the histogram type as a readable string.
565
566 @return a readable string representation of the histogram type
567 */
568 virtual std::string histogram_type_to_str() const = 0;
569
570 /**
571 @return number of buckets in this histogram
572 */
573 virtual size_t get_num_buckets() const = 0;
574
575 /**
576 Get the estimated number of distinct non-NULL values.
577 @return number of distinct non-NULL values
578 */
579 virtual size_t get_num_distinct_values() const = 0;
580
581 /**
582 @return the data type that this histogram contains
583 */
585
586 /**
587 @return number of buckets originally specified by the user. This may be
588 higher than the actual number of buckets in the histogram.
589 */
591
592 /**
593 @return True if automatic updates are enabled for the histogram, false
594 otherwise.
595 */
596 bool get_auto_update() const { return m_auto_update; }
597
598 /**
599 Sets the auto update property for the histogram.
600 */
601 void set_auto_update(bool auto_update) { m_auto_update = auto_update; }
602
603 /**
604 Converts the histogram to a JSON object.
605
606 @param[in,out] json_object output where the histogram is to be stored. The
607 caller is responsible for allocating/deallocating the JSON
608 object
609
610 @return true on error, false otherwise
611 */
612 virtual bool histogram_to_json(Json_object *json_object) const = 0;
613
614 /**
615 Converts JSON object to a histogram.
616
617 @param mem_root MEM_ROOT where the histogram will be allocated
618 @param schema_name the schema name
619 @param table_name the table name
620 @param column_name the column name
621 @param json_object output where the histogram is stored
622 @param context error context for validation
623
624 @return nullptr on error. Otherwise a histogram allocated on the provided
625 MEM_ROOT.
626 */
628 const std::string &schema_name,
629 const std::string &table_name,
630 const std::string &column_name,
631 const Json_object &json_object,
632 Error_context *context);
633
634 /**
635 Make a clone of the current histogram
636
637 @param mem_root the MEM_ROOT on which the new histogram will be allocated.
638
639 @return a histogram allocated on the provided MEM_ROOT. Returns nullptr
640 on error.
641 */
642 virtual Histogram *clone(MEM_ROOT *mem_root) const = 0;
643
644 /**
645 Store this histogram to persistent storage (data dictionary). The MEM_ROOT
646 that the histogram is allocated on is transferred to the dictionary.
647
648 @param thd Thread handler.
649
650 @return false on success, true on error.
651 */
652 bool store_histogram(THD *thd) const;
653
654 /**
655 Get selectivity estimation.
656
657 This function will try and get the selectivity estimation for a predicate
658 on the form "COLUMN OPERATOR CONSTANT", for instance "SELECT * FROM t1
659 WHERE col1 > 23;".
660
661 This function will take care of several of things, for instance checking
662 that the value we are estimating the selectivity for is a constant value.
663
664 The order of the Items provided does not matter. For instance, of the
665 operator argument given is "EQUALS_TO", it does not matter if the constant
666 value is provided as the first or the second argument; this function will
667 take care of this.
668
669 @param items an array of items that contains both the field we
670 are estimating the selectivity for, as well as the
671 user-provided constant values.
672 @param item_count the number of Items in the Item array.
673 @param op the predicate operator
674 @param[out] selectivity the calculated selectivity if a usable histogram was
675 found
676
677 @retval true if an error occurred (the Item provided was not a constant
678 value or similar).
679 @return false if success
680 */
681 bool get_selectivity(Item **items, size_t item_count, enum_operator op,
682 double *selectivity) const;
683
684 /**
685 @return the fraction of non-null values in the histogram.
686 */
688 return 1.0 - get_null_values_fraction();
689 }
690};
691
692/** Return true if 'histogram' was built on an empty table.*/
693inline bool empty(const Histogram &histogram) {
694 return histogram.get_num_distinct_values() == 0 &&
695 histogram.get_null_values_fraction() == 0.0;
696}
697
698/**
699 Create a histogram from a value map.
700
701 This function will build a histogram from a value map. The histogram type
702 depends on both the size of the input data, as well as the number of buckets
703 specified. If the number of distinct values is less than or equal to the
704 number of buckets, a Singleton histogram will be created. Otherwise, an
705 equi-height histogram will be created.
706
707 The histogram will be allocated on the supplied mem_root, and it is the
708 callers responsibility to properly clean up when the histogram isn't needed
709 anymore.
710
711 @param mem_root the MEM_ROOT where the histogram contents will be
712 allocated
713 @param value_map a value map containing [value, frequency]
714 @param num_buckets the maximum number of buckets to create
715 @param db_name name of the database this histogram represents
716 @param tbl_name name of the table this histogram represents
717 @param col_name name of the column this histogram represents
718
719 @return a histogram, using at most "num_buckets" buckets. The histogram
720 type depends on the size of the input data, and the number of
721 buckets
722*/
723template <class T>
724Histogram *build_histogram(MEM_ROOT *mem_root, const Value_map<T> &value_map,
725 size_t num_buckets, const std::string &db_name,
726 const std::string &tbl_name,
727 const std::string &col_name);
728
729/**
730 A simple struct containing the settings for a histogram to be built.
731*/
733 /// A null-terminated C-style string with the name of the column to build the
734 /// histogram for.
735 const char *column_name;
736
737 /// The target number of buckets for the histogram.
738 size_t num_buckets = 100;
739
740 /// Holds the JSON specification of the histogram for the UPDATE HISTOGRAM ...
741 /// USING DATA command, otherwise empty.
742 LEX_STRING data = {nullptr, 0};
743
744 /// True if AUTO UPDATE, false for MANUAL UPDATE.
745 bool auto_update = false;
746
747 /// A pointer to the field, used internally by update_histograms().
748 Field *field = nullptr;
749};
750
751/**
752 Create or update histograms for a set of columns of a given table.
753
754 This function will try to create a histogram for each HistogramSetting object
755 passed to it. It operates in two stages:
756
757 In the first stage it will attempt to resolve every HistogramSetting in
758 settings, verifying that the specified column exists and supports histograms.
759 If a setting cannot be resolved an error message will be generated (see note
760 below for details on error reporting), but the function will continue
761 executing. The collection of settings is modified in-place so that only the
762 resolved settings remain when the function returns.
763
764 In the second stage, after the settings have been resolved, the function
765 attempts to build a histogram for each resolved column. If an error is
766 encountered during this stage, the function will immediately abort and return
767 true. In other words, if the function returns true, it will have made an
768 attempt to update the histograms as specified in the output collection of
769 settings, but it could have failed halfway.
770
771 If no error occurs during the second stage the function will return false, and
772 the histograms specified in the output collection of settings will succesfully
773 have been updated.
774
775 @param thd Thread handler.
776 @param table The table where we should look for the columns/data.
777 @param[in,out] settings The settings for the histograms to be built.
778 @param[in,out] results A map where the result of each operation is stored.
779
780 @return False on success, true if an error was encountered.
781*/
784 results_map &results);
785
786/**
787 Updates existing histograms on a table that were specified with the AUTO
788 UPDATE option. If any histograms were updated a new snapshot of the current
789 collection of histograms for the table is inserted on the TABLE_SHARE.
790
791 @note The caller must manually ensure that the table share is flushed or that
792 tables are evicted from the table cache to guarantee that new queries will use
793 the updated histograms. This can be done by calling tdc_remove_table() and
794 passing the TDC_RT_REMOVE_UNUSED or TDC_RT_MARK_FOR_REOPEN option,
795 respectively.
796
797 @param thd Thread handle.
798 @param table Table_ref for the table to update histograms on. The table should
799 already be opened.
800
801 @return False if all automatically updated histograms on the table
802 (potentially none) were updated without encountering an error. True otherwise.
803*/
805
806/**
807 Retrieve an updated snapshot of the histograms on a table directly from the
808 dictionary (in an inefficient manner, querying all columns) and inserts this
809 snapshot in the Table_histograms_collection on the TABLE_SHARE.
810
811 @param thd The current thread.
812 @param table The table to retrieve updated histograms for.
813
814 @note This function assumes that the table is opened and generally depends on
815 the surrounding context. It also locks/unlocks LOCK_OPEN.
816
817 @return False on success. Returns true if an error occurred in which case the
818 TABLE_SHARE will not have been updated.
819*/
821
822/**
823 Updates existing histograms on a table that were specified with the AUTO
824 UPDATE option. Updated histograms are made available to the optimizer.
825
826 This function wraps auto_update_table_histograms()) in an appropriate
827 transaction-context for the background thread.
828
829 @note This function temporarily disables the binary log as we are not
830 interested in replicating or recovering updates to histograms that take place
831 in the background.
832
833 @note This function supresses some errors in order to avoid spamming the error
834 log, but unexpected errors are written to the error log, following the same
835 pattern as the event scheduler.
836
837 @param thd Background thread handle.
838 @param db_name Name of the database holding the table.
839 @param table_name Name of the table to update histograms for.
840
841 @return False on success, true on error.
842*/
844 THD *thd, const std::string &db_name, const std::string &table_name);
845
846/**
847 Drop histograms for all columns in a given table.
848
849 @param thd Thread handler.
850 @param table The table where we should look for the columns.
851 @param original_table_def Original table definition.
852 @param results A map where the result of each operation is stored.
853
854 @note Assumes that caller owns exclusive metadata lock on the table,
855 so there is no need to lock individual statistics.
856
857 @return false on success, true on error.
858*/
860 const dd::Table &original_table_def,
861 results_map &results);
862
863/**
864 Drop histograms for a set of columns in a given table.
865
866 This function will try to drop the histogram statistics for all specified
867 columns. If one of the columns fail, it will continue to the next one and try.
868
869 @param thd Thread handler.
870 @param table The table where we should look for the columns.
871 @param columns Columns specified by the user.
872 @param results A map where the result of each operation is stored.
873
874 @note Assumes that the caller has the appropriate metadata locks on both the
875 table and column statistics. That can either be an exclusive metadata lock on
876 the table itself, or a shared metadata lock on the table combined with
877 exclusive locks on individual column statistics.
878
879 @return false on success, true on error.
880*/
881bool drop_histograms(THD *thd, Table_ref &table, const columns_set &columns,
882 results_map &results);
883
884/**
885 Rename histograms for all columns in a given table.
886
887 @param thd Thread handler.
888 @param old_schema_name The old schema name
889 @param old_table_name The old table name
890 @param new_schema_name The new schema name
891 @param new_table_name The new table name
892 @param results A map where the result of each operation is stored.
893
894 @return false on success, true on error.
895*/
896bool rename_histograms(THD *thd, const char *old_schema_name,
897 const char *old_table_name, const char *new_schema_name,
898 const char *new_table_name, results_map &results);
899
900bool find_histogram(THD *thd, const std::string &schema_name,
901 const std::string &table_name,
902 const std::string &column_name,
903 const Histogram **histogram);
904} // namespace histograms
905
906#endif
Kerberos Client Authentication nullptr
Definition: auth_kerberos_client_plugin.cc:251
Definition: field.h:575
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:936
JSON DOM abstract base class.
Definition: json_dom.h:173
Represents a JSON container value of type "object" (ECMA), type J_OBJECT here.
Definition: json_dom.h:369
Mem_root_allocator is a C++ STL memory allocator based on MEM_ROOT.
Definition: mem_root_allocator.h:68
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:426
Stateless_allocator is a C++ STL memory allocator skeleton based on Malloc_allocator,...
Definition: stateless_allocator.h:92
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
Definition: table.h:2864
Definition: table.h:47
Error context to validate given JSON object which represents a histogram.
Definition: histogram.h:209
Field * m_field
The field for checking endpoint values.
Definition: histogram.h:292
void report_node(const Json_dom *dom, Message err_code)
Report to this context that an error occurs on the given dom node.
Definition: histogram.cc:244
Error_context()
Default constructor.
Definition: histogram.h:214
Error_context(THD *thd, Field *field, results_map *results)
Constructor.
Definition: histogram.h:225
Field * field() const
Return data-type of field in context if present.
Definition: histogram.h:286
results_map * m_results
Where reported errors are stored.
Definition: histogram.h:294
bool m_binary
Whether or not the JSON object to process is in binary format.
Definition: histogram.h:296
THD * m_thd
Thread context for error handlers.
Definition: histogram.h:290
void report_missing_attribute(const std::string &name)
Report to this context that a required attribute is missing.
Definition: histogram.cc:232
bool check_value(T *v)
Check if the value is in the field definition domain.
Definition: histogram.cc:292
void report_global(Message err_code)
Report a global error to this context.
Definition: histogram.cc:220
bool binary() const
Tell whether the input json is an internal persisted copy or a user-defined input.
Definition: histogram.h:277
Histogram base class.
Definition: histogram.h:314
static constexpr const char * auto_update_str()
String representation of the JSON field "auto-update".
Definition: histogram.h:363
bool extract_json_dom_value(const Json_dom *json_dom, T *out, Error_context *context)
Return the value that is contained in the JSON DOM object.
virtual std::string histogram_type_to_str() const =0
Returns the histogram type as a readable string.
size_t m_num_buckets_specified
The number of buckets originally specified.
Definition: histogram.h:344
Value_map_type get_data_type() const
Definition: histogram.h:584
double get_equal_to_selectivity_dispatcher(const T &value) const
Definition: histogram.cc:2169
virtual size_t get_num_buckets() const =0
MEM_ROOT * m_mem_root
The MEM_ROOT where the histogram contents will be allocated.
Definition: histogram.h:435
static constexpr const char * data_type_str()
String representation of the JSON field "data-type".
Definition: histogram.h:323
double m_sampling_rate
Definition: histogram.h:335
static constexpr const char * collation_id_str()
String representation of the JSON field "collation-id".
Definition: histogram.h:326
double get_less_than_selectivity_dispatcher(const T &value) const
An internal function for getting the selecitvity estimation.
Definition: histogram.cc:2130
static constexpr const char * buckets_str()
String representation of the JSON field "buckets".
Definition: histogram.h:347
static constexpr const char * numer_of_buckets_specified_str()
String representation of the JSON field "number-of-buckets-specified".
Definition: histogram.h:358
virtual ~Histogram()=default
Destructor.
virtual size_t get_num_distinct_values() const =0
Get the estimated number of distinct non-NULL values.
double get_sampling_rate() const
Definition: histogram.h:561
const enum_histogram_type m_hist_type
The type of this histogram.
Definition: histogram.h:438
virtual bool histogram_to_json(Json_object *json_object) const =0
Converts the histogram to a JSON object.
Definition: histogram.cc:367
const CHARSET_INFO * m_charset
The character set for the data stored.
Definition: histogram.h:341
static constexpr const char * last_updated_str()
String representation of the JSON field "last-updated".
Definition: histogram.h:350
void set_auto_update(bool auto_update)
Sets the auto update property for the histogram.
Definition: histogram.h:601
LEX_CSTRING m_table_name
Name of the table this histogram represents.
Definition: histogram.h:447
Histogram(const Histogram &other)=delete
size_t get_num_buckets_specified() const
Definition: histogram.h:590
bool get_raw_selectivity(Item **items, size_t item_count, enum_operator op, double *selectivity) const
An internal function for getting a selectivity estimate prior to adustment.
Definition: histogram.cc:2403
static constexpr const char * equi_height_str()
String representation of the histogram type EQUI-HEIGHT.
Definition: histogram.h:332
double get_non_null_values_fraction() const
Definition: histogram.h:687
virtual Histogram * clone(MEM_ROOT *mem_root) const =0
Make a clone of the current histogram.
bool get_selectivity(Item **items, size_t item_count, enum_operator op, double *selectivity) const
Get selectivity estimation.
Definition: histogram.cc:2357
double m_null_values_fraction
The fraction of NULL values in the histogram (between 0.0 and 1.0).
Definition: histogram.h:338
bool m_auto_update
True if the histogram was created with the AUTO UPDATE option, false if MANUAL UPDATE.
Definition: histogram.h:454
const Value_map_type m_data_type
The type of the data this histogram contains.
Definition: histogram.h:441
const LEX_CSTRING get_database_name() const
Definition: histogram.h:535
LEX_CSTRING m_column_name
Name of the column this histogram represents.
Definition: histogram.h:450
bool get_selectivity_dispatcher(Item *item, const enum_operator op, const TYPELIB *typelib, double *selectivity) const
An internal function for getting the selecitvity estimation.
Definition: histogram.cc:2232
double apply_operator(const enum_operator op, const T &value) const
An internal function for applying the correct function for the given operator.
Definition: histogram.cc:2216
const CHARSET_INFO * get_character_set() const
Definition: histogram.h:558
bool get_auto_update() const
Definition: histogram.h:596
const LEX_CSTRING get_table_name() const
Definition: histogram.h:540
double get_null_values_fraction() const
Definition: histogram.cc:417
MEM_ROOT * get_mem_root() const
Definition: histogram.h:530
enum_histogram_type get_histogram_type() const
Definition: histogram.h:550
virtual bool json_to_histogram(const Json_object &json_object, Error_context *context)=0
Populate the histogram with data from the provided JSON object.
Definition: histogram.cc:629
LEX_CSTRING m_database_name
Name of the database this histogram represents.
Definition: histogram.h:444
bool store_histogram(THD *thd) const
Store this histogram to persistent storage (data dictionary).
Definition: histogram.cc:1946
static constexpr const char * histogram_type_str()
String representation of the JSON field "histogram-type".
Definition: histogram.h:320
bool histogram_data_type_to_json(Json_object *json_object) const
Write the data type of this histogram into a JSON object.
Definition: histogram.cc:740
static constexpr const char * singleton_str()
String representation of the histogram type SINGLETON.
Definition: histogram.h:329
static constexpr const char * sampling_rate_str()
Definition: histogram.h:355
double get_greater_than_selectivity_dispatcher(const T &value) const
Definition: histogram.cc:2149
const LEX_CSTRING get_column_name() const
Definition: histogram.h:545
enum_histogram_type
All supported histogram types in MySQL.
Definition: histogram.h:317
static constexpr const char * null_values_str()
String representation of the JSON field "null-values".
Definition: histogram.h:353
static MEM_ROOT mem_root
Definition: client_plugin.cc:114
This file includes constants used by all storage engines.
static PFS_engine_table_share_proxy table
Definition: pfs.cc:61
The version of the current data dictionary table definitions.
Definition: dictionary_client.h:43
Definition: column_statistics.h:34
std::set< std::string, std::less< std::string >, Histogram_key_allocator< std::string > > columns_set
Definition: histogram.h:141
bool drop_all_histograms(THD *thd, Table_ref &table, const dd::Table &table_definition, results_map &results)
Drop histograms for all columns in a given table.
Definition: histogram.cc:1900
std::map< std::string, Message, std::less< std::string >, Histogram_key_allocator< std::pair< const std::string, Message > > > results_map
Definition: histogram.h:147
Message
Definition: histogram.h:84
@ JSON_CUMULATIVE_FREQUENCY_NOT_ASCENDING
@ JSON_NUM_BUCKETS_MORE_THAN_SPECIFIED
bool update_share_histograms(THD *thd, Table_ref *table)
Retrieve an updated snapshot of the histograms on a table directly from the dictionary (in an ineffic...
Definition: histogram.cc:1485
bool auto_update_table_histograms_from_background_thread(THD *thd, const std::string &db_name, const std::string &table_name)
Updates existing histograms on a table that were specified with the AUTO UPDATE option.
Definition: histogram.cc:1750
bool drop_histograms(THD *thd, Table_ref &table, const columns_set &columns, results_map &results)
Drop histograms for a set of columns in a given table.
Definition: histogram.cc:1910
enum_operator
The different operators we can ask histogram statistics for selectivity estimations.
Definition: histogram.h:153
bool rename_histograms(THD *thd, const char *old_schema_name, const char *old_table_name, const char *new_schema_name, const char *new_table_name, results_map &results)
Rename histograms for all columns in a given table.
Definition: histogram.cc:2067
bool auto_update_table_histograms(THD *thd, Table_ref *table)
Updates existing histograms on a table that were specified with the AUTO UPDATE option.
Definition: histogram.cc:1628
Histogram * build_histogram(MEM_ROOT *mem_root, const Value_map< T > &value_map, size_t num_buckets, const std::string &db_name, const std::string &tbl_name, const std::string &col_name)
Create a histogram from a value map.
Definition: histogram.cc:427
bool find_histogram(THD *thd, const std::string &schema_name, const std::string &table_name, const std::string &column_name, const Histogram **histogram)
Definition: histogram.cc:2106
bool update_histograms(THD *thd, Table_ref *table, Mem_root_array< HistogramSetting > *settings, results_map &results)
Create or update histograms for a set of columns of a given table.
Definition: histogram.cc:1406
static const double INVALID_NULL_VALUES_FRACTION
The default (and invalid) value for "m_null_values_fraction".
Definition: histogram.h:82
std::map< T, ha_rows, Histogram_comparator, value_map_allocator< T > > value_map_type
Definition: histogram.h:138
Value_map_type
Datatypes that a Value_map and histogram can hold (including the invalid type).
Definition: value_map_type.h:33
bool empty(const Histogram &histogram)
Return true if 'histogram' was built on an empty table.
Definition: histogram.h:693
const char * table_name
Definition: rules_table_service.cc:56
const char * db_name
Definition: rules_table_service.cc:55
required string type
Definition: replication_group_member_actions.proto:34
case opt name
Definition: sslopt-case.h:29
Definition: m_ctype.h:423
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:83
Definition: mysql_lex_string.h:40
Definition: mysql_lex_string.h:35
Definition: typelib.h:35
A simple struct containing the settings for a histogram to be built.
Definition: histogram.h:732
bool auto_update
True if AUTO UPDATE, false for MANUAL UPDATE.
Definition: histogram.h:745
LEX_STRING data
Holds the JSON specification of the histogram for the UPDATE HISTOGRAM ... USING DATA command,...
Definition: histogram.h:742
Field * field
A pointer to the field, used internally by update_histograms().
Definition: histogram.h:748
size_t num_buckets
The target number of buckets for the histogram.
Definition: histogram.h:738
const char * column_name
A null-terminated C-style string with the name of the column to build the histogram for.
Definition: histogram.h:735
Definition: histogram.h:126
void * operator()(size_t s) const
Definition: histogram.cc:123
Definition: dbug.cc:189