WL#16895: Refactor DATE handling in server

Affects: Server-9.x   —   Status: Complete

See WL#16669 for problems and proposed solution for temporal types.

DATE handling in server code is based on the MYSQL_TIME object.

This will be replaced by a new Date_val class that only takes 4 byte
of storage and bit-codes the various components of a time value for
simplicity and efficiency.

Types TIME, DATETIME and TIMESTAMP are not affected by this work.

Interfaces to Field and Item classes are also simplified.

Approved by SDMT 2025-06-25.
This is mostly refactoring. Apart from some increased consistency,
there are no functional requirements in this worklog.

NFR-1: Date handling should be at least as efficient as previous handling.
       Focus is on efficiency for most common operations: storage, retrieval,
       movement and comparison, and less on arithmetic operations.
       Performance will to some extent be validated in unit tests.
See Low Level Design for actual changes.

The following bugs may be fixed fully
or partially by this worklog:

Bug#38177766: ADDDATE with date in year zero
yields incorrect result
Bug#38179658: TIMEDIFF is not consistent with
documentation
Bug#38177821: Function DAYNAME used in arithmetic expression
behaves like WEEKDAY
Bug#38177844: TO_DAYS() result is inconsistent

Bug#38177993: FROM_DAYS() result with argument less than 366 is inconsistent

Bug#38181443: TIMEDIFF returns wrong result when first argument is DATETIME and
second argument is DATE
Bug#29616536: Cast of YEAR column returns wrong result

Bug#38179402: Arithmetic operations involving JSON and DEFAULT seems incorrect

Bug#38590185: Result mismatch with DATE and INT comparison

Microbenchmark
results, times per iteration (updated at time of push)

BM_val_date: Reduced
from 5 ns to 4 ns.

BM_store_date: Reduced from 46 ns to 20 ns.

BM_add_date_interval: Reduced from 86 ns to 56 ns.

BM_day_number: Increased
from 9 ns to 11 ns.
Class Date_val:
==============

The range of DATE values is from 0000-01-01 to 9999-12-31.
In addition, certain modes make it possible to store date values with
day 0 or month 0, and the value 0000-00-00 has special semantics.

One date value is stored as an unsigned 4 byte value (1 byte is unused).
The operational format is the same as the format supported by the storage
engine interface, thus mimimizing any conversion operations.

The date components are bit-coded as follows within 23 bits:
- year         14 bits ( 9-22)
- month         4 bits ( 5- 8)
- day           5 bits ( 0- 4)

This format is easy to convert to and from the storage engine interface.
It is also directly comparable as an integer value.
Finally, it is easy to extract date components directly from the format.

The DATE implementation follows a proleptic Gregorian calendar with year 0
as the lowest year and year 9999 as the highest year. This is in accordance
with the SQL standard, except the SQL calendar starts from year 1.
Note also that year 0 is not considered to be a leap year, even though
standard calculation would make it so. This has implications for e.g. the
TO_DAYS() and FROM_DAYS() function.

A date value will always have year values less than or equal to 9999, month
values less than or equal to 12 and day values less than or equal to 31.
The constructor will assert that these values are valid.
In addition, when using the make_date function, a date value can be
constructed with
- Zero date allowed
- Zero month or zero day allowed.
- Invalid dates allowed (ie. date values not following the calendar).

The interfaces to class Date_val are as follows:
- Constructor from component values (year, month, day).
- Constructor from MYSQL_TIME struct
- Constructor from day number
- uint32_t year()
- uint32_t month()
- uint32_t day()
- int compare(const Date_val arg)
  Compares two date values and returns -1, 0 or 1 if current value is
  less than, equal to or greater than the other.

- longlong for_comparison()
  Returns an integer value that can be used for efficient (integer) comparison.

- void set_zero()
  Set date as the zero date (0000-00-00).

- void is_zero_date()
  Returns true value is the zero date (0000-00-00)..

- bool operator==(const Date_val rhs)
  Operator for use in std:: functions.

- int check_date(my_time_flags_t flags)
  Check date based on flags for zero date, zero date component, invalid date

- uint32_t day_number()
  Return day number, where 0000-01-01 is day number 1.
  zero date, date with zero components and invalid dates returns ???

- static uint32_t last_day_number()
  Returns the day number of the last date in the supported range (9999-12-31).

- bool add(Interval &iv, bool subtract)
  Add or subtract an interval to/from the current date value.

- void set_last_day_of_month()
  Adjust date so that it represents the last day of the mont.
  Invalid if date has month zero.

- Date_val strip_time()
  Contruct a date value from MYSQL_TIME, and strip off time component.

- int32_t to_int()
  Convert date value to YYYYMMDD format and return it as an integer.

- double to_double()
  Convert date value to YYYYMMDD format and return it as float value.

- size_t to_string()
  Format date value as a string.

- load_date()
  Load date value from storage engine format (binary format).

- store_date()
  Store date value into storage engine format (binary format).

- operator MYSQL_TIME()
  Convert date value to MYSQL_TIME format.

- int make_date()
  Construct a date value, return non-zero if values are out of range,
  according to the supplied flags.

Changes to class Field
----------------------

- Function store_date() is added and takes a Date_val argument. 

- Function store_time() with MYSQL_TIME argument is now only used
  with datetime values.

Changes to class Item
---------------------

- Function int_sort_key() now calls the for_comparison() function for
  DATE values.

Changes to class Item_date_literal
----------------------------------

- This class now stores the const value in a Date_val object instead of
  MYSQL_TIME.

- The class is used instead of Item_datetime_with_ref when it is required
  to compare DATE values.

Changes to class Item_cache_date
--------------------------------

This is a new class that replaces class Item_cache_datetime for DATE values.

- It implements all the usual Item_cache interface functions.

- It stores a value in a private Date_val member.

- For simplicity, it does not implement a cached string value.
  It is also rarely needed, as DATE values are most commonly used with other
  DATE values.

- A function store_value() is added for storing a date value directly into
  the cache object, instead of evaluating a value from another Item.

Changes to class Item_param (dynamic parameter handling)
--------------------------------------------------------

Instead of storing a MYSQL_TIME value if a parameter is identified as temporal,
we now store either a date, a time or a datetime value. Two new members are added
to Item_param: m_date and m_time, and value.time is renamed as m_datetime.

Two new functions for processing of temporal parameters are added:
Item_param::set_time() and Item_param::set_date().

Two new value "state" enums are also added: DATETIME_VALUE and DATE_VALUE.

Changes to class Cached_item_date
---------------------------------

This is a new class that replaces class Cached_item_temporal for DATE values.

- It implements all the usual Cached_item interface functions.

- It stores a value in a private Date_val member.

Changes to class Arg_comparator
-------------------------------

- Function compare_date() has been implemented for comparison of DATE values.
  It replaces the use of compare_datetime() for such values.

Changes to histogram handling
-----------------------------

- DATE values are now stored using the Date_val struct.
  Templatized classes have been added as needed.

Changes to class Json_dom
-------------------------

- A new class Json_date for handling of DATE values has been added.
  Json_date stores its value in a Date_val object.
  It has Json_temporal as superclass.

Changes to JSON date handling
-----------------------------

- Json_date::from_packed(), Json_date::to_packed(),
  Json_date::from_packed_to_key(), Json_wrapper::get_date() and
  Json_wrapper::get_date_packed() are new functions used to manipulate
  DATE values in JSON functions.

- date_to_json_storage() has been renamed from TIME_to_longlong_date_packed()
  and moved from my_time.cc into json_dom.cc, since it is now only used
  to generate stored JSON date values.

- date_from_json_storage() has been renamed from
  TIME_from_longlong_date_packed() and moved from my_time.cc into json_dom.cc,
  similar to the above.

Changes to date handling flags
------------------------------

Date values are allowed according to my_time_flags_t passed to e.g.
the val_date() functions. Note that all these rules apply to dates as
well as data types with date components, ie. datetimes and timestamps.

There are three types of date values that may or may not be allowed:

1. Zero dates
2. Date with zero components (month and/or day)
3. Invalid dates, such as 2020-02-31.

The functions that process dates are divided into three categories:

1. Functions that do not restrict date values.
   These are e.g retrieval of dates from storage, and passing of date
   values to the client or between modules.

2. Functions that adhere to the current SQL mode.
   These are all storage functions, and functions that convert string
   and numeric data into dates. These functions also adhere to
   data type properties, e.g, the TIMESTAMP type does not allow dates
   with zero components or invalid dates at all.

3. Functions that process dates based on individual rules.
   These are e.g the TO_DAYS() function, because only valid dates can
   be uniquely mapped to a consecutive day number.

The flags that constitute my_time_flags_t are refactored into three
flags that indicate what values are invalid. If none of these flags are
set, all possible date values are allowed. The flags may be ORed
together, and they are:

  TIME_NO_ZERO_IN_DATE
  TIME_NO_ZERO_DATE
  TIME_NO_INVALID_DATES

Other flags of my_time_flags_t are not affected by this work.

Added new utility functions:

Field_temporal::temporal_flags()
 - function that converts SQL mode settings for temporal data types into
   TIME flags. Used in Field_temporal::date_flags().

Field_temporal::date_flags()
- used to validate temporal values before calling storage functions, and
  in internal conversion functions to temporal types.

Item_temporal_hybrid_func::eval_flags()
- class-specific evaluation flag for the hybrid temporal functions.
  Each specific class needs its own set of flags when interpreting
  temporal values from numbers and strings.

Changes to handling of IN predicates with temporal values.
----------------------------------------------------------

This was necessary because we no longer want to perform comparison of
temporal values after converting them to integer values, but rather
want to compare them as native data types, since this is more efficient
using the new compact formats.

Since the logic of Item_func_in::resolve_type() was quite complex
with regard to optimization when temporal values were in involved,
some significant refactoring was deemed necessary.

We now determine the temporal type for the operation by looking at all
arguments to IN: If the determined type is STRING and at least one
argument is temporal, the operation is carried out using one of the
temporal formats: DATE, TIME or DATETIME. DATE is used if all temporal
arguments are of type DATE. TIME is used if all temporal arguments are
of type TIME. Otherwise, the operation is carried out using DATETIME.

A loop that previously handled both ROW and STRING types was deemed
too complex and has been split in two. The latter of these determines
the temporal type to use for comparisonb, if applicable.

We also had to create new code to handle the case where the comparison
type is REAL_RESULT. If left-hand expression is temporal, carry out
the operation using this item's data type. If left-hand expression is
of type integer, and all const values can be converted to integer
without data loss, carry out the operation as an integer operation.

A new subclass of In_vector called In_vector_date has been added for
efficient comparison of IN operations on DATE values using binary search.

Existing class in_datetime has been renamed to In_vector_datetime.
It will now only handle expressions of type DATETIME and TIMESTAMP.

Existing class in_datetime_as_longlong has been made redundant and could
thus be removed.

Changes to general date handling functions
------------------------------------------

- All functions that return components of a DATE, such as YEAR(), MONTH()
  and DAY(), are first evaluated as dates, before the component is extracted.

- Class Item_func_month has changed base class from Item_func to Item_int_func,
  as suggested in code. This simplified the code.

- Class Item_func_weekday has changed base class from Item_func to
  Item_int_func, as suggested in code. This simplified the code.

- Class Item_func_dayname has changed base class from Item_func_weekday to
  Item_str_func, as suggested in code. This simplified the code and fixed a
  semantic bug when an object of this class was used in an arithmetic
  operation.

- Class Item_date_add_interval was renamed to Item_add_interval. The old name
  was misleading since the class could accomodate all temporal types.

- Class Item_func_str_to_date was renamed to Item_func_str_to_date. The old name
  was misleading since the class could accomodate all temporal types.

Changes to date handling in RAPID
---------------------------------

Handling in RAPID has been changed as little as possible.
Some functions that have been obsoleted in the server have been moved
into RAPID (rpd_mytime.cc).

- Date_val::load_date() is used to load a date value from storage.

- Literal DATE values in Item_temporal have been replaced with
  Item_date_literal.

UNIT testing of DATE functions
------------------------------

- Unit testing for class Field, class Item and histograms have been changed
  to use the new Date_val class.

- Microbenchmarks to measure efficiency of some important DATE functions
  will be added:
    BM_val_date          - evaluate a DATE value from a field.
    BM_add_date_interval - add intervals to a DATE value.

- New unit tests for class Date_val have been added to file temporal-t.cc.