Floating-point numbers sometimes cause confusion because they
are approximate. That is, they are not stored as exact values
inside computer architecture. What you can see on the screen
usually is not the exact value of the number. The
FLOAT and
DOUBLE data types are such, and
DECIMAL operations before MySQL
5.0.3 are approximate as well.
Prior to MySQL 5.0.3, DECIMAL
columns store values with exact precision because they are
represented as strings, but calculations on
DECIMAL values are done using
floating-point operations. As of 5.0.3, MySQL performs
DECIMAL operations with a
precision of 64 decimal digits, which should solve most common
inaccuracy problems when it comes to
DECIMAL columns. (If your
server is from MySQL 5.0.3 or higher, but you have
DECIMAL columns in tables that
were created before 5.0.3, the old behavior still applies to
those columns. To convert the tables to the newer
DECIMAL format, dump them with
mysqldump and reload them.)
The following example (for versions of MySQL older than 5.0.3)
demonstrates the problem. It shows that even for older
DECIMAL columns, calculations
that are done using floating-point operations are subject to
floating-point error. (Were you to replace the
DECIMAL columns with
FLOAT, similar problems would
occur for all versions of MySQL.)
mysql>CREATE TABLE t1 (i INT, d1 DECIMAL(9,2), d2 DECIMAL(9,2));mysql>INSERT INTO t1 VALUES (1, 101.40, 21.40), (1, -80.00, 0.00),->(2, 0.00, 0.00), (2, -13.20, 0.00), (2, 59.60, 46.40),->(2, 30.40, 30.40), (3, 37.00, 7.40), (3, -29.60, 0.00),->(4, 60.00, 15.40), (4, -10.60, 0.00), (4, -34.00, 0.00),->(5, 33.00, 0.00), (5, -25.80, 0.00), (5, 0.00, 7.20),->(6, 0.00, 0.00), (6, -51.40, 0.00);mysql>SELECT i, SUM(d1) AS a, SUM(d2) AS b->FROM t1 GROUP BY i HAVING a <> b;+------+--------+-------+ | i | a | b | +------+--------+-------+ | 1 | 21.40 | 21.40 | | 2 | 76.80 | 76.80 | | 3 | 7.40 | 7.40 | | 4 | 15.40 | 15.40 | | 5 | 7.20 | 7.20 | | 6 | -51.40 | 0.00 | +------+--------+-------+
The result is correct. Although the first five records look
like they should not satisfy the comparison (the values of
a and b do not appear to
be different), they may do so because the difference between
the numbers shows up around the tenth decimal or so, depending
on factors such as computer architecture or the compiler
version or optimization level. For example, different CPUs may
evaluate floating-point numbers differently.
As of MySQL 5.0.3, you will get only the last row in the above result.
The problem cannot be solved by using
ROUND() or similar functions,
because the result is still a floating-point number:
mysql>SELECT i, ROUND(SUM(d1), 2) AS a, ROUND(SUM(d2), 2) AS b->FROM t1 GROUP BY i HAVING a <> b;+------+--------+-------+ | i | a | b | +------+--------+-------+ | 1 | 21.40 | 21.40 | | 2 | 76.80 | 76.80 | | 3 | 7.40 | 7.40 | | 4 | 15.40 | 15.40 | | 5 | 7.20 | 7.20 | | 6 | -51.40 | 0.00 | +------+--------+-------+
This is what the numbers in column a look
like when displayed with more decimal places:
mysql>SELECT i, ROUND(SUM(d1), 2)*1.0000000000000000 AS a,->ROUND(SUM(d2), 2) AS b FROM t1 GROUP BY i HAVING a <> b;+------+----------------------+-------+ | i | a | b | +------+----------------------+-------+ | 1 | 21.3999999999999986 | 21.40 | | 2 | 76.7999999999999972 | 76.80 | | 3 | 7.4000000000000004 | 7.40 | | 4 | 15.4000000000000004 | 15.40 | | 5 | 7.2000000000000002 | 7.20 | | 6 | -51.3999999999999986 | 0.00 | +------+----------------------+-------+
Depending on your computer architecture, you may or may not see similar results. For example, on some machines you may get the “correct” results by multiplying both arguments by 1, as the following example shows.
Never use this method in your applications. It is not an example of a trustworthy method!
mysql>SELECT i, ROUND(SUM(d1), 2)*1 AS a, ROUND(SUM(d2), 2)*1 AS b->FROM t1 GROUP BY i HAVING a <> b;+------+--------+------+ | i | a | b | +------+--------+------+ | 6 | -51.40 | 0.00 | +------+--------+------+
The reason that the preceding example seems to work is that on the particular machine where the test was done, CPU floating-point arithmetic happens to round the numbers to the same value. However, there is no rule that any CPU should do so, so this method cannot be trusted.
The correct way to do floating-point number comparison is to first decide on an acceptable tolerance for differences between the numbers and then do the comparison against the tolerance value. For example, if we agree that floating-point numbers should be regarded the same if they are same within a precision of one in ten thousand (0.0001), the comparison should be written to find differences larger than the tolerance value:
mysql>SELECT i, SUM(d1) AS a, SUM(d2) AS b FROM t1->GROUP BY i HAVING ABS(a - b) > 0.0001;+------+--------+------+ | i | a | b | +------+--------+------+ | 6 | -51.40 | 0.00 | +------+--------+------+ 1 row in set (0.00 sec)
Conversely, to get rows where the numbers are the same, the test should find differences within the tolerance value:
mysql>SELECT i, SUM(d1) AS a, SUM(d2) AS b FROM t1->GROUP BY i HAVING ABS(a - b) <= 0.0001;+------+-------+-------+ | i | a | b | +------+-------+-------+ | 1 | 21.40 | 21.40 | | 2 | 76.80 | 76.80 | | 3 | 7.40 | 7.40 | | 4 | 15.40 | 15.40 | | 5 | 7.20 | 7.20 | +------+-------+-------+


User Comments
I have been working on a placename -> latitude longitude database.
I wished to remove duplicates where the same place name was found roughly nearby geographically.
I found that queries which compared float coordinates like this (where lat and lon are both indexed):
SELECT place FROM places WHERE lat < 49.3 AND lat > 49.1 AND lon < -2 AND lon > -2.2 AND country = 'JE'
were about 1 order of magnitude faster than this:
SELECT place FROM places WHERE ABS(lat - 49.2) > 0.1 AND ABS(-2.1 - lon) > 0.1 AND country = 'JE'
I presume this is because mysql was able to use the indices to speed up the search in the first case but not in the second, since the ABS() got in the way.
Add your own comment.