**Table 13.25 Aggregate ( GROUP BY)
Functions**

Name | Description |
---|---|

`AVG()` | Return the average value of the argument |

`BIT_AND()` | Return bitwise AND |

`BIT_OR()` | Return bitwise OR |

`BIT_XOR()` | Return bitwise XOR |

`COUNT()` | Return a count of the number of rows returned |

`COUNT(DISTINCT)` | Return the count of a number of different values |

`GROUP_CONCAT()` | Return a concatenated string |

`MAX()` | Return the maximum value |

`MIN()` | Return the minimum value |

`STD()` | Return the population standard deviation |

`STDDEV()` | Return the population standard deviation |

`STDDEV_POP()` | Return the population standard deviation |

`STDDEV_SAMP()` | Return the sample standard deviation |

`SUM()` | Return the sum |

`VAR_POP()` | Return the population standard variance |

`VAR_SAMP()` | Return the sample variance |

`VARIANCE()` | Return the population standard variance |

This section describes group (aggregate) functions that operate
on sets of values. Unless otherwise stated, group functions
ignore `NULL`

values.

If you use a group function in a statement containing no
`GROUP BY`

clause, it is equivalent to grouping
on all rows. For more information, see
Section 13.19.3, “MySQL Handling of GROUP BY”.

For numeric arguments, the variance and standard deviation
functions return a `DOUBLE`

value.
The `SUM()`

and
`AVG()`

functions return a
`DECIMAL`

value for exact-value
arguments (integer or `DECIMAL`

),
and a `DOUBLE`

value for
approximate-value arguments
(`FLOAT`

or
`DOUBLE`

).

The `SUM()`

and
`AVG()`

aggregate functions do not
work with temporal values. (They convert the values to numbers,
losing everything after the first nonnumeric character.) To work
around this problem, convert to numeric units, perform the
aggregate operation, and convert back to a temporal value.
Examples:

SELECT SEC_TO_TIME(SUM(TIME_TO_SEC())) FROM`time_col`

; SELECT FROM_DAYS(SUM(TO_DAYS(`tbl_name`

))) FROM`date_col`

;`tbl_name`

Functions such as `SUM()`

or
`AVG()`

that expect a numeric
argument cast the argument to a number if necessary. For
`SET`

or
`ENUM`

values, the cast operation
causes the underlying numeric value to be used.

The `BIT_AND()`

,
`BIT_OR()`

, and
`BIT_XOR()`

aggregate functions
perform bit operations. They require
`BIGINT`

(64-bit integer) arguments
and return `BIGINT`

values.
Arguments of other types are converted to
`BIGINT`

and truncation might
occur. For information about a change in MySQL 8.0 that permits
bit operations to take binary string type arguments
(`BINARY`

,
`VARBINARY`

, and the
`BLOB`

types), see
Section 13.12, “Bit Functions and Operators”.

Returns the average value of

. The`expr`

`DISTINCT`

option can be used to return the average of the distinct values of.`expr`

If there are no matching rows,

`AVG()`

returns`NULL`

.mysql>

->`SELECT student_name, AVG(test_score)`

->`FROM student`

`GROUP BY student_name;`

Returns the bitwise

`AND`

of all bits in. The calculation is performed with 64-bit (`expr`

`BIGINT`

) precision.If there are no matching rows,

`BIT_AND()`

returns a neutral value (all bits set to 1).Returns the bitwise

`OR`

of all bits in. The calculation is performed with 64-bit (`expr`

`BIGINT`

) precision.If there are no matching rows,

`BIT_OR()`

returns a neutral value (all bits set to 0).Returns the bitwise

`XOR`

of all bits in. The calculation is performed with 64-bit (`expr`

`BIGINT`

) precision.If there are no matching rows,

`BIT_XOR()`

returns a neutral value (all bits set to 0).Returns a count of the number of non-

`NULL`

values ofin the rows retrieved by a`expr`

`SELECT`

statement. The result is a`BIGINT`

value.If there are no matching rows,

`COUNT()`

returns`0`

.mysql>

->`SELECT student.student_name,COUNT(*)`

->`FROM student,course`

->`WHERE student.student_id=course.student_id`

`GROUP BY student_name;`

`COUNT(*)`

is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain`NULL`

values.For

`MyISAM`

tables,`COUNT(*)`

is optimized to return very quickly if the`SELECT`

retrieves from one table, no other columns are retrieved, and there is no`WHERE`

clause. For example:mysql>

`SELECT COUNT(*) FROM student;`

This optimization only applies to

`MyISAM`

tables, because an exact row count is stored for this storage engine and can be accessed very quickly.`COUNT(1)`

is only subject to the same optimization if the first column is defined as`NOT NULL`

.For transactional storage engines such as

`InnoDB`

, storing an exact row count is problematic because multiple transactions may be occurring, each of which may affect the count. For more information about how`InnoDB`

handles`COUNT(*)`

operations, see Section 15.8.8, “Limits on InnoDB Tables”.`COUNT(DISTINCT`

,[`expr`

...])`expr`

Returns a count of the number of rows with different non-

`NULL`

values.`expr`

If there are no matching rows,

`COUNT(DISTINCT)`

returns`0`

.mysql>

`SELECT COUNT(DISTINCT results) FROM student;`

In MySQL, you can obtain the number of distinct expression combinations that do not contain

`NULL`

by giving a list of expressions. In standard SQL, you would have to do a concatenation of all expressions inside`COUNT(DISTINCT ...)`

.This function returns a string result with the concatenated non-

`NULL`

values from a group. It returns`NULL`

if there are no non-`NULL`

values. The full syntax is as follows:GROUP_CONCAT([DISTINCT]

[,`expr`

...] [ORDER BY {`expr`

|`unsigned_integer`

|`col_name`

} [ASC | DESC] [,`expr`

...]] [SEPARATOR`col_name`

])`str_val`

mysql>

->`SELECT student_name,`

->`GROUP_CONCAT(test_score)`

->`FROM student`

`GROUP BY student_name;`

Or:

mysql>

->`SELECT student_name,`

->`GROUP_CONCAT(DISTINCT test_score`

->`ORDER BY test_score DESC SEPARATOR ' ')`

->`FROM student`

`GROUP BY student_name;`

In MySQL, you can get the concatenated values of expression combinations. To eliminate duplicate values, use the

`DISTINCT`

clause. To sort values in the result, use the`ORDER BY`

clause. To sort in reverse order, add the`DESC`

(descending) keyword to the name of the column you are sorting by in the`ORDER BY`

clause. The default is ascending order; this may be specified explicitly using the`ASC`

keyword. The default separator between values in a group is comma (`,`

). To specify a separator explicitly, use`SEPARATOR`

followed by the string literal value that should be inserted between group values. To eliminate the separator altogether, specify`SEPARATOR ''`

.The result is truncated to the maximum length that is given by the

`group_concat_max_len`

system variable, which has a default value of 1024. The value can be set higher, although the effective maximum length of the return value is constrained by the value of`max_allowed_packet`

. The syntax to change the value of`group_concat_max_len`

at runtime is as follows, whereis an unsigned integer:`val`

SET [GLOBAL | SESSION] group_concat_max_len =

;`val`

The return value is a nonbinary or binary string, depending on whether the arguments are nonbinary or binary strings. The result type is

`TEXT`

or`BLOB`

unless`group_concat_max_len`

is less than or equal to 512, in which case the result type is`VARCHAR`

or`VARBINARY`

.See also

`CONCAT()`

and`CONCAT_WS()`

: Section 13.5, “String Functions”.Returns the maximum value of

.`expr`

`MAX()`

may take a string argument; in such cases, it returns the maximum string value. See Section 9.3.1, “How MySQL Uses Indexes”. The`DISTINCT`

keyword can be used to find the maximum of the distinct values of, however, this produces the same result as omitting`expr`

`DISTINCT`

.If there are no matching rows,

`MAX()`

returns`NULL`

.mysql>

->`SELECT student_name, MIN(test_score), MAX(test_score)`

->`FROM student`

`GROUP BY student_name;`

For

`MAX()`

, MySQL currently compares`ENUM`

and`SET`

columns by their string value rather than by the string's relative position in the set. This differs from how`ORDER BY`

compares them. This is expected to be rectified in a future MySQL release.Returns the minimum value of

.`expr`

`MIN()`

may take a string argument; in such cases, it returns the minimum string value. See Section 9.3.1, “How MySQL Uses Indexes”. The`DISTINCT`

keyword can be used to find the minimum of the distinct values of, however, this produces the same result as omitting`expr`

`DISTINCT`

.If there are no matching rows,

`MIN()`

returns`NULL`

.mysql>

->`SELECT student_name, MIN(test_score), MAX(test_score)`

->`FROM student`

`GROUP BY student_name;`

For

`MIN()`

, MySQL currently compares`ENUM`

and`SET`

columns by their string value rather than by the string's relative position in the set. This differs from how`ORDER BY`

compares them. This is expected to be rectified in a future MySQL release.Returns the population standard deviation of

. This is an extension to standard SQL. The standard SQL function`expr`

`STDDEV_POP()`

can be used instead.If there are no matching rows,

`STD()`

returns`NULL`

.Returns the population standard deviation of

. This function is provided for compatibility with Oracle. The standard SQL function`expr`

`STDDEV_POP()`

can be used instead.If there are no matching rows,

`STDDEV()`

returns`NULL`

.Returns the population standard deviation of

(the square root of`expr`

`VAR_POP()`

). You can also use`STD()`

or`STDDEV()`

, which are equivalent but not standard SQL.If there are no matching rows,

`STDDEV_POP()`

returns`NULL`

.Returns the sample standard deviation of

(the square root of`expr`

`VAR_SAMP()`

.If there are no matching rows,

`STDDEV_SAMP()`

returns`NULL`

.Returns the sum of

. If the return set has no rows,`expr`

`SUM()`

returns`NULL`

. The`DISTINCT`

keyword can be used to sum only the distinct values of.`expr`

If there are no matching rows,

`SUM()`

returns`NULL`

.Returns the population standard variance of

. It considers rows as the whole population, not as a sample, so it has the number of rows as the denominator. You can also use`expr`

`VARIANCE()`

, which is equivalent but is not standard SQL.If there are no matching rows,

`VAR_POP()`

returns`NULL`

.Returns the sample variance of

. That is, the denominator is the number of rows minus one.`expr`

If there are no matching rows,

`VAR_SAMP()`

returns`NULL`

.Returns the population standard variance of

. This is an extension to standard SQL. The standard SQL function`expr`

`VAR_POP()`

can be used instead.If there are no matching rows,

`VARIANCE()`

returns`NULL`

.

SELECT x.val from data x, data y

GROUP BY x.val

HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)

assumes (COUNT(*)+1)/2 returns an int, if not

then use INT((COUNT(*)+1)/2)

There are variations for the varied defs of

median too.

See Rozenshtein's book on Optimizing T-SQL

Advanced Programming Techniques. Search Amazon.

Learn how to use encoded characteristic functions

(delta[]), then you can do all sorts of fun stuff

quickly - for instance you could pivot a table

into summary columns of totals where each column

counted different things in one table pass.

I have utilized this technique with great success

on large tables: a multi million row table using

the existing summary code took minutes while this

SQL trimmed that down to less than 3 seconds!

Once you get used to Delta functions, the code is

very readable too!

------------------------------------------------

Well, I don't know toomuch about this but it works

for me, and fast enough.

SELECT columnname, COUNT(columnname)

FROM

tablename GROUP BY columnname HAVING COUNT

(columnname)>1

Bye community!

field called time, use something like: "select topic,

max(time) as m from nuke_stories group by topic

order by m desc limit 5;"

listed in a comment above does not work for all

lists of values (especially when the median value

is not unique in the list, or when the median

should be the average of the two middle-most

values in an even-sized list). Here is a

three-step process that has worked for me:

CREATE TEMPORARY TABLE medians SELECT x.val FROM

data x, data y GROUP BY x.val HAVING

((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val))))) <=

floor((COUNT(*) +1)/2)) and

((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val))))) <=

floor((COUNT(*) +1)/2));

The resulting medians table will either have 1 or

2 rows. If the size of your original data is an

odd number or the two middle-most data values are

the same there will

only be 1 row. If the size of

your original data is an even number and the

middle data values are not equal, there will be 2

rows. Take the average to get the true median:

SELECT AVG(val) AS median FROM medians;

The last step is to drop your temporary table

medians:

DROP TABLE medians;

I couldn't think of another way to compute the

MEDIAN without using a temporary table. If you

know how to optimize my technique, I would love to

hear from you.

HAVING COUNT(DISTINCT col1)>=5 correctly because it returns 'Unknown column col1 ...', however

HAVING COUNT(col1)>=5 is ok.

Fortunately one can get the intended behaviour with

SELECT ... , COUNT(DISTINCT col1) AS cnt

...

GROUP BY ...

HAVING cnt>=5

SELECT COUNT(*) as total FROM table WHERE poll=1231 and question=2

SELECT COUNT(*) as result FROM table WHERE poll=1231 and question=2 and answer = 1

and then you get the percentage getting (result/total)*100

I looked over and over all the MySql documentation, and finally I tried this and works (thanks to Tim Ferrer):

SELECT (COUNT(DISTINCT T2.id)/COUNT(DISTINCT T1.id)*100) as percentage FROM table T1, table T2 WHERE T1.poll=1231 and T1.question=2 and T2.poll=1231 and T2.question=2 and T2.answer=1;

That works for me.

For example, in a table that has the fields "tile_id", "name", "orientation" and "imagename":

1) I want the fields "tile_id", "name" and "imagename" to be returned.

2) I want to group by the "name".

3) I want the rows with an orientation of "up" to show up.

4) I only want rows in which there is more than one "orientation" for a given "name".

You can do this with the following query:

SELECT t1.tile_id, t1.name, t1.imagename, count(*) as thecount

FROM `tiles` AS t1, tiles AS t2

WHERE t1.orientation != t2.orientation AND t1.orientation='up' AND t1.name = t2.name

GROUP BY t1.name

Because it's joining where the two tables' orientations are not equal AND where the names ARE equal, that automatically finds results in which there is more than one.

The SAME QUERY could be written as follows:

SELECT t1.tile_id, t1.name, t1.imagename, count(*) as thecount FROM `tiles` AS t1, tiles AS t2 WHERE t1.orientation='up' AND t1.name = t2.name GROUP BY t1.name HAVING thecount > 1

If you aren't picky and don't mind resuts with only one orientation, of course, you can simply remove the HAVING clause from the example above.

http://www.kartmann.org/freeware/MyGroupConcat/Readme.htm

It only works for Windows though. :(

Here is a far more efficient method of finding medians (uses temporary tables, but some things you can't do in one sql statement). For my particular application, I needed to find medians for each group of rows. Here is a simple example (uses a temp table for the original data, so you can copy-paste it into a mysql term without needing to cleanup anything):

====================================

# setup example data

CREATE TEMPORARY TABLE data (group_id INT, value FLOAT);

INSERT INTO data VALUES (1,1),(1,5),(1,6),(2,1),(2,3),(2,4),(2,20);

# get all data we need to search

CREATE TEMPORARY TABLE temp_median (id INT AUTO_INCREMENT PRIMARY KEY) SELECT group_id, value FROM data ORDER BY group_id, value;

# find the row-id of all median values

CREATE TEMPORARY TABLE temp_median_ids SELECT ROUND(AVG(id)) AS id FROM temp_median GROUP BY group_id;

# select all median values

SELECT group_id, value FROM temp_median_ids LEFT JOIN temp_median USING (id);

===================

If you want to see both median values:

# setup example data

CREATE TEMPORARY TABLE data (group_id INT, value FLOAT);

INSERT INTO data VALUES (1,1),(1,5),(1,6),(2,1),(2,3),(2,4),(2,20);

# get all data we need to search

CREATE TEMPORARY TABLE temp_median (id INT AUTO_INCREMENT PRIMARY KEY) SELECT group_id, value FROM data ORDER BY group_id, value;

# find the row-id of all median values

CREATE TEMPORARY TABLE temp_median_ids SELECT FLOOR(AVG(id)) AS id_1, CEILING(AVG(id)) AS id_2 FROM temp_median GROUP BY group_id;

# select all median values

SELECT group_id, value FROM temp_median_ids LEFT JOIN temp_median ON (id=id_1 OR id=id_2);

instead of:

CREATE FUNCTION group_concat RETURNS STRING SONAME "MyGroupConcat.dll";

do:

CREATE AGGREGATE FUNCTION group_concat RETURNS STRING SONAME "MyGroupConcat.dll";

If you simply take the MySQL_UDF.cpp, comment out:

#include "stdafx.h"

And then remove the '__declspec(dllexport)' from the function prototypes, and also remove the 'extern "C" __declspec(dllexport)' from all the function declarations, well, you've got a compilable module for your mysql server for unix :-)

All the rest of that crap in the zip file musta been added by MSVC++ or whatever.

Once you've edited the file, just pop it into the sql/ folder in your source distribution, rename it to MyGroupConcat.cc, and then edit the make file and copy any section having to do with udf_example.cc and adjust it to MyGroupConcat.cc.

Then, just type "make MyGroupConcat.so" and follow the instructions to install it into the server :-)

http://mysql-udf.sourceforge.net/

SELECT COUNT(*) FROM mytable WHERE something="this";

SELECT COUNT(*) FROM mytable;

If you use an IF() function to evaluate the condition, and return NULL if it's false, you can collect both totals in a single query:

SELECT COUNT(IF(something="this",1, NULL)) AS somethingcount, COUNT(*) AS total FROM mytable;

Note that this isn't going to be optimised. If the column you're evaluating is indexed, it would be faster to use two queries. But if it's not indexed, SQL would have to evaluate each row individually anyway.

You can take the idea a stage further by using a WHERE clause on the query too. For instance, a discussion forum may want to count the number of items for a given topic ID, and the number of those items that have your userid against them:

SELECT COUNT(IF(userid="steve",1,NULL)) AS mypostcount, COUNT(*) AS total FROM forumitem WHERE topicid=123;

If this were done as two queries, SQL would have to evaluate topicid=123 twice. But in the query shown above, it only needs to evaluate it once.

Another way to count rows that match a criteria is to use SUM() instead of COUNT(). If you do this, you need to change the IF() function to return 0 instead of NULL on false:

SELECT SUM(IF(userid="steve",1,0)) AS mypostcount, COUNT(*) AS total FROM forumitem WHERE topicid=123;

If the column is already indexed do we need to add order by to either the main sql statement or the group_concat function?

What is the overhead?

Example:

Table `bookstores_sales` with fields `id`, `store_id`, `book_id`

So, each sold book is stored in the table (as book_id) along with the store id (store_id).

Now, we want to know how many sales each store generated for each book ordered by the biggest sales first!

SELECT store_id, book_id, COUNT(id) AS sales

FROM bookstores_sales

GROUP BY store_id, book_id

ORDER BY sales DESC

Which will show something like:

store_id | book_id | sales

1 | 2 | 210

3 | 5 | 190

1 | 5 | 185

DELETE FROM some_table WHERE primaryKey NOT IN

(SELECT MIN(primaryKey) FROM some_table GROUP BY some_column)

Of course, you can use MAX(primaryKey) as well if you want to keep the newest record with the duplicate value instead of the oldest record with the duplicate value.

To understand how this works, look at the output of this query:

SELECT some_column, MIN(primaryKey) FROM some_table GROUP BY some_column

As you can see, this query returns the primary key for the first record containing each value of some_column. Logically, then, any key value NOT found in this result set must be a duplicate, and therefore it should be deleted.

USE this :

SELECT SUM(prod.id)/COUNT(app.customid) AS sum_prod

FROM appointments app, products prod

WHERE prod.customid = app.customid

GROUP BY app.customid

HAVING COUNT(app.customid) > 0

AND SUM(prod.id) > 0

SUM(distinct expr) should sum unique values of a group.

Since version 4 do not allow it, what you can do is divide a regular SUM(expr) by all count(distinct expr2) of other tables in the join which initially casued a cartezian result set.

for example, the following query

select a.a, sum(distinct b.b), sum(distinct c.c)

from a join b on a.id = b.aid

join c on a.id = c.id

where a.id = 1

can be implemented as :

select a.a, sum(b.b)/count(distinct c.id), sum(c.c)/count(distinct b.id)

from a join b on a.id = b.aid

join c on a.id = c.id

where a.id = 1

It does not do a costly self-join.

Adjust the type of 'value' as needed.

Note that it finds both medians if there are an even number of items.

CREATE TEMPORARY TABLE tmp (

n INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,

value VARCHAR(99) NOT NULL );

INSERT INTO tmp (value)

SELECT value FROM tbl ORDER BY 1;

SELECT @count := COUNT(*) FROM tmp;

SELECT DISTINCT value FROM tmp

WHERE n IN (FLOOR((@count+1)/2), CEIL((@count+1)/2));

DROP TEMPORARY TABLE tmp;

For example, given the table "stocks":

5 rows in set (0.00 sec)

You will get the following:

mysql> select count(*) from stocks;

1 row in set (0.00 sec)

However, if you include all of the column names explicitly in count() you will get the following:

mysql> select count(symbol or price or quantity) from stocks;

1 row in set (0.00 sec)

Consequently, I think it may be valuable to avoid thinking of the wildcard * as meaning "look for matches in all columns" and rather look at it as having a special meaning, i.e., "return the total number of rows regardless of whether the content is NULL or not," which of course is what the manual says in less words (and less explicitly) above. This might be confusing for some users since * in a SELECT statement does actually mean "return all columns." In general, the wildcard * seems to have different meanings in different contexts.

Also, I have tried this with both MyISAM and InnoDB tables, so I know that this is not an artifact of the MyISAM practice of storing the row count as an explicit value in the table.

CREATE TABLE 'orders' (

OrderID int,

CustomerID int,

OrderType CHAR(1),

PRIMARY KEY OrderID

)

You want to find, for each customer, the type of the first order (the order with the minimal ID).

What we would like to do is:

SELECT CustomerID, FIRST(OrderType ORDER BY OrderID) GROUP BY CustomerID

However, as far as I know (correct me if I'm wrong), there is no "FIRST" aggregate function in MySql.

So we can do:

SELECT CustomerID, SUBSTR(GROUP_CONCAT(OrderType ORDER BY OrderID) FROM 1 FOR 1) GROUP BY CustomerID

I will be happy to know if there is a better way to do it...

I want to join two tables, but sum only over one of the tables, but I can't find a way to do it.

A simple example (using MySQL 5.0.22):

create temporary table t1 (ID int, value int);

insert into t1 values (1,3),(2,3);

create temporary table t2 (ID int, value int);

insert into t2 values (1,4),(2,5);

select sum(value) from t1;

-> 6

select sum(distinct concat(t1.value,',',t1.ID)) from t1 JOIN t2;

-> 3

How can I get the answer 6 after joining the two tables?

I cannot do the sum in a subquery before joining because I want to use a more complicated query with GROUP BY WITH ROLLUP.

SELECT

group_concat(distinct `object_plan`.`object_plan_id` separator ',') AS `object_plan_id`,

group_concat(distinct `object_plan`.`name` separator ',') AS `object_plan_name`

FROM object_plan;

returns object_plan_id as BLOB and object_plan_name as VARCHAR ...

So be carefull when using GROUP_CONCAT, especially with ColdFusion and its list processing functions ... It can lead to "ByteArray Object cannot be converted to String" errors since BLOBS are hadled by CF as binary objects ...

TABLE posts

post_id

head_id

post_text

author_name

whenAdded

try this:

select count(head_id), max(concat(whenAdded,","author_name)) as lastAuthor, min(concat(whenAdded,",",author_name) as firstAuthor from posts group by post_id

Splitting the date from the author_name is relatively trivial

Saul Dobney

SELECT AVG(val) FROM (

SELECT x.val FROM

data x, data y GROUP BY x.val HAVING

((SUM(SIGN(1-SIGN(y.val-x.val))))>=floor((COUNT(*)+1)/2)) and

((SUM(SIGN(1+SIGN(y.val-x.val))))>=floor((COUNT(*)+1)/2)));

bid INT :autoinc PK record ID

current_fat INT

points INT : if user lowers fat awarded a point

entry_date DATE

uid INT : User ID

For a company wide report we needed the starting body fat, the current body fat and the sum of the points for the month group by user.

SELECT `uid`, bid,sum(miles) as miles, `current_fat`, mid(max(concat(bid,":",current_fat)),instr(max(concat(bid,":",current_fat)),":")+1) as max_uid, mid(min(concat(bid,":",current_fat)),instr(min(concat(bid,":",current_fat)),":")+1) as min_uid

FROM `bodyfat`

WHERE `uid` IN (2,47,48,52,54,57,58,59,60,65,66,67,68,69,70,71,72,73,88,90,96,108,117,123,164,177)

AND date_format(`entry_date`,'%m/%Y') = '05/2007'

GROUP BY uid;

The key was finding the post showing concat. Thanks

SET @R:=0;select val from (select val,(@R:=@R+1) r from table order by val asc) t where r=((select count(*)+1 from table) DIV 2);

SELECT GROUP_CONCAT(CAST(myInt as CHAR)) myInts FROM aTable;

select month,

if(job="1" or job="3" ,"Sum of job 1 and 3",

if(job="2" or job="6" ,"Sum of job 2 and 6",

if(job="4" or job="7" ,"Sum of job 4 and 7",

if(job="5" or job="9" ,"Sum of job 5 and 9",

if(job="8" ,"Sum of job 8"," "))))) as name_of_job,

sum(if((pl_porezi.vrstaporeza="Porez") ,saldo ,0)) as Porez ,

"15%" as Percent,

sum(if((pl_porezi.vrstaporeza="Prirez") ,saldo ,0)) as Prirez

from pl_porezi

left outer join kadarp on (pl_porezi.sifra=kadarp.sifra)

where pl_porezi.month="11"

group by month,

(job="1" or job="3"),

(job="2" or job="6"),

(job="4" or job="7"),

(job="5" or job="9"),

(job="8" )

SELECT AVG(fld) FROM (

SELECT x.fld FROM

tbl AS x, tbl AS y GROUP BY x.fld HAVING

((SUM(SIGN(1-SIGN(y.fld-x.fld))))>=floor((COUNT(*)+1)/2)) and

((SUM(SIGN(1+SIGN(y.fld-x.fld))))>=floor((COUNT(*)+1)/2))

But there is another way that is easy to understand and implement:

1) Create table ordered by the grouping variable and the frequency.

2) Add counter to it. That is enough information to allow the median per group to be calculated.

3) We filter to retain the middle record(s)

4) Then get the AVG of that per group (if one record, no harm done but returns correct result if a draw).

e.g. with City as the grouping variable and value as the variable we want the median of

Counter City Value

11 New York 12

12 New York 26

13 New York 111

14 New York 123

The middle record(s) for the New York group are 11 + FLOOR((max-min)/2) AND 11 + CEIL((max-min)/2)

i.e. 11 + FLOOR((14-11)/2) AND 11 + CEIL((14-11)/2)

i.e. 11 + FLOOR((1.5) AND 11 + CEIL(1.5)

i.e. 11 + 1 AND 11 + 2

i.e. 12 and 13, which is correct

And it works if there is an odd number of records too:

e.g. if there was a fifth record

15 New York 176

11 + FLOOR((15-11)/2) AND 11 + CEIL((15-11)/2)

11 + 2, 11 + 2 which is record 13, which is correct

Implementation:

1) Make table but NB to sort by grouping variable, and by value we want median of. For this example the id we have added is intID.

2) Add counter:

Run "SET @cnt := 0;"

Run "UPDATE your_table_name_here

SET intID = @cnt := @cnt + 1"

3) Run a nested query

Text of Inner Query:

"SELECT grouping_var_name_here,

value_we_want_median_of,

MIN(intID) + FLOOR((MAX(intID) - MIN(intID))/2) As tag1,

MIN(intID) + CEIL((MAX(intID) - MIN(intID))/2) As tag2

FROM your_table_name_here

GROUP BY grouping_var_name_here"

Text of Outer Query:

"SELECT your_table_name_here.grouping_var_name_here,

FORMAT(AVG(value_we_want_median_of),1) AS median_of_values

FROM (text_of_inner_query_goes_here) AS qryid_tags INNER JOIN

your_table_name_here

USING(grouping_var_name_here)

WHERE intID IN(tag1, tag2)

GROUP BY grouping_var_name_here

ORDER BY grouping_var_name_here"

The best thing about this method is you can check it step by step and see why it should work.

-- Median defined in http://en.wikipedia.org/wiki/Median

--

-- by Peter Hlavac

-- 06.11.2008

--

-- Example Table:

DROP table if exists table_median;

CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));

COMMIT;

INSERT INTO table_median (id, val) VALUES

(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),

(2, 4),

(3, 5), (3, 2),

(4, 5), (4, 12), (4, 1), (4, 7);

-- Calculating the MEDIAN

SELECT @a := 0;

SELECT

id,

AVG(val) AS MEDIAN

FROM (

SELECT

id,

val

FROM (

SELECT

-- Create an index n for every id

@a := (@a + 1) mod o.c AS shifted_n,

IF(@a mod o.c=0, o.c, @a) AS n,

o.id,

o.val,

-- the number of elements for every id

o.c

FROM (

SELECT

t_o.id,

val,

c

FROM

table_median t_o INNER JOIN

(SELECT

id,

COUNT(1) AS c

FROM

table_median

GROUP BY

id

) t2

ON (t2.id = t_o.id)

ORDER BY

t_o.id,val

) o

) a

WHERE

IF(

-- if there is an even number of elements

-- take the lower and the upper median

-- and use AVG(lower,upper)

c MOD 2 = 0,

n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements

-- take the first if its only one element

-- or take the one in the middle

IF(

c = 1,

n = 1,

n = c DIV 2 + 1

)

)

) a

GROUP BY

id;

-- Explanation:

-- The Statement creates a helper table like

--

-- n id val count

-- ----------------

-- 1, 1, 1, 7

-- 2, 1, 3, 7

-- 3, 1, 4, 7

-- 4, 1, 5, 7

-- 5, 1, 6, 7

-- 6, 1, 7, 7

-- 7, 1, 8, 7

--

-- 1, 2, 4, 1

-- 1, 3, 2, 2

-- 2, 3, 5, 2

--

-- 1, 4, 1, 4

-- 2, 4, 5, 4

-- 3, 4, 7, 4

-- 4, 4, 12, 4

-- from there we can select the n-th element on the position: count div 2 + 1

Exemplary usage when we have three items per row would be:

SELECT *,

(COALESCE(V.rank_0, 0)

+ COALESCE(V.rank_1, 0)

+ COALESCE(V.rank_2, 0))

/

(3 -

(COALESCE(V.rank_0 - V.rank_0, 1)

+ COALESCE(V.rank_1 - V.rank_1, 1)

+ COALESCE(V.rank_2 - V.rank_2, 1))

) AS row_avg FROM voting V

I talk about it in more detail on my blog post at:

http://tech-blog.borychowski.com/index.php/2009/02/mysql/average-value-in-a-row/

On our Linux system, we were able to change the setting easily by altering the file located at:

/etc/my.cnf

And adding this line under the [mysqld] group heading:

group_concat_max_len=99999

"99999" is the new number of characters that will be returned.

Also, a restart of mysql is required.

select a.rownum, a.id, a.value

FROM

(SELECT @rownum:=@rownum+1 AS rownum,id, value

FROM (SELECT @rownum:=0) r, DATA

order by value) a,

(select round(count(*)/2) medio FROM DATA) b

WHERE a.rownum= b.medio;

If you need find the median by some agrupation:

select b.group_id,a.rownum, a.id, a.value

FROM

(SELECT if(@group=group_id,@rownum:=@rownum+1,@rownum:=0) AS rownum,

@group:=group_id as group,id, value

FROM DATA

order by group_id, valor ) a,

(select group_id,round(count(*)/2) medio

FROM DATA

group by group_id) b

WHERE a.rownum= b.medio and a.group=b.group_id;

If you want more information about this in spanish: http://senderodigital.wordpress.com

select avg(a.rownum), avg(a.value)

FROM (SELECT @rownum:=@rownum+1 AS rownum,value

FROM (SELECT @rownum:=0) r, DATA

order by value

) a,

(select count(*)/2 median FROM DATA) b

WHERE a.rownum between (b.median - 0.5) and (b.median +0.5)

select count(*)/2 median

to

select 0.5+count(*)/2 median

so that an odd number of rows yields an integer rownum.

// get number of rows that fits our filtering need (bigger then a certain date for example)

select count(*) as counter

from table

where date_field > 1/1/2000

// now save the row count in count variable lets say $rows_count in php

// now get the median row (the one the is in the middle of the list, filtered and ordered by our desired field

select date_field as median_date

from table

where date_field > 1/1/2000

order by date_field

limit '.$rows_count/2.',1

$row['median_date'] will contain the median date :)

enjoy.

@myvar:=0;

@rownum:=0;

SELECT result.grouping_field_name, avg(median_for_field) as median

FROM (SELECT middle_rows.grouping_field_name, numerated_rows.rownum, numerated_rows.median_for_field

FROM (SELECT if(@myvar = grouping_field_name, @rownum := @rownum + 1, @rownum := 0) AS rownum,

@myvar := grouping_field_name AS grouping_field_name_alias, median_for_field

FROM data ORDER BY grouping_field_name, median_for_field

) numerated_rows,

(SELECT grouping_field_name, count(*)/2 median

FROM data GROUP BY grouping_field_name

) middle_rows

WHERE numerated_rows.rownum BETWEEN (middle_rows.median - if( median = round(median) , 1, 0 ) - 0.5) AND (middle_rows.median - if( median = round(median) , 0, 0.5 )) AND

numerated_rows.grouping_field_name_alias = middle_rows.grouping_field_name

) result GROUP BY grouping_field_name

delimiter //

CREATE PROCEDURE calcMedian(IN tbl CHAR(64), IN col CHAR(64))

BEGIN

DECLARE counter INT;

SET @counter:= 0;

SET @s = CONCAT('Select avg(x) as median from (Select x from (Select (Select count(',col,') from ',tbl,') as al,@counter:=(@counter+1) as c,',col,' x from ',tbl,' order by x) t1 where t1.c between al/2 and al/2+1) t2');

PREPARE stmt FROM @s;

EXECUTE stmt;

END

//

delimiter ;

Here the test:

CREATE TABLE test (x int(11));

insert into test values (1),(100),(101),(105),(107),(100000);

call calcMedian('test','x');

insert into test values (900000);

call calcMedian('test','x');

SELECT COUNT( IF(x=1, 1, NULL) ) AS c;

SELECT SUM( IF(x=1, 1, 0) ) AS c;

SELECT COUNT(x=1 OR NULL) AS c;

The above 3 statements are equivalent. Though the last is not immediately obvious how its working. The condition x=1 is evaluated as TRUE or FALSE. If it is TRUE, then the lazy evaluation of the OR means that it exits and runs COUNT(TRUE). When x=1 is FALSE, you get FALSE OR NULL, which evaluates to NULL, so you get COUNT(NULL). This means you only COUNT the lines where x=1.

It is also worth noting that in all cases above you can have any condition e.g. x>1, x IN (1,2,3), (x=1 AND y=2)

i.e. COUNT( (x=1 AND y=2) OR NULL ) as c

Variance() and Var_pop() return the Variance. The variance of a random variable or distribution is the mean of the squared deviation of that variable from its mean.

Coefficient of Variation must be calcuated by STDDEV_POP(x)/Avg(x), as far as I know there is not a built in function.

http://en.wikipedia.org/wiki/Coefficient_of_variation

http://en.wikipedia.org/wiki/Variance

FIRST Aggregate Function MySQL equivalent is MIN

MIN returns the lowest number for numeric fields

MIN returns the first line in the GROUP BY Clause base on Sort Order for non numeric fields

http://www.websitefactors.co.uk/mysql/2011/07/mysql-limit-the-values-in-the-group_concat-function/

--------------------------------------------

SELECT

B.Year_Enter,

B.Month_Enter,

B.MonthName_Enter,

@IT_Open := @IT_Open + B.IT AS IT_Open,

@CS_Open := @CS_Open + B.CS AS CS_Open

FROM

(SELECT

A.Year_Enter,

A.Month_Enter,

A.MonthName_Enter,

SUM(IF(A.Month_Enter <> A.Month_Close, A.IT, 0)) AS IT,

SUM(IF(A.Month_Enter <> A.Month_Close, A.CS, 0)) AS CS

FROM

(SELECT

YEAR(I.Date_Entered) AS Year_Enter,

MONTH(I.Date_Entered) AS Month_Enter,

MONTHNAME(I.Date_Entered) AS MonthName_Enter,

IFNULL(MONTH(I.Closed_Date),0) AS Month_Close,

IFNULL(MONTHNAME(I.Closed_Date),'') AS MonthName_Close,

SUM(IF(U.Department = 'IT',1,0)) AS IT,

SUM(IF(U.Department = 'CS',1,0)) AS CS

FROM

Incident I,

User U

WHERE

I.Initiated_By = U.userLogin

GROUP BY

YEAR(I.Date_Entered),

MONTH(I.Date_Entered),

MONTHNAME(I.Date_Entered),

MONTH(I.Closed_Date),

MONTHNAME(I.Closed_Date)

ORDER BY

YEAR(I.Date_Entered),

MONTH(I.Date_Entered),

MONTH(I.Closed_Date)) A

GROUP BY

A.Year_Enter,

A.Month_Enter) B,

(SELECT

@IT_Open := 0,

@CS_Open := 0

FROM DUAL) C

First it runs a query to count the (non-null) values and saves the count in variable @N_VALS.

Then it runs a query containing a subquery:

- Inner query (named SQ1): from the values, selects the median value (if the number of values is odd) or the two middle values. During the selection, stores the value(s) selected in variables @MID1 and @MID2, leaving @MID1 null or @MID1<=@MID2. To accomplish this the values are ordered using an index or sort. The full scan required by some approaches, after the ordering, to assign a sequence number to every row is avoided. By using the LIMIT keyword, we can proceed directly to (at worst) a scan over half the values (also required where sequence numbers are assigned) to fetch the median value or the two middle values. We use LIMIT with OFFSET to identify the one or two values we need by position in the ordered list of values.

- Outer query: Uses @N_VALS, @MID1 and @MID2 to calculate median.

SET @TABLE =

/* Your table (or schema.table) > */ 'the_table_specification';

SET @COLUMN =

/* Your column > */ 'the_column_name';

#

SET @MEDIAN_QUERY1 = CONCAT('SELECT COUNT(', @COLUMN,') FROM ', @TABLE,' INTO @N_VALS');

PREPARE MEDIAN_PREPARED1 FROM @MEDIAN_QUERY1;

EXECUTE MEDIAN_PREPARED1;

DEALLOCATE PREPARE MEDIAN_PREPARED1;

SET @ODD = @N_VALS%2;

SET @MEDIAN_QUERY2 = CONCAT('SELECT CAST(IF(@ODD, @MID2, (@MID1+@MID2)/2) AS CHAR) AS `MEDIAN(', @COLUMN, ')` FROM (SELECT @MID1:=@MID2, @MID2:=', @COLUMN, ' FROM ',

@TABLE, ' WHERE ', @COLUMN, ' IS NOT NULL ORDER BY ', @COLUMN, ' LIMIT ', 2-@ODD, ' OFFSET ', @N_VALS DIV 2 - 1*(@N_VALS>'0') + @ODD,') SQ1 LIMIT 1');

PREPARE MEDIAN_PREPARED2 FROM @MEDIAN_QUERY2;

EXECUTE MEDIAN_PREPARED2;

DEALLOCATE PREPARE MEDIAN_PREPARED2;

http://mahmudahsan.wordpress.com/2008/08/27/mysql-the-group_concat-function/

http://www.qubed.ro

SELECT GROUP_CONCAT(`t`.`id`) FROM

(SELECT `id` FROM `table` LIMIT 5) `t`

Great function but 1) Cannot run in stored function so must be procedure. 2) You forgot to ad " INTO @SomeOutputVariable" in the final line I believe...

Here is a corrected version...

-- ------------------------------------------------------------------------------------

-- Dynamic SQL not permitted in stored functions, thus using a stored procedure instead.

-- Median procedure => MEDIAN(Using Column,From Table,InOut Result)

--

-- Use by...

-- CALL MEDIAN('YourColumn','YourTable',@res);

-- SELECT @res AS MEDIAN;

-- ------------------------------------------------------------------------------------

DELIMITER $$

DROP PROCEDURE IF EXISTS `MEDIAN`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `MEDIAN`(UsingCol varchar(64),FromTbl varchar(64),INOUT res decimal(50,5))

BEGIN

SET @ires = 0;

SET @TABLE = FromTbl;

/* Your table (or schema.table) > 'the_table_specification';*/

SET @COLUMN = UsingCol;

/* Your column > 'the_column_name'; */

#

SET @MEDIAN_QUERY1 = CONCAT('SELECT COUNT(', @COLUMN,') FROM ', @TABLE,' INTO @N_VALS');

PREPARE MEDIAN_PREPARED1 FROM @MEDIAN_QUERY1;

EXECUTE MEDIAN_PREPARED1;

DEALLOCATE PREPARE MEDIAN_PREPARED1;

SET @ODD = @N_VALS%2;

SET @MEDIAN_QUERY2 = CONCAT('SELECT CAST(IF(@ODD, @MID2, (@MID1+@MID2)/2) AS CHAR) AS `MEDIAN(', @COLUMN, ')` FROM (SELECT @MID1:=@MID2, @MID2:=', @COLUMN, ' FROM ',

@TABLE, ' WHERE ', @COLUMN, ' IS NOT NULL ORDER BY ', @COLUMN, ' LIMIT ', 2-@ODD, ' OFFSET ', @N_VALS DIV 2 - 1*(@N_VALS>'0') + @ODD,') SQ1 LIMIT 1 INTO @ires');

PREPARE MEDIAN_PREPARED2 FROM @MEDIAN_QUERY2;

EXECUTE MEDIAN_PREPARED2;

DEALLOCATE PREPARE MEDIAN_PREPARED2;

SET res = @ires;

END;

$$

DELIMITER ;

--- Now for the rest...

Just a few samples without needing to learn C / C++ or get a separate compiler outside MySQL

-- ------------------------------------------

-- Faculty function FACULTY(x) => x!

-- input Max value = 65, min value = 0

-- ------------------------------------------

DELIMITER $$

DROP FUNCTION IF EXISTS `FACULTY`$$

CREATE DEFINER=`root`@`localhost` FUNCTION `FACULTY`(x tinyint(3) UNSIGNED) RETURNS bigint(20) UNSIGNED

BEGIN

IF x = 1 OR x = 0 THEN

SET @res = 1;

ELSE

SET @res = 1;

SET @i = x;

WHILE @i >= 1 DO

SET @res = (@res * @i);

SET @i = @i - 1;

END WHILE;

END IF;

RETURN @res;

END;

$$

DELIMITER ;

-- ------------------------------------------

-- Combination function COMBINATIONS(x,y) => x!/y!/(x-y)!

-- input Max value = 65, min value = 0

-- ------------------------------------------

DELIMITER $$

DROP FUNCTION IF EXISTS `COMBINATIONS`$$

CREATE DEFINER=`root`@`localhost` FUNCTION `COMBINATIONS`(x tinyint(3) UNSIGNED,y tinyint(3) UNSIGNED) RETURNS bigint(20) UNSIGNED

BEGIN

IF (x < y) THEN

SET @res = 0;

ELSEIF (x = y) OR (x = 0) OR (y = 0) THEN

SET @res = 1;

ELSE

SET @varX = FACULTY(x);

SET @varY = FACULTY(y);

SET @varZ = FACULTY(x-y);

SET @res = (@varX / @varY / @varZ);

END IF;

RETURN @res;

END;

$$

DELIMITER ;