SQL tutorial

SQL Group By

Learn more about SQL, a standard language for interacting with databases and storing, manipulating, and retrieving data from databases.

Go hands-on with SQL in our free interactive SQL tutorial.

While WHERE is one of the most useful and frequently used clauses in SQL as it can be applied to filter data, the SQL GROUP BY clause is the basis for performing aggregation, unlocking the power of data analytics in SQL.

The SQL GROUP BY clause modifies the behavior of a SELECT statement and aggregates data over the column or columns specified. This results in the data being returned in a summarized form, and having equal to or less than the number of rows from the source.

In a simple query, the GROUP BY clause follows the table name, with the column values to aggregate over afterward:

Code Example
SELECT <expression> FROM <table> GROUP BY <columns>;

In a more complex query with other clauses, GROUP BY follows the table name and WHERE, but precedes ORDER BY and LIMIT:

Code Example
SELECT <expression> FROM <table> WHERE <predicate> 
GROUP BY <columns>
ORDER BY <column> LIMIT <row count>;

The GROUP BY clause is almost always used with aggregate functions. Since the data are being returned on a different row-level than that of the source, these functions are needed to perform summary calculations on another column or columns that do not form the basis for aggregation.

One of the most commonly executed queries by an analyst is the “count star” query, which uses the COUNT function to answer the question – how many rows are in this table?

Let’s take a look at this type of query for the employees table:

Code Example
SELECT COUNT(*) FROM employees;

This returns the result which is the number of rows – 300,024. However, we still have not used the GROUP BY clause in the above. 

Let’s say instead, we wanted to answer a specific business question about the ~300K employees: for example, what is the number of employees by gender?

To answer this question and count the number of rows appearing for each value occurring in the gender column, we add the GROUP BY clause and the column name, and SQL will return the aggregated result:

Code Example
SELECT gender, COUNT(*) FROM employees GROUP BY gender;

We can see that there are less females in the company’s employ with ~120K vs. the nearly ~180K male employees. Though only two rows were returned here, we also see that the sum of the two counts adds up to the total number of rows in the table, as 179,973 + 120,051 = 300,024.

GROUP BY can also be used to perform aggregations over multiple columns, and calculations for all combinations of the distinct values appearing in each. In this case the column names following the GROUP BY clause are separated by commas. For example, we could also calculate the number of employees by gender and by last name:

Code Example
SELECT last_name, gender, COUNT(*) FROM employees GROUP BY last_name, gender;

Note that for queries using aggregation, column names that appear in the SELECT statement must also appear following the GROUP BY clause. Though the converse is not true, omitting the columns which are aggregated over from being displayed can make it difficult to ascertain which rows of the result correspond to which values of the aggregated columns.

As a general best practice for interpretability, when writing queries using the GROUP BY clause, all columns which appear after it should also after SELECT and vice-versa.

Learn SQL Today

Get hands-on experience writing code with interactive tutorials in our free online learning platform.

  • Free and fun
  • Designed for beginners
  • No downloads or setup required