In Lesson 8, we learned how to use aggregate functions to summarize entire datasets. But what if you want to break down your data into categories and get totals for each group? What if you want to find only the categories that meet certain conditions?
This is where GROUP BY
and HAVING
become essential tools for advanced data analysis.
GROUP BY
takes your data and organizes it into groups based on one or more columns that have the same values. Then you can apply aggregate functions to each group separately.
Think of it like sorting your data into separate piles, then calculating totals for each pile.
Without GROUP BY:
With GROUP BY:
Sample Sales Data:
order_id | customer_id | product_category | order_amount | order_date
---------|-------------|------------------|--------------|------------
1 | 101 | Electronics | 1200.00 | 2024-01-15
2 | 102 | Sports | 89.99 | 2024-01-16
3 | 103 | Electronics | 2500.00 | 2024-01-17
4 | 101 | Books | 45.00 | 2024-01-18
5 | 104 | Sports | 150.00 | 2024-01-19
SELECT column_name, AGG_FUNC(column_name)
FROM table_name
GROUP BY column_name;
Important rule: The column in SELECT
must either be:
Example:
-- Revenue by product category
SELECT product_category, SUM(order_amount) AS total_revenue
FROM sales
GROUP BY product_category;
Result:
product_category | total_revenue
-----------------|---------------
Electronics | 3700.00
Sports | 239.99
Books | 45.00
You can combine GROUP BY
with JOINs to group data from related tables:
SELECT c.category_name, SUM(s.order_amount) AS total_sales
FROM categories c
JOIN sales s ON c.category_id = s.category_id
GROUP BY c.category_name;
This allows you to group by descriptive names from one table while aggregating values from another.
Sometimes you want to filter your groups based on the calculated aggregate values. This is where HAVING
comes in.
WHERE vs HAVING:
HAVING Syntax:
SELECT column_name, AGG_FUNC(column_name)
FROM table_name
WHERE condition
GROUP BY column_name
HAVING aggregate_condition;
Example:
-- Categories with total sales over $1,000
SELECT product_category, SUM(order_amount) AS total_revenue
FROM sales
GROUP BY product_category
HAVING SUM(order_amount) > 1000;
Result:
product_category | total_revenue
-----------------|---------------
Electronics | 3700.00
💡 Key Point: HAVING
filters the grouped results, while WHERE
filters individual rows before grouping occurs.
Understanding the order helps you know when to use WHERE
vs HAVING
:
-- How many orders per category?
SELECT product_category, COUNT(*) AS order_count
FROM sales
GROUP BY product_category;
Result:
product_category | order_count
-----------------|------------
Electronics | 2
Sports | 2
Books | 1
-- Total revenue by category
SELECT product_category, SUM(order_amount) AS total_revenue
FROM sales
GROUP BY product_category;
-- Average order value by category
SELECT product_category, AVG(order_amount) AS avg_order_value
FROM sales
GROUP BY product_category;
-- For recent orders, show average amount by category,
-- but only for categories with 2+ orders
SELECT product_category, AVG(order_amount) AS avg_amount
FROM sales
WHERE order_date > '2024-01-01'
GROUP BY product_category
HAVING COUNT(*) >= 2;
Result:
product_category | avg_amount
-----------------|------------
Electronics | 1850.00
Sports | 119.995
Sales Analysis:
-- Monthly sales summary
SELECT MONTH(order_date) AS month,
COUNT(*) AS total_orders,
SUM(order_amount) AS monthly_revenue
FROM sales
GROUP BY MONTH(order_date)
ORDER BY month;
Customer Analysis:
-- Customer spending patterns
SELECT customer_id,
COUNT(*) AS total_orders,
SUM(order_amount) AS total_spent,
AVG(order_amount) AS avg_order_value
FROM sales
GROUP BY customer_id
HAVING COUNT(*) > 1;
Product Performance:
-- Top performing categories
SELECT product_category,
SUM(order_amount) AS total_revenue,
COUNT(*) AS order_count
FROM sales
GROUP BY product_category
ORDER BY total_revenue DESC;
Use GROUP BY when you need:
Use HAVING when you need:
Good example:
SELECT product_category,
COUNT(*) AS total_orders,
SUM(order_amount) AS total_revenue,
AVG(order_amount) AS avg_order_value
FROM sales
WHERE order_date >= '2024-01-01'
GROUP BY product_category
HAVING COUNT(*) >= 5
ORDER BY total_revenue DESC;
💡 Did You Know? The GROUP BY clause is one of the most powerful features in SQL! It transforms SQL from a simple data retrieval tool into a sophisticated analysis platform. Many business intelligence and reporting tools rely heavily on GROUP BY queries to generate insights. When you use GROUP BY, SQL essentially creates temporary "sub-tables" for each group and applies your aggregate functions to each sub-table separately.
In this lesson's exercises, you'll work with both Sales and Products tables to practice grouping data and filtering groups. You'll learn to answer complex business questions like "Which product categories generate the most revenue?" and "Which suppliers have the highest average sales volume?"
These scenarios will show you how GROUP BY and HAVING transform simple aggregate functions into powerful analytical tools. You'll practice:
Take your time to understand how grouping works - this is where SQL becomes a true analytical powerhouse for business intelligence!
Ready to practice SQL? Try our coding challenges →