Lesson 9: GROUP BY and HAVING - Advanced Aggregation

In Lesson 8, we learned how to use aggregate functions to summarize entire datasets. But what if you want to break down your data into categories and get totals for each group? What if you want to find only the categories that meet certain conditions?

This is where GROUP BY and HAVING become essential tools for advanced data analysis.

🗂️ What is GROUP BY?

GROUP BY takes your data and organizes it into groups based on one or more columns that have the same values. Then you can apply aggregate functions to each group separately.

Think of it like sorting your data into separate piles, then calculating totals for each pile.

Without GROUP BY:

  • One total for entire dataset
  • "What's our total revenue?" → $50,000

With GROUP BY:

  • Separate totals for each category
  • "What's our total revenue by product category?" → Electronics: $30,000, Sports: $15,000, Books: $5,000

Sample Sales Data:

order_id | customer_id | product_category | order_amount | order_date
---------|-------------|------------------|--------------|------------
1        | 101         | Electronics      | 1200.00      | 2024-01-15
2        | 102         | Sports           | 89.99        | 2024-01-16
3        | 103         | Electronics      | 2500.00      | 2024-01-17
4        | 101         | Books            | 45.00        | 2024-01-18
5        | 104         | Sports           | 150.00       | 2024-01-19

🔧 Basic GROUP BY Syntax

SELECT column_name, AGG_FUNC(column_name)
FROM table_name
GROUP BY column_name;

Important rule: The column in SELECT must either be:

  • Listed in GROUP BY - The grouping column
  • An aggregate function - COUNT, SUM, AVG, etc.

Example:

-- Revenue by product category
SELECT product_category, SUM(order_amount) AS total_revenue
FROM sales
GROUP BY product_category;

Result:

product_category | total_revenue
-----------------|---------------
Electronics      | 3700.00
Sports           | 239.99
Books            | 45.00

🔗 GROUP BY with Multiple Tables

You can combine GROUP BY with JOINs to group data from related tables:

SELECT c.category_name, SUM(s.order_amount) AS total_sales
FROM categories c
JOIN sales s ON c.category_id = s.category_id
GROUP BY c.category_name;

This allows you to group by descriptive names from one table while aggregating values from another.


🎯 The HAVING Clause

Sometimes you want to filter your groups based on the calculated aggregate values. This is where HAVING comes in.

WHERE vs HAVING:

  • WHERE filters individual rows before grouping
  • HAVING filters groups after grouping and aggregation

HAVING Syntax:

SELECT column_name, AGG_FUNC(column_name)
FROM table_name
WHERE condition
GROUP BY column_name
HAVING aggregate_condition;

Example:

-- Categories with total sales over $1,000
SELECT product_category, SUM(order_amount) AS total_revenue
FROM sales
GROUP BY product_category
HAVING SUM(order_amount) > 1000;

Result:

product_category | total_revenue
-----------------|---------------
Electronics      | 3700.00

💡 Key Point: HAVING filters the grouped results, while WHERE filters individual rows before grouping occurs.

🔄 Query Execution Order

Understanding the order helps you know when to use WHERE vs HAVING:

  1. FROM - Get data from tables
  2. JOIN - Combine tables if needed
  3. WHERE - Filter individual rows
  4. GROUP BY - Group filtered rows
  5. HAVING - Filter groups
  6. SELECT - Choose columns to display
  7. ORDER BY - Sort final results

📊 Practical Examples

🔢 Counting by Category

-- How many orders per category?
SELECT product_category, COUNT(*) AS order_count
FROM sales
GROUP BY product_category;

Result:

product_category | order_count
-----------------|------------
Electronics      | 2
Sports           | 2
Books            | 1

💰 Totals by Group

-- Total revenue by category
SELECT product_category, SUM(order_amount) AS total_revenue
FROM sales
GROUP BY product_category;

📈 Averages by Group

-- Average order value by category
SELECT product_category, AVG(order_amount) AS avg_order_value
FROM sales
GROUP BY product_category;

🔍 Complex Example: WHERE + GROUP BY + HAVING

-- For recent orders, show average amount by category, 
-- but only for categories with 2+ orders
SELECT product_category, AVG(order_amount) AS avg_amount
FROM sales
WHERE order_date > '2024-01-01'
GROUP BY product_category
HAVING COUNT(*) >= 2;

Result:

product_category | avg_amount
-----------------|------------
Electronics      | 1850.00
Sports           | 119.995

🛠️ Common GROUP BY Patterns

Sales Analysis:

-- Monthly sales summary
SELECT MONTH(order_date) AS month, 
       COUNT(*) AS total_orders,
       SUM(order_amount) AS monthly_revenue
FROM sales
GROUP BY MONTH(order_date)
ORDER BY month;

Customer Analysis:

-- Customer spending patterns
SELECT customer_id, 
       COUNT(*) AS total_orders,
       SUM(order_amount) AS total_spent,
       AVG(order_amount) AS avg_order_value
FROM sales
GROUP BY customer_id
HAVING COUNT(*) > 1;

Product Performance:

-- Top performing categories
SELECT product_category, 
       SUM(order_amount) AS total_revenue,
       COUNT(*) AS order_count
FROM sales
GROUP BY product_category
ORDER BY total_revenue DESC;

🎯 When to Use GROUP BY and HAVING

Use GROUP BY when you need:

  • Category breakdowns - Sales by region, products by supplier
  • Comparative analysis - Which groups perform best?
  • Detailed insights - Beyond simple totals
  • Time-based analysis - Monthly, quarterly, yearly summaries

Use HAVING when you need:

  • Filter aggregated results - Only show high-performing groups
  • Set minimum thresholds - Categories with significant volume
  • Conditional summaries - Groups meeting specific criteria
  • Quality control - Find groups with unusual patterns

📋 Best Practices

  • Include grouping columns in SELECT - Show what you're grouping by
  • Use meaningful aliases - Make results clear and professional
  • Think about the business question - What insights do you need?
  • Test with small datasets first - Verify your logic before running on large data
  • Use ORDER BY - Sort your grouped results for better readability
  • Consider performance - GROUP BY can be expensive on large datasets

Good example:

SELECT product_category,
       COUNT(*) AS total_orders,
       SUM(order_amount) AS total_revenue,
       AVG(order_amount) AS avg_order_value
FROM sales
WHERE order_date >= '2024-01-01'
GROUP BY product_category
HAVING COUNT(*) >= 5
ORDER BY total_revenue DESC;

💡 Did You Know? The GROUP BY clause is one of the most powerful features in SQL! It transforms SQL from a simple data retrieval tool into a sophisticated analysis platform. Many business intelligence and reporting tools rely heavily on GROUP BY queries to generate insights. When you use GROUP BY, SQL essentially creates temporary "sub-tables" for each group and applies your aggregate functions to each sub-table separately.


✅ Key Takeaways

  • GROUP BY organizes data into groups based on common values in specified columns
  • Aggregate functions work on each group separately, not the entire dataset
  • WHERE filters rows before grouping, HAVING filters groups after aggregation
  • You can combine GROUP BY with JOINs for multi-table analysis
  • Use GROUP BY for category breakdowns and comparative analysis
  • Use HAVING to filter groups based on aggregate conditions
  • Query execution order matters - understand when WHERE vs HAVING applies
  • Include grouping columns in SELECT to show what you're grouping by
  • Always use meaningful aliases for professional, readable results
  • GROUP BY is essential for business intelligence and reporting
  • Test your logic with small datasets before running complex queries

📝 Exercise Overview

In this lesson's exercises, you'll work with both Sales and Products tables to practice grouping data and filtering groups. You'll learn to answer complex business questions like "Which product categories generate the most revenue?" and "Which suppliers have the highest average sales volume?"

These scenarios will show you how GROUP BY and HAVING transform simple aggregate functions into powerful analytical tools. You'll practice:

  • Grouping data by different categories
  • Using aggregate functions with groups
  • Filtering groups with HAVING clauses
  • Combining WHERE and HAVING for complex analysis
  • Creating meaningful business reports

Take your time to understand how grouping works - this is where SQL becomes a true analytical powerhouse for business intelligence!


Ready to practice SQL? Try our coding challenges →