Appearance
Advanced Queries
This chapter covers advanced SELECT query techniques including aggregations, grouping, subqueries, and common table expressions (CTEs). Master these to unlock PostgreSQL's full potential.
Sample Data Setup
Let's create sample tables for practicing:
sql
-- Create tables
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
budget DECIMAL(15, 2)
);
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE,
department_id INTEGER REFERENCES departments(id),
salary DECIMAL(10, 2),
hire_date DATE,
manager_id INTEGER REFERENCES employees(id)
);
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
employee_id INTEGER REFERENCES employees(id),
amount DECIMAL(10, 2),
sale_date DATE,
product VARCHAR(100)
);
-- Insert sample data
INSERT INTO departments (name, budget) VALUES
('Engineering', 500000),
('Marketing', 200000),
('Sales', 300000),
('HR', 150000);
INSERT INTO employees (name, email, department_id, salary, hire_date, manager_id) VALUES
('Alice Chen', 'alice@company.com', 1, 120000, '2020-01-15', NULL),
('Bob Smith', 'bob@company.com', 1, 95000, '2021-03-20', 1),
('Carol Davis', 'carol@company.com', 1, 85000, '2022-06-10', 1),
('David Lee', 'david@company.com', 2, 75000, '2021-08-01', NULL),
('Eva Martinez', 'eva@company.com', 2, 65000, '2023-01-10', 4),
('Frank Wilson', 'frank@company.com', 3, 80000, '2020-05-20', NULL),
('Grace Kim', 'grace@company.com', 3, 70000, '2022-09-15', 6),
('Henry Brown', 'henry@company.com', 4, 60000, '2021-11-01', NULL);
INSERT INTO sales (employee_id, amount, sale_date, product) VALUES
(6, 15000, '2024-01-05', 'Enterprise Plan'),
(7, 8000, '2024-01-10', 'Pro Plan'),
(6, 12000, '2024-01-15', 'Enterprise Plan'),
(7, 5000, '2024-01-20', 'Basic Plan'),
(6, 20000, '2024-02-01', 'Enterprise Plan'),
(7, 10000, '2024-02-10', 'Pro Plan'),
(6, 18000, '2024-02-15', 'Enterprise Plan'),
(7, 7000, '2024-02-20', 'Pro Plan');Aggregate Functions
Aggregate functions compute a single result from multiple rows.
┌─────────────────────────────────────────────────────────────────┐
│ Common Aggregate Functions │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Function Description Example Result │
│ ──────── ─────────── ────────────── │
│ │
│ COUNT() Number of rows COUNT(*) → 8 │
│ SUM() Total of values SUM(salary) → 650000 │
│ AVG() Average value AVG(salary) → 81250 │
│ MIN() Minimum value MIN(salary) → 60000 │
│ MAX() Maximum value MAX(salary) → 120000 │
│ STRING_AGG Concatenate strings STRING_AGG(name,', ') │
│ ARRAY_AGG Collect into array ARRAY_AGG(name) │
│ │
└─────────────────────────────────────────────────────────────────┘Basic Aggregation
sql
-- Count all employees
SELECT COUNT(*) AS total_employees FROM employees;
-- Count non-null values
SELECT COUNT(manager_id) AS employees_with_managers FROM employees;
-- Sum, Average, Min, Max
SELECT
SUM(salary) AS total_payroll,
AVG(salary) AS average_salary,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees;
-- Combine into a single row
SELECT
COUNT(*) AS employee_count,
ROUND(AVG(salary), 2) AS avg_salary,
SUM(salary) AS total_cost
FROM employees
WHERE department_id = 1;String and Array Aggregation
sql
-- Concatenate names into a string
SELECT STRING_AGG(name, ', ' ORDER BY name) AS all_employees
FROM employees;
-- Result: "Alice Chen, Bob Smith, Carol Davis, ..."
-- Collect into an array
SELECT ARRAY_AGG(name ORDER BY hire_date) AS employees_by_tenure
FROM employees;
-- Result: {"Alice Chen", "Frank Wilson", "Bob Smith", ...}
-- Aggregate per department
SELECT
d.name AS department,
STRING_AGG(e.name, ', ') AS team_members
FROM departments d
JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name;GROUP BY
GROUP BY divides rows into groups and applies aggregate functions to each group.
┌─────────────────────────────────────────────────────────────────┐
│ How GROUP BY Works │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Original Data: After GROUP BY department: │
│ ───────────── ───────────────────────── │
│ │
│ Name Dept Salary Dept Count Sum │
│ Alice Eng 120000 ───► Engineering 3 300000 │
│ Bob Eng 95000 Marketing 2 140000 │
│ Carol Eng 85000 Sales 2 150000 │
│ David Mkt 75000 HR 1 60000 │
│ Eva Mkt 65000 │
│ Frank Sales 80000 │
│ Grace Sales 70000 │
│ Henry HR 60000 │
│ │
└─────────────────────────────────────────────────────────────────┘Basic GROUP BY
sql
-- Count employees per department
SELECT
department_id,
COUNT(*) AS employee_count
FROM employees
GROUP BY department_id;
-- With department names (using JOIN)
SELECT
d.name AS department,
COUNT(e.id) AS employee_count,
SUM(e.salary) AS total_salary,
ROUND(AVG(e.salary), 2) AS avg_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
ORDER BY total_salary DESC;Multiple Grouping Columns
sql
-- Group by year and month
SELECT
EXTRACT(YEAR FROM sale_date) AS year,
EXTRACT(MONTH FROM sale_date) AS month,
COUNT(*) AS sale_count,
SUM(amount) AS total_sales
FROM sales
GROUP BY
EXTRACT(YEAR FROM sale_date),
EXTRACT(MONTH FROM sale_date)
ORDER BY year, month;
-- Group by product and employee
SELECT
product,
e.name AS salesperson,
COUNT(*) AS times_sold,
SUM(s.amount) AS total_revenue
FROM sales s
JOIN employees e ON s.employee_id = e.id
GROUP BY product, e.name
ORDER BY product, total_revenue DESC;HAVING
HAVING filters groups after aggregation (WHERE filters rows before aggregation).
sql
-- Departments with more than 1 employee
SELECT
d.name AS department,
COUNT(*) AS employee_count
FROM departments d
JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
HAVING COUNT(*) > 1;
-- Products with total sales over $20,000
SELECT
product,
SUM(amount) AS total_sales
FROM sales
GROUP BY product
HAVING SUM(amount) > 20000;
-- Employees with average sale over $10,000
SELECT
e.name,
ROUND(AVG(s.amount), 2) AS avg_sale,
COUNT(*) AS sale_count
FROM employees e
JOIN sales s ON e.id = s.employee_id
GROUP BY e.id, e.name
HAVING AVG(s.amount) > 10000;WHERE vs HAVING
sql
-- WHERE filters rows BEFORE grouping
SELECT department_id, AVG(salary)
FROM employees
WHERE salary > 70000 -- Filter individual rows first
GROUP BY department_id;
-- HAVING filters groups AFTER aggregation
SELECT department_id, AVG(salary)
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 80000; -- Filter aggregated results
-- Combined: Both WHERE and HAVING
SELECT
d.name,
COUNT(*) AS senior_count,
AVG(e.salary) AS avg_salary
FROM departments d
JOIN employees e ON d.id = e.department_id
WHERE e.hire_date < '2022-01-01' -- Only employees hired before 2022
GROUP BY d.id, d.name
HAVING COUNT(*) >= 2; -- Departments with 2+ such employeesSubqueries
A subquery is a query nested inside another query.
Scalar Subqueries (Return Single Value)
sql
-- Employees earning above average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Employee count compared to company average
SELECT
d.name,
COUNT(*) AS dept_count,
(SELECT COUNT(*) FROM employees) AS total_count,
ROUND(
COUNT(*) * 100.0 / (SELECT COUNT(*) FROM employees),
1
) AS percentage
FROM departments d
JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name;Column Subqueries (Return List of Values)
sql
-- Employees in departments with budget over 200k
SELECT name, department_id
FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE budget > 200000
);
-- Employees who have made sales
SELECT name
FROM employees
WHERE id IN (SELECT DISTINCT employee_id FROM sales);
-- Employees who haven't made sales
SELECT name
FROM employees
WHERE id NOT IN (
SELECT DISTINCT employee_id FROM sales WHERE employee_id IS NOT NULL
);Table Subqueries (Return Multiple Rows and Columns)
sql
-- Top earner per department
SELECT name, salary, department_id
FROM employees e
WHERE salary = (
SELECT MAX(salary)
FROM employees
WHERE department_id = e.department_id
);
-- Using subquery as a table
SELECT
dept_stats.department,
dept_stats.avg_salary,
e.name AS highest_earner
FROM (
SELECT
department_id,
d.name AS department,
AVG(e.salary) AS avg_salary,
MAX(e.salary) AS max_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY department_id, d.name
) AS dept_stats
JOIN employees e ON e.department_id = dept_stats.department_id
AND e.salary = dept_stats.max_salary;EXISTS Subqueries
sql
-- Departments that have at least one employee
SELECT name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees WHERE department_id = d.id
);
-- Departments with no employees
SELECT name
FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees WHERE department_id = d.id
);
-- Employees who are managers
SELECT name
FROM employees e
WHERE EXISTS (
SELECT 1 FROM employees WHERE manager_id = e.id
);Common Table Expressions (CTEs)
CTEs make complex queries more readable by breaking them into named steps.
┌─────────────────────────────────────────────────────────────────┐
│ CTE Structure │
├─────────────────────────────────────────────────────────────────┤
│ │
│ WITH cte_name AS ( │
│ -- Define the CTE query here │
│ SELECT ... │
│ ), │
│ another_cte AS ( │
│ -- Can reference previous CTEs │
│ SELECT ... FROM cte_name ... │
│ ) │
│ -- Main query uses the CTEs │
│ SELECT * FROM cte_name │
│ JOIN another_cte ON ... │
│ │
└─────────────────────────────────────────────────────────────────┘Basic CTE
sql
-- Calculate department statistics
WITH dept_stats AS (
SELECT
department_id,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT
d.name,
ds.employee_count,
ROUND(ds.avg_salary, 2) AS avg_salary
FROM departments d
JOIN dept_stats ds ON d.id = ds.department_id
ORDER BY ds.avg_salary DESC;Multiple CTEs
sql
-- Complex report using multiple CTEs
WITH
-- CTE 1: Department summary
dept_summary AS (
SELECT
d.id,
d.name,
d.budget,
COUNT(e.id) AS headcount,
COALESCE(SUM(e.salary), 0) AS total_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name, d.budget
),
-- CTE 2: Sales summary per employee
sales_summary AS (
SELECT
employee_id,
COUNT(*) AS sale_count,
SUM(amount) AS total_sales
FROM sales
GROUP BY employee_id
),
-- CTE 3: Top performers
top_performers AS (
SELECT
e.id,
e.name,
e.department_id,
COALESCE(ss.total_sales, 0) AS total_sales
FROM employees e
LEFT JOIN sales_summary ss ON e.id = ss.employee_id
WHERE ss.total_sales IS NOT NULL
)
-- Main query: Department report with top performer
SELECT
ds.name AS department,
ds.headcount,
ds.total_salary,
ds.budget - ds.total_salary AS remaining_budget,
tp.name AS top_performer,
tp.total_sales
FROM dept_summary ds
LEFT JOIN top_performers tp ON ds.id = tp.department_id
ORDER BY ds.name;Recursive CTEs
Recursive CTEs are perfect for hierarchical data like org charts or categories.
sql
-- Build organization hierarchy
WITH RECURSIVE org_chart AS (
-- Base case: Top-level employees (no manager)
SELECT
id,
name,
manager_id,
1 AS level,
name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: Employees with managers
SELECT
e.id,
e.name,
e.manager_id,
oc.level + 1,
oc.path || ' → ' || e.name
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT
REPEAT(' ', level - 1) || name AS employee,
level,
path
FROM org_chart
ORDER BY path;Output:
employee | level | path
-------------------------+-------+--------------------------------
Alice Chen | 1 | Alice Chen
Bob Smith | 2 | Alice Chen → Bob Smith
Carol Davis | 2 | Alice Chen → Carol Davis
David Lee | 1 | David Lee
Eva Martinez | 2 | David Lee → Eva Martinez
Frank Wilson | 1 | Frank Wilson
Grace Kim | 2 | Frank Wilson → Grace Kim
Henry Brown | 1 | Henry BrownWindow Functions
Window functions perform calculations across related rows without collapsing them.
┌─────────────────────────────────────────────────────────────────┐
│ Window Functions │
├─────────────────────────────────────────────────────────────────┤
│ │
│ GROUP BY Window Functions │
│ ──────── ──────────────── │
│ │
│ Collapses rows ───► Keeps all rows │
│ One result per group Adds computed column │
│ │
│ id salary dept id salary dept rank │
│ ────────────────── ────────────────────────── │
│ Eng: AVG = 100000 1 120000 Eng 1 │
│ Mkt: AVG = 70000 2 95000 Eng 2 │
│ 3 85000 Eng 3 │
│ 4 75000 Mkt 1 │
│ 5 65000 Mkt 2 │
│ │
└─────────────────────────────────────────────────────────────────┘ROW_NUMBER, RANK, DENSE_RANK
sql
-- Rank employees by salary within department
SELECT
name,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM employees;
-- Get top 2 earners per department
SELECT * FROM (
SELECT
name,
department_id,
salary,
ROW_NUMBER() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS rank
FROM employees
) ranked
WHERE rank <= 2;Running Totals and Moving Averages
sql
-- Running total of sales
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;
-- Sales with cumulative and moving average
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS cumulative_total,
ROUND(
AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2
) AS moving_avg_3
FROM sales;LAG and LEAD
sql
-- Compare each sale to previous sale
SELECT
sale_date,
amount,
LAG(amount) OVER (ORDER BY sale_date) AS previous_sale,
amount - LAG(amount) OVER (ORDER BY sale_date) AS difference
FROM sales;
-- Compare to next sale
SELECT
sale_date,
amount,
LEAD(amount) OVER (ORDER BY sale_date) AS next_sale
FROM sales;FIRST_VALUE, LAST_VALUE, NTH_VALUE
sql
-- Compare each employee to department's highest earner
SELECT
name,
salary,
FIRST_VALUE(name) OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS top_earner,
FIRST_VALUE(salary) OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS top_salary,
salary - FIRST_VALUE(salary) OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS diff_from_top
FROM employees;Real-World Query Patterns
Pagination with Total Count
sql
-- Efficient pagination with total count in one query
WITH filtered AS (
SELECT *
FROM products
WHERE category = 'Electronics'
),
counted AS (
SELECT COUNT(*) AS total FROM filtered
)
SELECT
f.*,
c.total
FROM filtered f, counted c
ORDER BY f.created_at DESC
LIMIT 20 OFFSET 0;Running Totals for Financial Reports
sql
-- Daily revenue with running total
SELECT
sale_date,
SUM(amount) AS daily_total,
SUM(SUM(amount)) OVER (
ORDER BY sale_date
ROWS UNBOUNDED PRECEDING
) AS running_total
FROM sales
GROUP BY sale_date
ORDER BY sale_date;Finding Gaps in Sequences
sql
-- Find missing order IDs
WITH all_ids AS (
SELECT generate_series(
(SELECT MIN(id) FROM orders),
(SELECT MAX(id) FROM orders)
) AS id
)
SELECT a.id AS missing_id
FROM all_ids a
LEFT JOIN orders o ON a.id = o.id
WHERE o.id IS NULL;Top N per Group
sql
-- Top 3 products per category by sales
WITH ranked AS (
SELECT
p.name,
p.category,
SUM(oi.quantity) AS total_sold,
ROW_NUMBER() OVER (
PARTITION BY p.category
ORDER BY SUM(oi.quantity) DESC
) AS rank
FROM products p
JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.id, p.name, p.category
)
SELECT name, category, total_sold
FROM ranked
WHERE rank <= 3;Year-over-Year Comparison
sql
-- Compare monthly sales to same month last year
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', sale_date) AS month,
SUM(amount) AS total
FROM sales
GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT
month,
total AS current_total,
LAG(total, 12) OVER (ORDER BY month) AS last_year_total,
ROUND(
(total - LAG(total, 12) OVER (ORDER BY month)) /
NULLIF(LAG(total, 12) OVER (ORDER BY month), 0) * 100,
2
) AS yoy_growth_percent
FROM monthly_sales
ORDER BY month DESC;Cohort Analysis
sql
-- User retention by signup month
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', created_at) AS cohort_month
FROM users
),
user_activities AS (
SELECT
user_id,
DATE_TRUNC('month', activity_date) AS activity_month
FROM user_activities
)
SELECT
uc.cohort_month,
ua.activity_month,
COUNT(DISTINCT uc.user_id) AS active_users,
EXTRACT(MONTH FROM AGE(ua.activity_month, uc.cohort_month)) AS months_since_signup
FROM user_cohorts uc
JOIN user_activities ua ON uc.user_id = ua.user_id
WHERE ua.activity_month >= uc.cohort_month
GROUP BY uc.cohort_month, ua.activity_month
ORDER BY uc.cohort_month, ua.activity_month;Query Performance Tips
Optimization Guidelines
- Use EXPLAIN ANALYZE to understand query plans
- Add indexes on columns used in WHERE, JOIN, ORDER BY
- Limit result sets - use LIMIT during development
- **Avoid SELECT *** - fetch only needed columns
- Use CTEs wisely - they can prevent optimization in some cases
- Consider materialized views for complex aggregations
sql
-- Check query performance
EXPLAIN ANALYZE
SELECT e.name, d.name AS department
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > 50000;
-- Look for:
-- - Seq Scan (might need index)
-- - High cost numbers
-- - Large row estimatesSummary
In this chapter, you learned:
- Aggregate Functions: COUNT, SUM, AVG, MIN, MAX, STRING_AGG
- GROUP BY: Grouping rows and applying aggregates
- HAVING: Filtering groups after aggregation
- Subqueries: Nested queries for complex logic
- CTEs: Named subqueries for readable code
- Recursive CTEs: Handling hierarchical data
- Window Functions: Calculations across related rows
- Real-world patterns: Pagination, running totals, YoY comparison
Quick Reference
sql
-- Aggregation
SELECT COUNT(*), SUM(col), AVG(col) FROM table GROUP BY group_col;
-- Filtering groups
SELECT col, COUNT(*) FROM table GROUP BY col HAVING COUNT(*) > 5;
-- Subquery
SELECT * FROM table WHERE col > (SELECT AVG(col) FROM table);
-- CTE
WITH cte AS (SELECT ...) SELECT * FROM cte;
-- Window function
SELECT col, SUM(col) OVER (PARTITION BY group ORDER BY sort) FROM table;What's Next?
Now let's learn how to combine data from multiple tables with JOINs!
👉 Continue to Chapter 6: JOINs