SQL for BI Analysts (2025): Queries, Window

Introduction

SQL for BI analysts in 2025 means more than SELECT and GROUP BY. You’re expected to write fast, reliable queries, master window functions for running totals and cohort metrics, and apply performance tuning so dashboards don’t crawl. This SEO-optimized playbook gives you the patterns that matter across PostgreSQL, Snowflake, and BigQuery—including ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD, rolling averages, partitions, indexes, and EXPLAIN.

Use the snippets below to speed up ad-hoc analysis, stabilize production reports, and build a reusable toolkit for business intelligence and analytics engineering.

1) Query Foundations BI Analysts Use Daily

Clean, predictable queries beat clever one-liners. Structure queries with CTEs (common table expressions) and explicit columns:

WITH orders_clean AS (
  SELECT
    o.order_id,
    o.customer_id,
    o.order_date::date AS order_date,
    o.total_amount::numeric AS total_amount,
    COALESCE(o.channel, 'unknown') AS channel
  FROM raw_orders o
  WHERE o.order_status = 'completed'
)
SELECT
  c.customer_id,
  COUNT(*) AS orders,
  SUM(total_amount) AS revenue
FROM orders_clean c
GROUP BY 1
ORDER BY revenue DESC;

Best practices: cast early, COALESCE nulls, avoid SELECT *, label metrics clearly (revenue_30d, orders_7d).

2) Joins & Aggregations: Patterns That Avoid Pitfalls

Most BI bugs come from join cardinality. Use distinct keys and pre-aggregate before joining to facts.

-- Pre-aggregate sessions before joining to orders (avoids row blow-ups)
WITH session_stats AS (
  SELECT user_id, DATE(session_ts) AS d, COUNT(*) AS sessions
  FROM web_sessions
  GROUP BY 1,2
)
SELECT
  o.customer_id,
  DATE(o.order_date) AS d,
  SUM(o.total_amount) AS revenue,
  COALESCE(s.sessions, 0) AS sessions
FROM orders o
LEFT JOIN session_stats s
  ON s.user_id = o.customer_id AND s.d = DATE(o.order_date)
GROUP BY 1,2,4;

Tip: When using LEFT JOIN, keep the aggregated side on the right and ensure join keys are indexed/partitioned.

3) Window Functions: The BI Superpower

Window functions calculate metrics across partitions (e.g., per customer, per product) without collapsing rows. Core syntax:

function() OVER (
  PARTITION BY key_columns
  ORDER BY sort_columns
  ROWS BETWEEN frame_start AND frame_end
)

Common analytics patterns:

-- Running total by customer (cumulative revenue)
SELECT
  customer_id,
  order_date,
  total_amount,
  SUM(total_amount) OVER (
    PARTITION BY customer_id
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cum_revenue
FROM orders;

-- Rolling 7-day average (needs day-level grain)
SELECT
  d,
  AVG(revenue) OVER (
    ORDER BY d
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rev_avg_7d
FROM daily_revenue;

-- Ranking products by category & month
SELECT
  category,
  month,
  product_id,
  revenue,
  RANK() OVER (
    PARTITION BY category, month
    ORDER BY revenue DESC
  ) AS rank_in_category
FROM monthly_product_revenue;

-- Cohort: time to second order with LAG
SELECT
  customer_id,
  order_date,
  LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order,
  EXTRACT(DAY FROM order_date - LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)) AS days_between
FROM orders;

Guideline: choose ROWS frames for fixed windows (7 days); use RANGE carefully with numeric/date gaps. Always define ORDER BY for deterministic results.

4) Percentiles, Retention & Cohorts (Advanced Windows)

BI analyses often need percentiles and retention curves.

-- Median and P90 order value by month (Postgres/Snowflake)
SELECT
  DATE_TRUNC('month', order_date) AS m,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_amount) AS p50,
  PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY total_amount) AS p90
FROM orders
GROUP BY 1;

-- BigQuery percentile (approx)
SELECT
  m,
  APPROX_QUANTILES(total_amount, 100)[OFFSET(50)] AS p50,
  APPROX_QUANTILES(total_amount, 100)[OFFSET(90)] AS p90
FROM (
  SELECT DATE_TRUNC(order_date, MONTH) AS m, total_amount FROM orders
)
GROUP BY 1;

-- Simple retention calc: did the user return next month?
WITH firsts AS (
  SELECT customer_id, DATE_TRUNC('month', MIN(order_date)) AS first_month
  FROM orders GROUP BY 1
),
activity AS (
  SELECT customer_id, DATE_TRUNC('month', order_date) AS m
  FROM orders
  GROUP BY 1,2
)
SELECT
  f.first_month,
  a.m AS activity_month,
  COUNT(DISTINCT a.customer_id) AS active_users
FROM firsts f
JOIN activity a ON a.customer_id = f.customer_id
WHERE a.m BETWEEN f.first_month AND f.first_month + INTERVAL '6 months'
GROUP BY 1,2
ORDER BY 1,2;

5) Reusable CTE Patterns (Metrics, Dedup, SCD-ish)

CTEs make complex logic readable and testable.

-- Deduplicate by latest timestamp
WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
  FROM user_profiles
)
SELECT * FROM ranked WHERE rn = 1;

-- Metrics layer: build once, reuse everywhere
WITH base_orders AS (
  SELECT order_id, customer_id, order_date::date AS d, total_amount FROM orders WHERE order_status='completed'
),
daily AS (
  SELECT d, COUNT(*) AS orders, SUM(total_amount) AS revenue
  FROM base_orders GROUP BY 1
)
SELECT * FROM daily;

Tip: Materialize heavy CTEs as materialized views (Postgres) or persistent derived tables (Snowflake) to speed dashboards.

6) Performance Tuning: Make Dashboards Feel Instant

Fast SQL = happy stakeholders. Focus on scans, filters, joins, and sorts.

PostgreSQL: create B-tree indexes on join keys and high-selectivity filters; analyze with EXPLAIN (ANALYZE, BUFFERS).
Snowflake: clustering keys on large tables improve pruning; use TASKS to precompute aggregates.
BigQuery: partition by date, cluster by common filter/join columns; watch scanned bytes, use APPROX functions.

-- PostgreSQL index example
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

-- BigQuery: partition + cluster
CREATE TABLE analytics.daily_revenue
PARTITION BY d
CLUSTER BY customer_id AS
SELECT DATE(order_date) AS d, customer_id, SUM(total_amount) AS revenue
FROM `proj.dataset.orders`
GROUP BY 1,2;

General wins: pre-aggregate to daily grain, avoid functions on indexed columns in WHERE, filter early in CTEs, and limit columns to reduce I/O.

7) Explain Plans & Query Smells

Learn to read EXPLAIN like a profiler. Look for full scans, nested loop explosions, and sorts on huge sets.

Full table scan where you expect index/partition pruning → add predicate on partition key or index join column.
High rows removed by filter → push filters earlier; pre-aggregate.
Repeated subqueries → CTE/materialize once.

8) Cheatsheet: Window Functions & When to Use Them

Use Case	Window Function	Example	Notes
Running total	SUM() OVER	SUM(revenue) OVER (PARTITION BY customer ORDER BY d)	Define frame to avoid future-looking sums
Rolling average	AVG() OVER	ROWS BETWEEN 6 PRECEDING AND CURRENT ROW	Needs contiguous dates at correct grain
Rank top-N	RANK / DENSE_RANK	RANK() OVER (PARTITION BY category ORDER BY revenue DESC)	DENSE_RANK has no gaps on ties
Prev/next value	LAG / LEAD	LAG(order_date) OVER (PARTITION BY user ORDER BY order_date)	Perfect for churn/retention gaps
Percentiles	PERCENTILE_CONT / APPROX_QUANTILES	Median, P90 by month	Use approximate on big data engines

9) Common Anti-Patterns (and the Fix)

SELECT * in dashboards → select named columns; reduce I/O and breakage.
Functions on WHERE columns (e.g., DATE(created_at)) → compute once in CTE or store derived column to enable pruning.
Joining fact → fact at raw grain → aggregate each fact to the join grain first.
ORDER BY without index/partition alignment → pre-sort or cache heavy sorts in a materialized table.

10) Productionizing BI SQL: Versioning, Tests, and Docs

Great analysts ship code like engineers.

Version control: keep SQL in Git; use branches + PR review.
Testing: assert row counts, null ratios, and primary/foreign key integrity (dbt tests or custom queries).
Docs: describe metrics, grains, and caveats in a shared catalog; add owners and SLA for refresh cadence.

Conclusion

Becoming a top-tier BI analyst in 2025 means mastering window functions, writing readable CTEs, and applying performance tuning so insights arrive instantly. Use the cheatsheet and snippets here to build faster dashboards, accurate cohorts, and scalable revenue reporting—no matter if you’re on PostgreSQL, Snowflake, or BigQuery.

Next steps: refactor one slow dashboard query using partitions and pre-aggregations, add rankings/percentiles with windows, and document your metrics. Your stakeholders—and your future self—will thank you.

Introduction

Use the snippets below to speed up ad-hoc analysis, stabilize production reports, and build a reusable toolkit for business intelligence and analytics engineering.

1) Query Foundations BI Analysts Use Daily

Clean, predictable queries beat clever one-liners. Structure queries with CTEs (common table expressions) and explicit columns:

WITH orders_clean AS (
  SELECT
    o.order_id,
    o.customer_id,
    o.order_date::date AS order_date,
    o.total_amount::numeric AS total_amount,
    COALESCE(o.channel, 'unknown') AS channel
  FROM raw_orders o
  WHERE o.order_status = 'completed'
)
SELECT
  c.customer_id,
  COUNT(*) AS orders,
  SUM(total_amount) AS revenue
FROM orders_clean c
GROUP BY 1
ORDER BY revenue DESC;

Best practices: cast early, COALESCE nulls, avoid SELECT *, label metrics clearly (revenue_30d, orders_7d).

2) Joins & Aggregations: Patterns That Avoid Pitfalls

Most BI bugs come from join cardinality. Use distinct keys and pre-aggregate before joining to facts.

-- Pre-aggregate sessions before joining to orders (avoids row blow-ups)
WITH session_stats AS (
  SELECT user_id, DATE(session_ts) AS d, COUNT(*) AS sessions
  FROM web_sessions
  GROUP BY 1,2
)
SELECT
  o.customer_id,
  DATE(o.order_date) AS d,
  SUM(o.total_amount) AS revenue,
  COALESCE(s.sessions, 0) AS sessions
FROM orders o
LEFT JOIN session_stats s
  ON s.user_id = o.customer_id AND s.d = DATE(o.order_date)
GROUP BY 1,2,4;

Tip: When using LEFT JOIN, keep the aggregated side on the right and ensure join keys are indexed/partitioned.

3) Window Functions: The BI Superpower

Window functions calculate metrics across partitions (e.g., per customer, per product) without collapsing rows. Core syntax:

function() OVER (
  PARTITION BY key_columns
  ORDER BY sort_columns
  ROWS BETWEEN frame_start AND frame_end
)

Common analytics patterns:

-- Running total by customer (cumulative revenue)
SELECT
  customer_id,
  order_date,
  total_amount,
  SUM(total_amount) OVER (
    PARTITION BY customer_id
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cum_revenue
FROM orders;

-- Rolling 7-day average (needs day-level grain)
SELECT
  d,
  AVG(revenue) OVER (
    ORDER BY d
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rev_avg_7d
FROM daily_revenue;

-- Ranking products by category & month
SELECT
  category,
  month,
  product_id,
  revenue,
  RANK() OVER (
    PARTITION BY category, month
    ORDER BY revenue DESC
  ) AS rank_in_category
FROM monthly_product_revenue;

-- Cohort: time to second order with LAG
SELECT
  customer_id,
  order_date,
  LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order,
  EXTRACT(DAY FROM order_date - LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)) AS days_between
FROM orders;

Guideline: choose ROWS frames for fixed windows (7 days); use RANGE carefully with numeric/date gaps. Always define ORDER BY for deterministic results.

4) Percentiles, Retention & Cohorts (Advanced Windows)

BI analyses often need percentiles and retention curves.

-- Median and P90 order value by month (Postgres/Snowflake)
SELECT
  DATE_TRUNC('month', order_date) AS m,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_amount) AS p50,
  PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY total_amount) AS p90
FROM orders
GROUP BY 1;

-- BigQuery percentile (approx)
SELECT
  m,
  APPROX_QUANTILES(total_amount, 100)[OFFSET(50)] AS p50,
  APPROX_QUANTILES(total_amount, 100)[OFFSET(90)] AS p90
FROM (
  SELECT DATE_TRUNC(order_date, MONTH) AS m, total_amount FROM orders
)
GROUP BY 1;

-- Simple retention calc: did the user return next month?
WITH firsts AS (
  SELECT customer_id, DATE_TRUNC('month', MIN(order_date)) AS first_month
  FROM orders GROUP BY 1
),
activity AS (
  SELECT customer_id, DATE_TRUNC('month', order_date) AS m
  FROM orders
  GROUP BY 1,2
)
SELECT
  f.first_month,
  a.m AS activity_month,
  COUNT(DISTINCT a.customer_id) AS active_users
FROM firsts f
JOIN activity a ON a.customer_id = f.customer_id
WHERE a.m BETWEEN f.first_month AND f.first_month + INTERVAL '6 months'
GROUP BY 1,2
ORDER BY 1,2;

5) Reusable CTE Patterns (Metrics, Dedup, SCD-ish)

CTEs make complex logic readable and testable.

-- Deduplicate by latest timestamp
WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
  FROM user_profiles
)
SELECT * FROM ranked WHERE rn = 1;

-- Metrics layer: build once, reuse everywhere
WITH base_orders AS (
  SELECT order_id, customer_id, order_date::date AS d, total_amount FROM orders WHERE order_status='completed'
),
daily AS (
  SELECT d, COUNT(*) AS orders, SUM(total_amount) AS revenue
  FROM base_orders GROUP BY 1
)
SELECT * FROM daily;

Tip: Materialize heavy CTEs as materialized views (Postgres) or persistent derived tables (Snowflake) to speed dashboards.

6) Performance Tuning: Make Dashboards Feel Instant

Fast SQL = happy stakeholders. Focus on scans, filters, joins, and sorts.

PostgreSQL: create B-tree indexes on join keys and high-selectivity filters; analyze with EXPLAIN (ANALYZE, BUFFERS).
Snowflake: clustering keys on large tables improve pruning; use TASKS to precompute aggregates.
BigQuery: partition by date, cluster by common filter/join columns; watch scanned bytes, use APPROX functions.

-- PostgreSQL index example
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

-- BigQuery: partition + cluster
CREATE TABLE analytics.daily_revenue
PARTITION BY d
CLUSTER BY customer_id AS
SELECT DATE(order_date) AS d, customer_id, SUM(total_amount) AS revenue
FROM `proj.dataset.orders`
GROUP BY 1,2;

General wins: pre-aggregate to daily grain, avoid functions on indexed columns in WHERE, filter early in CTEs, and limit columns to reduce I/O.

7) Explain Plans & Query Smells

Learn to read EXPLAIN like a profiler. Look for full scans, nested loop explosions, and sorts on huge sets.

Full table scan where you expect index/partition pruning → add predicate on partition key or index join column.
High rows removed by filter → push filters earlier; pre-aggregate.
Repeated subqueries → CTE/materialize once.

8) Cheatsheet: Window Functions & When to Use Them

Use Case	Window Function	Example	Notes
Running total	SUM() OVER	SUM(revenue) OVER (PARTITION BY customer ORDER BY d)	Define frame to avoid future-looking sums
Rolling average	AVG() OVER	ROWS BETWEEN 6 PRECEDING AND CURRENT ROW	Needs contiguous dates at correct grain
Rank top-N	RANK / DENSE_RANK	RANK() OVER (PARTITION BY category ORDER BY revenue DESC)	DENSE_RANK has no gaps on ties
Prev/next value	LAG / LEAD	LAG(order_date) OVER (PARTITION BY user ORDER BY order_date)	Perfect for churn/retention gaps
Percentiles	PERCENTILE_CONT / APPROX_QUANTILES	Median, P90 by month	Use approximate on big data engines

9) Common Anti-Patterns (and the Fix)

SELECT * in dashboards → select named columns; reduce I/O and breakage.
Functions on WHERE columns (e.g., DATE(created_at)) → compute once in CTE or store derived column to enable pruning.
Joining fact → fact at raw grain → aggregate each fact to the join grain first.
ORDER BY without index/partition alignment → pre-sort or cache heavy sorts in a materialized table.

10) Productionizing BI SQL: Versioning, Tests, and Docs

Great analysts ship code like engineers.

Version control: keep SQL in Git; use branches + PR review.
Testing: assert row counts, null ratios, and primary/foreign key integrity (dbt tests or custom queries).
Docs: describe metrics, grains, and caveats in a shared catalog; add owners and SLA for refresh cadence.

SQL for BI Analysts: Queries, Window Functions, and Performance Tuning

Introduction

1) Query Foundations BI Analysts Use Daily

2) Joins & Aggregations: Patterns That Avoid Pitfalls

3) Window Functions: The BI Superpower

4) Percentiles, Retention & Cohorts (Advanced Windows)

5) Reusable CTE Patterns (Metrics, Dedup, SCD-ish)

6) Performance Tuning: Make Dashboards Feel Instant

7) Explain Plans & Query Smells

8) Cheatsheet: Window Functions & When to Use Them

9) Common Anti-Patterns (and the Fix)

10) Productionizing BI SQL: Versioning, Tests, and Docs

Conclusion

Tags

Share this article

SQL for BI Analysts: Queries, Window Functions, and Performance Tuning

Introduction

1) Query Foundations BI Analysts Use Daily

2) Joins & Aggregations: Patterns That Avoid Pitfalls

3) Window Functions: The BI Superpower

4) Percentiles, Retention & Cohorts (Advanced Windows)

5) Reusable CTE Patterns (Metrics, Dedup, SCD-ish)

6) Performance Tuning: Make Dashboards Feel Instant

7) Explain Plans & Query Smells

8) Cheatsheet: Window Functions & When to Use Them

9) Common Anti-Patterns (and the Fix)

10) Productionizing BI SQL: Versioning, Tests, and Docs

Conclusion

Tags

Share this article

Ready for Career Guidance?