How To Force a Complete Dataset in SQL (Show All Months)
I was running a query that could potentially result in having “missing” months. For example, the output could be January, February, March, May, June: where April was missing.
The reason was obvious: I had no data for that month. But I needed to show all consecutive months, even if there was no data. The SQL below shows how to fill in the missing months.
A coworker referred to this as having “sparse data”. I’ll include that phrase here for SEO purposes. 😀
Solution: The Concept
- I created a second dataset that had every possible month
- I joined this extra dataset (serving as a placeholder) via UNION ALL, guaranteeing that every month will have a value- even if 0
- I queried the combined dataset (original + placeholder)
Preparation: Create dataset with all possible months
I modeled this after a previous article I wrote for writing a loop in SQL Server.
-- Define start/end dates for the query
DECLARE @startDate AS DATE = '1/1/2023';
DECLARE @endDate AS DATE = '12/31/2023';
-- We'll use this to iterate in the loop
-- initialize it to the start date
DECLARE @firstOfMonth AS DATE;
SET @firstOfMonth = @startDate
-- Create a new table that will hold all possible values (months)
DROP TABLE IF EXISTS #temp
CREATE TABLE #temp (
id INT IDENTITY (1,1)
, the_month DATE
)
-- Populate the table with all possible values
WHILE @firstOfMonth < @endDate
BEGIN
-- Force month to be first of the month
SET @firstOfMonth = ( SELECT DATEADD(month, DATEDIFF(month, 0, @firstOfMonth), 0) )
-- Insert into the temporary table
INSERT INTO #temp (the_month) VALUES (@firstOfMonth)
-- Go to the next month
-- loop will continue until it reaches the end date
SET @firstOfMonth = (SELECT DATEADD(month, 1, @firstOfMonth))
END
-- View all possible months
SELECT * FROM #temp
Explanation:
- Declare start/end dates that I know the final output should have
- Create a temporary table
- Use a WHILE loop to add in all possible dates
- Dates are converted to the first of the month for consistency
Sample Data
For demonstration purposes, this query will force April data to be missing:
SELECT the_month, SUM(total_ftes) AS ftes
FROM myTable
WHERE
the_month BETWEEN '1/1/2023' AND '12/31/2023'
AND the_month <> '4/1/2023'
GROUP BY the_month
ORDER BY the_month
Combine the “Real” data with the placeholder data
Using UNION ALL we can combine the two datasets:
SELECT the_month, SUM(total_ftes) AS ftes
FROM myTable
WHERE
the_month BETWEEN '1/1/2023' AND '12/31/2023'
AND the_month <> '4/1/2023'
GROUP BY the_month
UNION ALL
SELECT the_month, 0 AS ftes
FROM #temp
It’ll look like this:
Wrap everything in a single SQL statement
And SUM the values:
SELECT
the_month, SUM(ftes) AS ftes
FROM (
SELECT the_month, SUM(total_ftes) AS ftes
FROM myTable
WHERE
the_month BETWEEN '1/1/2023' AND '12/31/2023'
AND the_month <> '4/1/2023'
GROUP BY the_month
UNION ALL
SELECT the_month, 0 AS ftes
FROM #temp
) AS qryMain
GROUP BY
the_month
ORDER BY
the_month
Solution: Entire SQL
Here’s everything all in one spot: