Using Postgres & Rails to create monthly breakdowns.
The problem
I was faced with implementing a dashboard on a recent project. The client asked to see their data sliced up by category along with running monthly and yearly breakdowns.
The data to an approximation looked like this: There are Posts and Rejection. A Rejection references a Post and gives a reason for rejection. Something like:
posts
id
body
rejections
id
reason
post_id
Posts are written and moderators can reject them, giving a reason (“spam”, “boring”, “mansplaining”). We want to see how many posts are rejected monthly and yearly in the current year first overall and then grouped by reason.
How are we going to do this? We could make a big table joining a series of subqueries, and then have Rails read the table columns first, pivoting the output for us. That give me the no-feeling pretty quick as I hope it does for you also.
Reaching out
I sat on this idea for a while, and then I did what I always do when I have a Postgres question: ask Shey. Shey had a one word answer for me: crosstabs.
Making it happen
We’ll start with the overall rejections for this year, broken down by month.
Enabling tablefunc
First we need to enable the postgresql extension for crosstabs: tablefunc
# enable tablefunc extension for crosstab support
$ create extension if not exists tablefunc;
Here is my initial table:
create view overall as
select count(rejections.id),
date_part('month', rejections.created_at) as month
from rejections
where date_part('year', rejections.created_at) = date_part('year', current_date)
group by date_part('month', rejections.created_at);
create view report_overall as
select * from crosstab(
'select ''overall'', month, count from overall',
'select m from generate_series(1,12) m'
) as (
"overall" text,
"jan" int,
"feb" int,
"mar" int,
"apr" int,
"may" int,
"jun" int,
"jul" int,
"aug" int,
"sep" int,
"oct" int,
"nov" int,
"dec" int
);
select * from report_overall;
overall | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
---------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
overall | 2 | 1 | 2 | | 1 | 2 | 1 | 4 | | 4 | 2 | 1
That’s pretty close to what I need, thank you Shey.
Need more
That’s great, but we need a way to add the YTD info. Since it will be the same columns, we can combine it with the monthly data using a union. We just need to add a column at the end of our crosstab, and add a hacked-on 13th month.
create view overall as
select count(rejections.id),
date_part('month', rejections.created_at) as month
from rejections
where date_part('year', rejections.created_at) = date_part('year', current_date)
group by date_part('month', rejections.created_at)
union
select count(rejections.id),
13 as month
from rejections
where date_part('year', rejections.created_at) = date_part('year', current_date);
create view report_overall as
select * from crosstab(
'select ''overall'', month, count from overall',
'select m from generate_series(1,13) m'
) as (
"overall" text,
"jan" int,
"feb" int,
"mar" int,
"apr" int,
"may" int,
"jun" int,
"jul" int,
"aug" int,
"sep" int,
"oct" int,
"nov" int,
"dec" int,
"ytd" int
);
select * from report_overall;
overall | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec | ytd
---------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
overall | 2 | 1 | 2 | | 1 | 2 | 1 | 4 | | 4 | 2 | 1 | 20
Thoroughly hacked.
Keep it coming
That’ll do for the overall data, but we’d like to see the same info broken out by rejection reason.
create view rejection_reasons as
select count(rejections.id),
reason_id,
date_part('month', rejections.created_at) as month
from rejections
where date_part('year', rejections.created_at) = date_part('year', current_date)
group by reason_id,
date_part('month', rejections.created_at)
union
select count(rejections.id),
reason_id,
13 as month
from rejections
where date_part('year', rejections.created_at) = date_part('year', current_date)
group by reason_id;
create view report_rejection_reasons as
select * from crosstab(
'select reason_id, month, count from rejection_reasons order by reason_id',
'select m from generate_series(1,13) m'
) as (
"reason" text,
"jan" int,
"feb" int,
"mar" int,
"apr" int,
"may" int,
"jun" int,
"jul" int,
"aug" int,
"sep" int,
"oct" int,
"nov" int,
"dec" int,
"ytd" int
);
select * from report_rejection_reasons;
reason | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec | ytd
--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
0 | 1 | | 2 | | | | | 1 | | 2 | 2 | | 8
1 | 1 | | | | 1 | 1 | | 1 | | 1 | | | 5
2 | | 1 | | | | 1 | 1 | 2 | | 1 | | 1 | 7
It works… but it’s ugly
Okay that’s pretty good but that is some ugly SQL. Lets clean up with some Common Table Expressions:
create view rejection_reasons as
with
monthly as (
select count(rejections.id),
reason,
date_part('month', rejections.created_at) as month
from rejections
where date_part('year', rejections.created_at) = date_part('year', current_date)
group by reason,
date_part('month', rejections.created_at)
),
yearly as (
select count(rejections.id),
reason,
13 as month
from rejections
where date_part('year', rejections.created_at) = date_part('year', current_date)
group by reason;
)
select * from monthly
union
select * from yearly;
It’s not a big change, but it helps to give meaning and structure to the statement.
Lets use it.
Now that we’ve got our report, lets see how we can use it nicely in rails.
Create a Report model that is backed by a view:
class RejectionReasonReport < ActiveRecord::Base
self.table_name = 'report_rejection_reasons'
def each_column
%i(jan feb mar apr may jun jul aug sep oct nov dec ytd).map do |col|
yield public_send(col)
end
end
def title
%w(Spam Boring Mansplaining)[reason_id.to_i]
end
end
Create a controller that loads the data
class ReportsController < ApplicationController
def rejection_reasons
@report = RejectionReasonReport.all
end
end
Create a view that displays it for us:
<table>
<thead>
<th></th>
<th>Jan</th>
<th>Feb</th>
<th>Mar</th>
<th>Apr</th>
<th>May</th>
<th>Jun</th>
<th>Jul</th>
<th>Aug</th>
<th>Sep</th>
<th>Oct</th>
<th>Nov</th>
<th>Dec</th>
<th>YTD</th>
</thead>
<tbody>
<% @report.each do |row| %>
<tr>
<td><%= row.title %></td>
<% row.each_column do |count| %>
<td><%= count %></td>
<% end %>
</tr>
<% end %>
</tbody>
</table>