Postgres-fu: Electric Boogaloo

Posted on

Using Postgres & Rails to create monthly breakdowns.

The problem

I was faced with implementing a dashboard on a recent project. The client asked to see their data sliced up by category along with running monthly and yearly breakdowns.

The data to an approximation looked like this: There are Posts and Rejection. A Rejection references a Post and gives a reason for rejection. Something like:

posts
  id
  body

rejections
  id
  reason
  post_id

Posts are written and moderators can reject them, giving a reason (“spam”, “boring”, “mansplaining”). We want to see how many posts are rejected monthly and yearly in the current year first overall and then grouped by reason.

How are we going to do this? We could make a big table joining a series of subqueries, and then have Rails read the table columns first, pivoting the output for us. That give me the no-feeling pretty quick as I hope it does for you also.

Reaching out

I sat on this idea for a while, and then I did what I always do when I have a Postgres question: ask Shey. Shey had a one word answer for me: crosstabs.

Making it happen

We’ll start with the overall rejections for this year, broken down by month.

Enabling tablefunc

First we need to enable the postgresql extension for crosstabs: tablefunc

# enable tablefunc extension for crosstab support
$ create extension if not exists tablefunc;

Here is my initial table:

create view overall as
  select      count(rejections.id),
              date_part('month', rejections.created_at) as month
  from        rejections
  where       date_part('year', rejections.created_at) = date_part('year', current_date)
  group by    date_part('month', rejections.created_at);

create view report_overall as
  select * from crosstab(
    'select ''overall'', month, count from overall',
    'select m from generate_series(1,12) m'
  ) as (
    "overall" text,
    "jan" int,
    "feb" int,
    "mar" int,
    "apr" int,
    "may" int,
    "jun" int,
    "jul" int,
    "aug" int,
    "sep" int,
    "oct" int,
    "nov" int,
    "dec" int
  );

select * from report_overall;
 overall | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
---------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
 overall |   2 |   1 |   2 |     |   1 |   2 |   1 |   4 |     |   4 |   2 |   1

That’s pretty close to what I need, thank you Shey.

Need more

That’s great, but we need a way to add the YTD info. Since it will be the same columns, we can combine it with the monthly data using a union. We just need to add a column at the end of our crosstab, and add a hacked-on 13th month.

create view overall as
  select      count(rejections.id),
              date_part('month', rejections.created_at) as month
  from        rejections
  where       date_part('year', rejections.created_at) = date_part('year', current_date)
  group by    date_part('month', rejections.created_at)

  union

  select      count(rejections.id),
              13 as month
  from        rejections
  where       date_part('year', rejections.created_at) = date_part('year', current_date);

create view report_overall as
  select * from crosstab(
    'select ''overall'', month, count from overall',
    'select m from generate_series(1,13) m'
  ) as (
    "overall" text,
    "jan" int,
    "feb" int,
    "mar" int,
    "apr" int,
    "may" int,
    "jun" int,
    "jul" int,
    "aug" int,
    "sep" int,
    "oct" int,
    "nov" int,
    "dec" int,
    "ytd" int
  );
select * from report_overall;
 overall | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec | ytd
---------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
 overall |   2 |   1 |   2 |     |   1 |   2 |   1 |   4 |     |   4 |   2 |   1 |  20

Thoroughly hacked.

Keep it coming

That’ll do for the overall data, but we’d like to see the same info broken out by rejection reason.

create view rejection_reasons as
  select      count(rejections.id),
              reason_id,
              date_part('month', rejections.created_at) as month
  from        rejections
  where       date_part('year', rejections.created_at) = date_part('year', current_date)
  group by    reason_id,
              date_part('month', rejections.created_at)

  union

  select      count(rejections.id),
              reason_id,
              13 as month
  from        rejections
  where       date_part('year', rejections.created_at) = date_part('year', current_date)
  group by    reason_id;

create view report_rejection_reasons as
  select * from crosstab(
    'select reason_id, month, count from rejection_reasons order by reason_id',
    'select m from generate_series(1,13) m'
  ) as (
    "reason" text,
    "jan" int,
    "feb" int,
    "mar" int,
    "apr" int,
    "may" int,
    "jun" int,
    "jul" int,
    "aug" int,
    "sep" int,
    "oct" int,
    "nov" int,
    "dec" int,
    "ytd" int
  );
select * from report_rejection_reasons;
 reason | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec | ytd
--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
 0      |   1 |     |   2 |     |     |     |     |   1 |     |   2 |   2 |     |   8
 1      |   1 |     |     |     |   1 |   1 |     |   1 |     |   1 |     |     |   5
 2      |     |   1 |     |     |     |   1 |   1 |   2 |     |   1 |     |   1 |   7

It works… but it’s ugly

Okay that’s pretty good but that is some ugly SQL. Lets clean up with some Common Table Expressions:

create view rejection_reasons as
  with

  monthly as (
    select      count(rejections.id),
                reason,
                date_part('month', rejections.created_at) as month
    from        rejections
    where       date_part('year', rejections.created_at) = date_part('year', current_date)
    group by    reason,
                date_part('month', rejections.created_at)
  ),

  yearly as (
    select      count(rejections.id),
                reason,
                13 as month
    from        rejections
    where       date_part('year', rejections.created_at) = date_part('year', current_date)
    group by    reason;
  )

  select * from monthly
  union
  select * from yearly;

It’s not a big change, but it helps to give meaning and structure to the statement.

Lets use it.

Now that we’ve got our report, lets see how we can use it nicely in rails.

Create a Report model that is backed by a view:

class RejectionReasonReport < ActiveRecord::Base
  self.table_name = 'report_rejection_reasons'

  def each_column
    %i(jan feb mar apr may jun jul aug sep oct nov dec ytd).map do |col|
      yield public_send(col)
    end
  end

  def title
    %w(Spam Boring Mansplaining)[reason_id.to_i]
  end
end

Create a controller that loads the data

class ReportsController < ApplicationController

  def rejection_reasons
    @report = RejectionReasonReport.all
  end

end

Create a view that displays it for us:

<table>
  <thead>
    <th></th>
    <th>Jan</th>
    <th>Feb</th>
    <th>Mar</th>
    <th>Apr</th>
    <th>May</th>
    <th>Jun</th>
    <th>Jul</th>
    <th>Aug</th>
    <th>Sep</th>
    <th>Oct</th>
    <th>Nov</th>
    <th>Dec</th>
    <th>YTD</th>
  </thead>
  <tbody>
    <% @report.each do |row| %>
      <tr>
        <td><%= row.title %></td>
        <% row.each_column do |count| %>
          <td><%= count %></td>
        <% end %>
      </tr>
    <% end %>
  </tbody>
</table>

rejections by reason