App Query: when SQL is all you need

I recently found myself working on a fairly extensive data export for a Rails application. My first stab was to take a builder-template, a collection of records and a sprinkle of helper-functions to generate the required XML.

This approach quickly hit its limits: it was hard to get rid of N+1 queries, the performance was abysmal due to both inefficient sql and the sheer volume of AR-instances that were instantiated. And Arel, while a great abstraction otherwise, made it hard to identify the bottlenecks. Meanwhile I had a nagging feeling that "this shouldn't be too hard in raw SQL"...

In the end, I chose to disregard the sunk cost and refactored to raw SQL. The result was a blazingly fast export, eliminating the intermittent timeouts that some customers experienced. Additionally, the process gave me a deeper understanding of the data model and a new appreciation for PostgreSQL functions — many of which made existing Ruby helper methods unnecessary.

In the process I created App Query, a gem designed to make integrating raw SQL with Active Record simpler and more enjoyable.

Let's take a closer look.

Tinkering

Let's take a query that returns accounts for an organization.
When tinkering on the console, you'd instantiate it like so:

q = AppQuery(<<~SQL)
  SELECT *
  FROM accounts
  WHERE
    accounts.organization_id = :org
SQL

Then to retrieve results (i.e., a collection of maps):

q.select_all(binds: {org: 2}).entries
#=> [{"id" => 1234,
#     "organization_id" => 2,
#     ...},
#     ...]

Just like when using ActiveRecord::Base.connection, there's also select_one and select_value. Though unlike ActiveRecord, all these methods will cast the results by default, e.g. this works:

AppQuery(%{select array[1,2]}).select_value
#=> [1,2]

While AR forces you to go the select_all-route:

ActiveRecord::Base.connection.select_all("select array[1,2]").cast_values.first

With AppQuery, additional castings can be provided as well:

cast = {"date" => ActiveRecord::Type::Date.new}
AppQuery(%{select '2024-1-1' as date}).select_value(cast:)
=> Mon, 01 Jan 2024

Especially ideal for SQLite that doesn't have a notion of e.g. dates and timestamps.

After the exploratory phase, this query can live in a separate file, e.g. app/queries/find_accounts.sql and then be instantiated like so: AppQuery[:find_accounts].

Templating

When it comes to parameterization, binds will only get you sofar. ERB templating gives a lot more power.

Say, for example, we want the above query to be suitable for admins: they can select accounts across organizations. A naief approach using binds would be to interpret {org: nil} as "accounts from any organization":

SELECT *
FROM accounts
WHERE
  :org IS NULL
  OR accounts.organization_id = :org

But treating "I forgot to pass a value" the same as "grant me ultimate power" is... let's say, optimistic security design. Templating makes it secure and explicit:

SELECT *
FROM accounts
<% if @admin -%>
WHERE :org IS NULL
  OR accounts.organization_id IN (:org)
<% else -%>
WHERE :org IS NOT NULL
  AND accounts.organization_id = :org
<% end -%>

Query instances are immutable so 'altering' a query, gets you a fresh copy. This way we can safely branch off an admin variant from the same base:

q = AppQuery[:find_accounts]
admin_q = q.render(admin: true)

# admins: accounts for all orgs
admin_q.select_all(binds: {org: nil})
# ...or multiple specific orgs
admin_q.select_all(binds: {org: [1,2]})

# non-admins: one specific org
q.select_all(binds: {org: 2})
# anything else won't work:
q.select_all # => []
q.select_all(binds: {org: [1,2]}) # => error

App Query ❤️ CTEs

While writing queries in pure SQL offers a lot of power, the result can easily become dense and thus difficult to understand and maintain. Splitting a query up into CTEs (aka WITH-clauses) tremendously increases the maintainability of a query.

Say for example we are only interested in accounts from organizations of a specific size and accounts that were recently active:

WITH big_orgs AS (
  -- some logic determining big organizations
  ...
),
active_accounts AS (
  -- some logic defining activeness
)

SELECT *
FROM active_accounts accounts
WHERE
  accounts.organization_id = ANY(select id from big_orgs)

Instead of having the concepts of 'active account' and 'big org' buried in the query, these are now explicit, separate building blocks.

It's not necessarily as performant as one big query, but it's more readable.

But verifying the result of the query is still an end-to-end test: is an account not included in the end result because it's not from one of the big orgs or is it filtered out based on the fact it wasn't used recently? Not too difficult if you have only two of these concepts, but chain a couple more and debugging can become seriously time consuming.

Hence, it made me wonder: would it be possible to query a CTE in isolation? — allowing the result to be inspected from the console and have CTE-specific tests?

Turns out that parsing WITH clauses isn't all that hard, and so App Query allows you to do exactly that:

q = AppQuery[:find_accounts]

# query an existing CTE
q.select_all("select id,name from big_orgs")

# Query the end-result by referring to :_
big_spenders = q.select_all("select * from :_ where big_spender")

# this can be repeated further down the chain:
big_spenders.select_value("select count(*) from :_")

The latter is so common that it's available as #count. Similarly, #first, #column, #any? and #none? use SQL-rewriting to fetch only what's needed — unlike their ActiveRecord::Result equivalents which fetch all rows first.

You can rewrite the query in other ways, e.g. by redefining or adding CTEs.

For example, for testing purposes, we can mock some dummy data for accounts by adding it as a CTE:

q2 = q.prepend_cte(<<~ACCOUNTS)
  accounts(id, organization_id, last_sign_in_at) AS(
    VALUES(1, 1, now() - '2 days'::interval),
          (2, 1, now() - '15 days'::interval),
          (3, 2, now() - '5 days'::interval)
  )
ACCOUNTS

Given the mocked accounts we can now explore what the logic for the active_accounts CTE could look like:

q2.select_all(<<~SQL, binds: {sign_in_since: 10.days.ago}).entries
  select *
  from accounts
  where last_sign_in_at >= :sign_in_since
SQL

When going past the exploratory phase on the console, you would be using real tables and properly generated data. But the above code could easily be re-used in the tests.

Conclusion

App Query gives you the full expressiveness of SQL in your Rails application and might be the right fit when you find yourself fighting SQL abstractions.
By enabling CTE inspection and ERB templating, it transforms what could be unwieldy SQL into a maintainable, testable query.

App Query is available on GitHub and RubyGems. Give it a try on your data-heavy Rails application!

Don't be eval

App Query: when SQL is all you need

Tinkering

Templating

App Query ❤️ CTEs

Conclusion