Try Now
Get hands on with Cobalt's PtaaS Platform

ActiveRecord’s Ghost Queries — a sneak attack on your DB performance

ORMs are awesomely dangerous to your DB health. One that has caught my attention lately is Ruby on Rail's ActiveRecord. Let's explore how…

ORMs are awesomely dangerous to your DB health. One that has caught my attention lately is Ruby on Rail’s ActiveRecord. Let’s explore how ActiveRecord represents models and communicates with databases through a simple example. Here is a table representing people.

Person
-------------
person_id (primary_key)
family_id (foreign_key references family.family_id)
school_id (foreign_key references school.school_id)
name
age

Representing this table as a ruby object with ActiveRecord is as simple as:

class Person < ActiveRecord::Base  
  belongs_to :family
  belongs_to :school
end

Notice the belongs_to attribute. That is ActiveRecord magic for describing the foreign key relationship of family to person.

Now if a developer wants to gather information about people; commonly, this would happen:

people = Person.all
for person in people do
  family_name = person.family.name
  school_state = person.school.state
end

This is where ActiveRecord works its Ghost Queries. These are Ghost Queries because the ActiveRecord paradigm guides developers into calling these helper methods without the caution of performance. When the developer uses person.family they are actually calling a method that issues a SQL select statement instead of just pulling an attribute from memory of the prefetched record. Unlike other languages, when calling a method in ruby, developers are encouraged to leave off the parens. Transforming what would be person.family() in many languages to person.family. This distinction is important because without the parens a developer may believe that family is an attribute on the person and not a foreign relationship that requires a lookup call.

Who Gives?

So… why even bother to write about this? In most every Rails app, code patterns like the above can be found; however, many instances go unnoticed because of the small amount of data flowing through these code paths.

But imagine the performance costs at scale:

If the line from above, Person.all gathered 1000 records then the code would make 2000 queries (1000 from person.family.name and 1000 from person.school.state). Referencing the example from above:

people = Person.all # gets 1000 records
for person in people do
  family_name = person.family.name # will run 1000 times
  school_state = person.school.state # will run 1000 times
end

person.family.name runs this select:

select * from family where family_id = <person.family_id>;

Granted some of the queries would be cached (if two people came from the same family or school). This is still 2001 queries, even if they aren’t all unique. This is where your efficiency radar should be deafening.

Think if we used the ActiveRecord paradigm a bit smarter. We could use the belongs_to relationship where it makes sense — like when looking up one or two records. Say we have a thousand records though, let’s not exercise our database for no reason. Let’s be lazy and make the fewest amount of queries:

people = Person.all

# Gather all family ids
family_ids = people.map { |person| person.family_id }

# Make one query for all the families
Family.find(family_ids)

To put a few numbers into the mix on why this matters at scale here is some Postgres analysis.

SELECT family.* FROM family WHERE family.family_id = 149;

Planning time: 0.112 ms
Execution time: 0.088 ms

Running that query with 1000 different family_id‘s we would have a run time of 1000 * .120ms = 120ms

Even though in queries aren’t the fastest this is still 30 times faster:

SELECT  family.* FROM family WHERE family_id in (1, 2, 3, ...# all the way to 1000);

Planning time: 2.661 ms
Execution time: 1.464 ms

Total run time of this is about 4.1ms versus the 120ms alternative. Multiply the scale and the performance difference yielded by the two approaches is glaring.

Don’t be a victim, use your tools properly

ActiveRecord is a powerful tool and as such is packed with ways to shoot yourself in the foot. Make sure you know how ActiveRecord is querying your database. Chances are if you have a Rails app with some data you are probably a victim of Ghost Queries.

Back to Blog
About Cameron Clifford
Cameron Clifford works as a senior engineering manager for Cresta where he helps the team deploy secure code. With over a decade's experience in the cybersecurity sector, Cameron offers a plethora of expertise for cybersecurity best practices. More By Cameron Clifford
Cobalt’s Code-Assisted Pentests
Learn more about Cobalt's code-assisted pentests.
Blog
Apr 1, 2022