ORMs are awesomely dangerous to your DB health. One that has caught my attention lately is Ruby on Rail’s ActiveRecord. Let’s explore how ActiveRecord represents models and communicates with databases through a simple example. Here is a table representing people.
Person
-------------
person_id (primary_key)
family_id (foreign_key references family.family_id)
school_id (foreign_key references school.school_id)
name
age
Representing this table as a ruby object with ActiveRecord is as simple as:
class Person < ActiveRecord::Base
belongs_to :family
belongs_to :school
end
Notice the belongs_to
attribute. That is ActiveRecord magic for describing the foreign key relationship of family
to person
.
Now if a developer wants to gather information about people; commonly, this would happen:
people = Person.all
for person in people do
family_name = person.family.name
school_state = person.school.state
end
This is where ActiveRecord works its Ghost Queries. These are Ghost Queries because the ActiveRecord paradigm guides developers into calling these helper methods without the caution of performance. When the developer uses person.family
they are actually calling a method that issues a SQL select statement instead of just pulling an attribute from memory of the prefetched record. Unlike other languages, when calling a method in ruby, developers are encouraged to leave off the parens. Transforming what would be person.family()
in many languages to person.family
. This distinction is important because without the parens a developer may believe that family
is an attribute on the person
and not a foreign relationship that requires a lookup call.
Who Gives?
So… why even bother to write about this? In most every Rails app, code patterns like the above can be found; however, many instances go unnoticed because of the small amount of data flowing through these code paths.
But imagine the performance costs at scale:
If the line from above, Person.all
gathered 1000 records then the code would make 2000 queries (1000 from person.family.name
and 1000 from person.school.state
). Referencing the example from above:
people = Person.all # gets 1000 records
for person in people do
family_name = person.family.name # will run 1000 times
school_state = person.school.state # will run 1000 times
end
person.family.name
runs this select:
select * from family where family_id = <person.family_id>;
Granted some of the queries would be cached (if two people came from the same family or school). This is still 2001 queries, even if they aren’t all unique. This is where your efficiency radar should be deafening.
Think if we used the ActiveRecord paradigm a bit smarter. We could use the belongs_to
relationship where it makes sense — like when looking up one or two records. Say we have a thousand records though, let’s not exercise our database for no reason. Let’s be lazy and make the fewest amount of queries:
people = Person.all
# Gather all family ids
family_ids = people.map { |person| person.family_id }
# Make one query for all the families
Family.find(family_ids)
To put a few numbers into the mix on why this matters at scale here is some Postgres analysis.
SELECT family.* FROM family WHERE family.family_id = 149;
Planning time: 0.112 ms
Execution time: 0.088 ms
Running that query with 1000 different family_id
‘s we would have a run time of 1000 * .120ms = 120ms
Even though in
queries aren’t the fastest this is still 30 times faster:
SELECT family.* FROM family WHERE family_id in (1, 2, 3, ...# all the way to 1000);
Planning time: 2.661 ms
Execution time: 1.464 ms
Total run time of this is about 4.1ms
versus the 120ms
alternative. Multiply the scale and the performance difference yielded by the two approaches is glaring.
Don’t be a victim, use your tools properly
ActiveRecord is a powerful tool and as such is packed with ways to shoot yourself in the foot. Make sure you know how ActiveRecord is querying your database. Chances are if you have a Rails app with some data you are probably a victim of Ghost Queries.