test_data
does what it says on the tin: it provides a fast & reliable system
for managing your Rails application's test data.
The gem serves as both an alternative to fixtures & factory_bot, as well a broader workflow for building test suites that will scale gracefully as your application grows in size and complexity.
What it does:
-
Establishes a fourth Rails environment (you can define custom Rails environments!) named
test_data
, which you'll use to create a universe of data for your tests by simply running and using your application. No Ruby DSL, no YAML files, no precarious approximations of realism: real data created by your app -
Exposes a simple API for ensuring that your data will be pristine for each of your tests, whether the test depends on test_data, factories, fixtures, or an empty database
-
Safeguards your tests from flaky failures and supercharges your build by providing a sophisticated transaction manager that isolates each test while ensuring your data is only loaded once
If you've despaired over the seeming inevitability that all Rails test suites will eventually grow to become slow, flaky, and incomprehensible, then this gem is for you! And even if you're a factory_bot fan, we hope you'll be open to the idea that there might be a better way.
[Full disclosure: because the gem is still brand new, it makes a number of assumptions—chief among them being that Postgres & Rails 6+ are required—so it may not work for every project just yet.]
This gem requires a lot of documentation—not because test_data
does a lot of
things, but because managing one's test data is an inherently complex task. If
there's one reason Rails apps often suffer from slow tests, it's that the most
popular approaches to test data management oversimplify the problem—they might
save time up front, but tend to cost you later. The test_data
gem, meanwhile,
is designed to tackle the problem head on: it takes longer to set up, but it'll
scale along with your application for years to come.
- Getting Started Guide
- Factory & Fixture Interoperability Guide
- Rake Task Reference
- API Reference
- Assumptions
- Fears, Uncertainties, and Doubts (Q & A)
- But we're already happy with factory_bot!
- How will we handle merge conflicts in the schema dumps?
- Why can't I manage different SQL dumps for different scenarios?
- These SQL dumps are way too large to commit to git!
- Tests shouldn't rely on shared test data if they don't need to
- My tests aren't as fast as they should be
- Code of Conduct
- Changelog
- MIT License
This guide will walk you through setting up test_data
in your application. You
might notice that it's more complicated than installing a gem and declaring some
default Widget
attributes! The hard truth is that designing robust and
reliable test data is an inherently complex problem and takes some thoughtful
planning. There are plenty of shortcuts available, but experience has shown they
tend to collapse under their own weight as your app scales and your team
grows—exactly when having a suite of fast & reliable tests is most valuable.
And if you get stuck or need help as you're getting started, please feel free to ask us for help!
Before even installing anything, it's important to understand that because
test_data
defines a new Rails environment and because Rails expects
a gem group (like :development
, :test
, and :production
) for each
environment, any gems we want to be available to the test_data
gem need to be
installed with a :test_data
group.
Since the test_data
environment is designed to be used similarly to
development
(i.e. with a running server and interacting via a browser), any
gems in your :development
gem group should likely be included in a
:test_data
gem group as well.
For example, this:
group :development, :test do
gem "standard"
gem "cypress-rails"
gem "test_data"
end
Should have its first line changed to:
group :development, :test, :test_data do
Finally, add test_data
to your Gemfile and bundle install
it:
gem "test_data"
The gem ships with a number of Rake tasks, including
test_data:install, which will generate the necessary
configuration and initialize a test_data
database:
$ bin/rake test_data:install
This should output something like:
create config/environments/test_data.rb
create config/initializers/test_data.rb
insert config/database.yml
insert config/webpacker.yml
insert config/webpacker.yml
Created database 'yourappname_test_data'
set_config
------------
(1 row)
Your test_data environment and database are ready for use! You can now run
your server (or any command) to create some test data like so:
$ RAILS_ENV=test_data bin/rails server
The purpose of the test_data
database is to provide a sandbox in which you
will manually generate test data by playing around with your app. Rather than
try to imitate realistic data using factories and fixtures (a task which only
grows more difficult as your models and their associations increase in
complexity), your test data will always be realistic because your real
application will have created it!
Now comes the fun part! It's time to start up your server in the new environment and create some records by interacting with your system.
To run your server against the new test_data
database, set the RAILS_ENV
environment variable:
$ RAILS_ENV=test_data bin/rails server
[If you're using webpacker, you may also
need to start its development server as well with RAILS_ENV=test_data bin/webpack-dev-server
]
Because test_data
creates a full-fledged Rails environment, you can run any
number of Rails commands or Rake tasks against its database by setting
RAILS_ENV=test_data
, either in your shell environment or with each command
(e.g. RAILS_ENV=test_data bin/rake db:migrate
)
[Aside: If you experience any hiccups in getting your server to work, please
open an issue and let us
know—it may present an opportunity for us to improve the test_data:configure
task!]
Once the app is running, it's time to generate some test data. You'll know how to accomplish this step better than anyone—it's your app, after all!
A few bits of advice click & type some test data into existence:
- Spend a little time thoughtfully navigating each feature of your app in order
to generate enough data to be representative of what would be needed to test
them (e.g. one
User
per role, one of each kind ofOrder
, etc.) - Less is more: the less test data you create, the more meaningful & memorable
it will be to yourself and your teammates when writing tests. Don't keep
adding test data unless it will allow you to exercise additional application
code (e.g. enough
Project
models to require pagination, but not hundreds of them for the sake of looking "production-like") - Memorable names can become memes for the team to quickly recall and reference later (if the admin user is named "Angela" and the manager is "Maria", that'll probably serve you better than generic names like "TestUser #1")
If you make a mistake when creating your initial set of test data, it's perfectly okay to reset the database and start over! Your future tests will be coupled to this data as your application grows and evolves, so it's worth taking the time to ensure the foundation is solid. (But that's not to say everything needs to be perfect; you can always change things or add more data later—you'll just have to update your tests accordingly.)
Once you've created a good sampling of test data by interacting with your app,
the next step is to flush it from the test_data
database to SQL files. These
database dumps are meant to be committed to source control and versioned
alongside your tests over the life of the application. Additionally, they are
designed to be incrementally
migrated over time, just like you
migrate production database with every release.
Once you have your test data how you want it, dump the schema and data to SQL
files with the test_data:dump
Rake task:
$ bin/rake test_data:dump
This will dump three files into test/support/test_data
:
-
schema.sql
- Schema DDL used to (re-)initialize thetest_data
environment database for anyone looking to update your test data -
data.sql
- The test data itself, exported as a bunch of SQLINSERT
statements, which will be executed by your tests to load your test data -
non_test_data.sql
- Data needed to run thetest_data
environment, but which shouldn't be inserted by your tests (thear_internal_metadata
andschema_migrations
tables, by default; seeconfig.non_test_data_tables
)
You probably won't need to, but these paths can be overridden with TestData.config method. Additional details can also be found in the test_data:dump Rake task reference.
Once you've made your initial set of dumps, briefly inspect them and—if everything looks good—commit them. (And if the files are gigantic or full of noise, you might find these ideas helpful).
Does it feel weird to dump and commit SQL files? That's okay! It's healthy to
be skeptical
whenever you're asked to commit a generated file! Remember that the test_data
environment exists only for creating your test data. Your tests will, in turn,
load the SQL dump of your data into the test
database, and things will proceed
just as if you'd been loading Rails' built-in
fixtures
from a set of YAML files.
Now that you've dumped the contents of your test_data
database, you can start
writing tests that rely on this test data.
To accomplish this, you'll likely want to add hooks to run before each test to put the database into whatever state the test needs.
For the simplest case—ensuring your test data is loaded into the test
database
and available to your test, you'll want to call the
TestData.uses_test_data method at the beginning of
the test. The first time uses_test_data
is called, test_data
will start a
transaction and insert your test data. On subsequent calls to uses_test_data
by later tests, the transaction will be rolled back to a save point taken just
after the data was initially loaded, so that each test gets a clean starting
point without repeatedly executing the expensive SQL operation.
If, for the sake of consistency & simplicity you want every single Rails-aware test to have access to your test data, you can accomplish this with a single global before-each hook.
If you're using Rails' default
Minitest, you can load it in a setup
hook in ActiveSupport::TestCase
:
class ActiveSupport::TestCase
setup do
TestData.uses_test_data
end
end
Likewise, if you use RSpec, you can accomplish the same
thing with global before(:each)
hook in your rails_helper.rb
file:
RSpec.configure do |config|
config.before(:each) do
TestData.uses_test_data
end
end
Of course, for simple units of code, it may be more prudent to manually create
the test data they need inline as opposed to relying on a shared source of test
data. For these tests, you can call
TestData.uses_clean_slate in a setup
hook.
For the best performance, you might consider a mode-switching method that's invoked at the top of each test listing like this:
class ActiveSupport::TestCase
def self.uses(mode)
case mode
when :clean_slate
setup { TestData.uses_clean_slate }
when :test_data
setup { TestData.uses_test_data }
else
raise "Invalid test data mode: #{mode}"
end
end
end
# A simple model that will `create` its own data
class WidgetTest < ActiveSupport::TestCase
uses :clean_slate
# …
end
# An integrated test that depends on a lot of data
class KitchenSinkTest < ActionDispatch::IntegrationTest
uses :test_data
# …
end
Or, with RSpec:
module TestDataModes
def uses(mode)
case mode
when :clean_slate
before(:each) { TestData.uses_clean_slate }
when :test_data
before(:each) { TestData.uses_test_data }
else
raise "Invalid test data mode: #{mode}"
end
end
end
RSpec.configure do |config|
config.extend(TestDataModes)
end
RSpec.describe Widget, type: :model do
uses :clean_slate
# …
end
RSpec.describe "Kitchen sink", type: :request do
uses :test_data
# …
end
But wait, there's more! If your test suite switches between multiple modes from
test-to-test, it's important to be aware of the marginal cost between each of
those tests. For example, two tests in a row that call TestData.uses_test_data
only need a simple rollback as test setup, but a TestData.uses_test_data
followed by a TestData.uses_clean_slate
requires a rollback, a truncation, and
another savepoint. These small costs add up, so consider speeding up your
build by grouping your
tests into sub-suites based on their source of test data.
If you're adding test_data
to an existing application, it's likely that you
won't be able to easily adopt a one-size-fits-all approach to test setup across
your entire suite. Some points of reference, if that's the situation you're in:
- If your test suite is already using fixtures or factories and the above hooks just broke everything, check out our interoperability guide for help.
- If you need to make any changes to the data after it's loaded, truncated, or after Rails fixtures are loaded, you can configure lifecycle hooks that will help you achieve a very fast test suite by including those changes inside the transaction savepoints
- If you don't want
test_data
managing transactions and cleanup for you and just want to load the SQL dump, you can call TestData.insert_test_data_dump - For more information on how all this works, see the API reference.
Your app relies on its tests and your tests rely on their test data. This
creates a bit of a paradox: creating & maintaining test data is literally a
tertiary concern but simultaneously an inescapable responsibility that will live
with you for the life of your application. That's true whether you use this gem,
factory_bot
, Rails fixtures, or something else as a source of shared test
data.
Fortunately, we already have a fantastic tool available for keeping our
test_data
database up-to-date over the life of our application: Rails
migrations. If
your migrations are resilient enough for your production database, they should
also be able to keep your test_data
database up-to-date. (As a happy side
effect of running your migrations against your test data, this means your
test_data
database may help you identify hard-to-catch migration bugs early,
before being deployed to production!)
Whenever you create a new migration or add a major feature, you'll probably need to update your test data. Here's how to do it:
-
If the current SQL dumps in
test/support/test_data
are newer than your localtest_data
database:-
Be sure there's nothing in your local
test_data
database that you added intentionally and forgot to dump, because it's about to be erased -
Run
rake test_data:reinitialize
drop and recreate thetest_data
database and load the latest SQL dumps into it -
Run any pending migrations with
RAILS_ENV=test_data bin/rake db:migrate
-
If you need to create any additional data, start up the server (
RAILS_ENV=test_data bin/rails s
), just like in Step 2 -
Export your newly-updated
test_data
database withrake test_data:dump
-
Ensure your tests are passing and then commit the resulting SQL files
-
-
If the local
test_data
database is already up-to-date with the current SQL dumps, follow steps 3 through 6 above
It's important to keep in mind that your test data SQL dumps are a shared,
generated resource among your team (just like a structure.sql
or schema.rb
file). As a result, if your team doesn't integrate code frequently or if the
test data changes frequently, you'd be right to be concerned that the resulting
merge conflicts could become
significant,
so sweeping changes should be made deliberately and in collaboration with other
contributors.
[Aside: some Rails teams are averse to using migrations to migrate data as well
as schemas, instead preferring one-off scripts and tasks. You'll have an easier
time of things if you use migrations for both schema and data changes. Here are
some notes on how to write data migrations
safely.
Otherwise, you'll need to remember to run any ad hoc deployment scripts against
your test_data
Rails environment along with each of your other deployed
environments.]
Let's be real, most Rails apps already have some tests, and most of those test
suites will already be relying on
factory_bot or Rails' built-in
test
fixtures.
While test_data
is designed to be an alternative to both of these approaches
to managing your test data, it wouldn't be practical to ask a team to rewrite
all their existing tests in order to migrate to a different tool. That's why the
test_data
gem goes to great lengths to play nicely with your existing tests,
while ensuring each test is wrapped in an isolated and fast always-rolled-back
transaction—regardless if the test depends on test_data
, factories, fixtures,
all three, or none-of-the-above.
This section will hopefully make it a little easier to incorporate new
test_data
tests into a codebase that's already using factory_bot
and/or
Rails fixtures, whether you choose to incrementally migrate to using test_data
over time.
This section will document some thoughts and strategies for introducing
test_data
to a test suite that's already using factory_bot
.
Depending on the assumptions your tests make about the state of the database before you've loaded any factories, it's possible that everything will "just work" after adding TestData.uses_test_data in a before-each hook (as shown in the setup guide). So by all means, try running your suite after following the initial setup guide and see if the suite just passes.
If you find that your test suite is failing after adding
TestData.uses_test_data
to your setup, don't panic! Test failures are most
likely caused by the combination of your test_data
SQL dump with the records
inserted by your factories.
One approach would be to attempt to resolve each such failure one-by-one—usually
by updating the offending factories or editing your test_data
database to
ensure they steer clear of one another. Care should be taken to preserve the
conceptual encapsulation of each test, however, as naively squashing errors
risks introducing inadvertent coupling between your factories and your
test_data
data such that neither can be used independently of the other.
Another approach that the test_data
gem provides is an additional mode with
TestData.uses_clean_slate
, which—when called at the top of a factory-dependent
test—will ensure that the tables that test_data
had written to will be
truncated, allowing the test to create whatever factories it needs without fear
of conflicts.
class AnExistingFactoryUsingTest < ActiveSupport::Testcase
setup do
TestData.uses_clean_slate
# pre-existing setup
end
# …
end
If you have a lot of tests, you can find a more sophisticated approaches for logically switching between types of test data declaratively above in the getting started section
While Rails
fixtures
are similar to factories, the fact that they're run globally by Rails and
permanently committed to the test database actually makes them a little trickier
to work with. This section will cover a couple approaches for integrating
test_data
into suites that use fixtures.
It's more likely than not that all your tests will explode in dramatic fashion
as soon as you add TestData.uses_test_data
to a setup
or before(:each)
hook. Typically, your fixtures will be loaded and committed immediately with
your test_data
dump inserted afterward, which makes it exceedingly likely that
your tests will fail with primary key and unique constraint conflicts. If that's
the case you find yourself in, test_data
provides an API that overrides
Rails' built-in fixtures behavior with a monkey patch.
And if that bold text wasn't enough to scare you off, here's how to do it:
-
Before your tests have loaded (e.g. near the top of your test helper), call: TestData.prevent_rails_fixtures_from_loading_automatically! This will patch Rails' setup_fixtures and effectively render it into a no-op, which means that your test fixtures will not be automatically loaded into your test database
-
In tests that rely on your
test_data
dump, call TestData.uses_test_data as you normally would. Because your fixtures won't be loaded automatically, they won't be available to these tests -
In tests that need fixtures, call TestData.uses_rails_fixtures(self) in a before-each hook. This will first ensure that any tables written to by
test_data
are truncated (as withTestData.uses_clean_slate
) before loading your Rails fixtures
For example, you might add the following to an existing fixtures-dependent test to get it passing:
class AnExistingFixtureUsingTest < ActiveSupport::Testcase
setup do
TestData.uses_rails_fixtures(self)
# pre-existing setup
end
# …
end
If you've adopted a mode-switching helper method like the one described above, you could of course add a third mode to cover any tests that depend on Rails fixtures.
A meta-task that runs test_data:configure and test_data:initialize.
This task runs several generators:
-
config/environments/test_data.rb
- As you may know, Rails ships withdevelopment
,test
, andproduction
environments defined by default. But you can actually define custom environments, too! This gem adds a newtest_data
environment and database that's intended to be used to create and dump your test data. This new environment file loads yourdevelopment
environment's configuration and disables migration schema dumps so that you can run migrations against yourtest_data
database without affecting your app'sschema.rb
orstructure.sql
. -
config/initializers/test_data.rb
- Creates an initializer for the gem that calls TestData.config with an empty block and comments documenting the currently-available options and their default values -
config/database.yml
- This generator adds a newtest_data
section to your database configuration, named with the same scheme as your other databases (e.g.your_app_test_data
). If your configuration resembles Rails' generateddatabase.yml
and has a working&default
alias, then this should "just work" -
config/webpacker.yml
- The gem has nothing to do with web assets, but webpacker will display some prominent warnings or errors if it is loaded without a configuration entry for the currently-running environment, so this generator defines an alias based on yourdevelopment
config and then definestest_data
as extending it -
config/secrets.yml
- If your app still uses (the now-deprecated) secrets.yml file introduced in Rails 4.1, this generator will ensure that thetest_data
environment is accounted for with a generatedsecret_key_base
value. If you have numerous secrets in this file'sdevelopment:
stanza, you may want to alias and inherit it intotest_data:
like thewebpacker.yml
generator does -
config/cable.yml
- Simply defines atest_data:
entry that tells ActionCable to use theasync
adapter, since that's also the default fordevelopment
This task will verify that your configuration appears to be valid by checking with each of the gem's generators to inspect your configuration files, and will error whenever a configuration problem is detected.
This task gets your local test_data
database up-and-running, either from a set
of dump files (if they already exist), or by loading your schema and running
your seed file. Specifically:
-
Creates the
test_data
environment's database, if it doesn't already exist -
Ensures the database is non-empty to preserve data integrity (run test_data:drop_database first if you intend to reinitialize it)
-
Checks to see if a dump of the database already exists (by default, stored in
test/support/test_data/
)-
If dumps do exist, it invokes test_data:load to load them into the database
-
Otherwise, it invokes the task
db:schema:load
anddb:seed
(similar to Rails' built-indb:setup
task)
-
This task is designed for the situation where you may already have a test_data
database created and simply want to drop it and replace it with whatever dumps
are in the test/support/test_data
directory.
Dropping the database requires confirmation, either interactively or by setting
the environment variable TEST_DATA_CONFIRM
. It will additionally warn you in
the event that the local database appears to be newer than the dumps on disk
that would replace it. From there, this task behaves the same way as rake test_data:initialize
.
This task is designed to be run after you've created or updated your test data
in the test_data
database and you're ready to run your tests against it. The
task creates several plain SQL dumps from your test_data
environment's
database:
-
A schema-only dump, by default in
test/support/test_data/schema.sql
-
A data-only dump of records you want to be loaded in your tests, by default in
test/support/test_data/data.sql
-
A data-only dump of records that you don't want loaded in your tests in
test/support/test_data/non_test_data.sql
. By default, this includes Rails' internal tables:ar_internal_metadata
andschema_migrations
, configurable with TestData.config'snon_test_data_tables
Each of these files are designed to be committed and versioned with the rest of your application. TestData.config includes several options to control this task.
This task will load your SQL dumps into your test_data
database by:
-
Verifying the
test_data
environment's database is empty (creating it if it doesn't exist and failing if it's not empty) -
Verifying that your schema, test data, and non-test data SQL dumps can be found at the configured paths
-
Loading the dumps into the
test_data
database -
Warning if there are pending migrations that haven't been run yet
If there are pending migrations, you'll probably want to run them and then dump & commit your test data so that they're up-to-date:
$ RAILS_ENV=test_data bin/rake db:migrate
$ bin/rake test_data:dump
This task will create the test_data
environment's database if it does not
already exist. It also
enhances Rails'
db:create
task so that test_data
is created along with development
and
test
whenever rake db:create
is run.
This task will drop the test_data
environment's database if it exists. It also
enhances Rails' db:drop
task so that test_data
is dropped along with
development
and test
whenever rake db:drop
is run.
This is the method designed to be used by your tests to load your test data
into your test
database so that your tests can rely on it. Typically, you'll
want to call it at the beginning of each test that relies on the test data
managed by this gem—most often, in a before-each hook.
For the sake of speed and integrity, TestData.uses_test_data
is designed to
take advantage of nested transactions (Postgres
savepoints). By
default, data is loaded in a transaction and intended to be rolled back to the
point immediately after the data was imported between tests. This way, your
test suite only pays the cost of importing the SQL file once, but each of your
tests can enjoy a clean slate that's free of data pollution from other tests.
(This is similar to, but separate from, Rails fixtures'
use_transactional_tests
option.)
See configuration option: config.after_test_data_load
If a test does not rely on your test_data
data, you can instead ensure that it
runs against empty tables by calling TestData.uses_clean_slate
. Like
TestData.uses_test_data
, this would normally be called at the beginning of
each such test in a before-each hook.
This method works by first ensuring that your test data is loaded (and the correspondent savepoint created), then will truncate all affected tables and create another savepoint. It's a little counter-intuitive that you'd first litter your database with data only to wipe it clean again, but it's much faster to repeatedly truncate tables than to repeatedly import large SQL files.
See configuration options: config.after_test_data_truncate, config.truncate_these_test_data_tables
As described in this README's fixture interop
guide, TestData.uses_rails_fixtures
will load your app's Rails
fixtures
by intercepting Rails' built-in fixture-loading code. As with the other "uses"
methods, you'll likely want to call it in a before-each hook before any test
that needs access to your Rails fixtures.
There are two additional things to keep in mind if using this method:
-
Using this feature requires that you've first invoked TestData.prevent_rails_fixtures_from_loading_automatically! before your tests have started running to override Rails' default behavior before any of your tests have loaded or started running
-
Because the method depends on Rails' fixture caching mechanism, it must be passed an instance of the running test class (e.g.
TestData.uses_rails_fixtures(self)
)
Under the hood, this method effectively ensures a clean slate the same way
TestData.uses_clean_slate
does, except that after creating the truncation
savepoint, it will then load your fixtures and finally create—wait for it—yet
another savepoint that subsequent calls to uses_rails_fixtures
can rollback
to.
See configuration option: config.after_rails_fixture_load
Call this method before any tests have been loaded or executed by your test runner if you're planning to use TestData.uses_rails_fixtures to load Rails fixtures into any of your tests. This method will disable the default behavior of loading your Rails fixtures into the test database as soon as the first test case with fixtures enabled is executed. (Inspect the source for the patch to make sure you're comfortable with what it's doing.)
The generated config/initializers/test_data.rb
initializer will include a call
to TestData.config
, which takes a block that yields a mutable configuration
object (similar to Rails.application.config
). If anything is unclear after
reading the documentation, feel free to review the
initializer and the Config
class themselves.
Want to shift forward several timestamp fields after your test_data
SQL dumps
are loaded into your test database? Need to refresh a materialized view after
your Rails fixtures are loaded? You could do these things after calling
TestData.uses_test_data
and TestData.uses_rails_fixtures
, respectively, but
you'd take the corresponding performance hit in each and every test.
Instead, you can pass a callable or a block and test_data
will execute it just
after performing the associated data operation but just before creating a
transaction savepoint. That way, whenever the gem rolls back between tests, your
hook won't need to be run again.
This is hook is run immediately after TestData.uses_test_data
has loaded your
SQL dumps into the test
database, but before creating a savepoint. Takes a
block or anything that responds to call
.
TestData.config do |config|
# Example: roll time forward
config.after_test_data_load do
Boop.connection.exec_update(<<~SQL, nil, [[nil, Time.zone.now - System.epoch]])
update boops set booped_at = booped_at + $1
SQL
end
end
This is hook is run immediately after TestData.uses_clean_slate
has truncated
your test data, but before creating a savepoint. Takes a block or anything that
responds to call
.
TestData.config do |config|
# Example: pass a callable instead of a block
config.after_test_data_truncate(SomethingThatRespondsToCall.new)
end
This is hook is run immediately after TestData.uses_rails_fixtures
has loaded
your Rails fixtures into the test
database, but before creating a savepoint.
Takes a block or anything that responds to call
.
TestData.config do |config|
# Example: refresh Postgres assets like materialized views
config.after_rails_fixture_load do
RefreshesMaterializedViews.new.call
end
end
The gem provides several options governing the behavior of the test_data:dump Rake task. You probably won't need to set these unless you run into a problem with the defaults.
Your application may have some tables that are necessary for the operation of
the application, but irrelevant or incompatible with you your tests. This data
is still dumped for the sake of being able to restore the database with rake
test_data:load, but will not be loaded when your tests are
running. Defaults to []
, (but will always include ar_internal_metadata
and
schema_migrations
).
TestData.config do |config|
config.non_test_data_tables = []
end
Some tables populated by your application may not be necessary to either its
proper functioning or useful to your tests (e.g. audit logs), so you can save
time and storage by preventing those tables from being dumped entirely. Defaults
to []
.
TestData.config do |config|
config.dont_dump_these_tables = []
end
The path to which the schema DDL of your test_data
database will be written.
This is only used by rake test_data:load when initializing the
test_data
database. Defaults to "test/support/test_data/schema.sql"
.
TestData.config do |config|
config.schema_dump_path = "test/support/test_data/schema.sql"
end
The path that the SQL dump of your test data will be written. This is the dump
that will be executed by TestData.uses_test_data
in your tests. Defaults to
"test/support/test_data/data.sql"
.
TestData.config do |config|
config.data_dump_path = "test/support/test_data/data.sql"
end
The path to which the non_test_data_tables in
your test_data
database will be written. This is only used by rake
test_data:load when initializing the test_data
database.
Defaults to "test/support/test_data/non_test_data.sql"
.
TestData.config do |config|
config.non_test_data_dump_path = "test/support/test_data/non_test_data.sql"
end
By default, when TestData.uses_clean_slate is
called, it will truncate any tables for which an INSERT
operation was
detected in your test data SQL dump. This may not be suitable for every case,
however, so this option allows you to specify which tables are truncated.
Defaults to nil
.
TestData.config do |config|
config.truncate_these_test_data_tables = []
end
The gem outputs its messages to standard output and error by assigning a log
level to each message. Valid values are :debug
, :info
, :warn
, :error
,
:quiet
. Defaults to :info
.
TestData.config do |config|
config.log_level = :info
end
If you just want to insert the test data in your application's SQL dumps without
any of the transaction management or test runner assumptions inherent in
TestData.uses_test_data, then you can call
TestData.insert_test_data_dump
to load and execute the dump.
This might be necessary in a few different situations:
- Running tests in environments that can't be isolated to a single database transaction (e.g. orchestrating tests across multiple databases, processes, etc.)
- You might ant to use your test data to seed pre-production environments with
enough data to exploratory test (as you might do in a
postdeploy
script with your Heroku Review Apps) - Your tests require complex heterogeneous sources of data that aren't a good fit for the assumptions and constraints of this library's default methods for preparing test data
In any case, since TestData.insert_test_data_dump
is not wrapped in a
transaction, when used for automated tests, data cleanup becomes your
responsibility.
The test_data
gem is still brand new and doesn't cover every use case just
yet. Here are some existing assumptions and limitations:
-
You're using Postgres
-
You're using Rails 6 or higher
-
Your app does not require Rails' multi-database support in order to be tested
-
Your app has the binstubs
bin/rake
andbin/rails
that Rails generates and they work (protip: you can regenerate them withrails app:update:bin
) -
Your
database.yml
defines a&default
alias from which to extend thetest_data
database configuration (if your YAML file lacks one, you can always specify thetest_data
database configuration manually)
If you use factory_bot
and all of these are true:
-
Your integration tests are super fast and are not getting significantly slower over time
-
Minor changes to existing factories rarely result in test failures that require unrelated tests to be read & updated to get them passing again
-
The number of associated records generated between your most-used factories are representative of production data, as opposed to generating a sprawling hierarchy of models, as if your test just ordered "one of everything" off the menu
-
Your default factories generate models that resemble real records created by your production application, as opposed to representing the sum-of-all-edge-cases with every boolean flag enabled and optional attribute set
-
You've avoided mitigating the above problems with confusingly-named and confidence-eroding nested factories with names like
:user
,:basic_user
,:lite_user
, and:plain_user_no_associations_allowed
If none of these things are true, then congratulations! You are probably using
factory_bot
to great effect! Unfortunately, in our experience, this outcome
is exceedingly rare, especially for large and long-lived applications.
However, if you'd answer "no" to any of the above questions, just know that
these are the sorts of failure modes the test_data
gem was designed to
avoid—and we hope you'll consider trying it with an open mind. At the same time,
we acknowledge that large test suites can't be rewritten and migrated to a
different source of test data overnight—nor should they be! See our notes on
migrating to test_data
incrementally
How will I handle merge conflicts in these SQL files if I have lots of people working on lots of feature branches all adding to the test_data
database dumps?
In a word: carefully!
First, in terms of expectations-setting, you should expect your test data SQL dumps to churn at roughly the same rate as your schema: lots of changes up front, but tapering off as the application stabilizes.
If your schema isn't changing frequently and you're not running data migrations against production very often, it might make the most sense to let this concern present itself as a real problem before attempting to solve it, as you're likely to find that other best-practices around collaboration and deployment (frequent merges, continuous integration, coordinating breaking changes) will also manage this risk. The reason that the dumps are stored as plain SQL (aside from the fact that git's text compression is very good) is to make merge conflicts with other branches feasible, if not entirely painless.
However, if your app is in the very initial stages of development or you're otherwise making breaking changes to your schema and data very frequently, our best advice is to hold off a bit on writing any integration tests that depend on shared sources of test data (regardless of tool), as they'll be more likely to frustrate your ability to rapidly iterate than detect bugs. Once you you have a reasonably stable feature working end-to-end, that's a good moment to start adding integration tests—and perhaps pulling in a gem like this one to help you.
For the same reason you (probably) don't have multiple production databases: the fact that Rails apps are monolithic and consolidated is a big reason why they're so productive and comprehensible. This gem is not VCR for databases. If you were to design separate test data dumps for each feature, stakeholder, or concern, you'd also have more moving parts to maintain, more complexity to communicate, and more pieces that could someday fall into disrepair.
By having a single test_data
database that grows up with your application just
like production
does—with both having their schemas and data migrated
incrementally over time—your integration tests that depend on test_data
will
have an early opportunity to catch bugs that otherwise wouldn't be found until
they were deployed into a long-lived staging or (gasp!) production environment.
If the dump files generated by test_data:dump
seem massive, consider the
cause:
-
If you inadvertently created more data than necessary, you might consider resetting (or rolling back) your changes and making another attempt at generating a more minimal set of test data
-
If some records persisted by your application aren't very relevant to your tests, you might consider either of these options:
-
If certain tables are necessary for running the app but aren't needed by your tests, you can add them to the
config.non_test_data_tables
configuration array. They'll still be committed to git, but won't loaded by your tests -
If the certain tables are not needed by your application or by your tests (e.g. audit logs), add them to the
config.dont_dump_these_tables
array, and they won't be persisted byrake test_data:dump
-
-
If the dumps are necessarily really big (some apps are complex!), consider looking into git-lfs for tracking them without impacting the size and performance of the git slug. (See GitHub's documentation on what their service supports)
[Beyond these options, we'd also be interested in a solution that filtered data in a more granular way than ignoring entire tables. If you have a proposal you'd be interested in implementing, suggest it in an issue!]
Agreed! Nothing is simpler than calling new
to create an object.
If it's possible to write a test that looks like this, do it. Don't use shared test data loaded from this gem or any other:
def test_exclude_cancelled_orders
good_order = Order.new
bad_order = Order.new(cancelled: true)
user = User.create!(orders: [good_order, bad_order])
result = user.active_orders
assert_includes good_order
refute_includes bad_order
end
This test is simple, self-contained, clearly demarcates the arrange-act-assert phases, and (most importantly) will only fail if the functionality stops working. Maximizing the number of tests that can be written expressively and succinctly without the aid of shared test data is a laudable goal that more teams should embrace.
However, what if the code you're writing doesn't need 3 records in the database, but 30? Writing that much test setup would be painstaking, despite being fully-encapsulated. Long test setup is harder for others to read and understand. And because that setup depends on more of your system's code, it will have more reasons to break as your codebase changes. At that point, you have two options:
-
Critically validate your design: why is it so hard to set up? Does it really require so much persisted data to exercise this behavior? Would a plain old Ruby object that defined a pure function have been feasible? Could a model instance or even a
Struct
be passed to the subject instead of loading everything from the database? When automated testing is saved for the very end of a feature's development, it can feel too costly to reexamine design decisions like this, but it can be valuable to consider all the same. Easy to test code is easy to use code -
If the complex setup is a necessary reality of the situation that your app needs to handle (and it often will be!), then having some kind of shared source of test data to use as a starting point can be hugely beneficial. That's why
factory_bot
is so popular, why this gem exists, etc.
As a result, there is no one-size-fits-all approach. Straightforward behavior that can be invoked with a clear, concise test has no reason to be coupled to a shared source of test data. Meanwhile, tests of more complex behaviors that require lots of carefully-arranged data might be unmaintainable without a shared source of test data to lean on. So both kinds of test clearly have their place.
But this is a pretty nuanced discussion that can be hard to keep in mind when under deadline pressure or on a large team where building consensus around norms is challenging. As a result, leaving the decision of which type of test to write to spur-of-the-moment judgment is likely to result in inconsistent test design. Instead, you might consider separating these two categories into separate test types or suites, with simple heuristics to determine which types of code demand which type of test.
For example, it would be completely reasonable to load this gem's test data for integration tests, but not for basic tests of models, like so:
class ActionDispatch::IntegrationTest
setup do
TestData.uses_test_data
end
end
class ActiveSupport::TestCase
setup do
TestData.uses_clean_slate
end
end
In short, this skepticism is generally healthy, and encapsulated tests that
forego reliance on shared sources of test data should be maximized. For
everything else, there's test_data
.
The test_data
gem was written to enable tests that are not only more
comprehensible and maintainable over the long-term, but also much faster to
run. That said—and especially if you're adding test_data
to an existing test
suite—care should be taken to audit everything the suite does between tests in
order to optimize its overall runtime.
Generally speaking, randomizing the order in which tests run is an unmitigated
win: randomizing helps you catch any unintended dependency between two tests
early, when it's still cheap & easy to fix. However, if your tests use different
sources of test data (e.g. some call TestData.uses_test_data
and some call
TestData.uses_clean_slate
), it's very likely that randomizing your tests will
result in a significantly slower overall test suite. Instead, if you group tests
that use the same type of test data together (e.g. by separating them into
separate suites), you might find profound speed gains.
To illustrate this, suppose you had 5 tests that relied on your test_data
data
and 5 that relied on Rails fixtures. If all of these tests ran in random order
(the default), you might see the following behavior at run-time:
$ bin/rails test test/example_test.rb
Run options: --seed 63999
# Running:
test_data -- loading test_data SQL dump
. fixtures -- truncating tables, loading Rails fixtures
. fixtures -- rolling back to Rails fixtures
. test_data -- rolling back to clean test_data
. fixtures -- truncating tables, loading Rails fixtures
. test_data -- rolling back to clean test_data
. fixtures -- truncating tables, loading Rails fixtures
. test_data -- rolling back to clean test_data
. fixtures -- truncating tables, loading Rails fixtures
. test_data -- rolling back to clean test_data
.
Finished in 2.449957s, 4.0817 runs/s, 4.0817 assertions/s.
10 runs, 10 assertions, 0 failures, 0 errors, 0 skips
So, what can you do to speed this up? The most effective strategy to avoiding this churn is to group the execution of each tests that use each source of test data into sub-suites that are run serially, on e after the other.
- If you're using Rails' defualt Minitest, we wrote a gem called
minitest-suite to accomplish
exactly this. Just declare something like
suite :test_data
orsuite :fixtures
at the top of each test class - If you're using RSpec, the suite option combined with a custom ordering can accomplish this. You might also consider using tags to organize your tests by type, but you'll likely have to run a separate CLI invocation for each to avoid the tests from being interleaved
Here's what the same example would do at run-time after adding minitest-suite:
$ bin/rails test test/example_test.rb
Run options: --seed 50105
# Running:
test_data -- loading test_data SQL dump
. test_data -- rolling back to clean test_data
. test_data -- rolling back to clean test_data
. test_data -- rolling back to clean test_data
. test_data -- rolling back to clean test_data
. fixtures -- truncating tables, loading Rails fixtures
. fixtures -- rolling back to clean fixtures
. fixtures -- rolling back to clean fixtures
. fixtures -- rolling back to clean fixtures
. fixtures -- rolling back to clean fixtures
.
Finished in 2.377050s, 4.2069 runs/s, 4.2069 assertions/s.
10 runs, 10 assertions, 0 failures, 0 errors, 0 skips
By grouping the execution in this way, the most expensive operations will usually only be run once: at the beginning of the first test in each suite.
If you're doing anything repeatedly that's data-intensive in your test setup
after calling one of the TestData.uses_*
methods, that operation is being
repeated once per test, which could be very slow. Instead, you might consider
moving that behavior into a lifecycle hook.
Any code passed to a lifecycle hook will only be executed when data is
actually loaded or truncated and its effect will be included in the
transaction savepoint that the test_data
gem rolls back between tests.
Seriously, appropriately moving data adjustments into these hooks can cut your
test suite's runtime by an order of magnitude.
One of the most likely sources of unnecessary slowness is redundant test cleanup. The speed gained from sandwiching every expensive operation between transaction savepoints can be profound… but can also easily be erased by a single before-each hook calling database_cleaner to commit a truncation of the database. As a result, it's worth taking a little time to take stock of everything that's called between tests during setup & teardown to ensure multiple tools aren't attempting to clean up the state of the database and potentially interfering with one another.
This project follows Test Double's code of conduct for all community interactions, including (but not limited to) one-on-one communications, public posts/comments, code reviews, pull requests, and GitHub issues. If violations occur, Test Double will take any action they deem appropriate for the infraction, up to and including blocking a user from the organization's repositories.