Runner on Docker


@dmathieu originally suggested using Docker to take care of Runner, I was a bit reluctant initially but the more I think about it the better I feel it fits.

Here is a rough idea for a V1 of runner

  • Runner to be controlled by a simple bash file with no dependencies except for Docker
  • On execution runner is either to build or download a cached image
  • Take advantage of AUFS for image building (keeping diffs small)
  • Bash file to control what results runner is to generate, it can submit them optionally using curl


# ./runner

Usage example: ./runner --start 45340  --finish 45357 --step 10 full_suite

(runs full_suite for each 10 builds outputs results on screen and to a file called results.csv) 

Runner attempts to use a pre-built image first, if missing it creates one.

Keeping diffs small

Runner can create a clean image every N builds and used diffs for the rest.

So for example it can create a base image with 45300 and then for every subsequent build use that as a starting point (nuke old install do a new install)

If all filesystem changes are kept in one step AUFS should be smart enough to only store the diff between builds, leading to a very minimal storage req and way faster running of specs.

Consider running a custom registry

We can outsource building of all the images across the team and publish to a custom registry

This makes running tests on a clean box very simple.

Version 1

For version 1 I would suggest we stay away from complex setups like say discourse and focus on simple .rb benches, later on we can integrate discourse bench. It gets trickier cause you need to bundle install stuff and bring up a pile of environment.

Thoughts on the basics here?

Status, progress, cooperation

Hi @sam,

I have some questions here:

  1. I’m not sure what the builds means here? A commit on We run benchmark whenever there is a new commit on ruby/master?
  2. All things are controlled by a bash, right? It will use pre-built or create a image, and then run the benchmark inside that docker image?

I agree we start from simple .rb benchmarks. We may finally build something like rails-dev-box for discourse’s benchmarking.
I suppose we will work on .rb benches first, and then move toword Rails or Discourse, right?

By the way, what we are doing is running benchmarks on some ruby projects with different version ruby, so we can easily know if there is an improvement between two ruby versions, right?


yes with builds I mean a commit on ruby, each commit of ruby has a corresponding svn commit see for example. You can fetch single rev from the svn server.


I would like to first focus on simple benches, we can get to this later.


If we’re using docker for that, we don’t need to provision any virtual machine except the one running docker, that’s the whole point.

Apart from that, @sam’s idea seems very good to me. I might add other ideas later.


What will you suggest for the schedule of this task? I have a roughly task list in my mind, ordered by priority:

  • Setup and tune the docker, maybe a couple of days?
  • Choose profiling tools. I have no idea what the current tool is? I think tmm1/rbtrace and tmm1/perftools.rb serve a suitable role in this case? (I don’t know why discourse don’t allow me to post links of github)
  • Try to run benchmarks on simple .rb files. Connect runner and ruby-bench interface. Make it run automatically. This may take about one or two weeks.
  • Extend to run benchmarks on Discourse. Setting up and tuning the environment will be a main task here. I expect this will take about (or more than) three weeks.
  • Extend to run benchmarks on Rails itself. I have no idea how this should be done yet.

Are these tasks too few or too many for a summer project?


I would not worry at all about tuning, I have a pretty rock solid bench box at home. I can run any long term bench you need or give you access. Just worry about the script for now.

You can skip this, not the role of the benchmarking project at all.

Would also skip this for now, only actual metric you could get from rails proper is “duration it takes to run test suite”.

Instead once Discourse bench works, we could leave Discourse and Ruby at a fixed version and keep running the bench against newer and newer versions of Rails. Discourse is compatible with Rails master.


Since Discourse and Ruby versions are fixed, we can then know if the newer version of Rails is better than previous, right?

I don’t really get the picture of this project for now. The main task will be the runner script? What will it do exactly?


Yes, exactly.

At the highest level:

  • Graph MRI performance over time
  • Graph Rails performance over time
  • Detect which commit impacted Rails or MRI performance

Runner script is how we can get there.

By making it trivial and reproducible to bring up a benching environment, run bunches and post to a central repo we solve the problem of having the data.


I’m trying to build a docker image for the runner. What should the image looks like?
I think there should be a ruby layer and a runner layer so that we can replace the ruby layer easily.

Anything else is needed?

Another question is, how the “benchmark” should be implemented? We just record the running time?


output for time is a good start, or you could use the benchmark gem.

Regarding layers, it does not matter that much cause to keep diffs small you are going to have to start from a full image and layer on that. Otherwise a layer will include every file distributed with Ruby.

Be sure not to install rdoc or ri when you set up the image, to cut on space.


Is running benchmark on some simple but long running time ruby scripts a good start?
We then compare the running time of each build of ruby?

Should we use Dockerfile to control docker setup, or should we just use some shell script to setup docker?


I think you are going to need a shell script for the setup, especially cause you need to handle params.

I would start with running a few of the scripts in the ruby benchmark directory. (see ruby source)


This topic is now closed. New replies are no longer allowed.

all rubybench discussion is at