@dmathieu originally suggested using Docker to take care of Runner, I was a bit reluctant initially but the more I think about it the better I feel it fits.
Here is a rough idea for a V1 of runner
Runner to be controlled by a simple bash file with no dependencies except for Docker
On execution runner is either to build or download a cached image
Take advantage of AUFS for image building (keeping diffs small)
Bash file to control what results runner is to generate, it can submit them optionally using curl
Example:
# ./runner
Usage example: ./runner --start 45340 --finish 45357 --step 10 full_suite
(runs full_suite for each 10 builds outputs results on screen and to a file called results.csv)
Runner attempts to use a pre-built image first, if missing it creates one.
Keeping diffs small
Runner can create a clean image every N builds and used diffs for the rest.
So for example it can create a base image with 45300 and then for every subsequent build use that as a starting point (nuke old install do a new install)
If all filesystem changes are kept in one step AUFS should be smart enough to only store the diff between builds, leading to a very minimal storage req and way faster running of specs.
Consider running a custom registry
We can outsource building of all the images across the team and publish to a custom registry
This makes running tests on a clean box very simple.
Version 1
For version 1 I would suggest we stay away from complex setups like say discourse and focus on simple .rb benches, later on we can integrate discourse bench. It gets trickier cause you need to bundle install stuff and bring up a pile of environment.
I’m not sure what the builds means here? A commit on https://github.com/ruby/ruby? We run benchmark whenever there is a new commit on ruby/master?
All things are controlled by a bash, right? It will use pre-built or create a image, and then run the benchmark inside that docker image?
I agree we start from simple .rb benchmarks. We may finally build something like rails-dev-boxhttps://github.com/rails/rails-dev-box for discourse’s benchmarking.
I suppose we will work on .rb benches first, and then move toword Rails or Discourse, right?
By the way, what we are doing is running benchmarks on some ruby projects with different version ruby, so we can easily know if there is an improvement between two ruby versions, right?
yes with builds I mean a commit on ruby, each commit of ruby has a corresponding svn commit see process.c: constify · ruby/ruby@d64ba37 · GitHub for example. You can fetch single rev from the svn server.
Yes
I would like to first focus on simple benches, we can get to this later.
What will you suggest for the schedule of this task? I have a roughly task list in my mind, ordered by priority:
Setup and tune the docker, maybe a couple of days?
Choose profiling tools. I have no idea what the current tool is? I think tmm1/rbtrace and tmm1/perftools.rb serve a suitable role in this case? (I don’t know why discourse don’t allow me to post links of github)
Try to run benchmarks on simple .rb files. Connect runner and ruby-bench interface. Make it run automatically. This may take about one or two weeks.
Extend to run benchmarks on Discourse. Setting up and tuning the environment will be a main task here. I expect this will take about (or more than) three weeks.
Extend to run benchmarks on Rails itself. I have no idea how this should be done yet.
Are these tasks too few or too many for a summer project?
I would not worry at all about tuning, I have a pretty rock solid bench box at home. I can run any long term bench you need or give you access. Just worry about the script for now.
You can skip this, not the role of the benchmarking project at all.
Would also skip this for now, only actual metric you could get from rails proper is “duration it takes to run test suite”.
Instead once Discourse bench works, we could leave Discourse and Ruby at a fixed version and keep running the bench against newer and newer versions of Rails. Discourse is compatible with Rails master.
Detect which commit impacted Rails or MRI performance
Runner script is how we can get there.
By making it trivial and reproducible to bring up a benching environment, run bunches and post to a central repo we solve the problem of having the data.
I’m trying to build a docker image for the runner. What should the image looks like?
I think there should be a ruby layer and a runner layer so that we can replace the ruby layer easily.
Anything else is needed?
Another question is, how the “benchmark” should be implemented? We just record the running time?
output for time is a good start, or you could use the benchmark gem.
Regarding layers, it does not matter that much cause to keep diffs small you are going to have to start from a full image and layer on that. Otherwise a layer will include every file distributed with Ruby.
Be sure not to install rdoc or ri when you set up the image, to cut on space.