Ruby Bench Intros

Hi All,
I’m Dan McClain, a partner at DockYard, a Rails and Ember consultancy. I started working with ruby about 3-4 years ago, transitioning to it full time after leaving .Net 2 years ago. I maintain a couple of ActiveRecord related gems, and have started really thinking about performance. I don’t have much profiling experience, but definitely want to change that. I’d be interested in working on both the front end and the data gathering as well. I’d be able to help most on the front end (API and as @rwjblue suggested possible Ember UI) until I get up to speed on benchmarking practices. Most weeks I’d be able to contribute 5 hours, possible more.

I won’t be able to make the call at 5PM today

Perhaps we should schedule a call at a better time, or simply start planning without a call? Herding cats is complicated.

Indeed it is complicated. What we need is a quick breakdown of the steps, and an overall game plan. Doesn’t need to be super detailed yet, though.

I am still game for a call to layout the basic ideas and decide a general path forward…

Basic minutes of call:

@rwjblue @andypike @mariovisic and @sam attended.

  • We want to build this on Ruby all the way through, so will not reuse existing PyPy or Go front ends

  • Project will be split into 3 parts: UI, API and Runner. Runner will post up json results to the API, UI will talk to the API

  • Some suggested an Ember front end for the UI, personally I am happy with whatever the team working on the API decides on be it Ember or traditional. I do want to ensure it is very easy to contribute to the front end.

  • I created a home for the project at Github, will add people to the organisation as needed https://github.com/organizations/ruby-bench

  • @andypike and @rwjblue are interested in UI / API. @mariovisic and myself interested in Runner

  • We need to define a “language / format” for results posted to the API as a first priority, everything flows from that, @rwjblue to create a topic here to discuss a proposed format

  • Timelines are complicated to set at the moment

  • Long term goal is isolate performance gains and regressions for a particular bench suite.

  • Engine is to be capable of running a suite comparing MRI performance over time, Rails performance over time and so on. When comparing performance overtime environment will be fixed except for the component being tested. So, for example, if testing Rails perf over time one would only change the Rails git repo and keep gems and Ruby versions fixed.

  • We need to think of an efficient way to bootstrap environments, I suggested storing historic builds of every version of Ruby to speed up running any historic benches we need.

:slight_smile:

Howdy all! Looks like I’m a bit late, but I’m Ben Weintraub, relatively new to Ruby (~2.5 years) by way of a mishmash of C, Objective-C, and lots of other stuff. I work at New Relic on the newrelic_rpm gem, and have set up a system kind of like this internally (though much more limited in scope) for our internal benchmarks. I’m interested in helping to build something better and more general.

I’m specifically interested in helping out with the runner or the API (or both).

I’ve got probably 2-5 hours / week that I can dedicate, though possible more for a spell early next year.

That’s an awesome start. Sorry for not being able to make it.
About bootstrapping the environments, docker has already been mentioned, and I think it’s definitely the way to go to easily get stored and quickly bootable builds.

I could definitely work on setting up this kind of environment.

I’ll be immediately able to contribute to the front end/API if you are looking to allocate resources. I would be in the runner but i’d need to get up to speed before being able to make super useful contributions

My concern with docker is the sheer amount of images we would need for all the ruby versions.

I would like to be able to keep a ruby compile for every commit on head around so we can quickly zoom a bench through every version of Ruby in the last year.

This is fairly simple to do with rbenv as all the persistent data lives in a single dir.

I completely love Docker, we now use it for deployment, but I am concerned it may be a bit overkill here since rbenv can do the job anyway.

API/UI is fine, can you coordinate with @andypike he wants to get started.

I like the idea of using ruby-build + rbenv to build each Ruby version and be able to keep them all cached to quickly run benchmarks through all of them.

I’m currently working on a script to automate the builds for this, and did a few back-of-the-envelope calculations on the storage requirements for keeping this many builds around:

In 2013, there were around 5000 commits to Ruby trunk. Each compiled build of Ruby in that series is ~23 MB (that is, if you build with the --disable-install-doc configure option, which cuts down the size by almost an order of magnitude). So 5000 commits * 23 MB per build = ~112 GB.

The builds compress pretty well with tar + gzip, and not all need to be expanded on disk at once. Tarballs for each build would be ~6.5 MB, for a total of ~31 GB.

I bet that there’s a ton of duplication between builds, so an even better (slightly crazy?) approach might be to store the compiled builds in a git repo - one build per commit, layered on top of each other in chronological order. I believe git would take care of compression and de-duping for us, and I bet it would be able to significantly bring down the store cost to the range of 5-10 GB.

Once I’ve got a goodly number of builds locally (say around 100) I’m going to try the git experiment and see how it goes.

1 Like

Let us know how you go, really want to get some code into the runner area.

@andypike I made you admin so you are not blocked, thanks heaps for all your work here, hope to clear up to help out more soon.

Hi all, I’m Luca Mearelli, web developer / freelancer, and user of Ruby on Rails since the ancient 0.13 (was 2005/6 right?). I’ve been doing performance optimizations regularly for various clients and I’d like to lend a hand to this project. I’ll probably have 2-4 hours per week (I’ll most probably able to do a few sprints here and there) and I’d be interested in the API / runner part or helping with some visualizations on the front end (knowing enough d3js :wink: )

where do I start?

@benweint one other option might be using tarsnap to store the builds as it behaves a bit like a remote tar+gz (it’s backed by S3) but has impressive deduplication capabilities.

It might be too slow though to store all the builds there … it depends on how often we need to access them but it can make sense as a second level of long term storage for the built environments.

If I was to experiment with it to see the storage requirements & speed of it, where should I start? Mind to share the build script you are working on?

@luca here’s what I’ve got so far: https://gist.github.com/benweint/8101511

Based on a sample of 100 builds, this seems to be working really well - it looks like the total size of storing all of the builds from the past year in git will only be ~2 GB, even less than I thought.

I’m running a longer test now with all ~5000 builds from the past year.

tarsnap looks cool, but my biggest concern there would be cost. I also found bup - backup software that’s based on the git packfile format, so should get the same de-duping and compression capabilities (honestly, some of git’s other features seem like they might be useful for us).

Other things to think about:

  • How do we annotate builds to keep track of what platforms they were built on, and using what options?
  • How do we deal with builds that fail?

I think we should add an endpoint on the API to flag this, then runners can query for “bad” builds and skip them.

This should be included in the result file.


I think a great next step would be to run a single bench (pick 1 or 2 from the benchmark dir in ruby) and output the results in an API friendly fashion. You can upload / link the results here and the @andypike and team can graph it using the front end.

Hi everyone,

Just a heads up really. We have a basic rails app that has an end-point you can use to send results to (https://github.com/ruby-bench/ruby-bench). The results are stored and there is a basic UI where users can see a graph of a selected benchmark across ruby versions to see how scores have changed over time. If there are multiple results for the same benchmark and ruby version we average the scores currently.

In terms on moving forward, can I suggest that if the runner team would like changes/additions to the API that you create an issue in the github project? We can work on those and then close when done. Feel free to propose whatever (including API formats) and we’ll tweak as required to get you something that works. Also, in terms on the front-end, please again create an issue describing what reports etc you would like to see and we’ll make that happen too :smile:. Of course you can also send a PR if you fancy :wink:.

Once we have a server(s) for the API/UI we’ll setup deployments so you have live end-points to call against. I imagine we’ll have staging and production environments so you can test out stuff before going to production. We can discuss that in more detail later.

Any questions please let me know

Cheers

Andy

1 Like

this or maybe if we go the git route just annotate the commit message for the build with some machine readable text (e.g. a json with the data describing the build: platform, revision, options, …)

otherwise we could store the history of the builds on the API service, after a commit the builder script would send a message to the API with the revision + other descriptive data (this solution could open ways to integrate the build information in the frontend)

what info do we need to keep for each build?

Hi all,

I’m Shihang and I am a newcomer to Rails community. I am not sure how I can help this project, but I think I can probably run some benchmark tests on EC2 because I have a few credit from my class project :smile:

This topic is now closed. New replies are no longer allowed.

all rubybench discussion is at http://community.rubybench.org/

cc @shihangw @benweint @rwjblue