My concern with docker is the sheer amount of images we would need for all the ruby versions.
I would like to be able to keep a ruby compile for every commit on head around so we can quickly zoom a bench through every version of Ruby in the last year.
This is fairly simple to do with rbenv as all the persistent data lives in a single dir.
I completely love Docker, we now use it for deployment, but I am concerned it may be a bit overkill here since rbenv can do the job anyway.
API/UI is fine, can you coordinate with @andypike he wants to get started.
I like the idea of using ruby-build + rbenv to build each Ruby version and be able to keep them all cached to quickly run benchmarks through all of them.
I’m currently working on a script to automate the builds for this, and did a few back-of-the-envelope calculations on the storage requirements for keeping this many builds around:
In 2013, there were around 5000 commits to Ruby trunk. Each compiled build of Ruby in that series is ~23 MB (that is, if you build with the --disable-install-doc configure option, which cuts down the size by almost an order of magnitude). So 5000 commits * 23 MB per build = ~112 GB.
The builds compress pretty well with tar + gzip, and not all need to be expanded on disk at once. Tarballs for each build would be ~6.5 MB, for a total of ~31 GB.
I bet that there’s a ton of duplication between builds, so an even better (slightly crazy?) approach might be to store the compiled builds in a git repo - one build per commit, layered on top of each other in chronological order. I believe git would take care of compression and de-duping for us, and I bet it would be able to significantly bring down the store cost to the range of 5-10 GB.
Once I’ve got a goodly number of builds locally (say around 100) I’m going to try the git experiment and see how it goes.
Let us know how you go, really want to get some code into the runner area.
@andypike I made you admin so you are not blocked, thanks heaps for all your work here, hope to clear up to help out more soon.
Hi all, I’m Luca Mearelli, web developer / freelancer, and user of Ruby on Rails since the ancient 0.13 (was 2005/6 right?). I’ve been doing performance optimizations regularly for various clients and I’d like to lend a hand to this project. I’ll probably have 2-4 hours per week (I’ll most probably able to do a few sprints here and there) and I’d be interested in the API / runner part or helping with some visualizations on the front end (knowing enough d3js )
where do I start?
@benweint one other option might be using tarsnap to store the builds as it behaves a bit like a remote tar+gz (it’s backed by S3) but has impressive deduplication capabilities.
It might be too slow though to store all the builds there … it depends on how often we need to access them but it can make sense as a second level of long term storage for the built environments.
If I was to experiment with it to see the storage requirements & speed of it, where should I start? Mind to share the build script you are working on?
@luca here’s what I’ve got so far: https://gist.github.com/benweint/8101511
Based on a sample of 100 builds, this seems to be working really well - it looks like the total size of storing all of the builds from the past year in git will only be ~2 GB, even less than I thought.
I’m running a longer test now with all ~5000 builds from the past year.
tarsnap looks cool, but my biggest concern there would be cost. I also found bup - backup software that’s based on the git packfile format, so should get the same de-duping and compression capabilities (honestly, some of git’s other features seem like they might be useful for us).
Other things to think about:
- How do we annotate builds to keep track of what platforms they were built on, and using what options?
- How do we deal with builds that fail?
I think we should add an endpoint on the API to flag this, then runners can query for “bad” builds and skip them.
This should be included in the result file.
I think a great next step would be to run a single bench (pick 1 or 2 from the benchmark dir in ruby) and output the results in an API friendly fashion. You can upload / link the results here and the @andypike and team can graph it using the front end.
Just a heads up really. We have a basic rails app that has an end-point you can use to send results to (https://github.com/ruby-bench/ruby-bench). The results are stored and there is a basic UI where users can see a graph of a selected benchmark across ruby versions to see how scores have changed over time. If there are multiple results for the same benchmark and ruby version we average the scores currently.
In terms on moving forward, can I suggest that if the runner team would like changes/additions to the API that you create an issue in the github project? We can work on those and then close when done. Feel free to propose whatever (including API formats) and we’ll tweak as required to get you something that works. Also, in terms on the front-end, please again create an issue describing what reports etc you would like to see and we’ll make that happen too . Of course you can also send a PR if you fancy .
Once we have a server(s) for the API/UI we’ll setup deployments so you have live end-points to call against. I imagine we’ll have staging and production environments so you can test out stuff before going to production. We can discuss that in more detail later.
Any questions please let me know
this or maybe if we go the git route just annotate the commit message for the build with some machine readable text (e.g. a json with the data describing the build: platform, revision, options, …)
otherwise we could store the history of the builds on the API service, after a commit the builder script would send a message to the API with the revision + other descriptive data (this solution could open ways to integrate the build information in the frontend)
what info do we need to keep for each build?
I’m Shihang and I am a newcomer to Rails community. I am not sure how I can help this project, but I think I can probably run some benchmark tests on EC2 because I have a few credit from my class project
This topic is now closed. New replies are no longer allowed.
all rubybench discussion is at http://community.rubybench.org/
cc @shihangw @benweint @rwjblue