About the staging app, which build / repo it uses ? The app that i pulled from github repo acts differently as was written at API/UI staging app on Heroku thread. The stagging one uses javascript for displaying graphs. So the questions are what are the differences between code on staging and code in official repo.
I saw lazywei started repo to benchmark ruby using docker. Sorry i was not able to test review or test the code by now, so i just want to ask if you know at what state it is now.
Also what are you looks for future, and your opinion on my idea of testing also different implementations of Ruby as i wrote at issue at github.
As far as I am totaly open-minded by this point you may tweak my direction so I can fit more of your need with this project. I still need to read something more about Benchmarking Discourse locally so i know more about it. But as i read at your page Call for Long time Ruby benchmark there is plenty of tests to use.
PS: Sorry for non-clickable links, i was only allowed to post 2 of them
I am really not sure perhaps @andypike can help with that.
Again, not sure about this I think @lazywei will need to chime in.
I think we need something super basic and public out there visible to the world, a lot of what caused this to go dormant is over-ambition, something needs to be out there. Sure it would be nice to test different implementations and such but the first step is having real data out there.
@sam - Well, i took my bachelor thesis with Ruby Benchmarking, so by now i think i can make you some background test suit. My aim was on different Ruby implementations so Rubinius and JRuby for sure too. I gonna make test-suite based on docker and i thought i can push the stats to your UI. Also I am sure, I can make some tweaks with UI and Graph displaying.
@andypike - Will you be able to support me the content of Heroku staging ? Its not the same as on github.
@lazywei - Have you any plans , with your docker testing ? I am gonna build something similar but with more versions and different tests.
hmmā¦ I think we can work on ārunning more specific testā. Currently, ruby-bench-docker can only run the simple ābuilt-inā test in discourse, and the built-in benchmarks in Ruby. We may work on 1. more aspect test 2. more detail report.
@lazywei - Okey, i will take a look on status of your repo, also I am planing to add more rubies supported, as I mentioned, at least JRuby and Rubinius, but i think it would be interested also looking on something like TinyRB. I am planning to add ruby-benchmark-game to test sets too, also i have some test for parallelism.
I will fork your repo and play a little bit with it.
We can figure out some common metrics for benchmarks and some UI tweaks so we can compare stats and make some results
I saw you were benchmarking time outside container. I would wrap benchmarks by runners which will use Ruby Benchmark lib for example and then run following tests calculating just the time inside container ( we dont know but runing a cointainer may slow down something ).
Of course that wont be possible for start up tests. What you think @lazywei ?
Iāve actually started from scratch and have came up with an MVP over at https://railsbench.herokuapp.com/. Currently, Iām running the benchmarks on a bare metal server from Softlayer (sponsors welcome) and would love to have more collaborators on it. I really want this to take off and have plenty of time (Iām a student so time is on my side) to work on the project. Below is a brief overview of what Iāve done so far.
Overview
The following is a rough overview of how I implemented the application:
The Web UI has a Github Event Handler which receives the commits pushed to the repository through a webhook. Once the webhook has been received, the application run jobs against the particular commit which executes the benchmarks on a bare metal server through SSH.
In order to have consistent benchmarks, Iām renting the cheapest bare metal server from Softlayer. Iām not sure if we really need a bare metal server but Sam Saffron mentioned something about it in his blog post. The benchmarks are executed in Docker containers which is similar to what Bert did for GSoC and inline with the discussion here Runner on Docker.
For the Rails Benchmark, I forked ko1-test-app by Aaron and modified it to post the results of objects allocation for an index action which fetches and renders 100 users and their attributes. I read the previous thread and felt that creating our own app is the cheapest way to get things going. Discourse is currently running on Rails master but it will be tough if weāre benchmarking ātwo moving targetsā(~ Matthew). Perhaps we might be able to use a fixed code base of Discourse and run the benchmarks against it.
For the Ruby Benchmark, I used the existing ruby benchmarks. Not sure if this is relevant to the Rails Community but I thought it would be good to implement it anyway. Currently Iām running the benchmarks once as the whole suite takes close to 6 mins on my machine.
Next Steps
What other metrics do we need to track? (ko1-test-app has scripts for GC, Number of Request/sā¦)
Setup webhook on Rails Repo so that I donāt have to track the changes from a fork.
Thoughts and feedback in general?
More eyes to look at my code
Some sort of variation tracking system to notify contributors of anomalies
Per release benchmark of major apps?
Improve design
Just one thing is unclear to me, on which repo are the webhooks that trigger benchmarks and why ?
The Github webhooks are being triggered by me manually daily from a fork of the repo. This will be so until Iām able to get the Rails core and Ruby core to add the webhooks into the official repo.
And the Web UI is kind of missleading to me, i dont know what you are tring to say by the results and what the nubmers mean.
Ah yes. This part will definitely need more work to make it clear. For now I just wanted a quick prove of concept before more work is being done. Anyway for Rails, the numbers represent object allocations for an index action which fetches a 100 users from the DB and renders them in a list. For Ruby, they numbers represent the time in secs for each benchmark to run.
Ah ok. I just had a brief look into your repository. I think the main difference for what Iām working on is that Iām only interested in the performance tracking of Ruby Trunk and Rails Master. In summary, Iām running the benchmarks on a per commit basis.
@tgxworld - yeah but i think our ideas can merge somewhere between, my plan is to test benchmarks like āruby benchmark gameā, ruby official benchmarks, or cursera benchmarks on different versions of ruby (i am managing the dockerfiles with ruby versions manualy now - they dont come out that often)
My idea was to present the same benchmark on diferent versions and get the knowledge of how and why is the speed of ruby code changing (yeah sometimes ruby-1.9.2 was much faster in my tests than ruby 2.1.2 mainly because of security issues).
Also i saw u trigger the tests on each commit, which can be unnecessary sometimes, as i saw, there was a commit where was just small typo in comment fixed and some benchmark went from 3.00 to 2.90 which i think should not happen
I think we can workoff something together even as two separate platforms ( i will just use ur UI but present other results )
My idea was to present the same benchmark on diferent versions and get the knowledge of how and why is the speed of ruby code changing (yeah sometimes ruby-1.9.2 was much faster in my tests than ruby 2.1.2 smile mainly because of security issues).
Yup! I think this is one of the objectives we want to achieve. Upgrade to Ruby 2.X to get XX% of performance improvement.
Also i saw u trigger the tests on each commit, which can be unnecessary sometimes, as i saw, there was a commit where was just small typo in comment fixed and some benchmark went from 3.00 to 2.90 which i think should not happen smile
Yea Iām skipping the benchmarks when [CI SKIP] is present though.
As i checked now, i have 13 tests only now in suite and for all versions i have (12 currently - 10 MRI + jruby, rubinius) It takes about 1.5 - 2 days i think to finish. I am running each tests 10 times.
I will talk to my bachelor thisis leader (the thesis and leader is from Red Hat), so maybe i can get some machines on openshift where i can run my tests with no distributions.
I will post them here then, also i have to make other tests run but i am not sure, as Sam ( i thing ) told that there are still troubles running cursera benchmarks on rubinius for example
I am just waiting for some metal to run the tests on, as it take few days to finish. Also i want to ask if you have some suggestions on tests to run ?
@tgxworld i we can split the job in benchmarking, so you can focus mainly on rails benchmarking and i will take care of ruby benchmarking. Also i think it would be good to tests some other implementations too, some without GIL as they can support true concurency. Maybe it will end up by giving some usefull results
I would love to have a single, useful, site where people can go for both high-level and detailed Ruby and Rails performance information. If we could have that, then Iād gladly surrender the IRFY domain. We donāt need a plethora of semi-similar projects.
Iām willing to help out a bit. What are your next steps?
@richard_ludvigh Why does your suite take so long? If your suite takes multiple days to run, it can make iterating very hard. I had a goal with IRFY that it had to finish overnight. (Granted, IRFY is tightly coupled because itās a mountain of spaghetti Ruby and Bash.)
Yup lots more work to do for the UI. Thanks for the examples!
Seems like the main concern now is the presentation of the data which Iāll work on this week. Other than that, Iām currently checking with the Rails Core to find out what they need or rather what needs to be tracked. Hmm Iām open to options.
@brianhempel - It need couple of days to run each benchmark 10 times on 12 different version. After we have this results, we will need only 1/12 of that time to test one new version after release ( some benchs takes like 3 mins, now do id 10 times for 12 versions and u got 6 hours already )
Also the more test runs, the more accurate results