Status, progress, cooperation

@tgxworld - I will make a sample of how i imagine my test results presentation webpage during the holidays. I think it might be a good subsection of your page (or separate domain withi same principle as you named it railsbench). It will be simple and selfexplaining i hope.

Hopefully u will like it guys.

@tgxworld - here is a small example what i imagine the ruby bench presentation http://rubybench-rycco.rhcloud.com/benchmarks

@richard_ludvigh Looks good to me. What are you objectives for your thesis though? From what I can see, its moving towards what @brianhempel has already done.

@tgxworld - my objective is to benchmark different implementations also outside rails ( for now i am not doing any rails benchmarks as I know u are doing them :slight_smile: )

Also part of thesis is the benchmark suit which u can run on your local computer. Also the suit results running on my server will be published on the web i am currently making.

I was thinking that u can focus on rails benchmarking while i focus on ruby benchmarking, and use my web as a part of your. Or subdomain.

Hi @sam, I was away for a vacation the past week so there wasn’t much update.

I could only go as far back as Mar 2 2014 before running into compiling issues. Its expected to take about 30hrs from now as there are over 300 commits. (3000 commits but I sliced them into 10s)

Do you have an idea of what else needs to be done since you’ll have a better vision of what is needed. :smiley:
Feel free to add tasks to this board and I’ll get them done. Trello

Also, do we still have the bare metal server provisioned by ninefold? I’m currently using a $500 off first purchase from Softlayer for a bare metal machine. Once it runs out, we’re going to have to switch and that means re-running most of the benchmarks

@tgxworld - as i see u made it look like my concept :smiley: Are you also focusing on bare ruby benchmarks ?

@richard_ludvigh Yup :smile: I’m not really focusing on it per say but I just do what I can. My worry here is that because you’re working on the project as part of your thesis, your code probably cant be opened up for collaboration. Currently I’m just cleaning up what I’ve done for Ruby Benchmarks and am looking for my next direction.

@tgxworld - Well, its open source so there is no problem about my code not being opened. The thing is that i just have the image of how it shold look like at the end, and also to be able to download and test all versions localy with my suit.

I think best for us is to split the work, as i am focusing on ruby, u made site called (railsbench :smiley: ) so probably u should focus on benchmarking rails. :slight_smile: - also i am trying to make my suit opened to all benchmarks so when u paste some code in benchmark fodler it should test it :slight_smile: - not working now, its feature for future. Also I have build an API so everyone can download results and make some stats on his own (each benchmark is running 20-30 times ). I want to try memory benchmarks also :slight_smile:

Also as i see the only thing splitting us is that your approach is to make commit based benchmarking while my approach is to make version based benchmarking (as its not possible to download benchmark suit and try to test all commits localy ) :smile:

So my suggestion is we split the work and make brother sites (railsbench, rubybench) - which will deeply focus on their own benchmarks

Just to throw another immediate problem on the table:

https://p8952.info/ruby/2014/12/12/benchmarking-ruby-with-gcc-and-clang.html

I don’t trust these numbers at all, they are running on a virtual host. That said, its conceivable there is a difference between jemalloc on / off, o2 vs o3, clang vs gcc, gcc 4.8 vs 4.9. It would be great to give some honest replicable results ideally running Discourse bench and/or the full bench suite.

@sam - yeah All my tests run inside container with ubuntu 14.04 that means GCC 4.9 with O2. By his post this has the best results, and I think its not bad to benchmark on that configuration as many people doploying ruby (or rails) applications use Ubuntu servers (saying by my own experience from work).

But i will definitly meintion that in the benchmark web thank yout Sam !

Most people will be on 4.8 with O3, (latest ubuntu is 4.8 and O3 is the makefile default)

So having a meaningful comparison here would be very handy.

@sam , yeah thanks for notice, I will build docker images for all rubies i have using GCC4.9-O2 and GCC4.8-O3. Maybe i will get some usefull results :smile:

Seems, like i will pay for some machine to test it for now, bot my thesis leader send me link to this site ( https://crissic.net/open-source_free-hosting ). Maybe I can try talk to them about some machines

Hi @sam more updates!

So as per your discussion with @brianhempel here, I’ve implemented the benchmarks for Discourse against Ruby Trunk and Discourse against Rails Head.

Discourse against Ruby Trunk

Since we’re only interested in the performance of Ruby in this case, what I’ve done is to run the Discourse benchmarks off a forked Discourse Repo stable branch. That means that the version of Discourse that we use will be fixed with Ruby as the only changing variable. Postgres and Redis is fixed at 9.3.5 and 2.8.19 respectively. I’ve already starting back filling data till Mar 2014. You can have a look at the benchmarks here: https://railsbench.herokuapp.com/tgxworld/ruby?utf8=✓&result_types[]=discourse_ruby_trunk_categories&commit=Submit

Discourse against Rails Head

For Discourse against Rails head, I decided to keep Discourse as a constant by resetting to a particular commit for every run. Similarly, Postgres and Redis are fixed at 9.3.5 and 2.8.19 respectively. I’m still waiting for the Ruby jobs to finish before adding more data for Rails. Rails is a little more tricky for me due to dependency issues and new/old commits breaking stuff with Discourse bench.

As you may have also noticed, I’ve also touched up the UI quite a bit. I’ve also started reporting back to Ruby core about performance issues which I’ve noticed so that is going well too.

One immediate problem I have right now is the cost of a bare metal server so I’m hoping we still have that server that Ninefold has provisioned. (On change of the bare metal server, we have to rerun all benchmarks).

Thats all from me. Happy Holidays!

@tgxworld - seems nice to me :smile:

One question: How much do you pay for that baremetal ? seems pretty strong to me :smile:

I am currently running some tests across MRI versions (GCC4.8 and GCC4.9). Its on hosted VPS, but they are pretty stable. Hopefully i will get some baremetal soon too.

I’m renting a server from Softlayer. They have been offering $500 off first months for servers in certain regions and I’m using those offers right now.

Ah, okej, As I said, i am curently running benchmarks in virtual (at least they are stable) and i will see what results it will get.

Also, i was thinking about measuring memory on ruby tests, how about memory tests on rails @tgxworld ?

@tgxworld Looks like some great progress. I like where this is going.

The commit-to-commit variability seems high in many cases. Getting the variability down could be annoying, but it would help to diagnose smaller regressions. For IRFY I ran each test 4 or more times before generating a datapoint, but IRFY has fewer benchmarks so I could get away with that.

@tgxworld Oh, I see: the Y axis on the graph doesn’t start at 0, which makes the commit-to-commit variance look larger than it is. The graphs certainly look more interesting when the Y axis is set to fit, and you can see small changes better. However, there would a better sense of proportion if they started at 0. Something to think about. I don’t know if there is a “right” answer.

http://rubybench-rycco.rhcloud.com/benchmarks/

I made some tests. Each benchmark/ruby_version/gcc_version was run 10 times. Unfortunately it was run on virtual masine for now (just tu test the suite).

But we can see somethink interesting here :smile:

Btw - y axis is logarithmic so you can see small differences.

Yup. I’m getting very high variance on some results. Perhaps the lowest hanging fruit would be to run the benchmarks more times. @sam I’m seeing very high variance on the Discourse benchmarks, do you normally have to run them multiple times?

I’m going to move away from C3js and just use D3 eventually :stuck_out_tongue: