Ruby

Benchmarking return data types


In a previous article, I discussed several approaches to handling returned data in API responses. These ranged from using plain Ruby hashes to more type-strict solutions like dry-struct.

I also wanted to benchmark all of them to compare their performance. In any real-world application, the raw performance of each method is largely irrelevant, as the overhead of the web request and response time of the API itself is likely much larger by multiple orders of magnitude. However, running some benchmarks can still be a fun exercise.

First, I want to formulate some assumptions about what I would expect. This is a valuable exercise for improving your ability to make ballpark estimations and train your intuition when it comes to performance.

So, here are my predictions:

  • Nothing beats the performance of Hash, simply for the fact that all other structures need a hash as input, so per definition that’s the absolute lower limit of performance.
  • Hashes with symbolized keys or indifferent access (both string and symbol) are a bit slower, but not twice as slow. So let’s end in the middle and guess 1.5 times slower.
  • All the Ruby class like structures have roughly the same performance and are a couple of times slower than Hash.
  • OpenStruct performs very bad, because I already know this before I started on this article and the docs are so explicit about it. It performs the worst of all structures, about 10 times as slow.
  • The structures that have some form of type checking are also slower because there’s a lot more going on. It needs to check types and cast types for example. I expect those to be twice as slow as the regular class like structures.

I’ve decided to split the tests into two parts, one where I test parsing JSON into hashes (with string keys, symbol keys and indifferent access), and one for converting those hashes into more complex structures. The reason for this is that all the other methods need a Hash as input anyway, so comparing pure performance make no sense, and I feel there is a distinct different between a hash structure and a more refined object-like structures.

I also want to have a look at memory usage, just to see if anything interesting shows up there. Memory usage might be a better to reason to use one or the other in any real life application, more so than raw performance, as API responses can be quite large.

💎 I ran these tests on a 2021 Macbook Pro equipped with a M1 Max chip and 32 GB of ram. I used Ruby 3.3.0 with YJIT enabled. (ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23])

For these tests I convert the full output of the Github User API for my personal account, which has about 22 simple string/integer fields in it. I wrote (with some help from Github Copilot) simple wrapper classes that take all fields as attributes. For the structures that have types I set up the correct types.

First the results for hash conversion:

Hash               164.407k (± 1.4%) i/s -    837.318k in   5.093985s
Hash w/ symbols    147.386k (± 1.4%) i/s -    738.750k in   5.013394s
Hash w/ ind. xs    115.527k (± 0.6%) i/s -    586.143k in   5.073812s

Comparison:
Hash:              164407.2 i/s
Hash w/ symbols:   147386.3 i/s - 1.12x  slower
Hash w/ ind. xs:   115527.4 i/s - 1.42x  slower

Hash:                  4328 allocated
Hash w/ symbols:       4488 allocated - 1.04x more
Hash w/ ind. xs:       5360 allocated - 1.24x more

I expected the symbolized keys and indifferent access hashes to be about 1.5 times slower, so these results are about what I expected. Memory wise there are a few more allocations, but nothing serious.

Now onto the more interesting data structures.

                PORO      1.615M (± 0.7%) i/s -      8.209M in   5.083925s
              Struct    979.275k (± 2.1%) i/s -      4.991M in   5.099120s
          OpenStruct    216.668  (±18.5%) i/s -      1.072k in   5.107021s
                Data    996.112k (± 3.0%) i/s -      4.989M in   5.013517s
         Dry::Struct    179.454k (± 2.5%) i/s -    914.175k in   5.097592s
         ActiveModel    125.635k (± 0.8%) i/s -    638.612k in   5.083395s
        Hashie::Mash     61.640k (± 1.2%) i/s -    314.132k in   5.097024s
        Hashie::Dash    132.436k (± 2.0%) i/s -    663.850k in   5.014691s

Comparison:
                PORO:  1614715.8 i/s
                Data:   996112.0 i/s - 1.62x  slower
              Struct:   979275.1 i/s - 1.65x  slower
         Dry::Struct:   179454.0 i/s - 9.00x  slower
        Hashie::Dash:   132436.3 i/s - 12.19x  slower
         ActiveModel:   125634.8 i/s - 12.85x  slower
        Hashie::Mash:    61639.7 i/s - 26.20x  slower
          OpenStruct:      216.7 i/s - 7452.49x  slower

                PORO:        320 allocated
              Struct:       1232 allocated - 3.85x more
                Data:       1232 allocated - 3.85x more
         ActiveModel:       1856 allocated - 5.80x more
        Hashie::Dash:       3808 allocated - 11.90x more
          OpenStruct:      18256 allocated - 57.05x more
         Dry::Struct:      24170 allocated - 75.53x more
        Hashie::Mash:     183756 allocated - 574.24x more

The Plain Ruby Object is the clear winner here. In general you could say that Data and Struct perform the same, being a bit slower than a simple Ruby object, but not as much as the more complex structures.

We can also group together the more complex structures like Dry::Struct, regular Hashie and ActiveModel. They perform a few times slower than the simpler structures, but still seem to be pretty fast. Dry::Struct does use a lot more memory though, I assume this is because of the type checking and type casting it needs to do.

The only real outlier is OpenStruct, which is a whopping 2500 times slower (!) and uses 57 times more memory. Next to that, Hashie::Mash is also really memory hungry, needing almost than 575 times more allocations than a simple Ruby object.

I think the main conclusion to draw from these benchmarks are:

  • There is no reason to ever use OpenStruct.
  • Any other structure is probably fine.

To circle back to my predictions I made at the start of the article:

  • ✅ Hash is the fastest (that was easy 😅).
  • ✅ Stringified keys is a bit slower, but not 2 times.
  • ✅ Regular Ruby objects perform about the same.
  • ❌ OpenStruct is not 10 times slower, but 2500 times.
  • ❌ Typed structures are not 2 times slower, but more like between 5 and 10 times.