Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ruby Garbage Collection Deep Dive: GC::INTERNAL_CONSTANTS (jemma.dev)
62 points by asicsp on March 26, 2021 | hide | past | favorite | 27 comments


This might be only slightly related to this post but I started wondering: how often do you have to tweak GC in your job? Do you ask/are you asked questions about GC on interviews?

I've just realized that I was asked about this stuff in almost every Java interview I had, sometimes the questions were very detailed (and I was nowhere near HFT or any other real-time systems, GC pauses were minor concerns) but for jobs focused on other languages this topic is almost completely skipped.


Maybe only a little related, but in the five-ish years of ruby dev I've done, there was only one time I can remember interacting with the GC directly in production code.

It was in the context of a sidekiq job that was importing customer data via csv file. We would read in the csv, and for each row a lot of complicated logic was being performed that would translate the data from customer format into our format, and decide how to update different tables in db. These files were sometimes 10k lines or longer (all handled by a single sidekiq job), and would balloon up in memory so much that sidekiq would crash and would keep trying to restart the job. For each row we were instantiating an ActiveModel object that had a lot of attributes/functions. I think the right solution would have probably been to do a (fairly heavy) refactor of that area in the code, and spin up a separate job for each row, but we found that by running a GC.start every few rows we were able to cleanup some of the old AM objects and keep the memory usage low for the time being...


Mirrors my experience as well. Nearly a decade of working on large Ruby/Rails apps, some with very complex reporting / data processing flows (talking like billions of db rows processed in streaming queries, media encoding, etc) and a particular CSV processing situation like yours was the only time I needed to manually trigger GC... and even that was just triggering it, not even tweaking it.

The defaults seem very good, even at scale.


Part of that is because Ruby has a very predictable, but slower, GC. Java on the other hand has multiple memory managers... some optimized for high throughput/spikes, but are much harder to predict.


That's interesting!

I've been doing Ruby since 2014. Mostly Rails, but also a bunch of data processing.

I have run into memory issues at times, when shuffling large amounts of data around. But manually running GC was never the answer in my cases.

In all cases, the memory issues were because I'd created a bunch of heavy objects that were still in-scope and were therefore not eligible to be cleaned up by GC anyway.

This was all Ruby 2.0+ and most of the heavy data processing stuff was 2.3+. So I wasn't doing any of it back in the days of really ancient Ruby GC.


I've done a lot of similar work and learned a lot of similar lessons. They were interesting and fun challenges but I've since moved on from Ruby in my professional life.

I'll say this much: When I was working on these applications, one of the minor wins that I had was swapping them over to the jemalloc memory allocator. It has introspection/instrumentation tooling that is really useful for these sorts of situations. You can use `MALLOC_CONF` [0] to trigger some built-in profiling. For instance, `export MALLOC_CONF='prof_leak:true,lg_prof_sample:0,prof_final:true'` will trigger jemalloc to log the heap at exit which is very useful for tracking down leaks.

[0]: https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Leak-C...


I work mostly on internal Rails apps so the need for fine-tuning GC is basically non-existant.

I think the app that needed to be the most performant ended up using JRuby :P

I can't wait for truffleruby to be a thing.


It's anecdata, but JVM tuning comes up far more in Java related conversations I've had than the equivalent in other languages/environments. I'm not enough of a Java expert to fully appreciate why this is.


Well, the JVM has several GCs they have a lot more potential for tuning, and there are many tools to gather data on what the GC is doing and to analyse heap dumps to discover the cause of problems. If you have a language like Ruby which is normally used without a moving GC then there isn’t huge scope for tuning things, but if you have a moving GC then there is a lot you can tinker with regarding region sizes etc.


It seems that Ruby GC is conservative (1st google hit)? That pretty much means giving up on your GC performance optimization..


Part of that is because there are so many options, and since Java is used a lot in software where you care about performance/throughput, you hear about it - just like how you hear about all the different kinds of memory allocators you can write in C.


I ask basic questions in my interviews to see if a person is even aware of GC and potential for object leaks in dynamic languages. We have a pretty standard webapp with a handful of backend services, not HFT, but we did have clueless interns write leaky code that wasted memory and crashed, and I don't need that to happen again, so yes, when a webapp programmer cannot even recall the term garbage collection that's not a great sign for me.

I'm additionally amazed at people who show up at the interview sometimes claiming C and/or C++ experience (completely not required for the role, but hey, they do claim that experience) but then seem to be completely unaware of any basics of memory management.


Ruby 1.8 and 1.9 did get significant benefit if you tuned the GC because the simple GC it used back then was tuned for quick command line startup. More recent versions have a much improved GC that doesn’t need tuning for most cases.


In modern Java you really don't need to tweak anything G1 defaults are typically more than enough. Maybe if your latency sensitive you would switch to ZGC or set the max pause time.


A Java dev will at least tweak the xmx/xms at some point in its career.


I don't know other languages/VMs that requires to set JVM's Xmx/Xms equivalent parameter. Why only JVM requires it? What about to just set unlimited by default?


Generally, you want to leave some memory available to other things, like the OS, buffers in the network stack, etc.

Having a limit for the VM is helpful. Also, by default it is automatically 20% of available memory.

Personally, I've had to tune this kind of stuff with every VM I worked with (JVM, Node, PHP). :)


I'm not a Java dev at all, and I've tweaked more than just those parameters. Simply running Java programs is sufficient that you can end up introduced to the JVM's GC.


As someone learning Ruby and Rails at the moment, and who likes to learn things from the fundamentals, is there a good book or blog series that takes you through it?

Because I’m helping on a Rails app I’m learning a lot of magic, which is awesome for productivity, but I feel like I’m missing some important concepts that tie things together.


https://github.com/The-Complete-Guide-to-Rails-Performance

I learned quite a bit about ruby memory mgmt from the above. Free text, pay for video tutorials IIRC. The course was authored by Nate Berkopec.

https://www.youtube.com/user/nateberkopec

I also found "the well grounded rubyist" to be very helpful.

https://www.manning.com/books/the-well-grounded-rubyist-thir...


The Well-Grounded Rubyist is my favorite Ruby book because it really drives home just how internally consistent the Ruby object model is. It gave me a deep appreciation for the aesthetics of Ruby.


This. Ruby’s syntax sugar makes it easy for newcomers to miss the elegance of what’s actually happening with everything being an object and every expression reducing to passing messages between objects. When you dig in and really understand what’s going on, the lightbulb moment is very neat.



If you want to get into the internals of Ruby then I'll recommend Ruby Under a Microscope.


I enjoyed 'Eloquent Ruby' by Russ Olsen. It feels nice and friendly, and it gives you a solid understanding of Ruby, the object model, how inheritance works, how and why to use metaprogramming and more.



Recent and related:

Ruby Garbage Collection Deep Dive: Tri-Color Mark and Sweep - https://news.ycombinator.com/item?id=26182796 - Feb 2021 (15 comments)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: