Summarized using AI

Democratizing the Fight Against Ruby Memory Bloat

Hongli Lai • June 09, 2021 • online • Talk

In the talk titled "Democratizing the Fight Against Ruby Memory Bloat," Hongli Lai outlines the challenges faced by Ruby applications regarding excessive memory usage, primarily attributing the issue not to Ruby itself but to the system's memory allocator, particularly in Linux environments. Lai emphasizes that around 70% of memory bloat can be traced back to these allocators, especially when multi-threading is involved. This talk aims to elucidate both the causes of Ruby memory bloat and the potential solutions that, despite existing, are not widely adopted due to their complexity in configuration and deployment.

Key points discussed include:

  • Definition of Memory Bloat: An instance where simple Ruby applications consume excessive memory, questioning the efficiency of Ruby as a programming language for certain applications.
  • Misconceptions: Lai addresses common myths surrounding memory bloat, clarifying that fragmentation and issues related to multi-threading don't account for the majority of the problem. Instead, it's primarily the system's memory allocator that causes this bloating.
  • Statistical Insights: Through synthetic tests, it's revealed that up to 90% of memory usage can be unrelated to Ruby.
  • Solutions: Lai presents several solutions to combat memory bloat, including changing environment variables, utilizing the garbage collector's malloc trim API, or swapping the memory allocator for a more efficient one like jemalloc. He points out that these solutions are not currently integrated into the upstream Ruby, which complicates their adoption.
  • Project Fullstaq Ruby: Introduced as a distribution that includes these solutions pre-compiled, focusing on making it easier for developers to alleviate Ruby memory issues without diving into complex configurations.
  • Sustainability and Community: Lai emphasizes the importance of community involvement and sustainable practices in open source projects, suggesting the use of automation, active recruitment, and comprehensive documentation to foster a healthy community around Fullstaq Ruby.

In conclusion, Lai reiterates that the successful fight against Ruby memory bloat requires broader accessibility to solutions. He encourages developers to think about the democratization of their software solutions and how to maintain healthy, sustainable open source projects that benefit the wider community.

Democratizing the Fight Against Ruby Memory Bloat
Hongli Lai • online • Talk

Date: June 09, 2021
Published: unknown
Announced: unknown

Ruby apps can use a lot of memory, but not for the reasons you think. I've discovered that as much as 70% of the memory usage is not caused by Ruby, but by the system's memory allocator! The good news is that there are technically simple solutions. So why isn't everybody using them? That's the bad news: they're cumbersome to deploy. We can say that the solutions are not "democratized".

I'm on a mission to allow everyone to easily get rid of Ruby memory bloat. In this talk I'll explain where Ruby memory bloat comes from, what the solutions are, and how I'm working on democratizing the solutions in the form of Fullstaq Ruby. I will also talk about my efforts to make Fullstaq Ruby a sustainable open source project.

This talk was delivered at EMEA on Rails, a virtual mega-meetup which took place on June 9, 2021.

EMEA on Rails 2021

00:00:18.480 so uh
00:00:19.920 i i have a philosophy
00:00:22.160 i have a dream
00:00:23.840 that software should be a joy to use and
00:00:26.880 that software development and work
00:00:29.199 should be fun
00:00:31.840 so one day i opened a monitoring
00:00:33.920 dashboard and i sold this
00:00:36.399 this is a simple proxy server written in
00:00:38.800 ruby and it consumes 1.3 gigabytes of
00:00:42.239 memory and i don't know this cannot be
00:00:44.879 right
00:00:45.760 this app is so simple there could not be
00:00:48.640 a memory leak
00:00:50.480 many people suffer from similar problems
00:00:53.920 why could even the simplest of ruby apps
00:00:56.480 suffer from memory blocks this is a
00:00:58.559 really hustling question that has
00:01:01.039 haunted people for a long time
00:01:03.760 now
00:01:04.640 because of the research i've done lately
00:01:08.240 i know where memory bloat comes from
00:01:11.200 and i know that there are solutions
00:01:14.240 but these solutions are cumbersome to
00:01:16.880 deploy
00:01:18.000 which is why
00:01:19.360 relatively few people use them
00:01:22.799 i'm a big proponent of democratization
00:01:26.640 this term means to make something
00:01:29.040 accessible to everyone
00:01:31.439 when there is a benefit it brings me joy
00:01:33.920 if i can make that benefit available to
00:01:36.960 everyone
00:01:38.560 so
00:01:39.759 i am on a mission
00:01:41.280 to allow everyone to easily get rid of
00:01:44.240 ruby memory blows
00:01:46.399 welcome to this talk democratizing the
00:01:48.960 fight against ruby memory clothes uh an
00:01:52.240 unholy lie i found the fusion and i am
00:01:55.119 the author of passenger
00:01:59.439 there were all sorts of rumors about uh
00:02:02.320 why would memory blow happens
00:02:04.719 one rumor is that it is caused by memory
00:02:07.520 fragmentation
00:02:08.959 this means that garbage collector
00:02:11.039 compaction will help
00:02:14.000 now i have actually found that
00:02:15.520 fragmentation is not the biggest problem
00:02:21.040 sorry about that
00:02:24.879 um other rumors say that it is related
00:02:27.920 to multi-threading
00:02:29.920 and that uh tuning the system allocator
00:02:32.720 will help well they have this has some
00:02:34.959 truth in it
00:02:38.000 a while ago i researched the actual
00:02:41.040 calls of ruby memory bloat and i found
00:02:43.680 that it is mostly due to the system's
00:02:47.200 memory allocator and not due to ruby
00:02:51.280 it is a problem that mostly occurs on
00:02:53.280 linux not on other platforms
00:02:56.319 and also it is a problem that mostly
00:02:58.720 occurs in combination with
00:03:00.560 multi-threading
00:03:02.080 so for example puma is much more
00:03:04.720 affected than unicorn
00:03:08.640 i saw in synthetic tests that up to 90
00:03:13.360 of memory usage was not attributed to
00:03:16.000 ruby but to something else
00:03:19.360 90 that's a really staggering number
00:03:24.319 how's this possible
00:03:25.920 it is because the operating systems
00:03:28.080 memory allocator rarely releases memory
00:03:31.280 back to the kernel
00:03:33.200 it likes to keep
00:03:34.799 freed memory around in order to make
00:03:37.519 future memory allocations faster
00:03:40.480 this strategy is optimized for big
00:03:43.760 hardware enterprise workloads but it is
00:03:46.640 not suitable for most ruby apps
00:03:50.959 the problem is magnified in
00:03:53.200 multi-threaded
00:03:54.720 in multi-threaded apps
00:03:56.480 the more threats and virtual cpus that
00:03:59.120 you have the bigger the potential for
00:04:01.200 bloat
00:04:05.760 so to learn more about exactly why
00:04:09.120 bloating occurs
00:04:10.720 please go to my website joyful bike
00:04:13.120 shedding or search for my article what
00:04:16.320 calls us the ruby memory belt
00:04:19.199 that article
00:04:20.959 has a has a video accompanied that
00:04:23.600 describes in detail exactly what is
00:04:25.680 going on and at the end of this talk you
00:04:28.160 will find a link to my website
00:04:32.880 this talk right now is
00:04:35.520 not about the calls of memory bloats but
00:04:38.240 about the solutions
00:04:40.720 there are roughly three ways to solve
00:04:43.440 memory loads
00:04:44.800 one you could set a magic environment
00:04:48.000 variable
00:04:49.440 or uh number two you could have the ruby
00:04:52.560 garbage collector call an api called
00:04:55.120 malloc trim
00:04:56.880 for three you could swap out the memory
00:04:59.199 allocator used by ruby altogether and
00:05:02.000 replace
00:05:03.280 whatever memory allocator it uses by
00:05:05.440 default an operating system with a
00:05:07.520 custom one called je mallock which is a
00:05:10.160 memory allocator that is
00:05:12.479 very efficient and doesn't does not
00:05:14.960 exhibit all these memory bloating issues
00:05:19.840 however
00:05:21.280 neither malotrim nor je mala are
00:05:24.960 currently
00:05:26.320 integrated in upstream ruby
00:05:29.120 and the ration now by upstream is that
00:05:31.840 this issue is fairly operating system
00:05:34.639 specific and also use case specific
00:05:38.479 so a question arises should a solution
00:05:41.680 even be included in upstream
00:05:44.479 now for many users this question may be
00:05:47.759 a bit bustling because so many of us
00:05:50.240 care about this problem
00:05:52.720 but to the upstream developers it makes
00:05:55.759 some sense because they have to care
00:05:58.160 about
00:05:59.120 more than just server use cases or
00:06:01.440 limits
00:06:03.680 anyway the reality right now is if you
00:06:06.720 want to get rid of ruby numbering growth
00:06:09.120 then you need to patch and recompile
00:06:11.919 ruby yourself
00:06:14.720 so this brings us to the route to
00:06:17.600 democratization
00:06:21.120 we have solutions for the memory bill of
00:06:24.319 problems
00:06:25.520 but they are cumbersome to deploy
00:06:28.960 they are not available to everyone
00:06:31.520 this situation is not joyful
00:06:35.280 and that that made me wonder how do we
00:06:37.600 democratize this so that everyone can
00:06:40.240 enjoy the benefits
00:06:43.199 maybe we could learn from past lessons
00:06:47.600 i began using ruby in 2007
00:06:52.560 back then i um i also did not like the
00:06:55.440 fact that my ruby app surfers used so
00:06:58.639 much memory
00:07:00.319 i wanted to apply a trick called
00:07:02.400 preforking to save memory
00:07:05.120 preforking is nowadays mature technology
00:07:08.960 supported by all major app servers such
00:07:11.919 as passenger unicorn and puma it could
00:07:15.360 save about 33 percent memory
00:07:19.199 back then it did not work because the
00:07:21.759 ruby garbage collector was not poppy on
00:07:24.319 white friendly
00:07:28.960 a copy of white friendly meant that
00:07:31.199 every time
00:07:32.560 a garbage collection
00:07:34.240 happens
00:07:35.440 um the way the garbage collector worked
00:07:38.080 basically undid all the memory
00:07:40.160 optimizations that was made possible by
00:07:42.840 preforking so in order to fix that
00:07:46.400 i dived into the ruby source code and
00:07:48.879 made the copyright friendly
00:07:50.960 i then published my patches and wrote
00:07:54.639 a number of blog posts
00:08:00.240 so um
00:08:01.840 even back then i also wanted to
00:08:04.319 democratize this benefit because just
00:08:06.560 having a bunch of patches lying around
00:08:09.039 on the blog it's not enough to really
00:08:11.520 get people to enjoy um
00:08:14.960 enjoy this benefit
00:08:16.879 ruby did not merge this feature for
00:08:18.879 years and years
00:08:20.319 and i wanted people to benefit much
00:08:22.080 sooner
00:08:24.400 so i launched a ruby for called ruby
00:08:27.599 enterprise edition now look at this
00:08:30.319 retrograde design it thinks it brings
00:08:32.640 back so much memories
00:08:35.360 the name of this product was
00:08:37.279 tongue-in-cheek
00:08:39.519 because it made fun of the meme back
00:08:41.919 then that ruby and rails were not ready
00:08:45.279 for the enterprise
00:08:48.000 but ruby enterprise edition was actually
00:08:49.920 an open source product
00:08:52.480 so ruby enterprise edition was very
00:08:55.360 successful
00:08:56.640 and was used by many people for years
00:09:00.160 until finally ruby merged in the feature
00:09:03.200 themselves
00:09:05.680 so time travel back to 2010
00:09:09.120 maybe we can pull off the same strategy
00:09:11.120 again
00:09:13.680 this time we could supply pre-compiled
00:09:17.440 easy to install ruby binaries that have
00:09:20.399 memory allocation patches included
00:09:25.040 enter full stack ruby
00:09:27.600 i launched this project
00:09:29.920 shortly after publishing my research on
00:09:32.959 where ruby memory block comes from full
00:09:36.320 stack ruby is a server-oriented ruby
00:09:39.440 distribution
00:09:40.800 it includes malloc trim and je mello you
00:09:44.320 can choose
00:09:46.320 it focuses on
00:09:48.120 x8664 linux on the server
00:09:51.279 it provides pre-built debian and rpm
00:09:54.720 packages for multiple ruby versions and
00:09:57.440 multiple distributions so that you can
00:10:00.000 install it easily on servers and in
00:10:02.880 containers it also integrates on rbm
00:10:07.040 and as a bonus it allows also updating
00:10:10.000 tiny ruby versions this allows you to
00:10:12.720 easily keep up with ruby security
00:10:14.959 patches so this is unlike when for
00:10:17.839 example you use rpf on the server that
00:10:20.720 every time a
00:10:22.560 tiny ruby version is released you'll
00:10:24.720 need to install that highly ruby version
00:10:26.880 then migrate off of your gems what we do
00:10:29.600 is we
00:10:30.560 allow you to install for example ruby 27
00:10:33.200 or ruby 30 and we take care of upgrading
00:10:35.839 that to the latest tiny version for you
00:10:39.040 then you don't have to worry about
00:10:40.880 changing paths or reinstalling all your
00:10:43.360 gems
00:10:45.519 uh so this this is not a fork
00:10:48.800 it is a
00:10:50.000 distribution
00:10:51.600 uh with some minor with some
00:10:54.399 relatively minor patches applied and it
00:10:56.720 is really focused on a server use case
00:11:00.720 so because the the upstream developers
00:11:03.440 they are hesitant about
00:11:06.320 optimizing too much for specific use
00:11:08.880 cases i thought well we could do it for
00:11:11.440 them we can make a choice we can choose
00:11:14.320 to optimize for one use case a very
00:11:17.920 important use case
00:11:19.600 that is
00:11:22.880 but um
00:11:24.320 but just providing a distribution like
00:11:26.160 this only solves one half of the problem
00:11:30.640 the other half is a temporal problem
00:11:34.399 my long history with open source has
00:11:37.040 taught me something
00:12:06.560 a truly
00:12:08.120 democratized project must be both
00:12:11.279 healthy and sustainable
00:12:16.160 if i look at my past open source
00:12:18.639 projects then
00:12:20.320 one of the challenges they face
00:12:22.560 is that they rely too much on myself
00:12:26.320 not only in terms of time but also in
00:12:28.800 terms of expertise
00:12:31.760 for an open source project to be healthy
00:12:34.800 and sustainable it must not rely on a
00:12:37.360 single person
00:12:54.720 i think that the answer lies in the
00:12:57.120 usage of four pillars
00:13:00.160 one is automation
00:13:02.320 two is community three is knowledge
00:13:05.040 sharing and 4 is active recruitment
00:13:09.120 i will explain what these pillars mean
00:13:11.760 because i
00:13:13.279 i also
00:13:14.800 try to apply them on the full stack ruby
00:13:19.279 automation
00:13:21.279 um
00:13:22.639 is necessary because maintaining open
00:13:25.279 source projects is already hard and time
00:13:27.760 consuming enough
00:13:30.079 you guys might have read a recent blog
00:13:33.040 post by uh sonic i guess that i think
00:13:35.839 that was his username the author of
00:13:37.839 fertilis and he reflected back on 10
00:13:40.639 years of open source and kind of
00:13:42.880 concluded that it is a lot of work
00:13:45.360 especially the more open source projects
00:13:47.680 that you join even if it is just
00:13:50.079 maintenance work fixing bugs responding
00:13:52.320 to issues it is still a lot of work
00:13:54.720 especially when you get
00:13:56.480 when you get older and you have more
00:13:58.560 responsibilities in life maybe kids
00:14:01.760 it just gets harder and harder
00:14:04.079 and so
00:14:05.519 we should invest in automation as much
00:14:08.320 as possible for example investing in a
00:14:11.360 good cicd pipeline
00:14:14.639 all project processes should be codified
00:14:18.079 and
00:14:19.040 the code should be the canonical source
00:14:21.360 of truth about the processes
00:14:23.760 there should be as few manual processes
00:14:26.480 as possible
00:14:28.399 if you do all of this then
00:14:30.720 this makes maintenance much more
00:14:32.639 efficient and scalable as well as less
00:14:35.519 error pro
00:14:38.639 community
00:14:40.320 means you should make yourself redundant
00:14:43.120 as soon as possible
00:14:45.040 you should have to offer as much power
00:14:47.120 as possible to people in the community
00:14:51.199 and be inclusive so that as many people
00:14:53.839 can participate as possible
00:14:56.880 when you can go on an extended holiday
00:14:59.600 while the project keeps running then you
00:15:01.760 have achieved your goal
00:15:04.000 the project should not need you
00:15:08.959 knowledge sharing
00:15:10.639 this pillar means that
00:15:13.519 this filler is necessary because
00:15:15.760 new contributors may not have sufficient
00:15:18.160 expertise or sufficient knowledge of the
00:15:20.800 project's design to be able to
00:15:23.199 contribute
00:15:24.839 efficiently so invest in knowledge
00:15:27.279 sharing
00:15:28.320 teach contributors the skills they need
00:15:31.920 document as much as you can so that
00:15:34.560 anyone can learn what they need to
00:15:36.800 contribute
00:15:38.880 document design concepts
00:15:41.360 processes
00:15:42.720 caveats
00:15:44.480 in order to reduce contribution friction
00:15:48.480 and also active recruitment
00:15:51.360 contributors and maintainers come and go
00:15:54.480 including yourself
00:15:58.560 this is a fact of life
00:16:01.040 rather than fight it you should embrace
00:16:03.600 it
00:16:05.120 your projects should actively recruit at
00:16:07.680 all times
00:16:09.040 make sure to keep a healthy maintainer
00:16:11.199 and contributor pool make sure that
00:16:13.519 everything
00:16:14.720 every row does can be handled or handed
00:16:17.920 over to someone else with minimal
00:16:19.600 friction
00:16:22.720 now take full stack ruby as an example
00:16:25.519 in which i apply these four pillars
00:16:29.120 the release process of fostering ruby is
00:16:31.759 very complicated it builds about 112
00:16:36.720 different packages across six supported
00:16:39.920 linux distributions four ruby versions
00:16:43.120 and three memory allocated for ions
00:16:46.160 the cicd pipeline consists of more than
00:16:48.959 300 jobs
00:16:51.360 if you look at github actions then the
00:16:53.759 pipeline graph does not even fit in the
00:16:56.399 visualization and it takes about an hour
00:16:59.120 to run
00:17:00.800 this sort of release process is totally
00:17:02.959 impossible to manage manually so um
00:17:06.559 pretty early on we
00:17:09.039 have fully automated our release process
00:17:12.720 which is not easy because our our
00:17:14.959 pipeline is also very complicated we had
00:17:17.839 to work around all sorts of in-house
00:17:20.160 actions limitations
00:17:22.480 which makes the pipeline even more
00:17:24.240 complicated and
00:17:27.039 even more in need of being documented
00:17:32.720 there are well-established processes for
00:17:35.440 common maintenance tasks such as adding
00:17:38.480 a new ruby version
00:17:40.559 here is an example of someone who
00:17:42.720 contributed to
00:17:44.240 who contributed ruby 301 support
00:17:47.679 we have a document called how to add a
00:17:50.320 new ruby version
00:17:52.000 the document says basically edit this
00:17:54.320 configuration file and change those
00:17:56.559 numbers
00:17:57.840 so the contributor did and then he sent
00:18:00.240 a pull request
00:18:01.760 the ci then took care of the entire
00:18:03.840 release process it took very little work
00:18:06.720 from both me and him
00:18:10.880 we have a comprehensive development
00:18:14.000 handbook which documents all the
00:18:16.080 important design and architecture
00:18:18.640 aspects as well as all processes and
00:18:21.760 responsibilities
00:18:25.919 everything from how the build pipeline
00:18:28.640 works to how
00:18:30.960 to important caveats in the build
00:18:33.120 processes to how to add support for
00:18:37.120 a new distribution and how to test
00:18:39.120 things locally is
00:18:40.840 documented um
00:18:42.960 we document a clear way of working for
00:18:45.679 team members and we define what all the
00:18:48.200 responsibilities are
00:18:50.080 so with this sort of documentation
00:18:52.799 we make it not only
00:18:54.799 easy and clear for
00:18:56.400 core team members to join but also for
00:18:58.960 more casual contributors to to easily
00:19:02.320 add value to the project to easily
00:19:08.320 fix any maintenance issues without too
00:19:10.880 much input from uh the current core
00:19:13.600 maintainers
00:19:17.039 i have
00:19:18.400 uh once blogged about my long term
00:19:21.440 vision for full stack ruby my vision is
00:19:24.720 for it to become an official part of the
00:19:27.120 ruby project now that would be the
00:19:29.679 ultimate democratization because
00:19:32.240 really everybody can benefit
00:19:36.559 so in conclusion
00:19:39.200 to fight ruby memory bloat it is not
00:19:42.799 enough to address the problem directly
00:19:46.400 every problem lives inside a much bigger
00:19:48.960 domain that of availability
00:19:52.559 i believe that software should be a joy
00:19:55.120 to use and that software development and
00:19:57.600 work should be fun
00:20:00.000 it is not joyful when only a small
00:20:02.240 number of people can enjoy a benefit
00:20:05.760 until a solution is democratized it is
00:20:09.120 not concrete for most people it is no
00:20:12.480 better than an abstract idea
00:20:17.760 if you also publish software solutions
00:20:20.799 you may want to take a moment and wonder
00:20:23.919 how do you
00:20:25.200 make
00:20:26.480 your solutions available to everyone
00:20:29.360 how do you democratize the solutions
00:20:32.320 how how do you keep your projects
00:20:34.480 healthy and sustainable
00:20:39.200 i am only live
00:20:41.120 keep up the good fight
00:20:42.960 thank you
00:20:57.440 thank you thank you
00:21:00.240 do we have any questions we do have some
00:21:01.919 time for question um
00:21:04.000 we have about 10 minutes until we start
00:21:06.080 the break um and then at 705
00:21:09.120 uh we're going to do the breakout
00:21:11.120 sessions and then at 7 35 we have the
00:21:13.440 you know anyone who wants to ask
00:21:14.960 questions we have about 10 minutes
00:21:20.000 uh i have a question
00:21:22.799 what's the stance of
00:21:24.640 ruby core maintainers on
00:21:27.120 having some of these features behind
00:21:29.360 some
00:21:30.480 runtime flags
00:21:33.679 um they do they do not
00:21:36.400 uh
00:21:37.679 i i do not sense a lot of real interest
00:21:41.520 so it it's kind of like they are um
00:21:46.640 my interpretation is that is that they
00:21:49.039 they would rather not have feature flags
00:21:51.200 like that
00:21:52.159 it's either completely in ruby as an
00:21:54.880 officially supported thing um
00:22:00.640 or not at all
00:22:02.880 so it's not entirely clear what it is
00:22:05.280 it's
00:22:06.159 it seems that they are um
00:22:09.679 they are hesitant about making the
00:22:11.120 decision on this
00:22:14.799 right
00:22:15.679 i remember reading the conversation
00:22:17.360 around j.e malloch
00:22:19.760 and i think one of the biggest concerns
00:22:22.240 was
00:22:23.440 was the project going to vendor j.e
00:22:25.200 molock or if it was going to create an
00:22:28.080 external dependency how they were going
00:22:30.400 to manage that changes to that project
00:22:33.280 et cetera right that was yeah one of the
00:22:35.760 concerns but for trim it might be
00:22:37.760 simpler right yeah for trim it it might
00:22:40.720 be simpler for je malock the situation
00:22:43.440 is actually even more complicated than
00:22:45.520 that because the
00:22:47.200 uh only j email up version 3
00:22:51.440 is good at solving the memory block
00:22:53.840 problem j email up version 5 performs
00:22:57.760 way worse
00:22:59.360 and it is not entirely clear why it is
00:23:02.480 on my
00:23:04.880 on my to-do list to one day investigate
00:23:08.320 the differences between three and five
00:23:10.720 to see where the issues come from maybe
00:23:12.799 we can fix that
00:23:14.240 uh but uh
00:23:16.240 but until then saying that yeah we
00:23:18.480 should be able to use jd malloc but only
00:23:20.480 version three it
00:23:23.600 that just doesn't sound good to anybody
00:23:25.760 and especially uh linux distributions
00:23:28.880 will hate you when you uh vendor j email
00:23:32.640 or 3 when they have version 5 installed
00:23:35.440 so that just makes the problem much more
00:23:37.520 complicated
00:23:40.000 yes thank you
00:23:43.840 yeah but for full second ruby we don't
00:23:45.360 have a problem and we just made a
00:23:47.360 decision
00:23:49.360 are we gonna include j
00:23:51.360 does it need j eval of three in order to
00:23:53.440 perform optimally okay let's do that uh
00:23:55.520 it doesn't matter what uh
00:23:57.360 what the linux distributions think
00:24:00.640 because it's because we are a
00:24:02.159 third-party package provider
00:24:06.000 makes sense
Explore all talks recorded at EMEA on Rails 2021
+4