Concurrency in Ruby: Threads, Fibers, and Ractors Demistified


Summarized using AI

Concurrency in Ruby: Threads, Fibers, and Ractors Demistified

Magesh S • September 12, 2024 • Sarajevo, Bosnia and Herzegovina • Talk

Concurrency in Ruby: Threads, Fibers, and Ractors Demystified

Magesh S presents an informative talk on concurrency in Ruby, exploring its evolution and the newly introduced concurrency features like threads, fibers, and Ractors. The discussion emphasizes the importance of utilizing CPU resources efficiently and boosting Ruby application performance. Here are the key points covered in the talk:

  • Understanding Concurrency vs. Parallelism

    • Concurrency is about managing multiple tasks at once without executing them simultaneously.
    • Parallelism executes multiple tasks simultaneously.
  • Historical Context

    • Ruby traditionally relied on threads for concurrency, which many developers found challenging.
    • The release of Ruby 3.0 introduced fibers and Ractors as enhancements to address concurrency in a more user-friendly and efficient manner.
  • Threads

    • Allow concurrent execution in Ruby but not true parallelism due to the Global Interpreter Lock (GIL), which limits execution to one thread at a time in MRI Ruby.
    • Example: Magesh demonstrates how threads can be used to speed up I/O-bound operations like API calls, reducing execution time significantly.
    • Challenges with threads include race conditions and deadlocks, requiring careful synchronization through locks to avoid inconsistencies.
  • Fibers

    • Light-weight, user-managed alternatives to threads that allow for more efficient concurrency without the overhead of OS-level context switching.
    • Magesh shows how fibers can be utilized for I/O-bound tasks while providing a structured way to yield control and resume execution, enhancing performance.
    • Cautions regarding fibers include the need for proper management of control yielding to avoid starvation between fibers.
  • Ractors

    • Introduced for true parallel processing in Ruby, allowing multiple light-weight processes to operate simultaneously without shared memory, thus avoiding race conditions altogether.
    • Example use case: Magesh illustrates processing large CSV files across multiple Ractors for parallelized performance improvements.
    • Limitations include immutability requirements for shared data and the current experimental nature of Ractors with some library compatibility issues.

In conclusion, Magesh emphasizes that:
- Threads and fibers are best for I/O-bound tasks, while Ractors are suited for CPU-intensive operations.

- Understanding the strengths and trade-offs of threads, fibers, and Ractors helps developers choose the right approach for their Ruby applications. The talk encourages developers to experiment with these concurrency features to enhance application efficiency.

Concurrency in Ruby: Threads, Fibers, and Ractors Demistified
Magesh S • Sarajevo, Bosnia and Herzegovina • Talk

Date: September 12, 2024
Published: January 13, 2025
Announced: unknown

Speed up your Ruby applications with the power of concurrency! Join us as we demystify threads, fibers, and Ractors, understanding their unique strengths, use cases, and impact on Ruby performance. Learn to tackle I/O bottlenecks and enable parallel execution, boosting your code’s speed.

For a very long time, Ruby had limited options for concurrency, mainly relying on threads, which developers often dreaded. However, with the exciting updates introduced in Ruby 3.0, the fiber scheduler and Ractors provided a remarkable 3x performance boost. Despite these advancements, few have used these features in production. But now, we can confidently say that they are ready for use.

EuRuKo 2024

00:00:10.240 so hey folks
00:00:14.400 nice all right so let's get started so
00:00:19.119 my talk today is going to be about
00:00:22.320 concurrency right so before I start just
00:00:25.599 a quick show off hands how many of you
00:00:27.439 have written concurrent code using
00:00:29.439 threads or F wow that's a lot I think my
00:00:33.160 work is done thank you so
00:00:36.480 much no I think there are others who you
00:00:39.760 know still need to know
00:00:41.879 awesome so before we start we need to
00:00:44.879 know like why we are talking about
00:00:47.680 concurrency
00:00:50.840 right so if you look at the chart at the
00:00:55.039 left you know over the last 10 or 20
00:00:58.039 years you see the CPU you know improving
00:01:01.000 a lot uh right so we went from 1 GHz to
00:01:04.680 uh four and five now you know we're
00:01:06.560 exceeding it um the CPU power has been
00:01:10.000 like you know improving the hardware
00:01:11.360 that you use is really uh awesome these
00:01:14.119 days um but you know uh most of us are
00:01:17.000 not really writing a lot of uh uh you
00:01:19.960 know concurrent code all right so uh
00:01:22.759 here's a rough chart on the right side
00:01:25.079 it is not very accurate um but you know
00:01:27.880 when you're running code that is not
00:01:30.759 uh concurrent you're probably just
00:01:32.720 utilizing 25% or 30% of your uh computer
00:01:36.720 right depending on what type of CPU you
00:01:39.399 have right because if you have a quad
00:01:42.240 core you're only running One Core and
00:01:44.159 the others are still uh you know not
00:01:46.079 used efficiently right uh so if you want
00:01:49.360 to use your resources efficiently and
00:01:52.360 you know get up to speed um you know
00:01:54.600 it's good to explore
00:01:57.560 concurrency right um so today I'm going
00:02:00.200 to be talking about all these um
00:02:02.439 features that Ruby offers um for
00:02:05.119 concurrency and parallelism like you
00:02:07.520 know threads fibers and reactors I'm
00:02:09.759 just it's just a getting started kind of
00:02:11.800 a thing to demystify for
00:02:14.879 others right a little about me um so I
00:02:18.040 work for this company called rails
00:02:20.040 Factory a consultant there and I've been
00:02:22.680 uh working on in this industry for 13
00:02:24.680 plus years um I am mostly ruby/ react uh
00:02:29.120 coder right and uh before this I was
00:02:31.879 running my own startup a very small
00:02:33.879 consulting company for almost uh 6 years
00:02:37.200 then I quit to pursue my passion for
00:02:39.519 teaching and helping people right and
00:02:43.080 thank you and during my free time I also
00:02:46.360 love attending conferences meetups and
00:02:48.560 you know I organize uh meetups for this
00:02:51.000 local community in Chennai and uh you
00:02:53.680 can find me as I mesh on social media
00:02:56.519 Twitter Master on many places um
00:03:00.040 LinkedIn so this is uh me wife and
00:03:04.120 one-year-old
00:03:05.480 daughter this is my uh work team uh at
00:03:09.440 rails Factory this is just the people
00:03:11.720 that we have in Chennai city alone not
00:03:13.879 including people from other cities or
00:03:15.560 other
00:03:17.200 countries all right
00:03:19.400 so um do we know what is concurrency and
00:03:24.239 parallelism I have some pictures here so
00:03:27.680 this should explain uh concurrency for
00:03:30.599 you right you know although it looks
00:03:33.200 like they're all doing this at the same
00:03:35.560 time they're not
00:03:38.680 right so another example is this so This
00:03:41.480 Is How We Define it you know concurrency
00:03:43.519 is about dealing with a lot of things
00:03:46.599 not doing a lot of
00:03:49.400 things when parallelism is about doing a
00:03:53.040 lot of things at the same time
00:03:57.560 right uh so another uh example for
00:04:02.480 parallelism so now does Ruby support
00:04:05.439 both concurrency and
00:04:08.519 parallelism yeah we'll find out right so
00:04:10.879 we have thread fiber and reactor we need
00:04:13.200 to see which one is concurrent which one
00:04:14.920 is parallel
00:04:17.000 these um a little bit of uh Basics let's
00:04:20.040 say um so if I want to achieve
00:04:22.560 parallelism right or concurrency I can
00:04:25.440 create multiple processes right um so I
00:04:28.560 can just go on and create multiple
00:04:30.199 process it will be concurrent and
00:04:32.240 parallel but the thing is uh processes
00:04:34.840 are very expensive right it uses a lot
00:04:37.080 of memory and it it doesn't share memory
00:04:40.280 between the processes right so there are
00:04:42.560 some caveats what if I want to share
00:04:44.759 memory or send messages right so that's
00:04:47.720 one thing processes are expensive so
00:04:50.880 then what is the next option like can I
00:04:53.080 try creating threads from my process
00:04:56.320 right so that might be uh less expensive
00:04:59.280 which Mak is a good option and thread
00:05:01.919 also shares memory right so it could be
00:05:04.759 a good thing for
00:05:07.360 me uh so using threads and Ruby can we
00:05:10.520 achieve concurrency or
00:05:13.560 parallelism um concurrency yes
00:05:16.680 parallelism not so much so here I'm
00:05:19.039 talking about the MRI Ruby not the J
00:05:21.000 Ruby because in J Ruby it might still be
00:05:23.199 uh possible right so in the M Ruby we
00:05:26.039 have the global interpretor lock uh that
00:05:28.479 doesn't allow par
00:05:30.759 ISM which means if you have multiple
00:05:33.560 threads the gvl only allows one thread
00:05:36.639 at a time to execute right so it will
00:05:38.680 not allow multiple threads to work at
00:05:41.000 the same time so let's look at a simple
00:05:43.400 uh thread
00:05:47.400 example
00:05:51.919 yeah
00:05:55.199 okay um so let's say for the I have a
00:05:58.280 Ser simple code uh to showcase this so
00:06:01.639 let's say I want to make some few API
00:06:03.800 calls to maybe process some data from
00:06:06.240 that or it could also be like you know U
00:06:08.919 scraping from the web um in this example
00:06:11.759 I just want to uh scrape some basic
00:06:13.919 information like title from them okay um
00:06:17.639 so one way is um like you know using
00:06:19.639 threads I could actually do things
00:06:21.759 concurrently so that it's little faster
00:06:24.759 okay just a simple example where we can
00:06:26.599 use no agree to scrape the uh title and
00:06:30.160 this is what I would normally do right
00:06:32.039 uh so thread. new is the syntax that you
00:06:34.160 would use and I can create one thread
00:06:36.520 for each uh API I can uh run it
00:06:39.039 simultaneously right I mean concurrently
00:06:41.919 at the end you use threads uh. join to
00:06:45.039 actually collect all the results and get
00:06:47.880 get the ones right so it works for uh
00:06:50.919 simple cases when you're doing this
00:06:53.039 right uh but what if I have something
00:06:55.680 more complicated right so let's say I'm
00:06:58.560 using something uh for transactions and
00:07:01.400 so on at that time you know you might uh
00:07:04.599 have some issues like for
00:07:07.440 example um let's say I have two bank
00:07:09.879 accounts A and B um and I deposit some
00:07:12.960 money like 1,00,000
00:07:15.039 maybe uh and then I want to simulate
00:07:17.879 this multiple transaction happening at
00:07:20.039 the same time using multiple threads
00:07:22.319 right so I just run a loop created
00:07:24.720 multiple threads and I'm trying to
00:07:26.400 transfer from account A to B say $50
00:07:30.240 right um so here you can imagine the
00:07:33.479 transfer method could be doing some API
00:07:36.400 calls or database calls to you know
00:07:39.680 credit or debit uh from these two
00:07:42.440 accounts uh so again I'm going to
00:07:44.280 simulate the same thing uh but this time
00:07:46.319 transferring from B to a right so
00:07:49.199 finally just collect all the results and
00:07:51.199 see uh what it prints right um so so
00:07:55.280 this is what I got you know every time I
00:07:57.120 run this um if you see I'm not getting
00:08:00.199 the expected result here you know 780
00:08:03.199 and 1,50 is not the right so I try to
00:08:05.680 run it again and uh um you know I did
00:08:09.039 multiple times to see how often does it
00:08:11.440 get it right you know so second time
00:08:13.960 again same thing the actual result
00:08:15.960 should have been 900 and 1,00 but I got
00:08:18.479 950 and, 220 right so what is the
00:08:22.919 problem with my code right one was the
00:08:26.080 uh the race condition right so using
00:08:27.919 threads most of the time we might run
00:08:29.879 into uh race condition let's say I want
00:08:32.320 to have multiple threads process some
00:08:34.640 data and I want to write it back to a
00:08:36.440 file or a database you know both of them
00:08:39.120 can actually raise uh when they're
00:08:40.760 sharing the state or a memory right so
00:08:44.279 that's one uh problem that you might run
00:08:46.720 into and another chances is the deadlock
00:08:50.320 situation right we'll explain that a
00:08:52.240 little bit um so how do we solve the
00:08:55.080 current problem that we saw racing issue
00:08:58.320 um I could probably you know
00:08:59.800 synchronized I can lock account A and B
00:09:02.519 and then transfer so that the when
00:09:04.720 thread one is running thread two doesn't
00:09:07.480 actually um you know do the same thing
00:09:10.720 right uh so one thread can lock and the
00:09:12.839 other can wait right and to avoid
00:09:16.519 deadlocking I can actually um sort the
00:09:19.120 order in which I lock you know um so
00:09:22.160 simple things uh that I can
00:09:25.000 do right so
00:09:27.839 um sorting the order of the accounts um
00:09:32.760 right uh and then making sure that I
00:09:35.440 also do things automically meaning when
00:09:37.480 I do the withdrawal and deposit it has
00:09:39.600 to happen automatically otherwise you
00:09:41.519 know that can also cause uh trouble and
00:09:44.519 then finally unlocking
00:09:46.200 it okay so it gets a little complicated
00:09:49.800 uh when we do this with threads unless
00:09:51.640 we are very good at you know uh doing
00:09:53.959 these things ourselves when you write
00:09:55.680 complex uh code with
00:09:58.160 threads and and there are other
00:10:00.600 libraries that can help you you know
00:10:01.920 like concurrent Ruby I saw uh it offers
00:10:04.079 a lot of good data structures and
00:10:07.120 methods for you to efficiently do these
00:10:09.399 things you know automically executing
00:10:11.600 stuff yeah I think let's go
00:10:19.680 back my
00:10:23.399 presentation
00:10:25.839 yes so yeah
00:10:30.120 so so I tried some basic API calls um
00:10:33.440 you know just to show how how it is done
00:10:36.200 uh right so when I'm running it
00:10:37.560 sequentially uh those API calls were
00:10:39.760 able to return in like 2.5 seconds but
00:10:42.600 when I'm doing it with thread I was able
00:10:44.120 to do the same thing in
00:10:45.519 0.14 um you know seconds so threads uh
00:10:48.399 can definitely save time when I'm
00:10:49.880 running a script that is going to do
00:10:52.040 some IO related uh you know
00:10:56.120 operations and what are the problems
00:10:58.399 with threads threads is good it can
00:11:00.200 definitely help you do things
00:11:01.480 concurrently uh for iob bound activities
00:11:05.560 right but the problem is you need to
00:11:06.880 learn to um synchronize or use the
00:11:09.880 mutual the mutex lock and Etc and you
00:11:13.200 have to also be wary of the race
00:11:16.040 condition and Deadlock right and
00:11:18.600 especially when you're writing to a file
00:11:20.240 or you know um when managing multiple
00:11:22.839 threads there is this problem of data
00:11:25.160 inconsistency that can come up we need
00:11:27.320 to manage that as well and then not to
00:11:30.320 mention the Contex switching which makes
00:11:32.880 threads um little expensive because uh
00:11:36.399 here the operating system is involved to
00:11:38.600 make the contact switching from thread
00:11:40.000 one to thread 2 right so which is still
00:11:44.079 good so now the next option um in this
00:11:47.800 is fibers right so let's look at what
00:11:50.240 fiber can
00:11:51.800 do okay so this is a basic syntax uh
00:11:55.079 that you might use for fibers right U
00:11:58.000 fibers are you know you can can think of
00:11:59.639 it like cortines um you know lightweight
00:12:03.399 uh threads sort of uh so you can create
00:12:06.079 as many fibers you want by using the
00:12:08.320 syntax f. new within a do block you can
00:12:10.720 put your actual code whatever you want
00:12:12.639 to do and then here the Contex switching
00:12:16.120 happens at the developer side you know
00:12:18.399 me as a developer have to make sure that
00:12:20.839 I pause a particular fiber and then do
00:12:23.560 the contact switching right so when I
00:12:25.199 make one API call I could actually pause
00:12:28.399 while the system system is waiting for
00:12:30.160 the results to come and I can move to
00:12:31.880 the second fiber right so using fiber.
00:12:34.440 resume I can again go back to the old
00:12:36.720 fiber to complete the task right um so
00:12:40.120 are fibers better than
00:12:42.560 threads let's see yes so creating 100
00:12:45.880 fibers a lot faster than creating 100
00:12:48.199 threads uh right and the contact
00:12:50.279 switching is not happening at the OS
00:12:52.720 level so it makes it less expensive when
00:12:55.320 compared to uh threads right so it's
00:12:58.199 okay if you're handling multiple a
00:13:00.040 couple of threads uh that might be good
00:13:03.000 to go but then when you're doing dealing
00:13:04.720 with hundreds of them or maybe more
00:13:06.880 fibers could actually be a very good use
00:13:09.079 case um let's quickly look at a fiber
00:13:16.600 code
00:13:20.800 sorry
00:13:23.680 yeah right so the same simple example
00:13:26.639 that I have um this is a full code
00:13:30.440 now this is the block where we have the
00:13:32.839 fiber syntax right uh so I can do fiber.
00:13:36.560 new and then put some sample code inside
00:13:38.519 just to see how it works right um so
00:13:43.160 here if you see in inside the fiber I'm
00:13:45.560 using fiber. yield so this will pause
00:13:48.199 the particular system and then it can
00:13:50.320 particular fiber and move on to the next
00:13:52.519 one right uh so that's it and then this
00:13:55.920 is the main program where everything
00:13:57.320 starts uh right so it starts from this
00:14:00.880 line you know first it puts the starting
00:14:03.240 the fiber message and then next is you
00:14:06.000 know it calls fiber. resume which means
00:14:09.279 this is when the fibers are actually
00:14:10.720 getting executed right so fiber. new and
00:14:13.800 then it starts with line number three
00:14:15.720 prints fiber started and then when it
00:14:18.600 comes to line number four it yields
00:14:21.000 which mean it pauses here and then it
00:14:22.839 gives back control to the main program
00:14:25.240 right and if you see when it gives back
00:14:27.519 the control to the main program
00:14:29.600 uh it actually starts from line number
00:14:31.639 11 not from nine again uh because it
00:14:34.399 remembers the context right it remembers
00:14:36.959 from where it paused earlier and it can
00:14:39.639 go back to it right so it prints this
00:14:41.959 now to yeah result uh result is whatever
00:14:45.680 that we sent out from the fiber you know
00:14:47.839 along with the yield
00:14:49.560 method and now you see um when I use
00:14:52.240 fiber. resume we are sending it back to
00:14:54.600 the fiber again now again it remembers
00:14:58.480 where the execution was paused earlier
00:15:01.120 so now it can start from here right fire
00:15:04.000 assume and it finally returns this
00:15:05.839 particular uh
00:15:07.240 message okay so when you see this print
00:15:09.759 message you'll also get the return that
00:15:13.040 was sent from the
00:15:14.839 fiber okay so basically this would be
00:15:18.000 the
00:15:20.720 output um so I can do the same thing uh
00:15:23.680 with the previous example that we saw
00:15:25.600 that is like web scraping or if you want
00:15:27.519 to do some API calls writing a script to
00:15:29.639 do some uh
00:15:31.920 scraping okay bunch of URLs that I need
00:15:34.839 to scrape from uh the same similar code
00:15:37.920 using noag uh but here I don't have to
00:15:41.600 actually use Fiber myself to create
00:15:44.199 those syntax right um it can be slightly
00:15:47.519 uh complicated if I do it myself as a
00:15:49.319 beginner so I can use gems like this you
00:15:51.639 know async gem is wonderful actually um
00:15:54.399 so I'm using it here to do the same
00:15:56.399 thing and I'm also using an AC HTTP
00:15:59.600 Library which is nonblocking you know
00:16:02.000 which is async in nature because while
00:16:03.920 using this uh you might be using an
00:16:06.279 older um gem which might not be
00:16:09.519 compatible with fiber right because it
00:16:11.519 could still be blocking you might think
00:16:13.639 that I'm using a async thing but it's
00:16:16.279 not completely async uh right it's not
00:16:18.680 truly async so we need to use uh
00:16:21.680 libraries that are compatible with these
00:16:23.480 uh fibers so here I chose async along
00:16:25.959 with async
00:16:27.000 HTTP uh right so we we start with this
00:16:30.000 block it starts basically a event Loop
00:16:33.600 and then
00:16:35.600 um you know I have task. async and
00:16:38.720 inside that I can have uh my little code
00:16:41.319 so this is like uh creating multiple
00:16:44.360 subtask within the main task right so
00:16:46.560 these are all fibers basically so you
00:16:48.279 can create as many fibers as you want
00:16:50.600 like this and this async gem handles the
00:16:53.360 scheduling that is passing the fiber and
00:16:55.759 then you know sending it off
00:16:57.240 interestingly Ruby uh 3 3.0 I think uh
00:17:00.199 actually gave away this um um way to
00:17:03.319 create scheduler which gives you some
00:17:05.319 hooks uh through which we can find out
00:17:07.160 when there is an IO weight happening at
00:17:08.799 the kernel and I can do some thing like
00:17:11.880 switching from one fiber to the other
00:17:14.480 right so yeah so this
00:17:18.480 works now let's look at the um other
00:17:26.120 option yeah okay uh while running the f
00:17:29.320 same thing here so I ran some 60
00:17:31.120 different apis um and here also we could
00:17:33.640 see a very good uh you know Improvement
00:17:35.960 when compared to sequentially running it
00:17:37.919 versus using fiber for concurrently uh
00:17:40.240 doing things all right using the acing
00:17:42.440 gem and so on um right so it came down
00:17:45.240 from 6 seconds to uh 3.8 seconds which
00:17:48.360 is good uh this is the gem that I was
00:17:51.480 talking about async I think there's
00:17:53.360 another talk uh today by Bruno I think
00:17:55.799 about this so if you're interested you
00:17:57.159 should check this out uh AC sync so this
00:18:00.559 gem also gives you uh these other tools
00:18:02.840 that you can work with you know async
00:18:04.360 HTTP I've used is a non-blocking uh you
00:18:07.600 know calls HTTP calls there is also
00:18:09.840 async rspec if you want to do uh you
00:18:12.039 know concurrently run your rpcs to make
00:18:14.480 it faster you could use this and there
00:18:16.520 are bunch of other tools that you can
00:18:18.039 also explore for sockets and all now
00:18:21.480 what is the catch with
00:18:23.280 fiber so here the contact switching is
00:18:25.760 not happening at the OS level which
00:18:28.280 means the OS is not going to
00:18:30.280 preemptively you know uh change the
00:18:32.720 context but you as a developer have to
00:18:35.120 do it yourself right when I have 10
00:18:37.720 different fibers running I need to know
00:18:40.000 when to pause and when to resume right
00:18:42.320 to make sure that all the fibers get
00:18:43.880 their turn right uh so fibers they're
00:18:48.559 not reallyu uh parallel right so they
00:18:51.799 can achieve concurrency but not parall
00:18:54.360 parallelism and one more problem that
00:18:56.480 can happen is basically starvation that
00:18:59.039 is when I have two fibers one of them is
00:19:02.440 uh running the other one is just waiting
00:19:04.320 for it to yield control right so unless
00:19:06.360 the fiber themselves they yield control
00:19:09.159 other fibers will still be waiting
00:19:11.080 basically starving them right uh so we
00:19:13.720 need to write code that will do this
00:19:16.320 efficiently if you're using
00:19:19.039 fiber
00:19:20.799 um okay so the next thing is Rector so
00:19:24.039 so far we saw threads and fiber uh both
00:19:27.760 can achieve concurrence see but they're
00:19:29.280 not truly parallel right so Rector would
00:19:33.360 be your option through which we could
00:19:35.559 achieve
00:19:37.000 parallelism uh so this is a basic syntax
00:19:39.960 that Rector uses right so same like
00:19:42.200 fiber you can do reactor. new do and put
00:19:45.000 your code inside um but here you could
00:19:48.360 think of this as like um like a
00:19:51.120 lightweight processes of sort you know
00:19:53.159 similar to processes this might uh not
00:19:56.120 actually share memories if you have
00:19:57.799 multiple rectors running it will not
00:19:59.640 share memory because it's isolated
00:20:03.080 okay but we have a way to pass messages
00:20:06.919 between the reactors uh so one reactor
00:20:09.360 can talk to the other by using send
00:20:11.640 receive or yield and take right so this
00:20:14.159 way you can do it and if you want to
00:20:15.679 send some data within the Rector you
00:20:18.360 have to make sure that they are
00:20:19.640 immutable right so if you're passing
00:20:21.480 something like a array or object it has
00:20:23.919 to be deeply Frozen uh to make sure this
00:20:26.679 works and right now it's an experimental
00:20:29.840 uh thing I guess so many of the librar
00:20:32.320 still doesn't support so if you try to
00:20:34.200 access some Library within the reactor
00:20:36.400 you might still face uh issues so you
00:20:38.039 have to manage that your
00:20:41.280 own okay so these are some of the uh key
00:20:43.880 features of that isolated memory which
00:20:46.400 means you don't have to run into race
00:20:48.080 condition right um so that's a good
00:20:51.559 thing here and communication like you
00:20:53.360 know you can pass data from one to other
00:20:55.320 and collect all the data at the end so
00:20:57.400 that's good and the last one is actually
00:20:59.520 um good since rectors do not share
00:21:02.600 memory you know you don't need to worry
00:21:04.240 about the uh gvl lock which means I can
00:21:07.400 have multiple reactors running
00:21:09.159 simultaneously at the same time unlike
00:21:11.039 threads right uh but although reactor
00:21:14.279 has a reactor level uh lock which means
00:21:17.840 within a Rector I can still only create
00:21:20.520 one thread run only one thread can't run
00:21:22.600 multiple thread within a single reactor
00:21:24.600 but through various reactors I can have
00:21:27.320 a number of uh you know threads running
00:21:29.720 right so that gives you true parallelism
00:21:32.480 running things simultaneously so if you
00:21:34.320 have four CPU processor you could have
00:21:36.320 four things doing at the same time using
00:21:38.640 the uh CPU more
00:21:40.679 efficiently right so let look at some
00:21:42.840 sample stuff
00:21:48.080 here
00:21:49.720 yeah um so let's say for the case of
00:21:52.600 this example um I have to process a CSV
00:21:56.600 file which is large okay I have a lot of
00:21:59.120 data in it um so here in this example
00:22:01.400 I'm just generating some uh dummy data
00:22:04.080 for us to just simulate this okay but in
00:22:07.200 in an actual scenario you could have a
00:22:09.720 CSV uh that you need to process which
00:22:12.200 has some data maybe you have to
00:22:13.720 calculate some price or maybe do
00:22:15.159 something more CPU intensive
00:22:18.240 right uh so here that's basic example
00:22:21.400 I'm just trying to uh do some price
00:22:23.440 calculation and
00:22:26.000 Etc okay so this is what uh I might do
00:22:28.840 you know I can just put a Rector do new
00:22:31.320 uh block within which I can uh send the
00:22:33.640 data uh so if I have millions of rows in
00:22:35.880 my CSV I can just divide them into
00:22:38.120 chunks of data and then each chunks can
00:22:40.679 be sent to uh one reactor right so if
00:22:43.760 there are multiple reactors multiple uh
00:22:46.120 chunks can be sent to them and processed
00:22:48.440 simultaneously uh allowing you to
00:22:50.360 actually do things uh faster all right
00:22:53.600 so this is that um uh so here you know
00:22:57.840 you see
00:22:58.960 um receiving a chunk the Rector receives
00:23:01.320 the U chunk of product here okay and
00:23:04.720 then after finishing the execution it
00:23:06.559 will actually yield sending you the data
00:23:08.919 the processed data at the end which can
00:23:11.480 be collected um towards the end to print
00:23:14.000 the uh result here
00:23:17.520 right
00:23:19.440 yeah so here uh here we actually conat
00:23:22.360 and you know we get all the uh end
00:23:25.480 result Okay so
00:23:29.600 that is
00:23:30.919 that going back to my
00:23:34.840 code uh so so this is what I did I
00:23:38.520 actually tried processing a
00:23:40.679 CSV um but you know uh Rector might be
00:23:44.559 good for CPU intensive task but it's not
00:23:46.720 actually good for the io intenso so
00:23:48.760 which means here when I'm reading the
00:23:50.200 CSV it's not actually uh helping there
00:23:53.360 but it helps me in processing the CSV uh
00:23:56.000 data right so I created uh C PSV with
00:23:58.960 millions of rows and did some
00:24:01.080 calculation on that here we could see uh
00:24:03.640 the execution difference right at the
00:24:05.520 bottom you see SQL uh execution was 4.7
00:24:09.640 seconds and reactor execution was 1.6
00:24:11.799 seconds right so this these are just
00:24:13.600 basic simple examples that I'm showing
00:24:16.000 you but imagine if you could do this uh
00:24:18.440 for your actual task a complex task that
00:24:20.320 you're running let's say you want to uh
00:24:22.520 migrate a database or you want to do
00:24:24.960 something um you know with data you
00:24:26.840 could use fibers or very efficiently to
00:24:29.760 do things much faster rather than
00:24:31.320 spending hours into uh doing such task
00:24:35.919 right yeah so uh so this could give you
00:24:39.760 like 2x 3x performance which is really
00:24:42.880 good and problems with Rector uh will
00:24:46.679 come because of the isolated memory
00:24:49.640 right so sharing data is difficult
00:24:51.520 because you have to make sure that they
00:24:52.799 are truly immutable uh right so Rector
00:24:56.080 does give you uh options to make
00:24:59.120 um you know copy data or maybe make it
00:25:01.520 immutable uh so but you still have to um
00:25:04.279 you know work with that otherwise you
00:25:05.960 might see errors like you know uh you
00:25:07.760 cannot uh share data something like that
00:25:10.039 gets thrown back at you right so limited
00:25:13.440 objects that you can share uh many of
00:25:15.440 the gems are not compatible but I'm
00:25:17.399 pretty sure that you know things will
00:25:19.000 change and we will see a lot of uh gems
00:25:21.559 doing this um right yeah so that's about
00:25:25.440 the whole concurrency thing so with
00:25:27.799 threads fibers and tractors what do we
00:25:30.440 use you know um so threads and fibers
00:25:33.360 can give you concurrency um you you can
00:25:35.520 choose so if it's just like a handful of
00:25:37.960 uh threads that you want to create and
00:25:39.640 do something concurrently you're good
00:25:42.120 with that but if you want to do more
00:25:43.600 than that you you know the operating
00:25:45.640 system uh doesn't allow you to create a
00:25:48.200 lot of uh threads like that right and
00:25:49.919 it's difficult to manage them also the
00:25:51.919 context switching becomes very expensive
00:25:54.159 memory is a problem there uh right so
00:25:56.600 threads will get difficult at at a point
00:25:59.240 when fiber can actually help you you
00:26:01.000 know you can create just hundreds and
00:26:02.240 thousands of them without a problem um
00:26:05.159 reactors uh can be used for CPU
00:26:08.760 intensive task right when threads and
00:26:10.720 fibers are used for Io intensive like
00:26:12.760 reading from a file or uh sending DB
00:26:15.320 calls or network calls uh your system is
00:26:17.799 basically waiting right if you're
00:26:19.000 writing sequential code the system is
00:26:21.600 just waiting for the API to return
00:26:24.000 something um so here you could just use
00:26:27.360 the time to currently process
00:26:30.360 more
00:26:32.240 yeah that's about it thank you so much
00:26:35.279 folks you can connect with me at imish
Explore all talks recorded at EuRuKo 2024
+39