Summarized using AI

Mechanical sympathy, or: writing fast ruby programs

Tim Kächele • September 11, 2024 • Sarajevo, Bosnia and Herzegovina • Talk

In the talk titled "Mechanical Sympathy, or: Writing Fast Ruby Programs," Tim Kächele explores the concept of optimizing Ruby programs by understanding the underlying hardware. The notion of "mechanical sympathy" initially drawn from race car driving, emphasizes operating a system effectively by knowing how it functions. Kächele illustrates how race car drivers, particularly Jim Clark, who knew the intricacies of their vehicles, could achieve superior performance. Similarly, developers should understand the hardware to write better code.

Kächele transitions into discussing performance issues in modern computing, addressing the common excuses such as hardware limitations and advancing software demands. He highlights how CPU development continues to progress in terms of transistor count even as clock speeds stabilize. He elaborates on caching mechanisms, cache misses, and delays in memory access that impact the performance of applications.

Key Points Discussed:
- Understanding Hardware: The importance of knowing how CPUs work and the implications for programming languages like Ruby.
- Race Car Analogy: Jim Clark's ability to extract maximum performance from his cars serves as a metaphor for effective software development.
- Performance Issues in Software: Modern software often seems slower due to inefficiencies despite advanced hardware capabilities.
- Cache Performance: The talk stresses the significance of caching and how different levels of CPU cache can affect application speed.
- Object-Oriented Code Challenges: How typical object-oriented approaches can lead to performance penalties, particularly with data locality.
- Example Task in Logistics: A step-by-step illustration of optimizing a logistics application in Ruby, measuring performance and reorganizing data structures for efficiency.
- Performance Measurement: Highlighting the necessity of benchmarks and the establishment of baselines before optimizing code.
- Refactoring for Efficiency: Encouraging the audience to focus on data organization and reduce code complexity while maintaining readability.

Conclusions:
The presentation culminates by emphasizing that performant Ruby code doesn't necessitate sacrificing clarity and elegance. Developers can achieve better performance without compromising the quality of their code. Kächele encourages programmers to be mindful of their approach to data, advocating for more effective programming strategies, including establishing clear baselines for performance.

The final takeaway is that a thorough understanding of how machines operate allows developers to harness Ruby's capabilities, thereby crafting efficient and elegant code that leverages hardware effectively.

Mechanical sympathy, or: writing fast ruby programs
Tim Kächele • Sarajevo, Bosnia and Herzegovina • Talk

Date: September 11, 2024
Published: January 13, 2025
Announced: unknown

Premature optimization is the root of all evil, that's what you hear whenever someone wants to optimize something, let's break with conventions. Let's learn about the limits of modern computers, how they apply to ruby and how we can use our knowledge to write faster ruby programs.

Ruby is a magical language and it's easy to forget that at the end of the day an actual CPU is running your program, but knowing a bit about CPUs and how they work can help you speed up your programs tremendously.

In this talk we are going to look at the physical limits of modern computing and how we can apply this knowledge to write fast ruby programs.

EuRuKo 2024

00:00:10.639 okay hello everybody thank you for
00:00:12.480 coming around I didn't expect that
00:00:14.559 turnout I mean analytics analytical
00:00:17.400 databases are also quite interesting so
00:00:20.920 we are going to talk today about
00:00:22.720 mechanical
00:00:23.720 sympathy and you're going to ask
00:00:26.480 sympathy towards a machine I don't think
00:00:29.240 so
00:00:31.000 but the real title probably should be
00:00:33.879 that is not as catchy as mechanical
00:00:35.440 sympathy how we can write fast Ruby
00:00:38.399 programs and how we can get the most out
00:00:40.920 of our
00:00:41.960 Hardware
00:00:43.920 so before we really start we have to do
00:00:46.559 the obligatory introduction round my
00:00:48.800 name is Tim we're going to skip the last
00:00:51.160 name for the Germans they can read it
00:00:53.039 for the rest don't even try it's going
00:00:55.640 to be mispronounced and butchered leave
00:00:58.000 it up to the Germans to be confused
00:00:59.559 about the loads
00:01:03.640 so today no sorry I work for a stock
00:01:08.600 exchange a European Stock Exchange
00:01:10.439 called ber stutgart yes this is
00:01:14.000 finance but don't be fooled this is
00:01:17.280 still a Cario it's not a Rolex those
00:01:20.360 days in finance are apparently over and
00:01:22.759 I have must missed it I'm
00:01:26.200 sorry so let's start and talk about
00:01:29.320 mechanical
00:01:30.479 empathy what is it about well first of
00:01:33.399 and foremost it's a term from race car
00:01:35.240 driving and okay but what do we really
00:01:40.079 mean by that and it means that we
00:01:42.640 operate a system with an understanding
00:01:44.880 of how that system works best so you
00:01:48.119 should know a bit about your car before
00:01:50.200 you before you take a step into it right
00:01:53.520 and a great example of somebody that
00:01:56.280 showed a great deal of mechanical
00:01:58.079 sympathy towards his car was this person
00:02:01.360 and unless you're really into race car
00:02:03.960 driving which I am not either you
00:02:06.880 probably don't know this person his name
00:02:09.280 is Jim Clark he lived in the 1960s and
00:02:12.239 he died in the 1960s in a race car crash
00:02:15.519 and he was a race car driver for team
00:02:18.160 Lotus and he was one of the fastest he
00:02:21.640 put multiple records on the books
00:02:23.440 actually with his team and he did so
00:02:27.599 despite whatever car you gave him really
00:02:29.720 like like you gave him this car he was
00:02:31.920 the fastest in it you gave him another
00:02:33.560 car he was the fastest in it that was
00:02:35.239 just his his stick and he did so because
00:02:39.760 he was very gentle with the car he he
00:02:42.120 showed a great deal of mechanical
00:02:43.920 sympathy towards this car he knew how to
00:02:46.640 operate a car he knew how how the
00:02:48.720 specifications worked and how engines
00:02:51.200 work so great examples of that for
00:02:54.920 others were fast as well don't get me
00:02:56.400 wrong but he was always the fastest no
00:02:58.159 matter the car and and he did so and
00:03:02.840 doing basically everything else the
00:03:05.280 others did as well and really amazing
00:03:09.040 fun facts or details his tires in the
00:03:12.519 cars lasted four times longer than the
00:03:14.360 rest of everybody else's tires think
00:03:16.400 about that four times more out of that
00:03:18.200 tire than the usual very efficient and
00:03:23.360 another great anecdote is that mechanics
00:03:26.280 could do a lineup of the cars used in a
00:03:28.319 in a race and they could always figure
00:03:30.680 out which one was uh Jim Clark's because
00:03:33.760 if they looked at the gearbox it
00:03:35.439 basically looked like brand new comp
00:03:37.640 compared with the others that look like
00:03:39.959 God knows
00:03:41.480 what and if there's one learning with
00:03:44.280 all of this then it it means you don't
00:03:47.120 have to be a mechanic to be a good race
00:03:49.159 car driver but you should know about the
00:03:51.120 details of your car you should know the
00:03:53.599 edge cases you should know how it
00:03:55.400 performs best maybe your car has the
00:03:58.000 most torque when it's running at 3,000
00:04:01.120 RPM could be or maybe you know you
00:04:03.840 should know that the breaks are coming
00:04:05.439 in early so you break break
00:04:09.280 accordingly and I see the confused faces
00:04:12.000 around here why are we talking about
00:04:14.519 this is this the wrong
00:04:16.560 room it was a ruby conference right it
00:04:19.160 was not supposed to be about race car
00:04:21.239 driving which I don't know much about
00:04:25.199 actually well I think we have to talk
00:04:27.520 about this because we live live in the
00:04:30.039 age of the M3 and nevertheless teams
00:04:32.680 takes 20 seconds to load what is up with
00:04:35.560 that not only is teams always a dreadful
00:04:37.680 experience now you has have to wait for
00:04:41.520 that Dreadful experience to unfold in
00:04:43.520 front of you what's going on that can't
00:04:46.240 be it we have amazing
00:04:49.680 Hardware but I hear you say what gives
00:04:53.720 Microsoft was always bad at
00:04:56.759 software maybe the apple is rder on the
00:04:59.960 other side so we look at Apple this is
00:05:02.960 the weather app a week ago it was a bit
00:05:07.000 warmer and look at
00:05:10.560 this my oh my this thing is dropping
00:05:13.440 frames like it's some kind of Olympic
00:05:15.440 discipline isn't it on a decent
00:05:18.600 phone the weather app I remind you the
00:05:22.000 thing that shows weather it doesn't do
00:05:24.039 real-time physics or anything it drops
00:05:26.520 frames rendering
00:05:28.319 text this can't be it if that is what we
00:05:32.479 what we output as an industry we should
00:05:34.639 be ashamed of
00:05:37.479 ourselves and when performance takes a
00:05:39.919 dive really creativity thrives at this
00:05:42.319 point in time we live in the Golden Age
00:05:44.880 of excuse making why things are slow we
00:05:47.960 have the excuse Event Horizon well
00:05:50.400 passed at this point you hear all kinds
00:05:53.880 of excuses oh it is slow because of
00:05:58.039 accessibility yeah that must be the
00:06:00.039 thing right
00:06:02.319 accessibility Microsoft calls it a PhD
00:06:05.319 level thesis to render text in a
00:06:08.160 terminal a terminal I remind you that
00:06:10.919 outputs text we did that in the 60s I
00:06:14.520 wasn't even life back
00:06:16.360 then and what I recently read in the
00:06:19.240 Reddit teams uh the teams Reddit
00:06:22.360 subreddit was amazing somebody said of
00:06:25.319 course teams is slow you're running it
00:06:27.560 on a machine with only 16 GB of ram no
00:06:31.280 wonder it doesn't work like
00:06:34.000 that
00:06:35.919 crazy okay but the most famous excuse
00:06:39.319 probably of all of them is well Mo's law
00:06:42.479 is running out of steam these days we
00:06:44.599 don't get a good Hardware anymore it's
00:06:46.800 just sad no
00:06:49.919 Hardware well let's debunk that let's
00:06:52.960 look at the graph look at the transistor
00:06:55.440 count that thing is still going strong
00:06:57.599 my friend the only thing not going so
00:07:00.639 strong is frequency and that is only
00:07:02.759 because we realized at some point that
00:07:04.400 running a CPU at 10 GHz is a Sure Fire
00:07:07.840 recipe to melt a motherboard and the and
00:07:10.319 the CPU core taking its way down to the
00:07:12.720 Earth's core you can't cool this thing
00:07:14.840 anymore when it runs at 10 GHz but
00:07:18.400 nevertheless single threat performance
00:07:21.160 it's not as fast as the transistor count
00:07:23.160 but it's still goes up and we still get
00:07:26.199 a little bit more performance each and
00:07:27.879 every year so
00:07:31.080 what's going on why why is everything
00:07:33.400 slow around us and I think we should
00:07:36.080 look at Hardware a bit more
00:07:39.400 intricately okay this is the Sunny Cove
00:07:42.000 micro architecture it's a bit of an
00:07:44.440 older architecture at this point but the
00:07:47.159 systems stay usually the same and Intel
00:07:49.360 isn't very creative uh when it comes to
00:07:52.479 applying recipes so what do you do if
00:07:55.319 you can't increase clock cycle anymore
00:07:58.120 because the frequency is kind of fixed
00:08:00.520 we we can't we can't do anything about
00:08:02.840 that well simple you try to do more
00:08:06.520 things in a clock cycle so what are you
00:08:09.080 going to do you make and the
00:08:10.759 instructions wider in the sense that you
00:08:12.879 add
00:08:13.639 more arithmetic logic units to the thing
00:08:16.800 so you can add multiple integers in one
00:08:19.039 clock cycle suddenly and you do all
00:08:22.120 other all the other kinds of shenanigans
00:08:24.639 to get more done per clock cycle so but
00:08:30.639 you made it wider now you have to make
00:08:32.399 it deeper because you need more data at
00:08:34.680 the CPU so you introduce caches
00:08:37.959 and you make it you you get get more
00:08:40.880 data towards the CPU at the end of the
00:08:42.959 day but the problem with that is if you
00:08:45.680 miss the cash o you pay a penalty you
00:08:49.160 don't get enough data into that CPU
00:08:50.880 anymore so what do you do well you try
00:08:52.959 to make the things smarter you you
00:08:55.000 introduce something like Branch
00:08:56.160 prediction you introduce clever
00:08:57.600 algorithms into the your C CPU so that
00:09:00.279 you can actually predict what the next
00:09:02.560 memory access is going to be and you can
00:09:04.959 load that data already and if that you
00:09:08.000 can take advantage of that you have a
00:09:10.320 damn fast CPU
00:09:12.120 suddenly okay but if you miss that
00:09:15.399 penalty is
00:09:17.760 steep let's look at the numbers well an
00:09:20.920 L1 CMAs that's the the closest cache
00:09:23.959 close to a CPU that is not a register
00:09:26.880 0.5 NS still pretty far plus actually L2
00:09:30.880 cach misses are already 7
00:09:33.200 NS that costs you and the thing main
00:09:36.480 memory you always considered fast not so
00:09:39.680 fast actually 100 NCS that's multiple
00:09:42.920 clock Cycles actually you can run
00:09:45.040 probably a square root operation and
00:09:46.760 multiple of those until you find you get
00:09:49.680 that main memory reference
00:09:52.480 back and not to forget the network
00:09:55.920 roundt trip in your data center so if
00:09:57.959 anyone says that microservice is going
00:10:00.120 to make it faster and reduce latency
00:10:02.279 they are lying to you the monolith is
00:10:05.120 the way to
00:10:07.120 go
00:10:08.880 Ruby so we have to visualize this a bit
00:10:11.839 because those numbers you can't imagine
00:10:13.760 them so L1 L2 and L Ram let's look at it
00:10:18.320 L1 already there L2 making its way
00:10:22.279 Ram I think this presentation is going
00:10:25.000 to be over until ra and until then Ram
00:10:28.040 is not going to make any more more
00:10:30.360 strides but why would you care
00:10:33.920 right this is hardware we're writing
00:10:36.480 Ruby we don't care about this let's take
00:10:39.320 an example
00:10:41.399 okay we are going to calculate the sum
00:10:44.320 of an
00:10:45.440 array so far that shouldn't be
00:10:48.200 challenging for any of
00:10:49.880 us and we're going to do so in two ways
00:10:54.320 one we're going to just go through the
00:10:57.320 array one by one adding the numbers up
00:10:59.800 pretty simple stuff and the other one we
00:11:02.440 go berserk
00:11:03.839 mode we just take a random index in that
00:11:06.920 array and we're going to take that
00:11:08.680 number and adding it up obviously you
00:11:11.680 pay a price for generating random
00:11:13.120 numbers so we're going to offload that
00:11:14.920 and not measure that part we're just
00:11:16.560 going to add to measure the adding
00:11:19.480 part okay let's see what we have here
00:11:23.320 for the sequential indexes we have 3%
00:11:25.720 cash misses 17 million cash misses out
00:11:29.560 uh out of 562 million references and it
00:11:34.320 roughly took 3.2 seconds okay that's
00:11:37.399 this was a lot of numbers that's
00:11:39.600 okay let's look at
00:11:42.279 random 28 million cash misses certainly
00:11:45.560 and it takes 4.1 seconds 4.2 actually
00:11:49.120 the thing is 30% slower suddenly and
00:11:52.120 don't blame it on Ruby don't don't walk
00:11:53.920 to the next room and call up muts and
00:11:55.880 like what did you do that it's that slow
00:11:58.079 you can measure this in any any language
00:12:00.079 take Java take C take Zig take rust it
00:12:03.040 doesn't matter it's always slower
00:12:04.760 because if you run around in a ray
00:12:08.600 randomly this is not how how you should
00:12:11.600 behave but then I hear you say I don't
00:12:13.920 write random I write object oriented
00:12:16.560 code I write the good code well think
00:12:19.120 again this is probably your object graph
00:12:21.760 so every time you call an object it's
00:12:24.959 probably a cach Miss because it's a new
00:12:27.000 object it's somewhere else on the Heap
00:12:29.199 there's no data locality in
00:12:30.480 object-oriented programming
00:12:32.199 usually and don't give me that ah but we
00:12:36.480 got a adjust in time compiler recently
00:12:38.560 that's going to make it faster adjusting
00:12:41.240 time compiler is a tool it's not a magic
00:12:43.959 wand just so you know it won't fix your
00:12:47.120 your slow code necessarily it can give
00:12:49.480 you some benefits and optimize some of
00:12:51.720 it but it will not solve all your
00:12:54.320 problems remember
00:12:56.519 that okay now we we talked about Theory
00:13:00.040 we talked about latency numbers Hardware
00:13:02.880 All Foreign topics usually because Ruby
00:13:05.560 is a really nice language that abstracts
00:13:07.360 a lot of
00:13:09.959 it and it has all been very theoretical
00:13:12.880 up to now let's talk about an example
00:13:15.360 task and our example task is going to be
00:13:18.560 in logistics this thing but you're not
00:13:22.160 going to work work in a warehouse you
00:13:24.519 never worked with your hands you have to
00:13:26.360 write the programs that manage the
00:13:28.959 people people in the in the
00:13:31.920 warehouse and you don't even have to do
00:13:35.120 that it's more of a marketing job at
00:13:37.600 this point you get approached by the
00:13:39.880 marketing team and it comes with you
00:13:41.880 comes to you with a list and says hey we
00:13:43.639 want to give our customers some cool
00:13:46.240 discounts and they give you a list of
00:13:48.240 the shipments with the sizes and the
00:13:50.040 providers and everything and obviously
00:13:53.360 they also give you the price list they
00:13:55.000 had in mind or of the provider so for
00:13:57.199 example LP Chargers s for s packages 1.5
00:14:00.720 I think that those are
00:14:02.320 Euros keep in mind those are
00:14:04.560 pre-inflation prices so look at them
00:14:07.600 remember the good old
00:14:09.320 times and
00:14:11.320 weep and your job is it to produce this
00:14:15.600 another list but this time you add the
00:14:18.560 price and whatever the discount is that
00:14:21.440 you wanted to apply but this is the
00:14:24.240 marketing department we had speaking of
00:14:26.639 they're going to have rules obviously
00:14:30.600 so first rule is simple s shipments
00:14:33.440 match the lowest price for S sizes of
00:14:35.480 any provider so you have FedEx and DHL
00:14:37.759 DHL is a bit more expensive you apply a
00:14:39.959 little discount to to DHL to match the
00:14:42.639 FedEx price okay if it's an S
00:14:45.600 package next rule I think we can also
00:14:48.639 manage the third L shipment via a
00:14:50.839 specific provider is free very good
00:14:53.279 people we want to we want you to ship
00:14:55.720 things so we going to make them free but
00:14:58.880 this is the marketing department and
00:15:00.240 they have budgets and they want to have
00:15:02.560 predictability so discounts should not
00:15:05.560 exceed € 10 per month folks we don't
00:15:07.880 want to bankrupt this company with
00:15:10.079 discounts so we're going to limit them
00:15:13.079 and as you have guessed probably by now
00:15:14.880 this is a very well speced out example
00:15:17.160 task so this was given to me as a
00:15:20.000 homework task and as a homework task you
00:15:22.680 want to shine really you want to apply
00:15:24.759 all these good software patterns you
00:15:27.440 want to have this invigorating feeling
00:15:29.240 of using the cleanest
00:15:31.399 of I'm sorry wrong slide you want to use
00:15:35.199 clean code for everything
00:15:38.880 right amazing and you end up with this a
00:15:42.720 call graph from hell and all the good n
00:15:47.399 the El shipment count from provider and
00:15:49.079 month the shipment repository and who
00:15:51.399 couldn't forget our good friend this
00:15:53.759 discount budget for a month repository
00:15:56.639 that must be a mouthful
00:15:59.399 and we're going to take performance
00:16:00.560 measurements of
00:16:01.839 it wow finally something that matches
00:16:06.040 that that Mor law curve in our own code
00:16:09.319 that hockey stick is going strong our
00:16:12.680 our middle name must be bus Lia because
00:16:15.279 this one is definitely going to infinity
00:16:17.319 and
00:16:18.440 beyond what are we going to do about
00:16:20.680 this burn it all down with flames we
00:16:24.079 can't do that we already built it so
00:16:26.639 we're going to harness the power of the
00:16:28.120 flames with a flame
00:16:30.959 craft and we're going to look at it and
00:16:33.680 we see there's one method that takes a
00:16:36.240 considerable amount of time and that is
00:16:38.600 here the L shipment count for provider
00:16:40.920 and month let's look at it it's pretty
00:16:44.440 simple it takes the valid
00:16:46.560 shipments does some select and counts
00:16:50.079 them pretty simple but there's something
00:16:52.880 hidden in there something audacious
00:16:55.279 valid shipments Crows with every
00:16:57.360 iteration you have so for the first
00:16:59.480 iteration you have zero items in that
00:17:01.360 valid shipments array but for the 50,000
00:17:04.199 you have to search
00:17:06.679 49,999 items to get the count oo that's
00:17:10.839 a lot that's not
00:17:14.120 nice but there is something to our
00:17:16.720 rescue the cash so we we solve this we
00:17:21.120 can just add a little bit of caching so
00:17:23.720 that we only have to search in a single
00:17:25.400 month instead of
00:17:27.640 everything and we adapt the code we add
00:17:30.919 we add something to our ad method to
00:17:34.400 organize the data appropriately and we
00:17:37.720 go through our flame craft look through
00:17:39.280 everything and rinse and repeat and at
00:17:41.880 the end of the day we get this oo no
00:17:44.520 hockey curve
00:17:46.400 anymore looks like linear gr I can tell
00:17:48.799 you from from experience it's not linear
00:17:50.400 C it's just
00:17:52.080 slightly less steep hockey curve
00:17:54.600 actually but anyway we we did something
00:17:57.120 and it got better and and we achieved
00:17:59.880 more performance that's nice that's why
00:18:02.840 we are here but at what cost this is the
00:18:06.559 code that comes out of this this is does
00:18:09.919 this spark Joy folks I doubt it this
00:18:13.640 does not spark joy and who doesn't
00:18:15.440 remember when you had to talk with your
00:18:17.600 domain driven design experts about the
00:18:20.200 calculate lowest price lookup table
00:18:22.720 that's a hard to explain concept at this
00:18:24.960 point when you are doing a domain driven
00:18:27.080 design mm we are not going to do this we
00:18:30.720 have to do better what are we going to
00:18:33.320 do obviously rewrite everything in Rust
00:18:35.880 it's a cool new thing to
00:18:37.600 do no we can't do this this is a ruby
00:18:41.440 conference and I'm probably going to be
00:18:43.760 asked to leave the stage when I really
00:18:45.480 recommend this so we we stick with
00:18:49.440 Ruby because it's a nice language but
00:18:52.760 we're going to change our methodology
00:18:55.200 and before we do any performance
00:18:57.720 optimizations we going to establish a
00:19:00.000 baseline because you don't know what
00:19:02.240 fast is on your machine unless you
00:19:04.760 measure what's
00:19:06.520 fast and looking at our task we we
00:19:10.679 probably know that the slowest thing is
00:19:12.799 probably going to be the follow we have
00:19:15.640 to read from from a file so we're going
00:19:17.919 to measure this with a very
00:19:19.840 sophisticated method we're going to
00:19:21.280 write ourselves a method that does this
00:19:24.120 copy lines from one file to another file
00:19:26.480 and measure that with the time command
00:19:28.919 in our in our uh in our command line
00:19:32.200 very sophisticated stuff let it run a
00:19:34.360 couple of times so that you know that
00:19:36.240 teams didn't hog all the memory in the
00:19:38.440 background and at the end you end up
00:19:41.440 with something like 700 th th000 lines
00:19:43.720 per second that's capable that my
00:19:45.480 machine is capable of and we know
00:19:47.640 roughly the ballpark number we are
00:19:49.280 playing
00:19:50.200 in and then we're going to write it
00:19:52.960 differently we're going to follow a
00:19:55.360 hardware friendly
00:19:57.480 approach right it's quite cryptic thank
00:20:00.200 you very
00:20:01.559 much and we are going to do so by
00:20:04.880 putting the data front
00:20:07.880 and
00:20:09.400 sorry front and
00:20:12.559 center and one of the philosophies of
00:20:15.520 following the data in your program is
00:20:18.000 where there is one there's going to be
00:20:20.000 many of them you're not going to process
00:20:21.919 one shipment otherwise you could hire an
00:20:23.919 intern doing the work for you no we
00:20:26.559 invented computers to process many items
00:20:29.679 so why is why are all our apis always
00:20:32.640 focused on the single thing when we
00:20:34.559 could also focus on the collection of
00:20:37.520 things so we are going to write F
00:20:39.360 functions that look like this apply the
00:20:41.640 rule but not to a single shipment but to
00:20:45.120 shipments because suddenly all the
00:20:47.320 static calculations you had to do up
00:20:49.120 front you can reuse them for for the
00:20:52.080 whole collection makes it a bit more
00:20:55.320 efficient and then something also very
00:20:57.919 important we are going to organize the
00:21:00.360 data the way we are going to use it
00:21:03.640 later no more any of this oh we have
00:21:07.120 this beautiful data structure and this
00:21:09.080 object graph no we're going to line up
00:21:11.600 the data the way we are going to later
00:21:13.520 process it and how are we going to
00:21:16.039 process it well this is our processing
00:21:18.679 pipeline basically we take a par step we
00:21:21.200 parse the
00:21:22.320 line and then we're going to take the
00:21:25.240 shipments and we're going to sort them
00:21:26.600 into buckets we have a sh s shipment
00:21:28.679 bucket we have an L shipment bucket
00:21:30.400 because those are the buckets where you
00:21:31.559 need to apply rules okay pretty simple
00:21:34.559 and at the end you're going to join them
00:21:36.559 and limit the discounts because remember
00:21:39.120 that stingy Market marketing department
00:21:42.080 that didn't want to Shell out too much
00:21:43.480 money yeah we have to follow those rules
00:21:45.919 as well okay and remember that we had to
00:21:50.080 work in the context of a month so maybe
00:21:52.799 we want to reflect that in our data
00:21:54.840 structures as
00:21:56.840 well so maybe we do this we have a
00:22:00.240 shipment shipments hash we have the
00:22:03.640 buckets that's s ML and we're going to
00:22:06.559 have a nested array I know I know
00:22:10.039 complicated data structures a 2d
00:22:12.720 array very complicated if you want to
00:22:16.159 you can also replace that with a struct
00:22:18.039 that that is called month that holds the
00:22:19.880 array that's up to you but in the end
00:22:23.279 it's going to be a nested array with
00:22:25.400 with the inner array representing a
00:22:26.919 month and the shipments in that month
00:22:29.559 okay it's already pretty
00:22:32.039 simple nothing audacious
00:22:35.039 yet and then we have to read the
00:22:37.559 shipments okay we initialize our data
00:22:41.000 structures we have
00:22:42.799 SML sizes and to make our life a little
00:22:46.000 bit easier later and not having to join
00:22:48.039 all these buckets again we're going to
00:22:50.320 use this all bucket where we put them
00:22:52.840 all that's foreshadowing for you and
00:22:56.440 we're going to take each line part it
00:22:59.520 with our um this was csb kind of
00:23:02.360 formatting so we can do something with
00:23:04.240 CSV and in case the month and the year
00:23:07.240 changes you just move on to the next
00:23:09.320 month that means pushing an array uh to
00:23:12.080 all of the uh to all of the an empty
00:23:14.240 array to all of the buckets okay and
00:23:16.760 then we push the the new shipment onto
00:23:20.840 onto our data collection
00:23:25.200 okay even though it looks like a lot of
00:23:27.480 code and it definitely doesn't follow
00:23:29.120 the one line per method
00:23:31.360 methodology I think every one of us can
00:23:33.880 read this sooner or later and understand
00:23:37.760 it and then we have to implement our
00:23:41.400 rules well the rules are pretty simple
00:23:43.520 actually like limiting a price is not
00:23:45.480 that hard so what are we going to do
00:23:48.240 well if the shipment price we subtract
00:23:50.840 the shipment price the lowest price
00:23:52.559 minus the shipment price and the
00:23:53.720 difference is our
00:23:56.320 discount did I lose anyone here hands
00:23:59.960 up
00:24:01.600 good that's the discount
00:24:04.600 calculation the the the second rule
00:24:07.320 where we have to count packages I think
00:24:09.000 we can all count to three so we're going
00:24:11.159 to skip that it's not more complicated
00:24:13.440 than
00:24:14.520 this and limit discounts same thing
00:24:18.640 you're going to take the limit per month
00:24:21.799 and for each shipment you're going to
00:24:23.799 subtract it from the available budget
00:24:25.799 and you take whatever is smaller because
00:24:28.399 you don't want to overrun the budget
00:24:31.200 okay by the way I heard that uh
00:24:34.000 available budget shipment. discount and
00:24:35.919 Min wrapping it in an array is actually
00:24:39.159 faster
00:24:40.600 than uh it's not as costly as you would
00:24:43.320 think it is uh because there's actually
00:24:45.919 a special instruction in the in the Ruby
00:24:47.799 VM that optimizes for this for this
00:24:50.039 pattern of do M now you know but if you
00:24:52.919 want to really get all of the
00:24:54.600 performance out of there you have to
00:24:55.919 write your own Min method I tested it
00:24:58.080 it's
00:24:59.000 it has it's not that slow as you would
00:25:01.080 think but you can get more out of it so
00:25:04.799 we did all that shipment the shipment
00:25:07.480 formatting is so denying that I'm going
00:25:09.320 to skip it because it's really just
00:25:10.960 string conation at this point let's
00:25:14.120 measure
00:25:15.760 it oo the blue line is our originally
00:25:20.960 optimized solution that we didn't that
00:25:22.880 really didn't spark joy and the Orange
00:25:26.080 Line Rock Solid
00:25:29.760 is the one we just
00:25:31.919 wrote okay that's it's not only not a
00:25:35.000 hockey stick anymore it's also
00:25:39.640 faster but let's look at the performance
00:25:41.760 numbers because wall clocks are cool but
00:25:44.480 they are not a precise
00:25:46.039 measurement so for the original solution
00:25:49.600 we had 1.4 billion cash references and
00:25:54.159 86 million of those were misses okay
00:25:59.279 let's look at our new
00:26:00.880 solution we have only 600 million cash
00:26:03.960 references anymore we have the cash
00:26:06.080 references
00:26:08.159 completely and not only that we are 2.5
00:26:13.279 times
00:26:14.240 faster
00:26:16.640 H let's look at some other stats we have
00:26:20.600 collected number one throughput per
00:26:23.440 lines the original solution roughly made
00:26:26.480 140k nowadays we made make
00:26:30.000 311k which is not bad compared that the
00:26:33.240 original Benchmark on Baseline was just
00:26:37.039 700,000 okay and not only that it's not
00:26:41.559 only faster you always already saved
00:26:45.039 some lines of code just just counting
00:26:47.320 the lines of code you got less of them
00:26:51.240 that's kind of nice because nobody dares
00:26:53.760 to delete 422 lines of of important code
00:26:57.360 with fancy patterns but everybody is
00:26:59.880 willing to rewrite 150 lines of code you
00:27:03.399 would you would dare to do that I trust
00:27:06.399 you so let's wrap it
00:27:09.720 up to write performant code first
00:27:13.039 establish a space a
00:27:15.080 baseline think of it St of the data and
00:27:18.240 its mutations first before doing any
00:27:20.960 cool uml diagrams and cool patterns
00:27:23.840 think of the data and how it flows
00:27:25.559 through your process and application
00:27:30.600 also it is always valuable to be lazy do
00:27:34.159 the least amount of work to your data as
00:27:36.440 possible lay out the data already when
00:27:38.440 you can so that you can easily process
00:27:40.440 it
00:27:41.159 later
00:27:42.840 and if there's anything I hope you take
00:27:45.440 away from this it's that performant Ruby
00:27:47.840 code does not mean that it must be ugly
00:27:50.480 Ruby code just so you know you can write
00:27:53.559 nice Ruby code and it doesn't need to
00:27:56.200 look horrible okay
00:27:58.679 so that's it if you want if you're
00:28:00.919 interested in the code here's the QR
00:28:03.960 code the original solution is called
00:28:06.519 Planet Express the better version is
00:28:09.159 called Planet Express Express so read
00:28:13.679 through it um feedback is welcome and
00:28:17.240 that's it from the clock I see we have
00:28:20.080 two and a half more
00:28:21.840 minutes so if you have
00:28:24.880 questions you can ask them now but but I
00:28:28.679 remind you a question is something that
00:28:30.440 ends with a question mark statements are
00:28:32.240 not
00:28:35.519 allowed I seen probably a lot of
00:28:38.159 statements but no questions there's a
00:28:39.840 state there's a
00:28:46.640 question I see the Vintage
00:28:50.240 sign maybe I couldn't possibly comment
00:29:02.240 is it the code review Tas task
00:29:05.880 still ah okay
00:29:09.760 cool any other questions not related to
00:29:12.279 Vinted and and homework
00:29:16.679 tasks okay then that's it thank thanks
00:29:19.679 for having your attention and thank you
Explore all talks recorded at EuRuKo 2024
+39