00:00:07.240
hi thank you very much so I'm Karen JX
00:00:10.280
I'm a Senior Solutions architect at
00:00:12.440
crunchy data and I'm really happy to be
00:00:15.240
here at my first ever Yuko and my first
00:00:17.920
time in Saro so fantastic and I'm going
00:00:21.080
to talk to you about how to optimize
00:00:23.480
your database for
00:00:25.920
analytics uh this is even better than a
00:00:29.320
tldr this is a link to the slides from a
00:00:32.279
previous version of this talk so it's
00:00:34.120
not identical but almost the same so if
00:00:36.640
you don't even want to listen to me talk
00:00:38.600
you can just go to that link
00:00:44.000
there first of all whenever I'm talking
00:00:46.680
at a developer conference I have to get
00:00:48.800
the confession out of the way I'm not a
00:00:51.000
developer I'm really sorry but as you
00:00:53.760
can see from this diagram of my career
00:00:56.000
so far I do know about databases so I
00:00:58.760
was a database administrator data for
00:01:00.480
about 20 years before I was a database
00:01:05.000
consultant and now a Senior Solutions
00:01:07.960
architect but I still work exclusively
00:01:10.479
with
00:01:12.920
databases I've worked with a lot of
00:01:15.000
developers along the way and I've
00:01:17.159
learned Lots from the developers that
00:01:18.880
I've worked with and I really like to be
00:01:21.360
able to give back some of what I know
00:01:23.479
about datab
00:01:25.680
bases so what am I actually going to
00:01:27.920
talk about today and why
00:01:33.720
to answer that question I'm going to do
00:01:35.799
a quick recap of some of the different
00:01:38.000
types of database
00:01:39.880
activity you've probably got an
00:01:42.240
application or many applications with a
00:01:45.360
database backend and that database
00:01:47.439
backend is probably constantly being
00:01:49.680
queried updated inserted into by your
00:01:53.119
application
00:01:54.520
connections so it could be a traditional
00:01:56.920
payroll app it could be a banking system
00:02:00.000
an online Commerce system or one of all
00:02:02.079
sorts of different
00:02:06.920
applications this type of activity is
00:02:09.360
usually called oltp or online
00:02:11.840
transaction processing or sometimes just
00:02:14.480
your operational database
00:02:17.840
activity and it's characterized by
00:02:20.280
multiple concurrent users usually
00:02:23.519
performing um lots of short queries and
00:02:26.760
they'll often be doing multiple selects
00:02:28.959
inserts and up dates within a single
00:02:33.920
transaction and it's at this point
00:02:35.840
actually that I need to
00:02:37.599
apologize uh for my view of the world so
00:02:40.440
I see the
00:02:41.879
database and then everything else that
00:02:44.239
goes on outside the database so you'll
00:02:46.360
see the
00:02:47.319
users uh connected to the database
00:02:49.800
conveniently ignoring all of that
00:02:52.159
application layer so just imagine that
00:02:54.159
that's there in
00:02:57.519
between so most of these applications
00:02:59.959
will constantly be generating data in
00:03:03.000
your
00:03:03.840
database and that's going to gradually
00:03:06.040
build
00:03:07.000
up and at some point the business wants
00:03:10.080
to get value from that data it wants to
00:03:12.159
glean insights from that
00:03:14.519
data and for that you're going to need
00:03:16.680
to run analytics
00:03:20.120
queries so this analytics activity
00:03:23.200
activity can take various different
00:03:24.799
forms and it's got lots of different
00:03:26.280
names you might hear it called olap
00:03:28.080
online analytical processing reporting
00:03:30.879
decision support DS uh business
00:03:33.439
intelligence bi but it's basically any
00:03:37.760
queries any activities that serve to
00:03:40.560
answer real world business questions
00:03:43.400
such as uh how successful was this
00:03:45.840
marketing campaign or what would be the
00:03:48.319
impact on our carbon emissions of
00:03:50.200
Performing this action over this
00:03:56.879
one and this type of activities
00:03:59.200
characterize
00:04:00.439
by complex queries often performing
00:04:03.599
joins on multiple tables working on
00:04:06.439
large data sets so typically it's going
00:04:09.599
to be using a lot of resource it's going
00:04:11.599
to be using potentially a lot of CPU
00:04:13.840
memory and
00:04:15.919
IO once upon a time we had the oltp
00:04:19.840
databases and we had the analytics
00:04:22.040
databases and the two were very much
00:04:25.600
separate data would somehow be extracted
00:04:28.680
from your oltp database and loaded into
00:04:32.160
the olap systems uh which is sometimes
00:04:34.759
known as your data warehouse your data
00:04:36.680
Lake um and often that was done by a
00:04:39.520
nightly batch or even a weekly
00:04:43.800
batch that does sometimes still happen
00:04:46.280
and there are very good reasons for
00:04:47.720
that's still happening that way
00:04:49.199
sometimes but often businesses want to
00:04:51.880
get insights from the data instantly
00:04:55.320
they want it in real time directly from
00:04:58.039
the oltp system
00:05:01.600
and where you've got the oltp and the
00:05:04.960
olap activity
00:05:07.199
cohabiting um you'll often hear that
00:05:09.560
described as a mixed or hybrid
00:05:14.240
workload it's not a new idea Gartner
00:05:17.120
came up with the term htap hybrid
00:05:19.280
transaction analytical processing back
00:05:21.400
in 2010 uh and it's been described in
00:05:24.160
other places as trans litical or hybrid
00:05:26.759
operation and analytical processing
00:05:31.000
so what's the problem with
00:05:35.120
this it's well established in academic
00:05:37.800
circles that this type of workload
00:05:39.840
presents Unique Performance challenges
00:05:42.240
there are papers from back in 2010 2011
00:05:45.639
uh where the re researchers identified
00:05:48.319
and discussed the fact that running
00:05:50.360
mixed workloads causes performance
00:05:53.000
issues and there are ongoing discussions
00:05:55.479
about this and things like what kind of
00:05:58.479
benchmarks can be used used um to
00:06:00.840
monitor um this type of activity but
00:06:03.440
you'll probably be pleased to hear that
00:06:05.319
I'm not planning an academic discussion
00:06:08.199
I'd rather focus on what you can do if
00:06:11.520
you're in this situation how can you
00:06:13.479
make sure that you run performant
00:06:15.880
analytics with minimal impact on your
00:06:18.759
operational database
00:06:22.240
activity so to recap your database is
00:06:25.720
probably optimized for your day-to-day
00:06:28.319
oltp applic ation
00:06:31.080
activity adding analytics activity into
00:06:34.039
the mix so um complex long running um
00:06:38.560
queries that consume lots of resource
00:06:41.160
means you can end up with the worst of
00:06:43.120
Both Worlds you can have analytics
00:06:46.120
queries that are inefficient but that
00:06:48.560
also slow down your existing
00:06:54.039
application so now that we've identified
00:06:56.319
what the problem is that we're trying to
00:06:57.960
solve I set up an environment where I
00:07:00.759
can test things out as we start to look
00:07:02.759
at ways to fix those
00:07:05.400
issues I'm using postgress for this
00:07:07.840
because it's the rdbms that I know and
00:07:09.759
love and the one that I work with on a
00:07:12.000
day-to-day basis uh and also it's quick
00:07:14.720
and easy to set up and work with it was
00:07:16.879
very heartening to hear the postgress
00:07:18.599
for everything from UL just now uh and
00:07:21.680
in particular I'm using pgbench which is
00:07:24.800
a simple benchmarking tool that's
00:07:26.879
available by default with postgress
00:07:32.720
so first of all we create and populate
00:07:36.360
the PG bench tables by running PG bench
00:07:39.639
with the minus I the initialized flag in
00:07:43.199
this case I'm doing it in a database
00:07:45.680
that I created called
00:07:47.360
uro and I've asked it to give me a scale
00:07:50.319
factor minus s of 100 just so that I've
00:07:52.919
got some kind of um size to my tables
00:07:56.599
I've also asked it to create foreign
00:07:58.360
Keys between the table so that it uh
00:08:00.759
simulates somewhat an oltp type
00:08:04.400
database you see it shows us the
00:08:06.400
progress towards generating 10 million
00:08:08.199
rows of test data and it takes about 25
00:08:13.639
seconds and this is what it creates for
00:08:15.879
us it creates four tables PG bench
00:08:18.240
branches accounts tellers and history
00:08:22.639
the number of rows listed here is what
00:08:24.759
you'd get if you left scale factor at
00:08:27.280
the default of one so because I set it
00:08:30.360
to 100 I ended up with 10 million rows
00:08:32.719
in the pgbench accounts table the
00:08:35.560
pgbench history table will start off
00:08:38.000
with zero rows at the beginning of each
00:08:39.919
PG bench run and they'll be inserted as
00:08:42.039
you go along and I've included a link to
00:08:44.440
the documentation there just in case
00:08:46.040
anybody wants to look up how that
00:08:50.320
works by default pgbench simulates oltp
00:08:54.360
activity by repeatedly running this
00:08:56.920
transaction it's Loosely based on the T
00:08:59.440
cpcb
00:09:00.880
Benchmark each transaction is made up of
00:09:03.959
several SQL statements so you can see
00:09:06.240
there are three updates a select at an
00:09:08.720
insert and each one will take random
00:09:11.480
values of Aid tid B ID and Delta you can
00:09:17.760
uh pass in any script you want if you
00:09:19.800
want it to simulate other kinds of
00:09:24.920
activity so then when you want to
00:09:27.120
actually simulate some activity on the
00:09:28.640
database you run pgbench without the
00:09:31.519
minor SI flag you tell it how many
00:09:34.320
concurrent clients you want so in this
00:09:36.120
case I've used minor C1 for 10
00:09:38.160
concurrent clients and how many seconds
00:09:41.000
minus t you want it to run for so here I
00:09:43.560
run it for 12200 seconds or 20
00:09:46.959
minutes the output that you get will
00:09:49.279
show you things like how many
00:09:50.480
transactions were processed uh what the
00:09:53.000
connection time and Laten you were and
00:09:56.399
um what TPS throughput you got so in
00:09:59.200
this this case I got about 5,000 TPS
00:10:02.240
this is just running on my MacBook so
00:10:04.800
the goal of this isn't to try and get
00:10:07.360
the best possible throughput it's just
00:10:09.120
to give you an idea of what you might
00:10:13.120
see Okay so we've got a demo environment
00:10:16.480
with some tables and some data in
00:10:19.560
it then I mocked up an annual sales
00:10:23.519
report type analytics query against that
00:10:26.200
pgbench history table it manipulates the
00:10:29.000
data in the um M time column to get
00:10:32.279
something that looks like year and month
00:10:35.200
it groups and orders by year and month
00:10:37.360
and it calculates some totals so you'll
00:10:40.800
probably notice that as analytics
00:10:42.680
queries go this one is extremely simple
00:10:45.560
but it's good enough for demo
00:10:49.440
purposes and this is what the output
00:10:51.560
looks like against those 10 million rows
00:10:53.600
that were inserted into my PG bench
00:10:55.360
history table
00:11:00.519
okay so if we want to have a look at the
00:11:03.639
execution plan for that query um so we
00:11:06.399
can see what it's doing behind the
00:11:07.959
scenes we can use uh post's explain
00:11:11.480
analyze
00:11:12.600
command I connect to my database via
00:11:15.320
psql I type explain analyze and then my
00:11:19.120
query so the analyze keyword here tells
00:11:23.079
it to actually run the query so you'll
00:11:26.240
get real stats rather than an estimation
00:11:28.399
of what it's going to to do
00:11:31.920
so we wait for that to complete in this
00:11:34.560
case we can see that it took about 6
00:11:36.880
seconds just under six seconds to run
00:11:38.800
our analytics
00:11:40.519
query it's doing a sequential scan on
00:11:43.800
the pgbench history table so it's
00:11:45.360
reading the entire pgbench history table
00:11:48.519
and we can see we've got a sort method
00:11:50.560
external external merge dis so it's
00:11:54.360
actually writing some temporary files to
00:11:56.680
disk and doing the sorts there
00:12:00.160
just in comparison the select statement
00:12:03.519
in the pgbench uh transaction that we
00:12:06.320
looked at takes about a
00:12:10.760
millisecond Okay so we've got the
00:12:12.519
environment in place let's have a look
00:12:14.839
at some of the techniques that you can
00:12:16.800
use to optimize analytics activity
00:12:19.680
without slowing down your
00:12:21.760
application it's not going to be an
00:12:23.639
exhaustive list so the idea really is to
00:12:26.040
give you enough to get started and again
00:12:29.480
I'm using postgress for the details but
00:12:31.399
the ideas are applicable to other rdms
00:12:37.279
rdbms uh okay so configuration
00:12:39.680
parameters as I said your database
00:12:42.560
configuration parameters are probably or
00:12:44.839
at least hopefully optimized for your
00:12:47.000
day-to-day operational database activity
00:12:50.199
if not I highly recommend going to have
00:12:52.440
a look at
00:12:53.560
them and that gives me the perfect
00:12:55.920
excuse for a Shameless plug because this
00:12:58.199
is a link to a talk that I gave on
00:13:00.079
exactly that subject at Jango con EU
00:13:02.480
last
00:13:08.079
year okay some configuration parameters
00:13:12.120
need to be set across the entire
00:13:14.199
postgress instance for those you're
00:13:17.199
going to have to find some kind of
00:13:19.440
compromise to find the values that work
00:13:22.160
best for your hybrid workload but other
00:13:24.800
parameters can be set for specific users
00:13:28.320
um so you can set set them to different
00:13:29.920
values for your application users and
00:13:32.160
for your analytics users and some can
00:13:34.519
even be set for individual sessions or
00:13:37.680
queries I'm just going to walk through a
00:13:40.160
few of the different parameters and I'm
00:13:42.560
going to encourage you to test things
00:13:44.040
out uh to read the documentation and
00:13:46.720
also take note of the comments in the
00:13:48.320
postgress default configuration file
00:13:50.480
because they're actually quite
00:13:55.199
useful first of all database connections
00:13:58.440
and this is something thing that ulick
00:14:00.360
referred to in his talk as
00:14:02.560
well the chances are you won't have too
00:14:05.920
many concurrent analytics connections
00:14:09.480
but those connections are likely to be
00:14:11.279
using a lot of resource so you probably
00:14:13.600
want to make sure there aren't too many
00:14:15.199
of
00:14:15.920
them the max connections parameter is
00:14:19.560
set for the entire database instance uh
00:14:23.040
that tells you the maximum number of
00:14:25.920
current connections that are allowed to
00:14:27.800
the entire post ins
00:14:30.320
the default is 100 on most systems and
00:14:33.680
you probably don't want it much higher
00:14:35.639
than that definitely no more than a few
00:14:37.560
hundred if you can help
00:14:39.680
it and you probably want to limit the
00:14:42.440
number of analytics connections to much
00:14:44.360
less than that and to do that you might
00:14:47.240
want to look at connection poing so for
00:14:48.959
example PG bouncer so that you could
00:14:51.240
conate create a separate small pool for
00:14:53.880
those analytics
00:14:57.800
connections well workm is the maximum
00:15:00.360
amount of memory that can be used by a
00:15:02.480
query operation before it spills to disk
00:15:04.959
and creates a temporary file so we saw
00:15:07.000
those temporary files created in the
00:15:08.560
execution plan for our
00:15:10.120
query by default it's 4 megabytes but if
00:15:14.079
you've got queries that are creating um
00:15:16.320
performing complex uh sorts or hash
00:15:19.639
operations they might need a bigger
00:15:22.000
value and sometimes a much bigger
00:15:24.519
value you can check if work M
00:15:27.120
potentially needs to be increased by
00:15:29.000
setting log temp files if you set that
00:15:32.120
to zero it will log in your postest logs
00:15:35.480
every time one of these temporary files
00:15:37.560
is created so you can see that you're
00:15:39.120
spilling to disk and potentially need
00:15:41.360
more
00:15:43.079
workmen just watch out because if you've
00:15:45.399
got multiple concurrent users and you've
00:15:48.440
got lots of these complicated
00:15:51.199
queries um each of those could actually
00:15:53.720
be using many times workm so you can
00:15:55.680
very very quickly use up a lot of um a
00:15:58.759
lot lot of memory So to avoid using too
00:16:01.199
much you probably better to set it to a
00:16:03.800
high value just for certain queries that
00:16:06.199
or certain sessions that really need it
00:16:08.120
rather than across the whole in
00:16:13.000
instance I tested out uh the the
00:16:15.959
analytics query that I created with
00:16:17.639
different values of workmen so we
00:16:19.959
already saw with the default value of
00:16:21.800
four Meg it was taking around 6 seconds
00:16:24.279
and it was doing that sequential scan um
00:16:27.319
and those dis sorts so it was creating
00:16:29.959
the um the files on disk here I changed
00:16:33.360
it to 100 megabytes to see what that did
00:16:35.959
and regenerated the execution
00:16:39.639
plan just take it a minute to do that I
00:16:42.000
might not wait for it to do that but it
00:16:44.480
it took still nearly six seconds and it
00:16:47.560
still did an external dis sort so
00:16:51.120
setting it to a large value doesn't
00:16:53.600
always help all that much but it's very
00:16:55.839
much use case dependent
00:16:59.519
in the interests of time I'll just move
00:17:01.000
on from there and I even tried
00:17:02.800
increasing it right up to 1 Gigabyte to
00:17:04.720
see what happened that got rid of the
00:17:07.160
dis sort so we can now see a quite large
00:17:09.679
inmemory sort and it took the time down
00:17:12.160
to about 4 and a half seconds but that
00:17:13.880
to me doesn't seem like a fantastic
00:17:15.640
tradeoff for the sake of using a
00:17:17.520
gigabyte of memory just for that one
00:17:20.079
small analytics query obviously again it
00:17:22.760
will vary from case to case so try out
00:17:24.919
different values and see what impact it
00:17:26.640
has on your queries
00:17:31.000
if you're concerned that certain queries
00:17:32.919
might run for too long you can set
00:17:34.679
statement time out so that will abort
00:17:37.200
any statements that run for longer than
00:17:39.440
the specified amount of time by default
00:17:42.039
it's set to zero which is
00:17:44.120
disabled but you can set it either for
00:17:46.360
the whole instance or just for specific
00:17:48.360
sessions so it probably makes sense to
00:17:50.840
set it to an appropriate value for your
00:17:52.600
analytics sessions um so for example
00:17:55.400
your analytics query some of them it
00:17:57.039
might be okay if they run for many
00:17:58.440
minutes or even longer but you might
00:18:00.679
want your application statements to not
00:18:03.039
run for longer than a certain number of
00:18:07.640
milliseconds so as I said statement
00:18:09.760
timeout by default is set to zero which
00:18:12.760
is
00:18:15.320
disabled I set it here to 2,000
00:18:17.840
milliseconds or 2
00:18:25.640
seconds and then I tried again to run my
00:18:28.200
Analytics query just to see what effect
00:18:30.280
that has and check that it's working as
00:18:33.440
expected so now I can see that it
00:18:36.000
doesn't get to finish I get the error
00:18:37.760
cancelling statement due to statement
00:18:39.720
timeout after just over two
00:18:44.520
seconds if you want to make sure that
00:18:46.960
any statement that gets timed out in
00:18:48.919
this way is logged so you know that it's
00:18:50.640
happening make sure that log Min error
00:18:53.080
statement is set to error or lower so
00:18:56.039
you can see all the different valid
00:18:57.600
values here from the least severe debug
00:19:00.080
five up to the most severe which is
00:19:02.240
panic so if you set it to error then
00:19:05.600
error log fatal and panic messages will
00:19:09.600
all be written to your postest
00:19:14.480
logs indexing
00:19:17.400
strategy you'll probably need to create
00:19:19.960
specific indexes for your analytics
00:19:23.200
queries but obviously bear in mind that
00:19:25.840
query uh indexes have to be maintained
00:19:28.720
so they will have some impact on your
00:19:32.039
operational
00:19:34.000
activity so if you create too many
00:19:36.200
indexes on your application tables that
00:19:37.960
can start to slow down your inserts and
00:19:40.200
updates so you again need to get some
00:19:42.200
kind of balance
00:19:44.640
there as well as creating indexes on one
00:19:47.600
or more table columns you can also
00:19:49.559
create indexes on uh functions or scaler
00:19:53.039
Expressions computed from your table
00:19:54.960
columns so our analytics query for
00:19:57.720
example can obtains Expressions to
00:20:00.320
calculate something that looks like year
00:20:02.400
and month from the M time
00:20:04.960
column if we created an index on the M
00:20:08.080
time column then the query planner
00:20:10.360
wouldn't be able to use it for that
00:20:11.880
particular
00:20:14.760
query so what we can do instead is that
00:20:18.280
we can create an index on the
00:20:19.799
Expressions that are used in the query
00:20:21.840
so in this case 2,000 plus hour and
00:20:25.200
modulo 12 of minutes
00:20:32.559
so that takes longer to create than it
00:20:34.799
would if I just created it on the index
00:20:36.559
because it's got to do those
00:20:37.559
calculations as it goes along if we then
00:20:40.320
analyze our query again then we should
00:20:43.120
see that it's actually using that index
00:20:45.240
that we
00:20:55.360
created okay so we can see that it's
00:20:57.400
doing index scan using using that index
00:20:59.679
that we just created takes the query
00:21:02.200
time down to just over 4 seconds in this
00:21:04.960
case uh and we don't have any of those
00:21:07.440
dis sorts uh obviously in this case
00:21:10.200
we're selecting all of the data so
00:21:11.799
potentially using an index doesn't help
00:21:13.440
as much as it
00:21:14.720
could um you know case-by case basis and
00:21:18.240
just be aware that it will have an
00:21:20.320
impact on your operational activity so
00:21:22.640
you need to um Benchmark that to see if
00:21:25.360
it's acceptable
00:21:28.919
most analytics queries aggregate sort
00:21:31.559
and calculate large amounts of data so
00:21:34.919
they do totals averages comparisons
00:21:37.279
groupings Etc and these obviously are
00:21:39.960
resource intensive operations so it can
00:21:42.720
be useful to pre-calculate some of that
00:21:44.880
data so pre-calculate pre-sort
00:21:47.600
pre-aggregate um to make those queries
00:21:49.840
less expensive at
00:21:52.799
runtime if we create columns in a table
00:21:55.559
with the generated always as keyword
00:21:58.279
then
00:21:59.039
the column value will be automatically
00:22:01.279
calculated based on the expression so in
00:22:03.919
this case on the slide uh the sum of
00:22:06.960
columns A and B the stored keyword says
00:22:10.640
that it should be actually physically
00:22:12.919
stored in the table rather than just
00:22:14.960
calculated at query time um so in that
00:22:18.640
select you can see that c has been
00:22:20.320
calculated
00:22:21.440
automatically again it will have some
00:22:24.480
impact on inserts into that table so you
00:22:27.039
need to Benchmark that and see see
00:22:28.919
whether that's an acceptable
00:22:33.360
tradeoff so in our case we can add year
00:22:36.240
and month columns to our pgbench history
00:22:38.440
table telling postgress to generate the
00:22:40.919
value automatically using the
00:22:42.679
definitions that we've got in our um
00:22:45.640
reporting query now in this case it's
00:22:48.320
going to take a while because I've
00:22:49.559
already got those um 10 million rows in
00:22:52.840
the table so it's got to calculate those
00:22:54.640
values and insert them into those
00:22:57.080
columns
00:23:00.320
um so now once that's in we can use
00:23:04.200
those calculated columns in our query
00:23:06.919
instead of actually putting the
00:23:09.039
expression so I can just select year and
00:23:11.320
month so that takes the query of time
00:23:13.960
down to just over 3 seconds sorry I
00:23:16.000
should have left that to wait for
00:23:21.279
longer a materialized view is a physical
00:23:24.840
copy of the results of a query uh that
00:23:27.720
can be queried just like a normal
00:23:30.000
table material view materialized views
00:23:32.840
are useful um for preag aggregating data
00:23:36.760
and they're particularly useful if you
00:23:39.200
don't need completely upto-date
00:23:41.480
data if um for example data from 10
00:23:46.240
minutes ago an hour ago even a day ago
00:23:48.320
is acceptable you choose how often you
00:23:50.880
want to refresh that materialized view
00:23:52.919
to keep the data fresh enough for your
00:23:54.960
reporting purposes and you can also
00:23:57.600
create indexes on materialized views to
00:23:59.679
make them even more
00:24:02.039
performant so to create a materialized
00:24:04.320
view that stores the results of our
00:24:06.000
analytics queries we can use create
00:24:08.080
materialized view as and then use our
00:24:11.640
select
00:24:13.679
statement and then we can just select
00:24:16.120
year month and total sales from the
00:24:18.080
materialized view instead of running the
00:24:20.039
analytics
00:24:21.039
query uh so this takes it down to a
00:24:23.640
couple of milliseconds because it's only
00:24:25.720
having to select from that 14 row
00:24:27.760
materialized view instead of from the
00:24:30.200
entire
00:24:31.080
table in the real world our analytics
00:24:33.960
query is probably just one small
00:24:35.679
building block that's potentially used
00:24:37.600
in many different reporting queries so
00:24:40.120
you could rewrite all of those reporting
00:24:41.799
queries to select from the materialized
00:24:44.000
view instead of from that base
00:24:49.200
table and of course the best way to um
00:24:53.399
improve performance of something is
00:24:55.120
usually by not doing it so if you can
00:24:59.360
don't run analytics queries on your main
00:25:02.559
database at
00:25:03.799
all even if you don't have a completely
00:25:06.360
separate analytics environment there are
00:25:08.559
probably some things that you can
00:25:10.799
do most database environments already
00:25:13.720
have a high availability architecture in
00:25:15.559
place so that means you've probably got
00:25:17.320
one or more standby databases that's
00:25:20.200
replicating from your primary so it's
00:25:23.120
kept in sync with your primary and
00:25:25.120
available for readon transactions so you
00:25:27.960
could send send your analytics workload
00:25:29.520
to one of those to take the load off
00:25:31.240
your primary
00:25:32.960
database just be aware that physical
00:25:35.279
replication gives you an exact copy of
00:25:37.360
your primary database you can't create
00:25:39.559
separate users you can't create indexes
00:25:41.840
you can't change the layout of the
00:25:43.039
schema or anything like that any changes
00:25:45.559
that you need for your analytic workload
00:25:47.559
would need to be made on your primary
00:25:49.360
database and replicated
00:25:52.320
across so another option would be to use
00:25:55.279
logical replication you can use that to
00:25:57.799
replicate
00:25:58.760
just certain objects so just certain
00:26:01.000
schemas or tables from a publisher
00:26:03.000
database to a subscriber
00:26:05.760
database and since postest 16 uh which
00:26:09.039
came out in 2023 you can do logical
00:26:11.760
replication from a standby which takes
00:26:13.760
additional pressure of your primary
00:26:18.039
database logical replication is much
00:26:20.520
more complicated to set up and maintain
00:26:22.440
Than Physical replication but it does
00:26:24.399
give you a lot of flexibility you can
00:26:26.799
replicate just a selection of object
00:26:28.320
objects you can replicate to and from
00:26:30.520
multiple targets and
00:26:32.159
sources uh your subscriber database so
00:26:34.960
your replicated database is available
00:26:36.760
for read write so you can create indexes
00:26:39.240
users materialized views or anything
00:26:41.399
else that you need just on that
00:26:43.120
subscriber
00:26:47.559
database so in summary we've looked at
00:26:50.120
various different techniques that you
00:26:51.520
can use to make sure your database is
00:26:53.559
optimized for analytics without slowing
00:26:56.480
down your application or at least only
00:26:58.399
slowing it down an acceptable
00:27:00.679
amount if you can use physical or
00:27:03.240
logical replication to create a separate
00:27:05.240
database for your analytics workload
00:27:07.960
tune your configuration
00:27:09.880
parameters uh so that they are a good
00:27:13.360
compromise for your uh hybrid
00:27:16.200
workload consider your indexing
00:27:19.200
strategies create generated columns or
00:27:22.480
materialized views so that you
00:27:23.960
pre-aggregate and pre-calculate data
00:27:28.640
and don't forget that you can combine
00:27:30.600
all of those techniques you can create
00:27:32.279
indexes on your materialized views you
00:27:34.240
can change your configuration parameters
00:27:36.000
and do all of these other
00:27:40.559
things thank you so much this is a link
00:27:43.720
again to that previous version of the
00:27:45.240
slides I'm in the process of creating an
00:27:46.960
annotated version which will be
00:27:48.440
available on my
00:27:49.880
blog and I need to do a quick
00:27:52.480
congratulations to erat and jurri who
00:27:54.880
have won a signed copy of Andrew
00:27:56.399
Atkinson's book so come and see me
00:27:58.559
afterwards and I will hand that over to
00:28:00.640
you thank you very much