00:00:10.160
so last minute I decided to uh change the name of the talk slightly so this is going to be uh insured to AI agents um
00:00:19.279
my name is Andre bere um I live in South Florida um thank you for coming uh to
00:00:25.960
this talk uh thank you to the organizers his uh conference is is
00:00:32.239
great so uh the my day-to-day I run a software development firm um we build a
00:00:38.800
lot of rails applications for um VC back startups and Enterprises um and over the last uh 18
00:00:47.320
months I've been doing a lot of um AI research trying to figure out what's the best way to implement AI into our
00:00:55.640
applications so let's talk about generative AI um I think this summarizes the
00:01:03.320
current um wave of um progress in AI pretty well what used to take uh six to
00:01:11.520
seven months um a lot of these capabilities can be done in a couple
00:01:16.680
days or a couple weeks um so you no longer have to hire uh a dedicated data
00:01:25.280
science team um and have them spin the Wheels uh developing training their own
00:01:32.119
models because let's face it um unless you were working at a f company um uh it
00:01:38.759
was probably a wasted effort so um a lot of these really
00:01:45.320
common machine learning tasks like classification named entity recognition
00:01:50.880
summarization and translation um are just at the tip of
00:01:56.039
her fingers now um so classification when taking a let's say a um a news
00:02:03.719
article and classifying it by by topic technology Sports business named entity
00:02:09.399
recognition extracting um proper names or uh organization names
00:02:15.879
locations company names um out of text um summarizing text translating between
00:02:21.519
different languages so these capabilities are um just an API call
00:02:27.120
away um so previously Reserve to um
00:02:32.400
companies that are willing to invest a lot of money uh there's a long tail of smaller companies that are evaluating
00:02:39.640
how to implement um some of these capabilities into their products and
00:02:46.400
businesses so of course whenever um we talk about um AI uh every time there's a
00:02:53.480
there's there's a wave people immediately talk about AI agents and
00:02:58.879
this is not a New Concept ccept this this concept has been around for several decades um in fact um it was first um
00:03:07.280
mentioned by Alan Turing in 1950 in his paper same paper where uh he introduced
00:03:13.200
the Turing test um in 1970s and 80s we we had explored expert
00:03:19.760
systems 1990s and 2000s were dedicated to software
00:03:24.920
agents um in 2010 the original chop Bots with Siri and Alexa um and now we're
00:03:32.519
evaluating if um llm Can power AI
00:03:38.480
agents and every major tech company has um a vision around this that they're uh
00:03:45.480
putting out there so Google for example has um uh custom AI agents ironically
00:03:52.280
called gems um anthropic um has a tool of their own um and all the rest of massive tech
00:04:01.400
companies so what is an AI agent well the the textbook definition is it is an
00:04:08.239
autonomous system capable of perceiving its environment making decisions and
00:04:14.480
taking actions to achieve specific goals so environment awareness decision-
00:04:21.520
making and action taking um and these two terms are used
00:04:29.720
used pretty much interchangeably in the field right now and I kind of draw a small distinction
00:04:36.000
that assistant is more of a in a conversational uh interaction pattern um
00:04:43.880
is constantly taking uh directions from uh from Human whereas agent maybe takes
00:04:50.840
the original tasks from a human but it's it's off to uh work on the task on its
00:04:56.240
own kind of like a background job
00:05:01.280
so if we were to um kind of put some interfaces behind behind this concept so the
00:05:07.320
conversational assistant is a free-for-all um text input um so it's an
00:05:14.360
infinite number of uh possible prompt permutations um infinite uh kind of
00:05:22.639
attack Vector um if we're talking about prompt hacking and the autonomous agent um kind
00:05:30.319
of powers like hardcoded functionality so um you would have uh buttons that
00:05:37.919
have hardcoded prompts um behind the scenes that are um executing
00:05:43.880
tasks so it's guided input from from a user so what are the use cases uh for AI
00:05:52.440
agents well of of course the um kind of one one that's always talked
00:05:59.400
about is is just broadly automating um business processes um so automating mundane low
00:06:08.000
IQ tasks um um and personal assistance
00:06:14.199
co-pilots um so for example in in in my Consulting business um uh we could be
00:06:20.639
creating invoices from time sheets categorizing business expenses writing proposals that remix Ser our service
00:06:28.720
offerings and and uh client meeting noes um writing job descriptions writing
00:06:36.479
tickets J tickets from those meeting Nows um so when we're talking about
00:06:43.639
building an AI agent there's um there's several different components to
00:06:55.160
it so first thing is is planning and reasoning right this is kind of how
00:07:01.080
humans um when they have a goal in in mind they they think about what are the
00:07:07.400
steps necessary to achieve that goal um how do you plan how do you plan for that
00:07:12.599
how do you formulate the plan um and then kind of
00:07:17.840
recursively um reflect on that plan to to make sure it's the um it's a good
00:07:23.720
plan it's uh it's a relevant plan um
00:07:30.080
reasoning is the kind of Cornerstone of problem solving decision- making critical
00:07:36.360
analysis um so uh chain f is U uh is a technique
00:07:47.440
uh to uh Force the AI to explain its
00:07:53.479
reasoning um it's prompting technique so similar to how humans when they think
00:08:00.520
through different problems if you're um out loud kind of thinking through it
00:08:06.960
in your head um you're much more likely to to get better answers kind of using
00:08:14.000
um system 2 versus system one think and I think um so if we look at this example
00:08:22.240
um without CH thought so in this example um I'm asking how many full soccer
00:08:28.960
fields would be needed to uh cover the distance between New York City and Washington DC in a straight line um I'm
00:08:36.800
instructing it to just provide the answer and it's much more likely to
00:08:41.959
hallucinate and not give you the correct answer versus when
00:08:49.040
we ask the AI to explain its reasoning step by step as it's generating uh its
00:08:56.959
answers as it's generating tokens um it's much more likely to uh get to a
00:09:02.720
better answer so this is what uh Chain of Thought
00:09:08.440
is um role play is forcing the AI to adapt certain personalities character or
00:09:15.800
behavior via prompt engineering so you could have it um act as a as a
00:09:22.560
strict manager as a relax manager um As a Dungeon uh master in the DND um in
00:09:31.040
fact I think Obi has a pretty popular article um on that um um you can ask to be a
00:09:39.279
helpful AI assistant so uh let's talk about environment perception and environment
00:09:45.920
perception is really just just context within which the AI agent is operating
00:09:53.959
um and it could be as simple as today September 13th 2024 because
00:10:01.079
it doesn't have access to uh real time data uh it's trained um on a in a in a
00:10:08.480
snapshot in time um it's stateless um so
00:10:13.920
we would we would provide to data L this um if it helps it accomplish
00:10:21.760
tasks um Tool uh calling or function calling uh those two are equivalent
00:10:28.079
those two are synonymous and you would uh typically use them for either structured outputs so having the
00:10:35.800
response adhere to a predefined Json schema uh typically
00:10:41.639
Json um or whenever you want the AI agent to use external tools um and in a
00:10:49.600
way this is also intent detection because the AI chooses to um use certain
00:10:57.880
tools so you would use tool calling uh in uh some of these following uh
00:11:05.720
instances so for example getting data from an external API proprietary
00:11:12.040
API getting realtime data uh like I mentioned stateless um it's trained on
00:11:20.320
data sets um it's unaware of your personal data or proprietary data um you
00:11:27.079
would use tool calling to have the AI agent take actions um or execute uh
00:11:33.959
deterministic tasks um so the the llms
00:11:39.800
are probabilistic systems so it doesn't really make sense to have it um try and
00:11:47.200
answer uh for example uh arithmetic questions should I go back to slide
00:11:56.839
one um so executing deterministic tasks
00:12:01.920
so for example without tools if you were to try to add up two
00:12:07.000
numbers um specifically uh really large numbers because um the chances that it
00:12:14.399
was trained uh on larger numbers is is
00:12:19.440
um are are less um so if you uh try to get it to out of these two these two
00:12:24.760
numbers it looks like the right answer but it's not it's off by a couple of digits so it's kind of
00:12:31.120
guessing um whereas the solution is obviously to to use the uh the code
00:12:38.600
interpreter or the calculator I mean this this problem has been solved it's a deterministic task so it doesn't uh make
00:12:46.760
sense to um use a probabilistic system for that so in this case um uh where we
00:12:55.680
don't restrict it from Tool usage it just generates code passes it off to uh
00:13:01.480
a container that executes it uh in a python environment and voila get the
00:13:09.959
answer so uh till calling really looks like this um it's a uh Json schema format uh
00:13:20.120
declaring uh your kind of tool signature so in this case we're telling it about a
00:13:26.360
fine product uh function inside of Inventory management service and and
00:13:33.240
telling it uh its signature so you can pass it SQ so on the on the right hand
00:13:40.880
side um if let's say user is asking how many of these skes that we have left
00:13:47.199
then the assistant might choose to call this
00:13:53.240
function um so lastly when it comes to AI agents uh we also need to use memory
00:14:01.920
um in in a sense of like remembering um so recall that the
00:14:08.480
definition of an AI agent is uh an autonomous system that's able uh to
00:14:14.040
perceive its environment make decisions and take actions um so as it's progressing through through different
00:14:19.880
tasks um um we need to also be able to save the environment save the progress
00:14:25.720
um and save the the actions uh or uh tool
00:14:31.240
calling and whenever we talk about memory um uh we talk about rag um I'm kind of
00:14:39.279
tired of talking about rag um I feel like rag was very uh uh
00:14:45.399
2023 uh phenomenon um there's so much content on the internet about rag
00:14:52.440
but really all it is is just taking data in plain text from a a uh data source
00:15:00.720
and inserting it into the prompt um with the goal of the LM using that
00:15:08.079
information that's it that's all it is um typically uh when people talk about
00:15:15.079
rag they they talk about Vector search uh databases semantic search yes you can
00:15:20.320
use that but not not necessarily if you can run a better query um um and just
00:15:27.240
pull data directly from uh a a CSV file or uh a SQL database you
00:15:34.199
can you can do that so some of the common problems
00:15:41.440
still with AI agents um that I think are
00:15:47.120
preventing uh still preventing a lot of uh companies and product teams to aggressively go into production with
00:15:53.560
these systems are are that these systems still tend to hallucinate um
00:15:59.959
people go back and forth whether uh these llms can truly reason or not um I
00:16:06.079
feel like it's it's just split down the middle um and sometimes you get
00:16:11.639
unreliable tool calling um and uh you want to be able to
00:16:18.800
evaluate your AI agent um so there's public bench marks like on on on hugging
00:16:25.160
face on a variety of different tasks um and you can it's it's basically a
00:16:31.880
catalog of of like task to um uh outcome
00:16:38.319
um and then the reasoning steps kind of like showing showing my work um how I got to the answer um You can compare
00:16:46.040
your uh AI agent uh how it performs um you can also uh use the um the llm as a
00:16:54.160
judge approach where you have the llm um recursively kind of reflect on its own
00:17:01.120
progress um and you can ask it yourself like this was this was the goal um uh
00:17:07.480
these were your actions this was your answer how do you think you you did according to these
00:17:16.079
metrics yeah and this was um just an example of one of the benchmarks uh data
00:17:22.839
sets from from uh hugging face um yeah I just want I just want to show you that
00:17:28.480
they're like widely available there's so many of them and um basically you can
00:17:33.720
this specifically um is um I believe like a basic arithmetic
00:17:41.160
um uh data set but you could see the question and then in in the answer
00:17:46.240
column um it's basically like reasoning steps and then the answer so uh if if
00:17:52.200
this is the uh ideal State then you could kind of compare the execution of your agent um against that
00:18:02.400
so um this was actually my slide before uh this open AI model came out last
00:18:08.240
night um but I I thought I thought that the the next Frontier was I thought the
00:18:13.840
weakest link was actually reasoning um and from the data I've gathered uh was
00:18:22.280
that companies were going to train specific models uh to do reasoning and they were going to train uh these models
00:18:28.360
on reasoning data um and I thought there was I I was under the impression there
00:18:34.799
was no good data and apparently open AI announced a new model that um I don't
00:18:41.679
have access to yet so I haven't touched it but all of us to say that um I I
00:18:48.840
think we're going to go towards an architecture where you're going to have different models at different um uh
00:18:54.120
doing different tasks so there's going to be this reasoning model that's putting uh together top level high level
00:19:00.960
uh plans um and makes decisions and then you're going to use a different model to
00:19:06.280
uh do tool calling um classification Etc um so I want to um just quickly
00:19:15.760
introduce the uh theop Source library that we've been working on which is link chrb so it's a ruby framework for
00:19:22.840
building llm uh powered applications um kind of one of the cool things you could do is uh you can anate any kind of uh um
00:19:32.440
llm provider um and it's a unified interface um and you can swap them out
00:19:39.120
and um test it out really quick the
00:19:44.840
demo okay so um so we've been talking about AI
00:19:50.480
agents for a while let's see some let's see some code something tangible um so
00:19:57.559
to set the stage uh imagine we have this fictional e-commerce store called nerds
00:20:03.600
and threads um it sells um comfortable Nery t-shirts for
00:20:09.559
software Engineers that work from home um and like any other e-commerce
00:20:17.760
store it integrates with a variety of different Services um it integrates with a
00:20:23.799
customer management service an external system an email service payment get way
00:20:29.559
order management Inventory management shipping service um self-explanatory you
00:20:35.200
can kind of figure out what those Services would do um and the kind of logic you would put in those
00:20:43.080
services so if we were to usual visualize this architecture what we're
00:20:48.880
going to try to do is we're going to try to have an AI assistant um kind of
00:20:54.320
orchestrate the business logic um and string those six Services
00:21:01.960
together um and on the left hand side uh let me see if I can zoom in oh I
00:21:09.919
can't um so on the left hand side we have a uh we have system instructions
00:21:15.480
for the um AI assistant that we're going to be using so we're giving it some
00:21:20.760
contacts today September 13 2024 um we give it a role to play
00:21:29.159
that it is an AI that runs an e-commerce store called nerds and threads that sells compy nerdy t-shirts for software
00:21:36.360
Engineers that work from home um we give it additional contacts that it has access to all these
00:21:42.919
different Services um and um uh when we talk about
00:21:50.440
automating business processes so lots of businesses have um a good
00:21:56.480
business um has standard operating procedures for doing things right the
00:22:03.080
way you scale a businesses um create processes and and those processes
00:22:08.360
scale um so so we have this process for
00:22:14.840
processing new orders um which consists of creating a customer account if it
00:22:21.960
doesn't exist um so there's a some decision-making opportunity uh checking
00:22:28.039
inventory for items uh calculating total amount uh charging the customer creating
00:22:34.200
the order record creating shipping label if the address is in Europe use DHL um
00:22:40.159
if address is in the US use FedEx um and then sending an email notification to
00:22:45.600
the customer and then on the right hand side um is the is um we're using the uh
00:22:53.919
the op Source Library I mentioned uh to instantiate this assistant and pass a list of uh tools which are basically
00:23:03.960
classes extended classes so
00:23:10.400
let's go to this real quick hope this
00:23:15.840
works okay I'm going to paste some
00:23:21.200
instructions um these are the same instructions you saw um a slide
00:23:26.600
earlier and presume that we have a point of sale um front end system that sends
00:23:35.880
whenever a new order comes in and kind of sends this this payload um with a header New Order and
00:23:45.000
the customer email and the quantity and the skew of the item and and the
00:23:51.360
address um and then um when I run it um like I mentioned I'm expecting it to
00:23:59.000
um string those messages together and and facilitate that um um business logic
00:24:05.400
that standard operating procedure follow the steps for uh processing a new
00:24:16.159
order so the gray messages are uh tool calling and the red message uh sorry the
00:24:24.600
the red messages are messages that come back from uh the llm and and the gray
00:24:29.880
messages are um output from calling the tools so
00:24:35.880
if we walk through this execution again um we see that the first thing that it
00:24:43.440
decides uh to do is to find the customer so it calls us find customer
00:24:49.640
function um and the tool says um Yep this here's a customer record
00:24:56.399
id1 um and this actually interfaces with uh uh real sqlite database um I have
00:25:04.000
running um so it didn't it didn't end up creating Customer because customer
00:25:09.880
record exist um it then decided to find the product um it gets the product uh
00:25:17.240
and the payload Returns the you find the product by skew it Returns the skew quantity um and the
00:25:23.840
price it then charges the customer um by
00:25:29.440
um taking that price times five um that's the
00:25:34.760
amount um and we generate this uh transaction
00:25:40.000
confirmation uh we create the order record success order
00:25:45.159
ID uh create the shipping label uh it uses FedEx uh because the address is uh
00:25:50.960
in the US um and not DHL um not a European address and then it sends the
00:25:57.960
email um and we have an email provider that generates this
00:26:03.080
um um mail and um at the end uh the AI just
00:26:10.840
kind of Recaps this system um so I mentioned this was an
00:26:16.480
assistant and more of an conversational interface and can also ask it um you know after
00:26:25.520
this transaction um
00:26:34.399
how many of these cues do we have left um and it finds the product um that
00:26:43.200
Returns the latest quantity um we can ask it how many users
00:26:52.279
are in our D total
00:27:03.760
okay live demo
00:27:14.919
right so why would you use this
00:27:20.440
um well I I'm not saying I'm not saying like you should be going to production with this um today um
00:27:28.679
but um it's it's it's definitely it's definitely
00:27:33.840
aspirational um and I think if if if all goes well uh we're kind of going to Trend in that direction um so for
00:27:42.600
example you could be changing requirements of the fly right the the CEO comes to you and says um you know I
00:27:48.799
forgot today's a 20th anniversary of our store and we'd like to U offer um uh
00:27:55.600
discounts for loyal customers and but the product team is like well it's not it's not in the current Sprint and uh it
00:28:03.919
wasn't planned and we're not going to be able to do it for another two months um so maybe you could just write it up
00:28:12.039
immediately um um the use case that I was just trying to show you was um a
00:28:18.200
text to SQL um use case where um we have a tool and we can dynamically modify a
00:28:24.799
list of tools and uh one of the built-in tools we have is a um is a database adapter but you could
00:28:31.480
um connect to any kind of SQL database um and it's able to reflect on the
00:28:37.480
database schema because it has no clue about what your database schema looks like uh it's proprietary to you right um
00:28:44.720
and then based on that um it will create uh a SQL query that makes sense um
00:28:51.480
obviously create a dedicated role to uh to make sure that uh
00:28:58.760
you don't accid it doesn't accidentally drop the whole database so create a dedicated SQL role for
00:29:06.760
that so we did a brief overview of uh the
00:29:12.720
current gen capabilities um we talked about uh the AI agents um and the
00:29:21.440
different components they consist of in different things to take into the
00:29:26.519
consideration when you're building one uh planning and reasoning memory tool calling
00:29:32.159
Etc um I did a quick demo illustrating how an AI assistant can basically
00:29:39.880
automate our business logic our standard operating procedure um again a lot of
00:29:46.919
businesses have these massive written documents of uh their internal processes
00:29:53.559
I think there's a lot of opportunity for automating them