Summarized using AI

Intro to AI Agents

Andrei Bondarev • September 13, 2024 • Sarajevo, Bosnia and Herzegovina • Talk

Summary of 'Intro to AI Agents'

In this insightful presentation titled "Intro to AI Agents" at the EuRuKo 2024 conference, speaker Andrei Bondarev shared his expertise on integrating AI agents within the Ruby ecosystem, particularly through the context of generative AI and business process automation. The talk emphasized the rapid advancements in AI capabilities and how developers can leverage these technologies to build innovative applications today.

Key Points Discussed:

  • Generative AI Overview: Bondarev highlighted how recent progress in AI has drastically reduced the time required to implement complex capabilities like classification, summarization, and translation.
  • AI Agent Definition: He defined AI agents as autonomous systems that perceive their environment, make decisions, and execute actions to achieve specific goals. This contrasts with conversational assistants, which operate under constant human guidance.
  • Components of AI Agents: The presentation delved into several critical components of AI agents, including:
    • Planning and Reasoning: Emphasizing the importance of formulating a plan and reflecting on it to ensure its relevance.
    • Tool Calling: Using external tools and APIs to perform structured tasks effectively, such as obtaining real-time data.
    • Memory: Storing progress and actions taken by the agent to create a more intelligent and context-aware system.
  • Challenges Encountered: Bondarev acknowledged ongoing issues such as hallucinations in AI output and the reliability of tool calling, which can hinder the deployment of AI agents in production environments.
  • Demonstration: A practical demo showcased the building of an AI Agent for an e-commerce store, automating business processes like order processing by orchestrating interactions with various services, showcasing the potential of AI agents in real-world applications.

Conclusion and Takeaways:

  • The integration of AI agents represents a significant opportunity for developers, especially within the Ruby framework, to enhance existing applications.
  • There is an urgent need for the Ruby ecosystem to adapt and embrace AI technologies to remain relevant.
  • Organizations should explore the potential for automation in their business processes, leveraging AI agents to optimize operations, enhance efficiency, and meet evolving market demands.

By focusing on the foundations laid by generative AI and AI agents, Bondarev concluded that developers could unlock tremendous possibilities for innovation.

Intro to AI Agents
Andrei Bondarev • Sarajevo, Bosnia and Herzegovina • Talk

Date: September 13, 2024
Published: January 13, 2025
Announced: unknown

The author of Langchain.rb will walk you through current capabilities of LLMs and what can be built today. We will build a business process automation AI agent in Ruby and discuss the common pitfalls and misconceptions. We'll discuss what might be emerging as a new LLM-powered software stack.

Generative AI has been taking the world by storm. The Coatue AI (Nov 2023) report is putting AI models at the centerpiece of all modern tech stacks going forward that Application Developers will be using to build on top of. It would not be controversial to say that the Ruby ecosystem lacks in its support and understanding of the AI, ML and DS landscape. If we'd like to stay relevant in the future, we need to start building the foundations now. We'll look at what Generative AI is, what kind of applications developers in other communities are building and how Ruby can be used to build similar applications today. We'll cover Retrieval Augmented Generation (RAG), vector embeddings and semantic search, prompt engineering, and what the state of art (SOTA) in evaluating LLM output looks like today. We will also cover AI Agents, semi-autonomous general purpose LLM-backed applications, and what they're capable of today. We'll make a case why Ruby is a great language to build these applications because on its strengths and its incredible ecosystem. After the slides, I'll walk the attendees through building an AI Agent in 15 min with Langchain.rb.

EuRuKo 2024

00:00:10.160 so last minute I decided to uh change the name of the talk slightly so this is going to be uh insured to AI agents um
00:00:19.279 my name is Andre bere um I live in South Florida um thank you for coming uh to
00:00:25.960 this talk uh thank you to the organizers his uh conference is is
00:00:32.239 great so uh the my day-to-day I run a software development firm um we build a
00:00:38.800 lot of rails applications for um VC back startups and Enterprises um and over the last uh 18
00:00:47.320 months I've been doing a lot of um AI research trying to figure out what's the best way to implement AI into our
00:00:55.640 applications so let's talk about generative AI um I think this summarizes the
00:01:03.320 current um wave of um progress in AI pretty well what used to take uh six to
00:01:11.520 seven months um a lot of these capabilities can be done in a couple
00:01:16.680 days or a couple weeks um so you no longer have to hire uh a dedicated data
00:01:25.280 science team um and have them spin the Wheels uh developing training their own
00:01:32.119 models because let's face it um unless you were working at a f company um uh it
00:01:38.759 was probably a wasted effort so um a lot of these really
00:01:45.320 common machine learning tasks like classification named entity recognition
00:01:50.880 summarization and translation um are just at the tip of
00:01:56.039 her fingers now um so classification when taking a let's say a um a news
00:02:03.719 article and classifying it by by topic technology Sports business named entity
00:02:09.399 recognition extracting um proper names or uh organization names
00:02:15.879 locations company names um out of text um summarizing text translating between
00:02:21.519 different languages so these capabilities are um just an API call
00:02:27.120 away um so previously Reserve to um
00:02:32.400 companies that are willing to invest a lot of money uh there's a long tail of smaller companies that are evaluating
00:02:39.640 how to implement um some of these capabilities into their products and
00:02:46.400 businesses so of course whenever um we talk about um AI uh every time there's a
00:02:53.480 there's there's a wave people immediately talk about AI agents and
00:02:58.879 this is not a New Concept ccept this this concept has been around for several decades um in fact um it was first um
00:03:07.280 mentioned by Alan Turing in 1950 in his paper same paper where uh he introduced
00:03:13.200 the Turing test um in 1970s and 80s we we had explored expert
00:03:19.760 systems 1990s and 2000s were dedicated to software
00:03:24.920 agents um in 2010 the original chop Bots with Siri and Alexa um and now we're
00:03:32.519 evaluating if um llm Can power AI
00:03:38.480 agents and every major tech company has um a vision around this that they're uh
00:03:45.480 putting out there so Google for example has um uh custom AI agents ironically
00:03:52.280 called gems um anthropic um has a tool of their own um and all the rest of massive tech
00:04:01.400 companies so what is an AI agent well the the textbook definition is it is an
00:04:08.239 autonomous system capable of perceiving its environment making decisions and
00:04:14.480 taking actions to achieve specific goals so environment awareness decision-
00:04:21.520 making and action taking um and these two terms are used
00:04:29.720 used pretty much interchangeably in the field right now and I kind of draw a small distinction
00:04:36.000 that assistant is more of a in a conversational uh interaction pattern um
00:04:43.880 is constantly taking uh directions from uh from Human whereas agent maybe takes
00:04:50.840 the original tasks from a human but it's it's off to uh work on the task on its
00:04:56.240 own kind of like a background job
00:05:01.280 so if we were to um kind of put some interfaces behind behind this concept so the
00:05:07.320 conversational assistant is a free-for-all um text input um so it's an
00:05:14.360 infinite number of uh possible prompt permutations um infinite uh kind of
00:05:22.639 attack Vector um if we're talking about prompt hacking and the autonomous agent um kind
00:05:30.319 of powers like hardcoded functionality so um you would have uh buttons that
00:05:37.919 have hardcoded prompts um behind the scenes that are um executing
00:05:43.880 tasks so it's guided input from from a user so what are the use cases uh for AI
00:05:52.440 agents well of of course the um kind of one one that's always talked
00:05:59.400 about is is just broadly automating um business processes um so automating mundane low
00:06:08.000 IQ tasks um um and personal assistance
00:06:14.199 co-pilots um so for example in in in my Consulting business um uh we could be
00:06:20.639 creating invoices from time sheets categorizing business expenses writing proposals that remix Ser our service
00:06:28.720 offerings and and uh client meeting noes um writing job descriptions writing
00:06:36.479 tickets J tickets from those meeting Nows um so when we're talking about
00:06:43.639 building an AI agent there's um there's several different components to
00:06:55.160 it so first thing is is planning and reasoning right this is kind of how
00:07:01.080 humans um when they have a goal in in mind they they think about what are the
00:07:07.400 steps necessary to achieve that goal um how do you plan how do you plan for that
00:07:12.599 how do you formulate the plan um and then kind of
00:07:17.840 recursively um reflect on that plan to to make sure it's the um it's a good
00:07:23.720 plan it's uh it's a relevant plan um
00:07:30.080 reasoning is the kind of Cornerstone of problem solving decision- making critical
00:07:36.360 analysis um so uh chain f is U uh is a technique
00:07:47.440 uh to uh Force the AI to explain its
00:07:53.479 reasoning um it's prompting technique so similar to how humans when they think
00:08:00.520 through different problems if you're um out loud kind of thinking through it
00:08:06.960 in your head um you're much more likely to to get better answers kind of using
00:08:14.000 um system 2 versus system one think and I think um so if we look at this example
00:08:22.240 um without CH thought so in this example um I'm asking how many full soccer
00:08:28.960 fields would be needed to uh cover the distance between New York City and Washington DC in a straight line um I'm
00:08:36.800 instructing it to just provide the answer and it's much more likely to
00:08:41.959 hallucinate and not give you the correct answer versus when
00:08:49.040 we ask the AI to explain its reasoning step by step as it's generating uh its
00:08:56.959 answers as it's generating tokens um it's much more likely to uh get to a
00:09:02.720 better answer so this is what uh Chain of Thought
00:09:08.440 is um role play is forcing the AI to adapt certain personalities character or
00:09:15.800 behavior via prompt engineering so you could have it um act as a as a
00:09:22.560 strict manager as a relax manager um As a Dungeon uh master in the DND um in
00:09:31.040 fact I think Obi has a pretty popular article um on that um um you can ask to be a
00:09:39.279 helpful AI assistant so uh let's talk about environment perception and environment
00:09:45.920 perception is really just just context within which the AI agent is operating
00:09:53.959 um and it could be as simple as today September 13th 2024 because
00:10:01.079 it doesn't have access to uh real time data uh it's trained um on a in a in a
00:10:08.480 snapshot in time um it's stateless um so
00:10:13.920 we would we would provide to data L this um if it helps it accomplish
00:10:21.760 tasks um Tool uh calling or function calling uh those two are equivalent
00:10:28.079 those two are synonymous and you would uh typically use them for either structured outputs so having the
00:10:35.800 response adhere to a predefined Json schema uh typically
00:10:41.639 Json um or whenever you want the AI agent to use external tools um and in a
00:10:49.600 way this is also intent detection because the AI chooses to um use certain
00:10:57.880 tools so you would use tool calling uh in uh some of these following uh
00:11:05.720 instances so for example getting data from an external API proprietary
00:11:12.040 API getting realtime data uh like I mentioned stateless um it's trained on
00:11:20.320 data sets um it's unaware of your personal data or proprietary data um you
00:11:27.079 would use tool calling to have the AI agent take actions um or execute uh
00:11:33.959 deterministic tasks um so the the llms
00:11:39.800 are probabilistic systems so it doesn't really make sense to have it um try and
00:11:47.200 answer uh for example uh arithmetic questions should I go back to slide
00:11:56.839 one um so executing deterministic tasks
00:12:01.920 so for example without tools if you were to try to add up two
00:12:07.000 numbers um specifically uh really large numbers because um the chances that it
00:12:14.399 was trained uh on larger numbers is is
00:12:19.440 um are are less um so if you uh try to get it to out of these two these two
00:12:24.760 numbers it looks like the right answer but it's not it's off by a couple of digits so it's kind of
00:12:31.120 guessing um whereas the solution is obviously to to use the uh the code
00:12:38.600 interpreter or the calculator I mean this this problem has been solved it's a deterministic task so it doesn't uh make
00:12:46.760 sense to um use a probabilistic system for that so in this case um uh where we
00:12:55.680 don't restrict it from Tool usage it just generates code passes it off to uh
00:13:01.480 a container that executes it uh in a python environment and voila get the
00:13:09.959 answer so uh till calling really looks like this um it's a uh Json schema format uh
00:13:20.120 declaring uh your kind of tool signature so in this case we're telling it about a
00:13:26.360 fine product uh function inside of Inventory management service and and
00:13:33.240 telling it uh its signature so you can pass it SQ so on the on the right hand
00:13:40.880 side um if let's say user is asking how many of these skes that we have left
00:13:47.199 then the assistant might choose to call this
00:13:53.240 function um so lastly when it comes to AI agents uh we also need to use memory
00:14:01.920 um in in a sense of like remembering um so recall that the
00:14:08.480 definition of an AI agent is uh an autonomous system that's able uh to
00:14:14.040 perceive its environment make decisions and take actions um so as it's progressing through through different
00:14:19.880 tasks um um we need to also be able to save the environment save the progress
00:14:25.720 um and save the the actions uh or uh tool
00:14:31.240 calling and whenever we talk about memory um uh we talk about rag um I'm kind of
00:14:39.279 tired of talking about rag um I feel like rag was very uh uh
00:14:45.399 2023 uh phenomenon um there's so much content on the internet about rag
00:14:52.440 but really all it is is just taking data in plain text from a a uh data source
00:15:00.720 and inserting it into the prompt um with the goal of the LM using that
00:15:08.079 information that's it that's all it is um typically uh when people talk about
00:15:15.079 rag they they talk about Vector search uh databases semantic search yes you can
00:15:20.320 use that but not not necessarily if you can run a better query um um and just
00:15:27.240 pull data directly from uh a a CSV file or uh a SQL database you
00:15:34.199 can you can do that so some of the common problems
00:15:41.440 still with AI agents um that I think are
00:15:47.120 preventing uh still preventing a lot of uh companies and product teams to aggressively go into production with
00:15:53.560 these systems are are that these systems still tend to hallucinate um
00:15:59.959 people go back and forth whether uh these llms can truly reason or not um I
00:16:06.079 feel like it's it's just split down the middle um and sometimes you get
00:16:11.639 unreliable tool calling um and uh you want to be able to
00:16:18.800 evaluate your AI agent um so there's public bench marks like on on on hugging
00:16:25.160 face on a variety of different tasks um and you can it's it's basically a
00:16:31.880 catalog of of like task to um uh outcome
00:16:38.319 um and then the reasoning steps kind of like showing showing my work um how I got to the answer um You can compare
00:16:46.040 your uh AI agent uh how it performs um you can also uh use the um the llm as a
00:16:54.160 judge approach where you have the llm um recursively kind of reflect on its own
00:17:01.120 progress um and you can ask it yourself like this was this was the goal um uh
00:17:07.480 these were your actions this was your answer how do you think you you did according to these
00:17:16.079 metrics yeah and this was um just an example of one of the benchmarks uh data
00:17:22.839 sets from from uh hugging face um yeah I just want I just want to show you that
00:17:28.480 they're like widely available there's so many of them and um basically you can
00:17:33.720 this specifically um is um I believe like a basic arithmetic
00:17:41.160 um uh data set but you could see the question and then in in the answer
00:17:46.240 column um it's basically like reasoning steps and then the answer so uh if if
00:17:52.200 this is the uh ideal State then you could kind of compare the execution of your agent um against that
00:18:02.400 so um this was actually my slide before uh this open AI model came out last
00:18:08.240 night um but I I thought I thought that the the next Frontier was I thought the
00:18:13.840 weakest link was actually reasoning um and from the data I've gathered uh was
00:18:22.280 that companies were going to train specific models uh to do reasoning and they were going to train uh these models
00:18:28.360 on reasoning data um and I thought there was I I was under the impression there
00:18:34.799 was no good data and apparently open AI announced a new model that um I don't
00:18:41.679 have access to yet so I haven't touched it but all of us to say that um I I
00:18:48.840 think we're going to go towards an architecture where you're going to have different models at different um uh
00:18:54.120 doing different tasks so there's going to be this reasoning model that's putting uh together top level high level
00:19:00.960 uh plans um and makes decisions and then you're going to use a different model to
00:19:06.280 uh do tool calling um classification Etc um so I want to um just quickly
00:19:15.760 introduce the uh theop Source library that we've been working on which is link chrb so it's a ruby framework for
00:19:22.840 building llm uh powered applications um kind of one of the cool things you could do is uh you can anate any kind of uh um
00:19:32.440 llm provider um and it's a unified interface um and you can swap them out
00:19:39.120 and um test it out really quick the
00:19:44.840 demo okay so um so we've been talking about AI
00:19:50.480 agents for a while let's see some let's see some code something tangible um so
00:19:57.559 to set the stage uh imagine we have this fictional e-commerce store called nerds
00:20:03.600 and threads um it sells um comfortable Nery t-shirts for
00:20:09.559 software Engineers that work from home um and like any other e-commerce
00:20:17.760 store it integrates with a variety of different Services um it integrates with a
00:20:23.799 customer management service an external system an email service payment get way
00:20:29.559 order management Inventory management shipping service um self-explanatory you
00:20:35.200 can kind of figure out what those Services would do um and the kind of logic you would put in those
00:20:43.080 services so if we were to usual visualize this architecture what we're
00:20:48.880 going to try to do is we're going to try to have an AI assistant um kind of
00:20:54.320 orchestrate the business logic um and string those six Services
00:21:01.960 together um and on the left hand side uh let me see if I can zoom in oh I
00:21:09.919 can't um so on the left hand side we have a uh we have system instructions
00:21:15.480 for the um AI assistant that we're going to be using so we're giving it some
00:21:20.760 contacts today September 13 2024 um we give it a role to play
00:21:29.159 that it is an AI that runs an e-commerce store called nerds and threads that sells compy nerdy t-shirts for software
00:21:36.360 Engineers that work from home um we give it additional contacts that it has access to all these
00:21:42.919 different Services um and um uh when we talk about
00:21:50.440 automating business processes so lots of businesses have um a good
00:21:56.480 business um has standard operating procedures for doing things right the
00:22:03.080 way you scale a businesses um create processes and and those processes
00:22:08.360 scale um so so we have this process for
00:22:14.840 processing new orders um which consists of creating a customer account if it
00:22:21.960 doesn't exist um so there's a some decision-making opportunity uh checking
00:22:28.039 inventory for items uh calculating total amount uh charging the customer creating
00:22:34.200 the order record creating shipping label if the address is in Europe use DHL um
00:22:40.159 if address is in the US use FedEx um and then sending an email notification to
00:22:45.600 the customer and then on the right hand side um is the is um we're using the uh
00:22:53.919 the op Source Library I mentioned uh to instantiate this assistant and pass a list of uh tools which are basically
00:23:03.960 classes extended classes so
00:23:10.400 let's go to this real quick hope this
00:23:15.840 works okay I'm going to paste some
00:23:21.200 instructions um these are the same instructions you saw um a slide
00:23:26.600 earlier and presume that we have a point of sale um front end system that sends
00:23:35.880 whenever a new order comes in and kind of sends this this payload um with a header New Order and
00:23:45.000 the customer email and the quantity and the skew of the item and and the
00:23:51.360 address um and then um when I run it um like I mentioned I'm expecting it to
00:23:59.000 um string those messages together and and facilitate that um um business logic
00:24:05.400 that standard operating procedure follow the steps for uh processing a new
00:24:16.159 order so the gray messages are uh tool calling and the red message uh sorry the
00:24:24.600 the red messages are messages that come back from uh the llm and and the gray
00:24:29.880 messages are um output from calling the tools so
00:24:35.880 if we walk through this execution again um we see that the first thing that it
00:24:43.440 decides uh to do is to find the customer so it calls us find customer
00:24:49.640 function um and the tool says um Yep this here's a customer record
00:24:56.399 id1 um and this actually interfaces with uh uh real sqlite database um I have
00:25:04.000 running um so it didn't it didn't end up creating Customer because customer
00:25:09.880 record exist um it then decided to find the product um it gets the product uh
00:25:17.240 and the payload Returns the you find the product by skew it Returns the skew quantity um and the
00:25:23.840 price it then charges the customer um by
00:25:29.440 um taking that price times five um that's the
00:25:34.760 amount um and we generate this uh transaction
00:25:40.000 confirmation uh we create the order record success order
00:25:45.159 ID uh create the shipping label uh it uses FedEx uh because the address is uh
00:25:50.960 in the US um and not DHL um not a European address and then it sends the
00:25:57.960 email um and we have an email provider that generates this
00:26:03.080 um um mail and um at the end uh the AI just
00:26:10.840 kind of Recaps this system um so I mentioned this was an
00:26:16.480 assistant and more of an conversational interface and can also ask it um you know after
00:26:25.520 this transaction um
00:26:34.399 how many of these cues do we have left um and it finds the product um that
00:26:43.200 Returns the latest quantity um we can ask it how many users
00:26:52.279 are in our D total
00:27:03.760 okay live demo
00:27:14.919 right so why would you use this
00:27:20.440 um well I I'm not saying I'm not saying like you should be going to production with this um today um
00:27:28.679 but um it's it's it's definitely it's definitely
00:27:33.840 aspirational um and I think if if if all goes well uh we're kind of going to Trend in that direction um so for
00:27:42.600 example you could be changing requirements of the fly right the the CEO comes to you and says um you know I
00:27:48.799 forgot today's a 20th anniversary of our store and we'd like to U offer um uh
00:27:55.600 discounts for loyal customers and but the product team is like well it's not it's not in the current Sprint and uh it
00:28:03.919 wasn't planned and we're not going to be able to do it for another two months um so maybe you could just write it up
00:28:12.039 immediately um um the use case that I was just trying to show you was um a
00:28:18.200 text to SQL um use case where um we have a tool and we can dynamically modify a
00:28:24.799 list of tools and uh one of the built-in tools we have is a um is a database adapter but you could
00:28:31.480 um connect to any kind of SQL database um and it's able to reflect on the
00:28:37.480 database schema because it has no clue about what your database schema looks like uh it's proprietary to you right um
00:28:44.720 and then based on that um it will create uh a SQL query that makes sense um
00:28:51.480 obviously create a dedicated role to uh to make sure that uh
00:28:58.760 you don't accid it doesn't accidentally drop the whole database so create a dedicated SQL role for
00:29:06.760 that so we did a brief overview of uh the
00:29:12.720 current gen capabilities um we talked about uh the AI agents um and the
00:29:21.440 different components they consist of in different things to take into the
00:29:26.519 consideration when you're building one uh planning and reasoning memory tool calling
00:29:32.159 Etc um I did a quick demo illustrating how an AI assistant can basically
00:29:39.880 automate our business logic our standard operating procedure um again a lot of
00:29:46.919 businesses have these massive written documents of uh their internal processes
00:29:53.559 I think there's a lot of opportunity for automating them
Explore all talks recorded at EuRuKo 2024
+39