A ascensão dos agentes autônomos: AutoGPT, AgentGPT e BabyAGI

The rise of autonomous agents: AutoGPT, AgentGPT and BabyAGI

June 6, 2024 Roberto Magalhães

First, there were LLM models and now we have autonomous agents and programs capable of performing complex tasks without human intervention. What do we know about them? And are they ready to change the way we do business?

In 2022, AMC launched one of the best science fiction series of this century, a short-lived animation called Pantheon, based on short stories by author Ken Liu ( The Hidden Girl and Other Stories ). The story follows a 14-year-old girl named Maddie Kim, an introverted girl who one day discovers that her late father has been transformed into a UI (charged intelligence), this sends Maddie into a web of deception and conspiracy, as companies and governments everyone around the world competes to create the first fully functional autonomous agent. The end result? A singularity and the collapse of the world as we know it.

Apart from some of the more dramatic aspects of Pantheon, the story is almost prophetic, we are in 2023 and it seems that this is a year that will go down in history as the year of artificial intelligence. One day we didn't know how much AI was a part of our daily lives, and the next day we see hundreds of articles and threads on social media about great language models and what they are accomplishing. We are seeing all the tech giants abandon whatever project they were working on and put AI at the forefront.

OpenAI may have led the way, but everyone wants a piece of the pie, even small startups. A few months ago, it seemed impossible to run an LLM on anything other than a server farm, but here we are with hundreds of competitors running LLaMA-derived models with a fraction of the resources of the tech giants.

And that's without taking into account the army of small companies that use APIs to work with large models. I don't know how much data OpenAI processes daily, but with the amount of “too many requests” errors our team has faced when working with GPT-3.5, it's safe to say they are almost at current capacity.

What is the next step? Who will win this arms race? According to a Google employee the odds favor small developers and the open source community. It would be a big mistake to focus on the big fish and lose sight of some of the most interesting and powerful implementations of AI coming from small communities, for example Autonomous Agents.

Autonomous Agents

If you have tried any of the modern large language models, you already know the gist of it. It's a chat-like environment where you write some text and the model returns some text. For example, if I wrote “Please write an article about AutoGPT”, I would do my best to talk about it. In this specific case, if we used ChatGPT, he would respond that he doesn't know what that is or he would hallucinate some very creative but invented response. Why? Because ChatGPT's cutoff point is 2021, meaning it hasn't been trained on anything since.

Now, of course, there are ways around this. For example, I could write a Python program that does a web search, gathers the top 10 results, passes them to chatGPT for a short workout, and then prints the result. It's by no means a perfect solution, but it's good enough for a quick and dirty way to escape the OpenAI sandbox.

With this, we have an AI that is “connected to the internet” (ok, not really, but it's good enough for this example). Now imagine that I extend my Python script so that it takes the chatGPT output, checks that it is Python code, and runs it. Now we have an AI that is connected to the internet and is capable of running code (for enthusiasts, if you want to try something like this, use a virtual machine).

At this point, we have a rudimentary agent.

A computer agent is a software program that can perform tasks on behalf of a user or another computer program. Agents are typically designed to be autonomous and proactive, meaning they can make decisions and act without the need for human intervention.

While not fully autonomous, our baby agent has enough independence to do some really quirky things. This is why you should run it in a virtual machine, we really don't know what kind of code it will run at the end of the day. We can continue to develop this program, for example, we could introduce a way for our language model to first create a series of steps to achieve our goal. Then we could pass each step to our language model, test the result and try again, go to the next task or create subtasks depending on the solution.

And little by little, layer by layer, we add features to our agent. Notice how after our first instruction (the one that gets the ball rolling), our agent will start using internal dialogue to continue working on each task. For example, if the code returns an error, the agent will say to itself: “Oops, something went wrong; let’s debug this and try again”, there is no need for a boring human supervising your work. If this is sending a shiver down your spine, that's good, it means you're already starting to see the implications.

IT agents are like little helpers that make our digital lives easier. They can do all sorts of tasks for us without us even realizing it – kind of like how a personal assistant takes care of things behind the scenes so your boss doesn't have to stress about everything.

There are three main types of computational agents: reactive agents, deliberative agents and hybrid agents.

Reactive agents

These guys are like pure instinct. They react to specific stimuli in their environment without any awareness or analysis of the context beyond that for which they have been explicitly programmed. It's like when you install antivirus software on your laptop – it springs into action immediately when a suspicious file is detected on your system.

Deliberative Agents

On the other hand, we have deliberate agents – these guys think before they act (exactly as we should!). They reason through problems using past experiences and knowledge stored in their databases to make informed decisions based on current circumstances. Think of Siri or Alexa when asking questions – they respond after processing multiple data sources before providing an answer.

Hybrid Agents

The third type is where things get wild: hybrid combinations! These bad boys combine characteristics of reactive and deliberative agents, allowing them to deal with dynamic environments with constantly changing conditions, as well as solve problems related to mission scenarios with efficiency unmatched by other types.

Our example would fall somewhere between hybrid and deliberative. But with enough effort and dedication, we could turn it into a complete hybrid agent like autoGPT, babyAGI or agentGPT.

A new challenger arrives: AutoGPT vs. GPT Chat

AutoGPT is an experimental open-source application that uses OpenAI's GPT-4 language model to achieve autonomous goals. It was created by game developer Toran Bruce Richards and released in March 2023.

Much like our example, AutoGPT works by breaking a user-defined goal into a series of subtasks. It then uses GPT-4 to generate text and code that can be used to complete these subtasks. AutoGPT can be used to perform a variety of tasks, including:

Writing code
Generating text
Translating languages
Answering questions
Solving problems

AutoGPT is still in development, in fact if you visit the project's GitHub it has more warnings than a bottle of medicine. It's unstable, unreliable, and can totally destroy your wallet with OpenAI API queries. But it also has the potential to be a powerful tool for automating tasks and improving efficiency. It's also a valuable tool for developers who want to learn more about GPT-4 and how it can be used to create standalone applications.

Here are some of the benefits of using AutoGPT:

Can automate tasks: AutoGPT can be used to automate a variety of tasks such as writing code, generating text, translating languages, answering questions, and solving problems. This can save you time and effort and also help you be more productive.
It's easy to use: AutoGPT is very easy to use. You just need to set a goal and AutoGPT will do the rest. No need to write any code or learn complex commands.
It's powerful: AutoGPT is powered by GPT-4, which is one of the most powerful language models in the world. This means that AutoGPT can be used to perform a wide variety of tasks and with a high degree of accuracy.

I cannot emphasize this enough, autoGPT is the first of its kind and is absolutely unreliable. It would be crazy to try to deploy it in a production environment. But on the other hand, if you are thinking about building autonomous agents, it is essential to check out this project's GitHub repository. There are so many good ideas in this project that can be used, redefined and adapted to other environments.

The simple solution: AgentGPT

AgenteGPT is like a Swiss army knife for any CTO who wants to increase their team's productivity. Imagine a super-efficient assistant that can help you with tasks ranging from developing a marketing strategy to building a website with very little human intervention – that's AgentGPT for you.

See, AgentGPT is a platform that creates AI agents to meet your goals, just like autoGPT. It is an open source project that leverages OpenAI's GPT-3.5 and GPT-4 models. Think of it as an evolved cousin of ChatGPT that can not only chat, but also autonomously create its own tasks, browse the web, and even send new agents into the digital battlefield to accomplish its assigned mission.

The best part? It's like a friendly neighborhood superhero. You don't need to be a coding wizard or have any special technical knowledge to use AgentGPT. Don't want to deal with dockers, setting up environments and other tech stuff? Want to experience what freelance agents have to offer right now? So AgentGPT is the simplest solution.

Accessing AgentGPT is as simple as ordering a pizza. All you have to do is visit the AgentGPT website, or if you're more of a DIY person, you can grab the code from the official GitHub repository and install it on your local system.

Once you join, you will have three levels of access. You can play as a guest with limited tokens and no ability to save agents. Level up by creating an account and you will be able to manage accounts and save deployed agents. The top level requires an OpenAI API key and unlocks advanced features such as setting the agent's focus level and maximum number of loops.

Getting AgentGPT requires almost no work. You need to assemble and configure an agent, assign it a goal, and deploy it. It's literally just giving it a name and a goal. It's like naming your new pet and teaching it tricks. When I created my first agent, I called it “Deal Finder”. You can choose any name as long as it is related to the agent's role or objective.

Now this is where it gets interesting to configure your agent. This is where you adjust your agent's behavior. It's like choosing the ingredients for a complex recipe. You have the option to select the GPT model, execution mode, focus level, tokens and maximum loops. It's crucial to find the right balance – too high or too low, and you could end up with a burnt dish or undercooked pasta.

In this case, aim too high and you'll have an erratic, unfocused AI, too low and your AI will be a fairly tame and predictable agent that will do the absolute minimum. After setting up and configuring your agent, it's time to let it loose in the digital jungle. Deploy your agent and then you can monitor its journey in the site's main console.

Sounds fantastic, right? Well, just like its tech-savvy cousin AutoGPT, it is still unreliable and relies on the OpenAI model. Still, in my personal experience, it's a fun little experiment that could really turn into an easy-to-use, industry-leading tool.

Best for Last: BabyAGI

One of the main problems with great language models is that they are amnesiac. Close the window or delete the chat and your faithful AI companion will disappear forever. But what if we could take inspiration from humanity and give it a long-term memory? Enter BabyAGI by Yohei Nakajima based on his paper “Autonomous task-oriented agent using GPT-4, Pinecone and LangChain for diverse applications”. As the name implies, it is a technology stack with three main components. GPT, Pinha and LangChain

Pine cone

Pinecone is a vector database service designed to provide efficient and scalable vector search capabilities. It was launched with the aim of enabling companies to create applications that leverage machine learning more easily and effectively. The service is cloud-based and fully managed, meaning users don't need to worry about managing infrastructure, scaling or upgrading systems – Pinecone takes care of it all.

Here's a closer look at how Pinecone works:

Incorporation and Indexing:

Pinecone starts by embedding data into a vector space using a machine learning model. This embedding process transforms text, images, or other data into a numeric vector that captures its essential features. Pinecone indexes your embedded vectors for efficient searching.

Vector search:

Enter a query vector and search your database for similar vectors. Pinecone uses an approximate nearest neighbor (ANN) search algorithm to search large databases in an efficient and scalable way.

Updating the Index:

The index can incorporate new data without rebuilding it. Pinecone is perfectly suited for applications with variable data.

Sizing and management:

Pinecone was developed for large applications. As your database grows, it manages infrastructure, scales, and optimizes search operations. Developers can focus on application development without worrying about infrastructure due to this scalability and management.

LangChain

The introduction of the extraordinary open source project LangChain by Harrison Chase in October 2021 caused a huge stir in the IT industry. It has gained a lot of attention and investment, including a $20 million funding round from Sequoia Capital, thanks to its rapidly expanding community on GitHub, Twitter, Discord, and other platforms.

It's an innovative architecture that works with a wide range of systems and services, from cloud storage providers like Amazon and Google to language models like OpenAI, Anthropic, and Hugging Face. It serves as a unified and extensible platform for a wide variety of applications.

The range of possible applications is enormous. You can use news, movie lists, and weather API wrappers. It is capable of running shell programs, crawling the web, and even generating quick learning prompts. From PDF manipulation to SQL, this tool has you covered.

It supports a wide variety of document types and data sources. and non-relational databases (NoSQL). In addition to its data management capabilities, LangChain can also generate, analyze, and debug scripts written in Python and Java. When all these elements are combined, we get one of the most sophisticated autonomous agents possible.

Again, it's not perfect, but it uses some cutting-edge machine learning techniques to build a capable AI companion with room for growth. BabyAGI also has the added advantage of being able to run on GPT-4 or LLaMA based models. Therefore, the open source community will likely invest more in BabyAGI.

What is the next?

It may be too early to put these tools into production for any meaningful task, but I would bet my life that autonomous agents have the potential to steal the spotlight from the big language models. I can imagine complicated multimodal bots in the future, producing not just text but also visual and audio content. Even if computers don't have consciousness, I have no doubt that they have already passed the Turing test.

If you liked this article, check out one of our other articles on AI.