Building LLM Agents in 3 Levels of Complexity: From Scratch, OpenAI Functions & LangChain

Understanding how LLM agents work by building it at 3 levels of complexity

Lucas Soares

~12 min read · December 27, 2023 (Updated: December 27, 2023) · Free: No

In this article I want to show you how to build simple LLM-based agents in 3 levels of complexity: from 'scratch' using only the OpenAI API, using OpenAI function calling and with LangChain

What is an Agent?

My rendition of what an agent: an agent is nothing more than some entity that can think and act, that's right, in a way you're an agent (lol).

You can think and act on those thoughts like in the case of coming reading this article:

- Thought: I want to learn about agents

- Action: Go to the internet and research interesting resources

- Thought: Medium has some neat content on agents

- Action: Look up medium articles

- Thought: Here is an interesting article by Lucas Soares

- Action: Read article

ReACT Framework and LLM Integration

In a way this is a silly interpretation of what may have brought you here, obviously not in this particular order nor these particular sets of thought and action pairs.

This particular way of thinking about how to structure thoughts and actions is well represented in the paper ReACT.

With regards to LLMs, how can we bring this idea to fruition thinking about the LLM model as the reasoning and thinking engine?

Implementation from 'Scratch'

Here is a strategy:

Let's think about some simple tasks an agent could perform
Let's write regular Python functions that can execute these tasks
Let's insert these functions inside a prompt to ChatGPT
Let's prompt the ChatGPT model to make use of these functions to solve the tasks given

Let's start by setting up the OpenAI API and use it to create some simple task options:

from openai import OpenAI
from IPython.display import Markdown

import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI()

def get_response(prompt_question):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-16k",
        messages=[{"role": "system", "content": "You are a helpful research and programming assistant"},
                  {"role": "user", "content": prompt_question}]
    )
    
    return response.choices[0].message.content

output = get_response("Create a simple task list of 3 desktop things I can do on the terminal.")
Markdown(output)

# Output
# Create a new directory: Use the mkdir command followed by the desired directory name to create a new directory on your desktop. For example: mkdir new_directory

# List files and directories: Use the ls command to display all the files and directories on your desktop. Simply type ls and press Enter.

# Remove a file: If you want to delete a file from your desktop, use the rm command followed by the file name. For example: rm file_name.txt. Be careful when using this command, as it permanently deletes the file and it cannot be recovered.

Ok cool, so here we have three ideas of actions to perform:

Creating directories
Listing files
Removing files

Let's transform them into functions that we could call just like in any type of Python-based application:

import subprocess

def create_directory(directory_name="test"):
    subprocess.run(["mkdir", directory_name])

def create_file(file_name="test.txt"):
    subprocess.run(["touch", file_name])

def list_files():
    subprocess.run(["ls"])

Now, let's imagine that we wanted to create an agent that would perform these actions for us based on some input that we give it, how can we connect models that we know and can use today like ChatGPT, with these tools that do stuff in the real world?

Executing Tasks with LLM Agents

To answer this question, how about we give a task to the model, and for that task we ask it to list the steps that it needs to perform to complete the task, and then for each of those steps we would ask the model to decide whether or not a function should be called to execute that task?

This is what the now famous paper Toolformer demonstrated!

They showed that today's advanced LLMs like the gpt-series could teach themselves how to properly call and use external tools (Schick et al., 2023)

Isn't that awesome???

So, let's see if we can hack our way into connecting the llm response with the functions that we want that llm to use.

task_description = "Create a folder called 'lucas-the-agent-master'. Inside that folder, create a file called 'the-10-master-rules.md"
output = get_response(f"""Given this task: {task_description}, \n
                            Consider you have access to the following functions:
                            
    def create_directory(directory_name):
        '''Function that creates a directory given a directory name.'''
        subprocess.run(["mkdir", directory_name])
    
    def create_file(file_name):
        '''Function that creates a file given a file name.'''
        subprocess.run(["touch", file_name])
    
    def list_files():
       '''Function that lists all files in the current directory.'''
        subprocess.run(["ls"])
    
    Your output should be the first function to be executed to complete the task containing the necessary arguments.
    The OUTPUT SHOULD ONLY BE THE PYTHON FUNCTION CALL and NOTHING ELSE.
    """)

Markdown(output)

# Output
# create_directory('lucas-the-agent-master')

Hey! Look at that the output is that function! Now, all we need is to direct this output to be executed! Let's use Python's built-in method exec:

exec(output)

Cool, now how can we actually put it all together so that given a task, a model can:

Plan the task
Execute actions to complete the task
Know when to call a function

? This is actually an interesting problem, let's now try to give the model the ability to execute multiple functions:


task_description = "Create a folder called 'lucas-the-agent-master'. Inside that folder create a file called 'the-10-master-rules.md'."
output = get_response(f"""Given a task that will be fed as input, and consider you have access to the following functions:
                            
    def create_directory(self, directory_name):
        '''Function that creates a directory given a directory name.'''
        subprocess.run(["mkdir", directory_name])
    
    def create_file(self, file_name):
        '''Function that creates a file given a file name.'''
        subprocess.run(["touch", file_name])
    
    def list_files(self):
       '''Function that lists all files in the current directory.'''
        subprocess.run(["ls"])  
    .
    Your output should be the a list of function calls to be executed to complete the task containing the necessary arguments.
    For example:
    
    task: 'create a folder named test-dir'
    output_list: [create_directory('test-dir')]
    
    task: 'create a file named file.txt'
    output_list: [create_file('file.txt')]
    
    task: 'Create a folder named lucas-dir and inside that folder create a file named lucas-file.txt'
    output_list: [create_directory('lucas-dir'), create_file('lucas-dir/lucas-file.txt')]
    
    The OUTPUT SHOULD ONLY BE A PYTHON LIST WITH THE FUNCTION CALLS INSIDE and NOTHING ELSE.
    task: {task_description}
    output_list:\n
    """)

Markdown(output)

# Output
# [create_directory('lucas-the-agent-master'), create_file('lucas-the-agent-master/the-10-master-rules.md')]

We can execute these commands with the exec command again:

exec(output)

# Checking files exist
# !ls lucas-the-agent-master/
# the-10-master-rules.md

Limitations of the naive approach

At this point we can start identifying a lot of issues with this approach despite our early sucess:

Uncertainty of model's outputs can affect our ability to reliably call the functions
We need more structured ways to prepare the inputs of the function calls
We need better ways to put everything together (just feeding the entire functions like this makes it a very clunky and non-scalable framework for more complex cases)

There are many more issues but starting with these, we can now look at frameworks and see how they attempt to fix some of these issues and with that in mind understand what is behind their implementations!

I personally think this is a much better way to understand what is going on behind agents in practice rather than just using the more higher level frameworks right of the bat!

OpenAI Functions

Ok, let's first understand how OpenAI the company behind ChatGPT, allows for these function call implementations in its API.

OpenAI implemented a function calling API] which is a standard way to connect their models to outside tools like in the very simple example we did above.

According to their official documentation the sequence of steps for function calling is as follows:

Call the model with the user query and a set of functions defined in the functions parameter.
The model can choose to call one or more functions; if so, the content will be a stringified JSON object adhering to your custom schema (note: the model may hallucinate parameters).
Parse the string into JSON in your code, and call your function with the provided arguments if they exist.
Call the model again by appending the function response as a new message, and let the model summarize the results back to the user.

Below is an example taken from their official documentation:

from openai import OpenAI
import json

client = OpenAI()

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": unit})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

def run_conversation():
    # Step 1: send the conversation and available functions to the model
    messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
    tools = [        {            "type": "function",            "function": {                "name": "get_current_weather",                "description": "Get the current weather in a given location",                "parameters": {                    "type": "object",                    "properties": {                        "location": {                            "type": "string",                            "description": "The city and state, e.g. San Francisco, CA",                        },                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    # Step 2: check if the model wanted to call a function
    if tool_calls:
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        messages.append(response_message)  # extend conversation with assistant's reply
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )  # extend conversation with function response
        second_response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=messages,
        )  # get a new response from the model where it can see the function response
        return second_response
output = run_conversation()
output.choices[0].message.content

# Output
# "The current weather in San Francisco is 72°C, in Tokyo it's 10°C, and in Paris it's 22°C."

Ok neat! So what's happening here is that they implemented the useful json schema to make the calls more reliable by structuring the input output flows, and trained the model on this schema to be better at calling functions.

Let's look at how our previous model with those three simple functions: create_directory would be implemented using OpenAI's function calling approach:

import json

def create_directory(directory_name):
    """Function that creates a directory given a directory name."""""
    subprocess.run(["mkdir", directory_name])
    return json.dumps({"directory_name": directory_name})


tool_create_directory = {
    "type": "function",
    "function": {
        "name": "create_directory",
        "description": "Create a directory given a directory name.",
        "parameters": {
            "type": "object",
            "properties": {
                "directory_name": {
                    "type": "string",
                    "description": "The name of the directory to create.",
                }
            },
            "required": ["directory_name"],
        },
    },
}

tools = [tool_create_directory]

import json

def run_terminal_task():
    messages = [{"role": "user", "content": "Create a folder called 'lucas-the-agent-master'."}]
    tools = [tool_create_directory]  
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-16k",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    # Step 2: check if the model wanted to call a function
    
    if tool_calls:
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "create_directory": create_directory,
        }
        messages.append(response_message)
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                directory_name=function_args.get("directory_name"),
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )
        second_response = client.chat.completions.create(
            model="gpt-3.5-turbo-16k",
            messages=messages,
        )
        return second_response

output = run_terminal_task()
output.choices[0].message.content

# Output
# "The folder 'lucas-the-agent-master' has been created successfully."

Great! We implemented openai function calling for creating directories! We could evolve this approach but let's stop for now and move on to the last level of these simple implementations of agents.

See more info on these examples from OpenAI's official cookbook.

Implementing Agents with LangChain

Ok, this is awesome, we can now use openai functions connected to the ChatGPT API endpoint to allow it to perform actions in the real world, and the calling of said functions is made easier by leveraging the json schema which structures how the model would call the function.

This is a great setup and tool but for more complex tasks, we want to have more control over the process these potential agents will go through in order to make them as reliable as possible. We want to control things like:

What goes in and out of prompts
What goes in and out of each thought and action pair stage the agent goes through when doing something in the real-world
A convenient interface to compose agents with useful building blocks (also leverage open source LLMs if we want to)

For scenarios like these where just connecting a model to some tools won't cut it, a really great framework that has had a lot of sucess recently is LangChain.

LangChain is a framework that allows for building LLM-powered applications by giving developers the ability to build and compose building blocks like prompts, models, tools and so on to develop complex and interesting applications.

To get an overview of the capabilities of LangChain, let's take a look below at a LangChain implementation of our simple agent that can create directories:

from langchain.tools import tool
from langchain.chat_models import ChatOpenAI
from langchain.agents import AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools.render import format_tool_to_openai_function
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser

@tool
def create_directory(directory_name):
    """Function that creates a directory given a directory name."""""
    subprocess.run(["mkdir", directory_name])
    return json.dumps({"directory_name": directory_name})


tools = [create_directory]

llm_chat = ChatOpenAI(temperature=0)

prompt = ChatPromptTemplate.from_messages(
[
    ("system","You are very powerful assistant that helps\
                users perform tasks in the terminal."),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

llm_with_tools = llm_chat.bind(functions=[format_tool_to_openai_function(t) for t in tools])

agent = (
{
    "input": lambda x: x["input"],
    "agent_scratchpad": lambda x: format_to_openai_function_messages(
        x["intermediate_steps"]
    ),
}
| prompt
| llm_with_tools
| OpenAIFunctionsAgentOutputParser())

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

action_input = "Create a folder called 'lucas-the-agent-master'."

agent_executor.invoke({"input": action_input})

# Output

# > Entering new AgentExecutor chain...

# Invoking: `create_directory` with `{'directory_name': 'lucas-the-agent-master'}`


# {"directory_name": "lucas-the-agent-master"}Folder 'lucas-the-agent-master' has been created successfully.

# > Finished chain.

# {'input': "Create a folder called 'lucas-the-agent-master'.",
#  'output': "Folder 'lucas-the-agent-master' has been created successfully."}

Let's break down what is happening here:

Define the custom tool function:

@tool
def create_directory(directory_name):
    """Function that creates a directory given a directory name."""
    subprocess.run(["mkdir", directory_name])
    return json.dumps({"directory_name": directory_name})

Create a list of tools:

tools = [create_directory]

Initialize the chat model:

llm_chat = ChatOpenAI(temperature=0)

Create a chat prompt template:

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are very powerful assistant that helps users perform tasks in the terminal."),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

Bind the chat model with the tools:

llm_with_tools = llm_chat.bind(functions=[format_tool_to_openai_function(t) for t in tools])

Define the agent:

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(x["intermediate_steps"]),
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

Create an instance of AgentExecutor:

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Invoke the agent with an input:

action_input = "Create a folder called 'lucas-the-agent-master'."
agent_executor.invoke({"input": action_input})

In the given setup, the intermediate_stepsand agent_scratchpadare used to represent the history of the agent's thought process and the intermediate results generated during that process.

By including the intermediate_steps and agent_scratchpad in the prompt template and passing them to the agent, the setup allows the agent to have access to its previous actions and tool outputs, enabling it to make informed decisions and generate more contextually relevant responses.

Don't worry about understanding the syntax, at a higher level here is a simple breakdown of the components at play here:

@tool : decorator: transforms a regular function into a usable tool for a LangChain-based agent.
ChatOpenAI : LangChain's implementation of the OpenAI API for chat models (like gpt-3.5-turbo-1106).
ChatPromptTemplate: something that allows you to programatically abstract over the chat prompt for the chat model.
format_tool_to_openai_function: LangChain's method to convert a tool to one formatted according to OpenAI's scheme.
format_to_openai_function_message: LangChain's method to format a message to the OpenAI function calling format.
OpenAIFunctionsAgentOutputParser: LangChain's output parsing for outputs of an OpenAI function.
agent = … | … | … |: Langchain's Expression Language interface (LCEL) which allows you to build these complex and cool agents by giving you a simple interface that uses the Unix pipe symbol |to compose building blocks like models and prompts.
AgentExecutor: LangChain framework that serves as the runtime for an agent. It is responsible for executing the agent's decision-making process, invoking tools, and generating responses.

When we use an agent we want that agent to have access to its previous outputs and decision-making process in order for it to make more informed decisions. Therefore, LangChain allows you to set all that up yourself so you have ultimate control over what is going on.

When to Use What Framework?

We are at a very early stage of these technologies, therefore is hard to know when should we use OpenAI functions, when to use LangChain or any other framework from this evergrowing LLM tooling space.

A simple and effective approach is to go underneath the implementation and try coding a silly toy example version yourself first to see what is the challenge you aim to solve, which should give you a clear indication of which framework best solves it.

Source code here (it is slightly different because for the article I made a few improvements).

If you prefer a video format here is my Youtube video on this topic:

If you liked this post, subscribe to my Youtube channel and my newsletter. Thanks and see you next time! Cheers! :)

References

#artificial-intelligence #large-language-models #chatgpt #machine-learning #data-science