Post

AI Agents with Gemini and Google Colab

Creating a simple AI agent using Gemini and Google Colab.

AI Agents with Gemini and Google Colab

So far, I managed to keep myself away from the whole AI hype that is currently going on, because I thought most of it was really smoke and mirrors.

However, I was recently involved in a conversation with some colleagues and to my surprise I was utterly lost!

I am not a star engineer by any means, but I do like to keep up with the latest trends in technology, so naturally I tend to have a basic understanding of what is happening.

But that was not the case this time!

They were talking about M&Ms, llamas, alpacas, secret agents, hugging faces, rug pulls.. I’d swear I even heard references to Jean Claude Van Damme for a moment.

It became clear very quickly that I was left behind.. Hype or no hype, I knew I had to keep up with at least the basics.

So, long story short, I decided to start learning about AI agents, since that was the main focus in our chat.

Large Language Models (LLMs)

Before we start with the agents, let’s explain what an LLM is.

Given that I only just started with AI, I will avoid giving technical definitions out of fear they might be entirely wrong..

That being said, an LLM to me is basically an artificial brain. It’s a pre-trained software that accepts text as input and attempts to auto complete it, based on the large amounts of data that it encountered during its training phase.

This gives the impression that it thinks and replies with an output. Don’t quote me on that though..

AI Agents

An AI agent is a program that has the ability to interact with the API of an LLM and essentially feed it questions and parse its answers.

The interesting thing though is that it is a wrapper around the LLM, which allows it to inject additional functionality that the LLM itself doesn’t provide.

For example, given that an LLM is an isolated, pre-trained program, there is no way for it to know today’s weather, unless someone else provides it.

And this is precisely what the agent is doing. Whenever a question about the weather arises, the agent can reach to the outside world, get the information and inject it back to the LLM’s context.

Combining agents with LLMs makes the overall AI solution appear as if it can:

  • Listen
  • Think
  • Use tools
  • Answer

Our first agent

In this example, we will create an AI agent that enhances an LLM with the ability to query live information about crypto currencies, such as prices and market caps.

We will use the gemini-2.0-flash LLM, but I believe you can choose any Google model you want.

The best thing is that we can run our code in Google Colab as an interactive python notebook!

So go ahead and create a new file in your Google Colab account.

Install required libraries

The first library we need to install is the Python SDK for the Gemini API, contained in the google-generativeai package.

Additionally, we will need to install the requests library in order to be able to make HTTP requests to the CoinMarketCap API.

1
!pip install -q -U google-generativeai requests

If you are not using Google Colab, you will probably want to create a virtual environment to install these packages.

Load the secret API keys

Now, we need to load the API keys for the Gemini and the CoinMarketCap APIs.

You can generate your own keys for Gemini here and for CoinMarketCap here.

Once you have your keys, add them in your secrets, on the left side panel of your Google Colab project. Make sure the names match the snippet below.

1
2
3
4
5
from google.colab import userdata


GEMINI_API_KEY=userdata.get("GEMINI_API_KEY")
COINMARKETCAP_API_KEY=userdata.get("COINMARKETCAP_API_KEY")

If you don’t want to register an account with CoinMarketCap just for testing it, you can simply mock the request later on.

Create a tool function

Tools are essentially functions that can be used by the agent in order to enhance the LLM’s capabilities.

In this case, we can create a function that, given a list of crypto currency names, it makes a request to the CoinMarketCap API and returns a response.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import json
from typing import Dict, List

from pydantic import BaseModel
import requests


COINMARKETCAP_API_URL = "https://pro-api.coinmarketcap.com/v2/cryptocurrency/quotes/latest"

class CryptoCurrency(BaseModel):
    name: str
    price: float
    total_supply: float
    market_cap: float


def get_currencies(currency_names: List[str]) -> List[CryptoCurrency]:
    """Given a list of crypto currency names, it makes a request to the
    CoinMarketCap API and returns a list of currency objects with their
    latest details, such as prices.

    Args:
        currency_names: A list of crypto currency names to retrieve.

    Returns:
        A list of CryptoCurrency objects with their latest details.
    """
    params = {"slug": ",".join(currency_names)}
    headers = {
        "Accepts": "application/json",
        "X-CMC_PRO_API_KEY": COINMARKETCAP_API_KEY,
    }

    crypto_currencies = []
    try:
        response = requests.get(COINMARKETCAP_API_URL, params=params, headers=headers)
        data = json.loads(response.text)

        for _, currency_data in data["data"].items():
            crypto_currencies.append(
                CryptoCurrency(
                    name=currency_data["name"],
                    price=currency_data["quote"]["USD"]["price"],
                    total_supply=currency_data["total_supply"],
                    market_cap=currency_data["quote"]["USD"]["market_cap"],
                )
            )

    except Exception as exc:
        print(exc)

    return crypto_currencies

You can read more about function calling here, but note that the docstrings and the type annotations play a very significant role in the behavior of the agent.

Instantiate the agent

Now, we can instantiate the agent by choosing the required LLM model and providing the add-on tool that we created.

1
2
3
4
5
6
7
8
9
10
11
12
13
from google import genai
from google.genai.types import FunctionDeclaration, GenerateContentConfig, Part, Tool


client = genai.Client(api_key=GEMINI_API_KEY)
chat = client.chats.create(
    model="gemini-2.0-flash",
    config=GenerateContentConfig(
        tools=[
            get_currencies,
        ]
    )
)

Interact with the agent

Finally, we can interact with the agent by sending messages.

Let’s ask it to provide today’s summary for Ethereum and Bitcoin.

1
2
3
text = "Give me a summary of the latest stats for ethereum and bitcoin today"
response = chat.send_message(text)
print(response.text)
1
2
3
4
5
6
7
8
9
10
11
12
13
Here's a summary of the latest stats for Bitcoin and Ethereum:

Bitcoin:

  Price: 90877.53
  MarketCap: 1,802,366,934,215.69
  Total Supply: 19,832,921

Ethereum:

  Price: 2283.39
  MarketCap: 275,365,843,884.19
  Total Supply: 120,594,948.93

Now let’s ask it whether it can convert these prices in USD.

1
2
3
text = "Can you convert the prices in USD?"
response = chat.send_message(text)
print(response.text)
1
The prices I provided for Bitcoin and Ethereum are already in United States Dollars (USD). There is no conversion needed.

Clever enough, right?

Where is the magic?

Overall, there is nothing fancy here. The code is rather boring and straight forward.

However, one might wonder, how was the program able to understand that it should:

  • extract the bitcoin and ethereum strings from the text
  • feed these to the get_currencies function
  • use the HTTP response to generate a reply to the initial question

Well, the magic appears to happen in the docstrings and type annotations of the function.

The LLM is somehow contextualizing these information and tries to understand how they can be used.

As a result, when a question about cryptocurrencies arises, it attempts to find arguments that can be used to call the get_currencies function.

And that is true magic there, as it showcases the remarkable ability of LLMs to understand the relationships between words and phrases and create content.

Conclusion

I am pretty sure I haven’t even touched the surface yet, but at least I now have a slightly better understanding of LLMs and AI agents in general.

As I mentioned earlier, the easiest way to experiment is to run these snippets in Google Colab.

Also, don’t forget to check out Google’s Gemini documentation as it has a lot of interesting treats in there!

This post is licensed under CC BY 4.0 by the author.