Travel w/ SwAG - Hackathon

Table of Contents

Travel w/ SwAG - Berkeley LLM Agents MOOC Hackathon

Recently, myself and three friends participated in the LLM Agents Hackathon hosted by Berkeley RDI. The goal of the hackathon was to create innovative work using LLM agents that fit into one of the following tracks: applications, benchmarks, fundamentals, safety & decentralized. After some deliberation, we chose the ‘Applications Track’. Its judging criteria was: “Strong submissions will demonstrate novel use cases addressing real-world problems, with seamless integration of the LLM agent into the target domain and intuitive UI/UX. Projects should display strong potential for impact and widespread adoption.

With this in mind, each team member took a week to research their preferred domain (health, lifestyle or tourism) before re-grouping and discussing ideas. Ultimately, we decided to focus on tourism, with our goal being to develop a tool that assists users both before & during their travels. We felt there was a gap in the market for useful AI-based tools that help with travel. In my experience, most of the “top AI travel tools” on Google are primarily used for recommending holiday activities and landmarks, thus becoming vessels for adverts. This makes them impractical and not customizable to the user. Furthermore, these tools struggle to make adjustments to a plan mid-way through the trip, forcing the user to create a new itinerary to make minor adjustments. Therefore, we wanted to build a tool that is both useful at the planning stage, but also during the trip - with the ability to adjust plans, act as a tour guide and be bespoke to the user. We wanted to build something that people would be excited to interact with on the go.

It was important therefore, that the model could interact with the world and its surroundings in a similar manner to humans, which multimodal models enabled by allowing the input of both text and images. As such, we built a tool called ‘Travel w/ SwAG’ (Software Agents doing Great things), a mobile-first website with two features - the Everywhere Tourguide and the Swag Assistant. The Everywhere TourGuide is vision-based while the SwAG Assistant is text-based, each with access to a different set of tools for various tasks.

This youtube video is an ~20 minute overview of the tool, including demos of each feature.

SwAG Assistant

The SwAG assistant is a text-based assistant with access to many tools, including but not limited to: route optimization, geolocation & web search. By saying “I want to plan a trip to Marseille, Mont Blanc & Barcelona, what’s the total road trip distance?”, the model will use the available tools to plan the optimal route between these places, as well as suggest an itinerary. You can, of course, ask additional queries in response. For example, you can ask for the best Tajine restaurants in Marseille, or the most scenic route between Mont Blanc & Barcelona and the model will adjust the itinerary accordingly.

To implement this, we created an Assistant class, in which repeated calls were made to the Anthropic API, with each request conforming to their “tool use” schema. The Anthropic documentation on tool use is very detailed and is 100% worth the read. One point that helped us was the information on how to format the tool response, specifically the is_error parameter. This helped the model recover if the tool could not be executed or the input parameters were incorrect.

As always, well defined tools (and well defined parameters to those tools) with detailed descriptions help the model accurately make decisions. I’ll go into more detail on tools and how to implement them later in this blog post. The set of tools available to this assistant are:

  • SearchInternet
  • ReadWebsite
  • Geocode
  • GetDistanceMatrix
  • OptimizeRoute

Other than OptimizeRoute, these are self explanatory. OptimizeRoute used the resulting distance matrix from GetDistanceMatrix and built a tree by adding the closest point to the current location to the route until all points were included.

In my experience, I have found that older people (and at times even younger people) tend to be fairly tech illiterate, so the idea of navigating Google maps for complicated journey’s is a fairly daunting task. The SwAG assistant is beneficial because the requests can be expressed via natural language, and the LLM determines how to make calls to the available tools to obtain the information required.

Check out the demo of the SwAG assistant here!

Everywhere TourGuide

The Everywhere TourGuide is a multi-modal assistant the can interpret the user’s surroundings, research relevant information and provide insight about it.

We think that it has potential to assist in these use-cases:

  • (1) Everyday Planning
    • When does this restaurant open?
    • What are the reviews like?
  • (2) Information on Artistic Objects
    • What is this statue?
    • Tell me more about this building?
    • What’s the significance of the horse in this painting?

Being able to ask these questions without having to type them into the search engine is great, and having a infinitely patient tour guide, with the internet as its knowledge base is even better.

A unique aspect of this assistant was the integration of a segmentation model. Specifically, we used SAMv2 to allow the users to click on certain objects within the image and then make the model essentially play ‘spot-the-difference’ with the base and segmented images to figure out where in the original image the user wanted to know about.

Uploading two images to the model is not optimal, however, we found that just passing the segmented version (aka with a translucent blue shape on top of the selected object) often confused the model. To illustrate this, we can use the Salvador Dalí painting - “Soft Monster in an Angelic Landscape” (Fig. 1). Here, the user has clicked on the Monster itself, but when uploading this specific image to the model (Claude Sonnet 3.5), it was consistently thrown off, even when specifically told that the blue colouring wasn’t part of the original image. It would make internet queries like “salvador dalí painting blue blob vatican city”. Therefore, we decided to send a reference base image along with the segmented image. This greatly improved performance.

Soft Monster in Angelic Landscape

Figure 1 - A click was made at the green star (on the 'Monster').

You can see the working version here in the demo.

Overall, I found this feature really fun to build, but also useful - even using it when I was holiday in Singapore over the Christmas period to find out about the context of certain artworks in the National Gallery Singapore.

Tool Use

For tool use, we used Pydantic. I have been inspired by Jason Liu’s speech on Pydantic (found here since 2023. In it, he suggest how Pydantic is ‘all you need’ for building LLM applications and I have found that to be true. I built gpt-cotts - a wrapper for multiple LLMs with notes access - using a backbone of Pydantic.

Each of the tools are defined with a Pydantic class, allowing them to easily be converted into a Claude tool definition. For example, our SearchForNearbyPlacesOfType tool looks like this:

class SearchForNearbyPlacesOfType(BaseModel):
    """Search for information of nearby places of a certain type. The range is very smalll such that the places are guaranteed to the close to the users location. The response from this tool is a list of JSON objects containing the id, name, rating of the place and a list of photos (if requested)."""

    types: list[str] = Field(
        description="The types of place to search for.", max_length=100
    )
    include_photos: bool = Field(
        description="Whether to include photos in the response.", default=False
    )
    lat: float = Field(description="The latitude of the user's location.")
    lon: float = Field(description="The longitude of the user's location.")

    @model_validator(mode="after")
    def validate_types(self) -> Self:
        for t in self.types:
            if t not in cfg.POSSIBLE_PLACE_TYPES:
                error = f"Invalid place type: {t}. Please choose from the following: {cfg.POSSIBLE_PLACE_TYPES}"
                raise ValueError(error)

        return self

The Fields give the LLM a description of how to work with the tool, and the model_validator ensures that the tool is not even instantiated if the input to the tool is incorrect. Ensuring that the descriptions of each of the parameters and the error messages are detailed means that Claude knows how to act and what tool to use. I think this would assist in tool retrieval later in the project, when the number of available tools increases.

Prompting

Again, inspired by Jason Liu, all of our prompts are stored as Pydantic models. I find this style of prompting much easier to maintain and expand with new prompts over time. We did not use any specific prompting techniques here, other than a highly structured prompt using XML (as per the Claude documentation).

Given the recent success of the DeepSeek R1, a block would have been cool to introduce, but honestly, Claude handles a lot of the ‘reasoning’ and tool use selection well on it’s own.

Future Work

Unfortunately, since all of us where in full-time work, we didn’t manage to put as many hours into this as we wanted to, for me, it was ~40 hours over the 2 months (including research, “marketing” & team meetings). However, since I believe this project has a lot to give - especially considering that Google released their own version Project Astra about two weeks before the submission. Although Project Astra is more of a universal AI assistant, the first demo was almost exactly identical to the goal of Travel w/ SwAG. It was cool to see other people implementing this idea, and validated that it was, in fact, a good one!

We will, of course, continue to develop Travel w/ SwAG, turning it into a mobile app, and eventually utilising on-device models as a way to assist users who are travelling to remote places. Keep an eye on our Github for future updates! Our first order of business is to actually host this properly, not via an ngrok tunnel to my laptop.

Thanks to my team members Hamza, Diwakaran & Sid for helping out!

If you liked the blog, the linked video has a demo ‘on the streets’ for your enjoyment.

Also, thanks to Luiza Foster for proof-reading and editing this blog!