100 Days of Gen AI: Day 21

I started work on a slackbot for Benchling where a user can tag @help-terraform-cloud and start a conversation with a slackbot trained on all our Benchling-specific guidance for how to use Terraform Cloud.

The ideal approach seems like a Retrieval-Augemented Generation (RAG) system. I don’t know much about this type of system so I started my learning with this AWS post: What is RAG (Retrieval-Augmented Generation)?

The essential insight of RAG is that when working with a large corpus of information, rather than stuff it all in the prompt with every request (which likely exceeds the LLM context window, not to mention would be a horribly inefficient use of tokens), to instead split it into two steps:

  1. Search for relevant content
  2. Generate a response using returned content

The Search step allows you to fetch the relevant text, then insert this into the prompt to generate a response based on only this subset of information returned by the search.

In order to build a searchable database of text, the strategy is to use an “embedding language model”, which converts your text into numerical representations which are then stored in a vector database. From here they can be retrieved via a search.

I decided to use Amazon Bedrock for this project, per guidance from our Security team. I’ve only used OpenAI before, so this is a new learning opportunity for me.

Amazon Bedrock has native support for what they call a “knowledge base” and is designed to implement a RAG system, handling much of this complexity for you behind the scenes. Following their guided setup is quite good.

I ran into some issues creating the data source because our AWS account didn’t have any embedding models enabled, which is necessary to create the data source and seed the vector database. Once I got approval to enable a text embedding model, I was all set.

I ended up using the Titan Text Embeddings v2 model from Amazon for the embeddings, and Claude 3.5 Sonnet for the response generation.

Testing out the first prototype from the AWS sandbox worked perfectly.

Next steps: