As AI continues to evolve, one of the limitations of generative AI is its tendency to hallucinate—producing inaccurate or nonsensical outputs, especially when it lacks relevant context. This is where Retrieval-Augmented Generation (RAG) comes in. RAG combines a large language model (LLM) with a knowledge retriever, helping provide context to queries and significantly improving accuracy. In this blog, we’ll explore how to build a RAG system for image descriptions using Supabase Vector and Azure OpenAI.
What is RAG??
When we normally run a query in any LLM, it processes it and gives response according to the data it has been trained on. But what if the query asked something it has not been trained on? That’s where RAG comes in. Retrieval-Augmented Generation (RAG) basically allows us to provide the missing context to the query so that the LLM can respond more accurately. The query is first sent to a knowledge retriever that can retrieve the relevant context/information. After that, the context from the relevant information is added to the query and then the query is sent to the LLM.
So how exactly are we going to build a Knowledge retrieval?? For that we would be storing the data in Vector format. Vector embeddings are numerical representations of data, including text or images, in a multi-dimensional space. Similar items are grouped closer together, allowing us to search semantically. For example, images with similar descriptions would have embeddings close to one another, enabling semantic search.
Let’s Build an Image-Based RAG
1. Uploading and Storing the Image
We begin by uploading an image and storing it in Supabase Storage. Using the following code, we handle the image upload and return a publicly accessible URL.
def upload_image(image, name, img_type):
temp = BytesIO()
image.save(temp, format='PNG')
res = supabase.storage.from_("images").upload(file=temp.getvalue(), path=f"{name}", file_options={"content-type": img_type})
url = supabase.storage.from_('images').get_public_url(name)
return url
2. Generating an Image Description
Next, we use Azure OpenAI’s powerful LLM to generate a description for the image. I am using Azure OpenAI, however you can use any multimodal llm.
Here’s the code for generating that description:
def generate_image_description(url):
headers = {"Content-Type": "application/json", "api-key": os.getenv("AZURE_OPENAI_API_KEY")}
payload = {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [{"type": "text", "content": "Generate a description for this image:"}, {"type": "image_url", "image_url": {"url": url}}]}]}
response = requests.post(ENDPOINT, headers=headers, json=payload)
return response.json()["choices"][0]["message"]["content"]
This provides a meaningful textual description of the image, which we use in the next step.
3. Creating Embeddings
With the description ready, we convert it into vector embeddings using Azure OpenAI. These embeddings are crucial for enabling search functionality:
def generate_embeddings(description):
headers = {"Content-Type": "application/json", "api-key": os.getenv("AZURE_OPENAI_API_KEY")}
payload = {"input": description, "model": "text-embedding-3-small"}
response = requests.post(ENDPOINT, headers=headers, json=payload)
return response.json()["data"][0]["embedding"]
This code transforms the image description into a vector embedding, making it searchable.
4. Saving the Embedding
We store these embeddings in Supabase Vector, allowing us to efficiently search through them later.
To use vectors, you need to enable that in Supabase. For that, go to your supabase dashboard and select your database. Then click on extensions and finally search and enable the vector
extension.
def save_embedding(name, embedding, url):
embeddings = vx.get_or_create_collection(name="docs", dimension=1536)
vectors = [(name, embedding, {"url": url})]
embeddings.upsert(vectors)
The whole upload workflow will be like the following:
5. Querying the Images
When users query an image, we generate a vector embedding for their query and use it to search through the stored embeddings to find the most relevant match:
def query_images(query):
embeddings = generate_embeddings(query)
docs = vx.get_collection(name="docs")
results = docs.query(embeddings, limit=1, include_metadata=True)
return results[0][1]["url"]
The vector search identifies the most similar image based on its embeddings, and the corresponding URL is returned.
And the whole query workflow will be like the following:
Adding a Simple UI with Streamlit
Now that we’ve built the backend logic for our RAG system using Supabase and Azure OpenAI, it’s time to add a user-friendly interface. I used streamlit since it allows me to quickly build a clean UI using very little python code. It has a lot of prebuilt components that we can use directly.
You can find the whole code on my GitHub repository: https://github.com/notnotrachit/Sample-Rag
Conclusion
By using Supabase Vector and Azure OpenAI, we can build a RAG system that efficiently retrieves relevant data (images in our case) and augments the language model’s responses. This approach not only improves the quality of the generated content but also addresses one of the major limitations of LLMs—hallucination.
Git Repo: https://github.com/notnotrachit/Sample-Rag
I also gave a talk on this topic recently at the Supabase LW12 Noida Meetup, so you can also check out the slides for your reference: Slides Link