RAG using Audio as a Context¶

Join Discord if you need help + ⭐ Star us on Github ⭐

This notebook will show you how to use audio files as a context for your RAG pipeline. We are going to use 2 Indexify Extractors:

tensorlake/whisper-asr: This extractor will convert the audio file into text.
tensorlake/minilm-l6: This extractor will convert the text into embeddings.

Setup¶

In [ ]:

Copied!





%pip install accelerate ffmpeg indexify

# Download Indexify Server
!curl https://getindexify.ai | sh

# Download Extractors
!indexify-extractor download tensorlake/whisper-asr
!indexify-extractor download tensorlake/minilm-l6
%pip install accelerate ffmpeg indexify

# Download Indexify Server
!curl https://getindexify.ai | sh

# Download Extractors
!indexify-extractor download tensorlake/whisper-asr
!indexify-extractor download tensorlake/minilm-l6

After installing the necessary libraries, download the server, and the extractors, you need to restart the runtime. Then, you have to run Indexify Server with the Extractors.

Open 2 terminals and run the following commands:

# Terminal 1
./indexify server -d

# Terminal 2
indexify-extractor join-server

Create Extraction Graph¶

To create a great extraction graph, we need to understand the input data type that we are working with and the output data type that we want to achieve. In this case, we are working with audio files and we want to get relevant text by their embeddings.

For that, we are going to create 2 extraction policies:

Audio to Text
Text to Embeddings

In [2]:

Copied!

from indexify import IndexifyClient
client = IndexifyClient()
from indexify import IndexifyClient
client = IndexifyClient()

In [ ]:

Copied!





extraction_graph_spec = """
name: "audio-knowledgebase"
extraction_policies:
   - extractor: "tensorlake/whisper-asr"
     name: "transcription"

   - extractor: "tensorlake/minilm-l6"
     name: "transcription-embedding"
     content_source: "transcription_chunks"
"""

extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
extraction_graph_spec = """
name: "audio-knowledgebase"
extraction_policies:
   - extractor: "tensorlake/whisper-asr"
     name: "transcription"

   - extractor: "tensorlake/minilm-l6"
     name: "transcription-embedding"
     content_source: "transcription_chunks"
"""

extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph) 

In [8]:

Copied!





# Change the path to the audio file you want to upload.
PATH = ""
content_id = client.upload_file("audio-knowledgebase", path=PATH)
client.wait_for_extraction(content_id)
# Change the path to the audio file you want to upload.
PATH = ""
content_id = client.upload_file("audio-knowledgebase", path=PATH)
client.wait_for_extraction(content_id)

Indexify Retriever for RAG¶

After the process is completed, we can use IndexifyRetriever to retrieve the most relevant documents for a given query using the index created by the MiniLM Extractor.

In [22]:

Copied!

from indexify_langchain import IndexifyRetriever
params = {"name": "audio-knowledgebase.transcription-embedding.embedding", "top_k": 50}
retriever = IndexifyRetriever(client=client, params=params)
from indexify_langchain import IndexifyRetriever
params = {"name": "audio-knowledgebase.transcription-embedding.embedding", "top_k": 50}
retriever = IndexifyRetriever(client=client, params=params)

In [23]:

Copied!





from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

In [24]:

Copied!





template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI(openai_api_key="xxx")
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI(openai_api_key="xxx")
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

Ask Questions¶

In [25]:

Copied!

chain.invoke("Tell me about Grok")
chain.invoke("Tell me about Grok")

Out[25]:

'Grok is a company that has had a significant viral moment in its history recently. It was founded in 2016 and has been a long road for the company. The company has seen a surge in customers and interest, with 3,000 unique customers trying to consume their resources in a short period, ranging from Fortune 500 companies to developers. The company has been fortunate to experience this growth and potential disruption in the market. Time will tell how big the company can get, but there is a lot of market cap for Grok to gain by producing things at scale. The company has been described as a meager unicorn, with a last valuation of around a billion dollars. The potential for Grok to be disruptive in the market is significant, and it has had a very exciting and important moment in its history recently.'