Cheap In-Memory RAG-Pipeline using LlamaIndex and Gemini 2.0 Flash¶
Lets see what the cool kid can do for free.
Before running this notebook, set a .env file in same folder as this notebook and set API key for gemini using GOOGLE_API_KEY key.
from dotenv import load_dotenv
assert load_dotenv()
You will need following packages:
pip install -U pymupdf llama-index llama-index-llms-gemini llama-index-llms-huggingface
from google import genai
from google.genai import types
import httpx
import os
import pymupdf
from time import sleep
from IPython.display import Markdown
client = genai.Client()
prompt = "Convert this document to EXACT markdown representation with special care on TABLES so that they are represented in a way suitable for RAG-Agent. Make sure to describe image so the context of images is converted CLEARLY into text."
File Setup¶
File should be in same folder if relative path is not specified.
filename = "bal_en_01-00.pdf"
fname = os.path.basename(filename).replace(".pdf", "")
limit_rate = 20971520 # 20 MB in GeminiAPI
Pre-Chunking¶
Not to be confused with chunking in RAG. This is just to split PDFs in small enough fragments to bypass Gemini file size restrictions. If you think you can write better logic, go for it.
fractions = (os.stat(filename).st_size // limit_rate) + 10 # poor mans cheap logic to split PDFs, you can do more using io.StreamIO and logic from [6]
fractions
14
Pre-Chunk¶
Split PDF into smaller fragments in memory. Remember, the finer the chunks, more information will be retrieved from the pages.
doc = pymupdf.open(filename)
page_count = len(doc)
page_fractions = page_count // (fractions-1)
doc_in_bytes = []
for i in range(0, len(doc), page_fractions):
split_doc = pymupdf.open()
print("Creating fractional PDF from page", i , "to", (i + page_fractions))
split_doc.insert_pdf(doc, from_page=i, to_page=(i + page_fractions))
doc_in_bytes.append(split_doc.write())
Creating fractional PDF from page 0 to 80 Creating fractional PDF from page 80 to 160 Creating fractional PDF from page 160 to 240 Creating fractional PDF from page 240 to 320 Creating fractional PDF from page 320 to 400 Creating fractional PDF from page 400 to 480 Creating fractional PDF from page 480 to 560 Creating fractional PDF from page 560 to 640 Creating fractional PDF from page 640 to 720 Creating fractional PDF from page 720 to 800 Creating fractional PDF from page 800 to 880 Creating fractional PDF from page 880 to 960 Creating fractional PDF from page 960 to 1040 Creating fractional PDF from page 1040 to 1120
Conversion¶
Convert each fraction into Markdown and concatenate back to string.
full_markdown = """"""
for idx, chunk in enumerate(doc_in_bytes):
print("Processing chunk of", idx)
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[(types.Part.from_bytes(data=chunk, mime_type="application/pdf"), prompt)]
)
resp_txt = response.text
resp_txt = "\n".join([line.replace("```markdown", "").replace("```", "") for line in resp_txt.splitlines()])
full_markdown += resp_txt
Processing chunk of 0 Processing chunk of 1 Processing chunk of 2 Processing chunk of 3 Processing chunk of 4 Processing chunk of 5 Processing chunk of 6 Processing chunk of 7 Processing chunk of 8 Processing chunk of 9 Processing chunk of 10 Processing chunk of 11 Processing chunk of 12 Processing chunk of 13
Save Markdown¶
Real Professionals always back up costly processes. (Not me but this time I will pretend.)
with open("bal_markdown.md", "w") as f:
f.write(full_markdown)
Retrieval using LlamaIndex¶
from llama_index.core import Document
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.llms.gemini import Gemini
/home/bijayregmi/.local/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Setting up Nodes¶
Here I use intfloat/multilingual-e5-small but feel free to use any other. This model is small and efficient in CPU-only environment.
embedder = HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-small")
Settings.llm = Gemini(model_name="models/gemini-2.0-flash") # API Key will read from env
md_doc = Document(text=full_markdown)
index = VectorStoreIndex.from_documents([md_doc], embed_model=embedder, show_progress=True)
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 3.71it/s] Generating embeddings: 100%|██████████| 94/94 [00:04<00:00, 20.41it/s]
retrieval = index.as_query_engine(similarity_top_k=5) # change top k if you want to spend all your money.
Retrieval in Natural Language¶
I am choosing to write a wrapper around IPython.display.Markdown so responses look better in notebooks.
def query_from_vstore(question: str) -> str :
return Markdown (
(
retrieval.query(question).response
)
)
query_from_vstore("Issuance Date of this manual")
2022-11-28
query_from_vstore("Counter-Ballast weights for Jib of 83.0 meters")
For a jib of 83.0 m, the counter-ballast consists of 8x A blocks and 2x B blocks, resulting in a total weight of 20.90 t. The arrangement of the counter-ballast blocks is AAAAAAAABB.
query_from_vstore("How much torque can i use for M16 with strength rating of 10.9")
For an M16 bolt with a strength rating of 10.9, the tightening torque depends on the bolt connection type and coating.
- Type 1: If the bolt has a zinc flake coating (FLZN), use 225 Nm. If it has no coating, galvanised coating, or zinc-nickel plated coating, use 290 Nm.
- Type 2: If the bolt connection complies with Liebherr standards, use 210 Nm.
- Type 3: If it is a high-tensile bolt connection from structural steelwork, use 215 Nm.
query_from_vstore("Erläutere alle erlaubten Spielräume in Bohrlöchern bei Pin-Verbindungen")
Here's a summary of the permissible play in boreholes for pin connections:
- For a 35 mm pin diameter, the permissible size of bore is less than 35.5 mm, the restricted permissible size of bore is 35.5 mm to 35.9 mm, and the non-permissible size of bore is greater than 35.9 mm.
- For a 45 mm pin diameter, the permissible size of bore is less than 45.6 mm, the restricted permissible size of bore is 45.6 mm to 46.1 mm, and the non-permissible size of bore is greater than 46.1 mm.
- For a 50 mm pin diameter, the permissible size of bore is less than 50.6 mm, the restricted permissible size of bore is 50.6 mm to 51.2 mm, and the non-permissible size of bore is greater than 51.2 mm.
- For a 60 mm pin diameter, the permissible size of bore is less than 60.7 mm, the restricted permissible size of bore is 60.7 mm to 61.4 mm, and the non-permissible size of bore is greater than 61.4 mm.
- For a 65 mm pin diameter, the permissible size of bore is less than 65.8 mm, the restricted permissible size of bore is 65.8 mm to 66.6 mm, and the non-permissible size of bore is greater than 66.6 mm.
- For a 70 mm pin diameter, the permissible size of bore is less than 70.9 mm, the restricted permissible size of bore is 70.9 mm to 71.7 mm, and the non-permissible size of bore is greater than 71.7 mm.
- For an 80 mm pin diameter, the permissible size of bore is less than 81.0 mm, the restricted permissible size of bore is 81.0 mm to 82.0 mm, and the non-permissible size of bore is greater than 82.0 mm.
- For a 100 mm pin diameter, the permissible size of bore is less than 101.2 mm, the restricted permissible size of bore is 101.2 mm to 102.4 mm, and the non-permissible size of bore is greater than 102.4 mm.
Crane operation is not permitted if the non-permissible size is reached.