Article

Obsidian Source: Notes / Solvendo - LLM

Summary

Pending synthesis from local Obsidian source.

Original source title: Solvendo Llm

Extracted Preview

Stack : Python(FastAPI, aws, Langchain, unstructured)

File : `app.py`

Workflow :

For the demo app, we are basically making a RAG based system for the data we have in s3.
The tasks are basically to get the documents from s3, start the ingestion pipeline, make calls to language model.(based on a prompt template and llm already decided by langchain). All of this is wrapped in a router class and is basically a module.

Can be improved :

Credentials should be private. Maybe a secure config file.
Should make some custom classes that makes plug and play of models, prompt templates and other data much easier.

File : `pdf-process.py`

Workflow :

Use unstructured on the data(from s3), and start pdf_process(defined below)
Extract the text, images, and data from the pdfs. Unstructured is really good for this case.
The image part is a bit tricky. We use LLaVa for the understanding the data in the images, and the tables, and the text can be used as it is. Pydantic is used for validation.
Different metadata are returned, and all of it is wrapped in FastAPI wrapper.

Integration Notes

Source folder: /home/yashs/Documents/Docs/Obsidian/Research-Notes
Local source: /home/yashs/Documents/Docs/Obsidian/Research-Notes/Notes/Solvendo - LLM.md
Raw copy: raw/obsidian/research-notes/Notes/Solvendo - LLM.md

Links Created Or Updated

Open Questions