Obsidian Source: Notes / Solvendo - LLM
Summary
Pending synthesis from local Obsidian source.
Original source title: Solvendo Llm
Extracted Preview
Stack : Python(FastAPI, aws, Langchain, unstructured)
File : app.py
Workflow :
- For the demo app, we are basically making a RAG based system for the data we have in s3.
- The tasks are basically to get the documents from s3, start the ingestion pipeline, make calls to language model.(based on a prompt template and llm already decided by langchain). All of this is wrapped in a router class and is basically a module.
Can be improved :
- Credentials should be private. Maybe a secure config file.
- Should make some custom classes that makes plug and play of models, prompt templates and other data much easier.
File : pdf-process.py
Workflow :
- Use unstructured on the data(from s3), and start pdf_process(defined below)
- Extract the text, images, and data from the pdfs. Unstructured is really good for this case.
- The image part is a bit tricky. We use LLaVa for the understanding the data in the images, and the tables, and the text can be used as it is. Pydantic is used for validation.
- Different metadata are returned, and all of it is wrapped in FastAPI wrapper.
Integration Notes
- Source folder:
/home/yashs/Documents/Docs/Obsidian/Research-Notes - Local source:
/home/yashs/Documents/Docs/Obsidian/Research-Notes/Notes/Solvendo - LLM.md - Raw copy:
raw/obsidian/research-notes/Notes/Solvendo - LLM.md