Article

Obsidian Source: Notes / Solvendo - LLM

Summary

Pending synthesis from local Obsidian source.

Original source title: Solvendo Llm

Extracted Preview

Stack : Python(FastAPI, aws, Langchain, unstructured)

File : app.py

Workflow :

  • For the demo app, we are basically making a RAG based system for the data we have in s3.
  • The tasks are basically to get the documents from s3, start the ingestion pipeline, make calls to language model.(based on a prompt template and llm already decided by langchain). All of this is wrapped in a router class and is basically a module.

Can be improved :

  • Credentials should be private. Maybe a secure config file.
  • Should make some custom classes that makes plug and play of models, prompt templates and other data much easier.

File : pdf-process.py

Workflow :

  • Use unstructured on the data(from s3), and start pdf_process(defined below)
  • Extract the text, images, and data from the pdfs. Unstructured is really good for this case.
  • The image part is a bit tricky. We use LLaVa for the understanding the data in the images, and the tables, and the text can be used as it is. Pydantic is used for validation.
  • Different metadata are returned, and all of it is wrapped in FastAPI wrapper.

Integration Notes

  • Source folder: /home/yashs/Documents/Docs/Obsidian/Research-Notes
  • Local source: /home/yashs/Documents/Docs/Obsidian/Research-Notes/Notes/Solvendo - LLM.md
  • Raw copy: raw/obsidian/research-notes/Notes/Solvendo - LLM.md

Links Created Or Updated

Open Questions