notes.md



NOTES

SETUP
The makefile does something not quite right. requirements.in will have unstructured[md]jupyter which causes the later uv pip compile command to throw an error "Distribution not found". Strip that entry back to just unstructured and the rest of the commands will work.

Prefer to use ollama, it is FOSS


Setting up ollama

Use wget https://github.com/ollama/ollama/releases/download/v0.3.4/ollama-linux-amd64 to get ollama
Use chmod u+x ollama (put in bin folder)
Use ./ollama serve (consider running in background). This sets up the ollama API frontend serving communication over local HTTP
In a new terminal use ./ollama pull llama3.1:8b to get the llama3.1 model locally

./ollama pull nomic-embed-text to get nomic text embedding locally
pip install unstructured[md]


Langchain

Provides a higher level API interface to ollama
Has vector embedding models
Has RAG model interface
There are dataloaders for MD and HTML: https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/

Tables?
Chunking by section?
What about section contexts?

Will need to use some amount of empiricism to optimize
Larger chunks give lower granularity, harder to map back to source, but more context
Smaller chunks have higher granularity, easier to map back to source, but less context


Retrievers: https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/

Lots of options here, worth examining


Note not recommended to use the 8B model in document examination in production


its emprically defined SS like I showed in the rag notebook 3.0, but a good rule of thumb is, if you want granular acess to iformation use strategically small chunk sizes, then experiment across metrics that matter to you, like relevancy, retrieving certian types of information etc...

Consider using llama-index

embeddings

vectorization of existing stuff as a "pre-analyzer" for the llm model proper
helps identify which files/data are most likely relevant to a given query
langchain helps build this and provides reasonable results

paths to files
content of files

question is there a simple interface to get sections/anchor links from MD?

question how to have langchain load a pre-built database?

ultimately we'd like to have a chatbot that uses the docs site content as part of the vector store
it should build as part of ci/cd by cloning the docs repo, building the db, and hosting it on a server
the chatbot would then use that db for similarity search


doc parsing

opensource frontend: https://github.com/nlmatics/llmsherpa

backend is opensource also: https://github.com/nlmatics/nlm-ingestor


temperature
Closer to 0 is more "precise"
Closer to 1 is more "creative"

Tool Calling

This is a way of supplying the AI model with an API call it can interact with. A message is supplied to the AI and it can use the supplied tool call (function, db, etc) as additional information it can interact with.
This is an alternative to supplying it with pre-parsed documents
The LLM response may indicate or suggest a tool call be used, and provide arguments to use with that tool call.
Check 4.0 notebook
8b model is less robust than 70b


Chat Agent

Investigate 4.3
5.0 brings things all together