diff --git a/notes.md b/notes.md deleted file mode 100644 index 0ea5283ab65370f788f0abc278dbc9d6481b5cee..0000000000000000000000000000000000000000 --- a/notes.md +++ /dev/null @@ -1,72 +0,0 @@ -# NOTES - -## SETUP - -The makefile does something not quite right. `requirements.in` will have `unstructured[md]jupyter` which causes the later `uv pip compile` command to throw an error "Distribution not found". Strip that entry back to just `unstructured` and the rest of the commands will work. - -- Prefer to use `ollama`, it is FOSS - -### Setting up ollama - -- Use `wget https://github.com/ollama/ollama/releases/download/v0.3.4/ollama-linux-amd64` to get ollama -- Use `chmod u+x ollama` (put in bin folder) -- Use `./ollama serve` (consider running in background). This sets up the ollama API frontend serving communication over local HTTP -- In a new terminal use `./ollama pull llama3.1:8b` to get the llama3.1 model locally -- `./ollama pull nomic-embed-text` to get nomic text embedding locally -- `pip install unstructured[md]` - -## Langchain - -- Provides a higher level API interface to ollama -- Has vector embedding models -- Has RAG model interface -- There are dataloaders for MD and HTML: <https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/> - - Tables? - - Chunking by section? - - What about section contexts? - - Will need to use some amount of empiricism to optimize - - Larger chunks give lower granularity, harder to map back to source, but more context - - Smaller chunks have higher granularity, easier to map back to source, but less context -- Retrievers: <https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/> - - Lots of options here, worth examining -- *Note* not recommended to use the 8B model in document examination in production - -> its emprically defined SS like I showed in the rag notebook 3.0, but a good rule of thumb is, if you want granular acess to iformation use strategically small chunk sizes, then experiment across metrics that matter to you, like relevancy, retrieving certian types of information etc... - -Consider using `llama-index` - -## embeddings - -- vectorization of existing stuff as a "pre-analyzer" for the llm model proper -- helps identify which files/data are most likely relevant to a given query -- langchain helps build this and provides reasonable results - - paths to files - - content of files - - **question** is there a simple interface to get sections/anchor links from MD? - - **question** how to have langchain load a pre-built database? - - ultimately we'd like to have a chatbot that uses the docs site content as part of the vector store - - it should build as part of ci/cd by cloning the docs repo, building the db, and hosting it on a server - - the chatbot would then use that db for similarity search - -## doc parsing - -- opensource frontend: <https://github.com/nlmatics/llmsherpa> -- backend is opensource also: <https://github.com/nlmatics/nlm-ingestor> - -## temperature - -Closer to 0 is more "precise" -Closer to 1 is more "creative" - -## Tool Calling - -- This is a way of supplying the AI model with an API call it can interact with. A message is supplied to the AI and it can use the supplied tool call (function, db, etc) as additional information it can interact with. -- This is an alternative to supplying it with pre-parsed documents -- The LLM response may indicate or suggest a tool call be used, and provide arguments to use with that tool call. -- Check 4.0 notebook -- 8b model is less robust than 70b - -## Chat Agent - -- Investigate 4.3 -- 5.0 brings things all together