Skip to content
Snippets Groups Projects
Commit c7724090 authored by William E Warriner's avatar William E Warriner
Browse files

update readme

parent 74f3e9b7
No related branches found
No related tags found
No related merge requests found
# README # README
## Future Work This repo is intended to serve as a simple example of how to use `ollama` to build a rudimentary Retrieval Augmented Generation (RAG) model using the UAB RC Documentation. This repo assumes you are using Cheaha at UAB (<https://rc.uab.edu>). If not, some of the commands will have to be modified.
There isn't anything particularly special or magical about RAG, at a basic level. I found breaking into the terminology more challenging than actually building a conceptual understanding of what is happening.
1. Take some data of interest, like the [UAB RC Documentation](https://docs.rc.uab.edu).
1. Generate a database of vectors embedding into some latent space. This is generally referred to colloquially as the "Embedding". These embedding vectors are typically generated by some embedding-specific deep learning model, but this isn't necessary. It could be as simple as some old-school Natural Language Processing, though the results might not be great.
1. Create a prepared prompt that includes three parts:
1. An engineered prompt suitable for your application. For a help-desk chat bot, you might instruct the model to reply as though it were a help-desk agent working at a call center.
1. A "blank space" for the user-supplied prompt.
1. A "blank space" for supporting data from the Embedding.
1. When the user submits a prompt, generate an embedding vector into the same latent space as the Embedding.
1. Use the prompt embedding vector to find nearest-neighbors in the embedding database. Select one or more of these to be the supporting data.
1. Using the prepared prompt, fill out the "blank spaces" with the user-supplied prompt and supporting data.
1. Submit the filled-out prepared prompt to the LLM and return the result.
While this is the most basic approach, there are several places where improvement can be made.
- More complex graph structures for the data making up the embedding. Having a graph structure in the database enables more complex lookup schemes that may produce more accurate results.
- Adding a quality-checking model to select supporting data.
- Tuning level of granularity of data in the embedding. Sentences? Sections? Pages?
- See [Future Directions](#future-directions) for even more ideas of where to go next.
## How to Use
It is recommended to use an HPC Desktop job in the Interactive Apps section of <https://rc.uab.edu>.
### One-time Setup
1. Clone this repository using `git clone`.
1. Install the `conda` environment.
1. `module load Miniforge3`
1. `conda env create --file environment.yml`
1. Obtain the rendered UAB RC Documentation pages by running `pull-site.sh`.
1. Setup `ollama` by running `setup-ollama.sh`.
1. Start the `ollama` server by running `./ollama serve`.
### Once-per-job Setup
1. Load the Miniforge module with `module load Miniforge3`.
1. Start the `ollama` server application with `./ollama serve`.
### To Run
1. Run the Jupyter notebook `main.ipynb`.
- At time of writing, the Documentation pages are enough data that it takes about 7-10 minutes to generate the embeddings. Please be patient.
## Supporting Software Used
The `llama-index` framework is used for RAG data parsing. The embedding database is generated using a custom section-based approach with ChromaDB as backend. Frameworks like `langchain` could be used instead for this purpose but, frankly, writing the custom code was simpler, easier to understand, and did not report errors and warnings.
The models are supplied by `ollama`.
- LLM: <https://ollama.com/library/llama3.1>
- Embedding: <https://ollama.com/library/bge-m3>
## Using other versions and models
Newer versions of `ollama` are compressed as `.tar.gz` files on the GitHub releases page (<https://github.com/ollama/ollama/releases>). When modifying the `setup-ollama.sh` script to use these models, you will need to take this into account.
Changing to other models may require varying levels of modification in the Jupyter notebook depending on the model.
## Future Directions
- other models
- LLM: there is llama 3.2, and models with more parameters
- Embedding
- cloud deployment - cloud deployment
- improve chunking strategy - improve chunking strategy
- probably too fine-grained right now using individual sections - probably too fine-grained right now using individual sections
...@@ -9,14 +73,16 @@ ...@@ -9,14 +73,16 @@
- try a hierarchical retrieval strategy - try a hierarchical retrieval strategy
- use full pages as initial pass - use full pages as initial pass
- then use sections only from within that page as second pass - then use sections only from within that page as second pass
- "BS" mitigation strategies? - try a graph-based strategy
- use internal linking to connect sections and pages in the database
- "BS" (hallucination) mitigation strategies
- improve embedding db persistence strategy - improve embedding db persistence strategy
- CI/CD triggered by docs changes - CI/CD triggered by docs changes
- mitigate prompt injection attacks - mitigation for prompt injection attacks
- <https://github.com/protectai/rebuff> not yet fully local - <https://github.com/protectai/rebuff> not yet fully local
- word counts limits (start at 1k maybe?) - word counts limits (start at 1k maybe?)
- check if response is similar to system prompt, if so, emit message - check if response is similar to system prompt, if so, emit message
- server-client model - server-client model
- client should be a page sending queries to a server which runs the backend code - client should be a page sending queries to a server which runs the backend code
- client should be very thin and light-weight - client should be very thin and light-weight
- streamlit could be a starting point: <https://docs.streamlit.io/develop/api-reference/chat> - streamlit could be a starting point for prototyping: <https://docs.streamlit.io/develop/api-reference/chat>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment