Cloud-Free AI Experience: Building a Local RAG with Multi-Document Persistence Using LM Studio, LangChain, and Streamlit
Back to Top
To reach a broader audience, this article has been translated from Japanese.
You can find the original version here.
Introduction
#In the previous article, we built a simple RAG (Retrieval + Generation) setup targeting a single text file (Momotaro Story).
This time, as an extension, we will create a persistent local RAG app that can load, retain, and delete multiple documents.
Overall Structure
#Directory Structure
#project/
├─ app2.py # Main application
├─ vectorstore/ # Directory for persisting the vector store
│ ├─ faiss_index/ # FAISS index files
│ ├─ metadata.json # List of loaded files
│ └─ temp_docs/ # Temporary storage for uploaded documents
In the new app (app2.py
), the following enhancements are introduced:
Feature | Description |
---|---|
Multi-Document Support | Ability to ingest multiple files simultaneously, including PDF, Word, PowerPoint, and text |
Persistent Storage | Store the vector DB (FAISS) locally so that you don't need to rebuild after restart |
Individual File Deletion | Delete a specific file and rebuild the DB |
Enhanced UI | Use the Streamlit sidebar to list files, delete individual files, or delete all |
Improved Accuracy | Improve retrieval accuracy by converting queries into multiple variations using MultiQueryRetriever |
When you launch the app, the necessary folders under vectorstore/
are created automatically.
When you add files, the vector DB and metadata are persisted to disk.
Required Libraries
#Install the required libraries with the following command:
pip install langchain langchain-openai langchain-community langchain-huggingface sentence-transformers streamlit faiss-cpu pypdf python-docx python-pptx pydantic cryptography unstructured docx2txt
Full Program
#The main source code is as follows. Comments within the code describe each processing step. Since there are many lines, we'll explain the key parts of the program afterward.
import streamlit as st
import os
import tempfile
from pathlib import Path
import shutil
import json
import uuid
# --- Import LangChain-related Libraries ---
# Used for RAG (Retrieval-Augmented Generation) setup with LLM (large language model) and embeddings
from langchain_openai import ChatOpenAI # Connection to OpenAI-compatible LLM (e.g., LM Studio/Ollama)
from langchain_huggingface import HuggingFaceEmbeddings # HuggingFace embedding model (for vectorization)
from langchain.text_splitter import RecursiveCharacterTextSplitter # Split documents into chunks
from langchain_community.vectorstores import FAISS # Fast vector search engine (with local persistence)
from langchain.chains import RetrievalQA # RAG chain combining retrieval and answer generation
from langchain_community.document_loaders import (
PyPDFLoader, Docx2txtLoader, TextLoader, UnstructuredPowerPointLoader
) # Loaders for various file formats
from langchain.prompts import PromptTemplate # Prompt template for LLM
from langchain.retrievers.multi_query import MultiQueryRetriever # Improve retrieval accuracy with multiple query variations
# ============================================================
# Persistence-related path configuration
# ============================================================
DB_DIR = "vectorstore" # Directory for storing the vector store
DB_FAISS_PATH = Path(DB_DIR) / "faiss_index" # FAISS vector index files
DB_METADATA_PATH = Path(DB_DIR) / "metadata.json" # Metadata storage file
TEMP_DOCS_DIR = Path(DB_DIR) / "temp_docs" # Temporary storage for uploaded documents
# ============================================================
# Initialize the file_uploader key
# ============================================================
# Set a unique key to reinitialize file_uploader
if 'file_uploader_key' not in st.session_state:
st.session_state['file_uploader_key'] = str(uuid.uuid4())
# ============================================================
# Document loading function
# ============================================================
def load_document(file_path):
"""Load a document using the appropriate LangChain loader based on file extension"""
ext = os.path.splitext(file_path)[1].lower()
if ext == ".pdf":
loader = PyPDFLoader(str(file_path))
elif ext == ".docx":
loader = Docx2txtLoader(str(file_path))
elif ext == ".pptx":
loader = UnstructuredPowerPointLoader(str(file_path))
else:
loader = TextLoader(str(file_path), encoding="utf-8")
return loader.load() # Returns a list of LangChain Documents
# ============================================================
# Embedding model initialization (with caching)
# ============================================================
@st.cache_resource(show_spinner=False)
def get_embeddings():
"""Load and cache the Sentence-BERT embedding model from HuggingFace"""
return HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
# ============================================================
# Vector DB build and save function
# ============================================================
@st.cache_resource(show_spinner=False)
def build_and_save_db(documents, file_metadata_list):
"""Build a FAISS vector DB from a list of documents and save it along with metadata"""
if not documents:
# If no documents, remove existing DB
if Path(DB_DIR).is_dir():
shutil.rmtree(DB_DIR)
return None
# Split documents into smaller chunks (500 characters with 100 characters overlap)
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=100,
length_function=len
)
docs = text_splitter.split_documents(documents)
# Initialize embeddings model (using a Sentence-BERT-based model)
embeddings = get_embeddings()
# Build a new FAISS vector store
db = FAISS.from_documents(docs, embeddings)
# Create persistence directory and save
Path(DB_FAISS_PATH).parent.mkdir(parents=True, exist_ok=True)
db.save_local(str(DB_FAISS_PATH))
# Save metadata as JSON
with open(DB_METADATA_PATH, "w", encoding="utf-8") as f:
json.dump(file_metadata_list, f, ensure_ascii=False, indent=4)
return db
# ============================================================
# RAG chain creation function
# ============================================================
def create_rag_chain_from_db(db):
"""Create a RAG (Retrieval + LLM) chain from an existing DB"""
llm = ChatOpenAI(
model_name="local-model", # Specify the model loaded into LM Studio
openai_api_base="http://localhost:1234/v1", # LM Studio server URL
openai_api_key="not-needed", # No API key needed
temperature=0.1, # Favor higher probability tokens
max_tokens=512 # Limit to 512 tokens
)
# Use MultiQueryRetriever to improve search accuracy by generating multiple query variations
retriever = MultiQueryRetriever.from_llm(
retriever=db.as_retriever(search_kwargs={"k": 2}),
llm=llm
)
# Define the RAG prompt template (answer policy)
prompt_template = """
Based on the following reference documents, answer the question in English.
If the answer is not found in the reference documents, respond with "I don't know."
Reference Documents: {context}
Question: {question}
"""
PROMPT = PromptTemplate(
template=prompt_template,
input_variables=["context", "question"]
)
# Create a RetrievalQA chain (retrieval -> answer generation)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # Fill all retrieved results into the prompt
retriever=retriever,
return_source_documents=True, # Return source documents used for the answer
chain_type_kwargs={"prompt": PROMPT} # Apply custom prompt
)
return qa_chain
# ============================================================
# Single file deletion and DB rebuild
# ============================================================
def delete_single_file(file_id_to_delete):
"""Delete the specified file ID and rebuild the DB using remaining documents"""
if not DB_METADATA_PATH.exists():
st.error("Metadata not found. It is recommended to delete all data.")
return
with open(DB_METADATA_PATH, "r", encoding="utf-8") as f:
current_metadata = json.load(f)
# Identify the file to remove
file_to_remove = next((item for item in current_metadata if item["id"] == file_id_to_delete), None)
if not file_to_remove:
st.error("The specified file ID to delete was not found.")
return
# Delete the temporary file
temp_path = Path(file_to_remove["temp_path"])
if temp_path.exists():
os.remove(temp_path)
# Rebuild the DB with remaining files
new_metadata = [item for item in current_metadata if item["id"] != file_id_to_delete]
all_documents = []
with st.spinner(f"Deleting file and rebuilding AI with remaining data..."):
# Reload all remaining files
for item in new_metadata:
doc_path = Path(item["temp_path"])
if doc_path.exists():
documents = load_document(doc_path)
all_documents.extend(documents)
# Clear existing cache and rebuild DB
if "rag_chain" in st.session_state:
del st.session_state["rag_chain"]
build_and_save_db.clear()
# Build and save DB with remaining documents and metadata
db = build_and_save_db(all_documents, new_metadata)
# Update session state after rebuild
if db:
st.session_state.rag_chain = create_rag_chain_from_db(db)
st.session_state.current_files = new_metadata
st.toast(f"✅ File '{file_to_remove['name']}' deleted and DB updated.", icon="🗑️")
else:
# Cleanup after all files are deleted
keys_to_delete = ["rag_chain", "messages", "current_files", "file_identifiers"]
for key in keys_to_delete:
if key in st.session_state:
del st.session_state[key]
st.toast("✅ All files have been deleted.", icon="🗑️")
st.session_state['file_uploader_key'] = str(uuid.uuid4())
st.rerun() # Rerun Streamlit to update the UI
# ============================================================
# Delete all data
# ============================================================
def delete_all_data():
"""Delete the entire DB directory and clear session variables"""
if Path(DB_DIR).is_dir():
try:
shutil.rmtree(DB_DIR)
st.toast("✅ All loaded data has been deleted.", icon="🗑️")
except OSError as e:
st.error(f"Error occurred while deleting data: {e}")
return
# Clear session variables
keys_to_delete = ["rag_chain", "messages", "current_files", "file_identifiers"]
for key in keys_to_delete:
if key in st.session_state:
del st.session_state[key]
# Reset the file_uploader key
st.session_state['file_uploader_key'] = str(uuid.uuid4())
st.rerun()
# ============================================================
# Streamlit UI setup
# ============================================================
st.title("📄 Document Chatbot")
st.write("Upload multiple PDF, DOCX, PPTX, and text files and ask questions about their content.")
# --- Sidebar: File upload ---
uploaded_files = st.sidebar.file_uploader(
"Upload files (add or overwrite existing DB)",
type=["pdf", "docx", "pptx", "txt", "md"],
accept_multiple_files=True,
key=st.session_state['file_uploader_key']
)
# --- Check for existing DB ---
db_exists = DB_FAISS_PATH.exists()
metadata_exists = DB_METADATA_PATH.exists()
current_file_identifiers = [(f.name, f.size) for f in uploaded_files]
# ----------------------------------------------------------------------
# Initialization logic (merge existing data with new uploads)
# ----------------------------------------------------------------------
# 1. When new files are uploaded (add to or rebuild & save DB)
if uploaded_files:
# Load existing metadata (file list)
existing_metadata = []
if metadata_exists:
with open(DB_METADATA_PATH, "r", encoding="utf-8") as f:
existing_metadata = json.load(f)
# Create a set of existing file names for duplicate checking
existing_names = {meta['name'] for meta in existing_metadata}
# ----------------------------------------------------
# Process newly uploaded files
# ----------------------------------------------------
newly_uploaded_files = []
for uploaded_file in uploaded_files:
if uploaded_file.name not in existing_names:
newly_uploaded_files.append(uploaded_file)
else:
# Skip files with the same name as existing ones (no overwrite)
pass
if newly_uploaded_files:
all_documents = []
new_metadata_list = []
TEMP_DOCS_DIR.mkdir(parents=True, exist_ok=True)
with st.spinner("Loading new files..."):
for uploaded_file in newly_uploaded_files:
unique_id = str(uuid.uuid4())
# Save as a new temporary file
temp_path = TEMP_DOCS_DIR / f"{unique_id}_{uploaded_file.name}"
temp_path.write_bytes(uploaded_file.getvalue())
# Generate metadata for the new file
new_metadata_list.append({
"id": unique_id, # Unique ID
"name": uploaded_file.name,
"temp_path": str(temp_path)
})
documents = load_document(temp_path)
all_documents.extend(documents) # Add to document list
# ----------------------------------------------------
# Merge existing data with new data
# ----------------------------------------------------
# Reload existing files and add to the document list
for meta in existing_metadata:
doc_path = Path(meta["temp_path"])
if doc_path.exists():
documents = load_document(doc_path)
all_documents.extend(documents)
# Merge metadata lists
combined_metadata = existing_metadata + new_metadata_list
with st.spinner("Integrating and rebuilding AI data..."):
# Clear old cache
if "rag_chain" in st.session_state:
del st.session_state["rag_chain"]
build_and_save_db.clear()
# Rebuild and save the DB with all documents and metadata
db = build_and_save_db(all_documents, combined_metadata)
st.session_state.rag_chain = create_rag_chain_from_db(db)
st.session_state.current_files = combined_metadata # Save new file list in session
st.session_state.messages = []
st.info("New data has been added to existing data and AI is ready.")
# Reset uploader to clear the file list UI
st.session_state['file_uploader_key'] = str(uuid.uuid4())
st.rerun()
else:
# Files were uploaded but all had the same names as existing files
st.info("All uploaded files have names matching existing files, so processing was skipped.")
# Reset the uploader to prevent accidental deletions
st.session_state['file_uploader_key'] = str(uuid.uuid4())
st.rerun()
# 2. No uploads, existing DB exists, and rag_chain not in session (fast load)
elif not uploaded_files and db_exists and "rag_chain" not in st.session_state:
with st.spinner("Loading existing AI data (DB)..."):
# Load embedding model
embeddings = get_embeddings()
# Load existing FAISS DB from disk
db = FAISS.load_local(str(DB_FAISS_PATH), embeddings, allow_dangerous_deserialization=True)
st.session_state.rag_chain = create_rag_chain_from_db(db)
# Load existing metadata (file list)
if metadata_exists:
with open(DB_METADATA_PATH, "r", encoding="utf-8") as f:
loaded_metadata = json.load(f)
st.session_state.current_files = loaded_metadata
st.success("You can now chat with the existing documents.")
else:
st.session_state.current_files = []
st.warning("Documents are loaded, but the original file name list was not found.")
st.session_state.messages = []
# 3. Display initial message
if "rag_chain" not in st.session_state and not db_exists:
st.info("Upload files or existing saved data will be loaded automatically.")
# ----------------------------------------------------------------------
# Display logic: show loaded file names and delete buttons
# ----------------------------------------------------------------------
if "current_files" in st.session_state:
st.sidebar.markdown("---")
st.sidebar.subheader("Currently Loaded Files")
if st.session_state.current_files:
for file_meta in st.session_state.current_files:
col1, col2 = st.sidebar.columns([0.8, 0.2])
# Show file name
col1.markdown(f"- **{file_meta['name']}**")
# Individual delete button UI (popover for confirmation)
with col2.popover("🗑️", help="Delete this file"):
st.write(f"Delete file **{file_meta['name']}** and rebuild the DB?")
if st.button("Confirm Delete", key=f"delete_{file_meta['id']}", type="secondary"):
delete_single_file(file_meta['id'])
else:
st.sidebar.markdown("- No files.")
st.sidebar.markdown("---")
if Path(DB_DIR).is_dir():
# Delete all data button
if st.sidebar.button("🗑️ Delete All Loaded Data", type="secondary"):
delete_all_data()
# ============================================================
# Chat interaction
# ============================================================
if "rag_chain" in st.session_state:
# Show chat history
if "messages" in st.session_state:
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# User input
if prompt := st.chat_input("Ask a question about the documents"):
with st.chat_message("user"):
st.markdown(prompt)
st.session_state.messages.append({"role": "user", "content": prompt})
rag_chain = st.session_state.rag_chain
with st.spinner("Thinking..."):
# Run RAG (retrieval + answer generation)
response = rag_chain.invoke({"query": prompt})
answer = response["result"]
# Display answer and expand source documents
with st.chat_message("assistant"):
st.markdown(answer)
with st.expander("Source Documents"):
for doc in response["source_documents"]:
st.markdown(f"--- \n {doc.page_content}")
st.session_state.messages.append({"role": "assistant", "content": answer})
Overall Code Structure
#A high-level overview of the structure of app2.py
is as follows:
1. Document loader functions
2. Vector DB build, save, and load functions
3. RAG chain creation function (with search accuracy improvement using MultiQueryRetriever)
4. File deletion logic
5. Streamlit UI setup
- Sidebar: upload, delete, list display
- Main screen: chat UI
This separation clearly divides the RAG processing and the UI, resulting in a highly extensible design.
Explanation of Key Components
#In this chapter, we dive deeper into the internal structure of app2.py
and explain in detail how each function and module collaborate to build the RAG (Retrieval-Augmented Generation) environment.
1. Document Loader
#This part functions as the system's "input gate." It receives uploaded files and converts them into Document
objects that LangChain can handle, allowing subsequent vectorization and retrieval processes to be performed uniformly. Internally, it selects the appropriate loader class based on the file extension, supporting PDF, Word, PowerPoint, and text formats. The loader not only extracts text but also attaches metadata such as page information.
The load_document()
function automatically detects and loads file formats such as PDF, DOCX, PPTX, and TXT by leveraging various LangChain loaders:
from langchain_community.document_loaders import (
PyPDFLoader, Docx2txtLoader, TextLoader, UnstructuredPowerPointLoader
)
2. Persistent Vector DB (FAISS)
#This part forms the core of RAG. It consists of three processes—splitting, embedding, and saving—and transforms multiple uploaded documents into a searchable format.
- Splitting: Use
RecursiveCharacterTextSplitter
to chunk text while preserving natural context boundaries. - Embedding: Use the Sentence-BERT model (
all-MiniLM-L6-v2
) viaHuggingFaceEmbeddings
to convert semantics into numerical vectors. - Saving: Build the vector space using FAISS and save it locally as index files.
By combining these processes, you can achieve "persistent RAG," which allows you to reuse the same knowledge base immediately even after restarting the app. Document chunks are embedded (vectorized) and saved with FAISS in vectorstore/faiss_index/
. Metadata (UUID, file name, storage path) is stored in metadata.json
, enabling immediate reuse of the same knowledge base upon app restart.
Loading the embedding model and building the FAISS vector DB can be time-consuming. To avoid running these from scratch every time, we use Streamlit's @st.cache_resource
decorator, which safely and efficiently caches resources like models and databases.
3. Improved Retrieval Accuracy with MultiQueryRetriever
#In standard RAG, a user's question is converted into a single query for retrieval. In this program, however, we adopt LangChain's MultiQueryRetriever
, which automatically generates multiple paraphrased queries using an LLM. For example, given the question "What is quality management?", the LLM internally generates multiple queries such as:
- What is the definition of quality management?
- How does it differ from software quality assurance?
- What are the methods for maintaining and improving quality?
This approach allows matching paraphrased expressions or different contexts in the documents, greatly improving retrieval accuracy.
from langchain.retrievers.multi_query import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(
retriever=db.as_retriever(search_kwargs={"k": 2}),
llm=llm
)
The "k" in search_kwargs={"k": 2}
indicates the number of similar document chunks the retriever will fetch (top K results). In other words, it is set to retrieve only the top 2 most relevant document chunks. LangChain's retriever compares the user's question and documents in vector space and returns the top K document chunks ordered by highest cosine similarity.
Parameter | Meaning |
---|---|
k | Retrieve the top K most similar document chunks |
search_kwargs | Dictionary of parameters that control search behavior |
db.as_retriever() | Configure the vector DB (e.g., FAISS) as a retriever |
Choosing k=2 optimizes the balance between retrieval accuracy, processing speed, and token consumption:
Aspect | Large k | Small k (e.g., 2) |
---|---|---|
Retrieval Accuracy | Can reference many documents but may include irrelevant ones | Limits to highly relevant contexts |
Processing Speed | Slower response (especially in local execution) | Fast and lightweight |
Token Consumption | Increased due to multiple document inputs | Efficient with fewer tokens |
Use Cases | Large-scale RAG or multi-domain documents | Small to medium-sized RAG (ideal for this app) |
Given the assumptions of local execution, FAISS persistence, and small to medium-sized documents, it is important to balance accuracy and speed without including excessive context:
- Adjust to fit within the context length of the LLM (e.g., LM Studio / Ollama)
- Reduce processing time to return responses smoothly in the Streamlit UI
- Remove noisy documents to generate more coherent answers
Tuning guidelines:
k Value | Characteristics | Intended Use |
---|---|---|
1 | Fastest, minimal tokens. Ideal for simple contexts. | Short text, single theme |
2–3 | Good balance of accuracy and speed. | Standard RAG (recommended for this app) |
5+ | Comprehensive but slow. Suitable for long-form or encyclopedia use. | Multi-domain, long-form RAG |
4. File Deletion Operations
#When handling persisted data, it is essential to control partial deletions and rebuilds. This app implements a two-stage deletion logic:
- Individual Deletion: Delete the temporary file and metadata corresponding to a specific file ID, then rebuild the DB from the remaining data.
- Delete All: Remove the entire
vectorstore/
directory for a full reset.
These deletion operations are tied to Streamlit's session state, and st.rerun()
is used after deletion to re-render the app for immediate UI updates. The following table summarizes the operations:
Operation | Action |
---|---|
Individual Deletion | Remove a specific file and rebuild the DB with remaining data |
Delete All | Delete the entire vectorstore/ directory for a full reset |
After deletion, the UI is automatically refreshed and the state is updated.
5. Features of the Streamlit UI
#This is the front-end part that handles the app's usability. It leverages Streamlit components to create a simple yet functional chat UI. Key points include:
- Sidebar: Centralized control for file management, deletion, and list display.
- Chat Area: Use
st.chat_message
to visualize the conversation history between the user and AI in a dialog format. - Reproducibility: Leverages Streamlit's
session_state
to maintain and reset state seamlessly.
Behind the chat, a RAG chain (RetrievalQA
) performs retrieval and generation in two stages for each question.
Sidebar
- File upload (multiple files supported)
- List of currently loaded files
- Delete button for each file
- "Delete All" button
Main Screen
- Display of chat history
- Input field: "Ask a question about the documents"
- AI response and an expander panel showing "Source Documents"
How to Run
#- Start the LM Studio server at
http://localhost:1234
(make sure the LLM is loaded beforehand) - Run the following in your terminal:
streamlit run app2.py
The app will open in your browser (usually at http://localhost:8501
).
On the first launch, you will see a message saying "Please upload files." When you upload a PDF or text file, learning and persistence occur automatically.
Example: RAG with Multiple Documents
#In addition to the "Momotaro" story from the previous example, let's load the story of "Kaguya-hime" into the RAG.
The story of Kaguya-hime is as follows:
Once upon a time, an old man who made his living cutting bamboo and his wife lived together.
One day, the old man found a shining stalk of bamboo in the mountain, and when he split it open, a small girl appeared.
The old man took her home, and with his wife, named her "Kaguya-hime" and raised her with great care.
Kaguya-hime grew into a beautiful woman, and her beauty became known even in the capital.
Many nobles proposed marriage, but Kaguya-hime accepted none of their proposals and set them difficult tasks in search of precious treasures.
No one succeeded.
Eventually, even the emperor fell in love with her, but her heart belonged to the Moon Kingdom.
On the night of the full moon, a messenger from the moon came to take her back, and Kaguya-hime returned to the moon in tears.
The old man and his wife were deeply saddened and gazed at the night sky forever, thinking of Kaguya-hime.
When you upload the two texts "Momotaro" and "Kaguya-hime", the system can accurately answer questions that combine the content of both.
Example question: "What commonalities do Momotaro and Kaguya-hime share?"
The RAG system extracts similar contexts from each story and generates a comparative answer. The following answer was generated:
In the answer, the observation that "both ultimately return to their original homes" is interesting, as it links their respective endings.
(Momotaro returns to his grandfather and grandmother; Kaguya-hime returns to the moon.)
Conclusion
#In this extended version, we have evolved the local RAG environment into a more practical form.
Improvement | Benefit |
---|---|
FAISS Persistence | Fast startup without needing to rebuild the DB after restart |
MultiQueryRetriever | Improved retrieval accuracy |
File Management UI | Visual control for uploading and deleting |
Multi-Format Support | Ability to ingest a mix of PDF, DOCX, PPTX, and TXT |
This makes it possible to implement a practical knowledge assistant that runs entirely in a local environment. Next time, we plan to extend it with document summarization and classification features and support for models beyond OpenAI-compatible APIs (e.g., Ollama, Llama.cpp).