搬瓦工 VPS 使用 LlamaIndex 构建 RAG 检索增强生成系统教程

LlamaIndex（原 GPT Index）是一个专注于连接大语言模型与外部数据的框架。它的核心功能是帮助开发者构建检索增强生成（RAG）系统，让 LLM 能够基于私有文档和数据回答问题。与 LangChain 相比，LlamaIndex 在文档索引和检索方面提供了更专业和深入的功能。本教程将介绍在搬瓦工 VPS 上使用 LlamaIndex 构建 RAG 系统的完整流程。

一、环境准备

操作系统：Ubuntu 20.04 或更高版本。
内存：至少 2GB（框架本身），配合本地模型需要更多。
Python：Python 3.9 或更高版本。

apt update && apt upgrade -y
apt install python3 python3-pip python3-venv git -y

mkdir -p /opt/llamaindex-app && cd /opt/llamaindex-app
python3 -m venv venv
source venv/bin/activate

二、安装 LlamaIndex

# 安装核心包
pip install llama-index

# 安装常用集成
pip install llama-index-llms-openai-like  # OpenAI 兼容接口
pip install llama-index-embeddings-huggingface  # 本地嵌入模型
pip install llama-index-vector-stores-chroma  # ChromaDB 向量存储
pip install chromadb  # 向量数据库

三、基础 RAG 示例

以下是一个最简单的 RAG 应用，从本地文档中检索信息并回答问题：

# 准备示例文档
mkdir -p /opt/llamaindex-app/data
cat > /opt/llamaindex-app/data/vps-guide.txt <<'EOF'
VPS 安全配置指南：
1. 修改 SSH 默认端口，避免使用 22 端口
2. 禁用 root 密码登录，使用 SSH 密钥认证
3. 配置防火墙，只开放必要端口
4. 定期更新系统和软件包
5. 安装 fail2ban 防止暴力破解
6. 配置自动安全更新
EOF

cat > /opt/llamaindex-app/basic_rag.py <<'EOF'
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 配置本地 LLM
Settings.llm = OpenAILike(
    api_base="http://localhost:8080/v1",
    api_key="not-needed",
    model="local-model",
    is_chat_model=True,
    max_tokens=500,
    timeout=120
)

# 配置本地嵌入模型
Settings.embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# 加载文档并创建索引
documents = SimpleDirectoryReader("/opt/llamaindex-app/data").load_data()
index = VectorStoreIndex.from_documents(documents)

# 创建查询引擎并提问
query_engine = index.as_query_engine()
response = query_engine.query("如何防止 SSH 暴力破解？")
print(response)
EOF

python3 basic_rag.py

四、持久化向量索引

将索引持久化到磁盘，避免每次启动都要重新构建：

cat > /opt/llamaindex-app/persistent_index.py <<'EOF'
import os
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, Settings
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai_like import OpenAILike

Settings.llm = OpenAILike(
    api_base="http://localhost:8080/v1", api_key="not-needed",
    model="local-model", is_chat_model=True
)
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

db_path = "/opt/llamaindex-app/chroma_db"
db = chromadb.PersistentClient(path=db_path)
chroma_collection = db.get_or_create_collection("documents")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

if chroma_collection.count() == 0:
    print("首次运行，构建索引...")
    documents = SimpleDirectoryReader("/opt/llamaindex-app/data").load_data()
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
else:
    print("加载已有索引...")
    index = VectorStoreIndex.from_vector_store(vector_store)

query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("VPS 安全配置有哪些建议？")
print(response)
EOF

五、多数据源加载

LlamaIndex 支持从多种数据源加载文档：

pip install llama-index-readers-web llama-index-readers-json

cat > /opt/llamaindex-app/multi_source.py <<'EOF'
from llama_index.core import SimpleDirectoryReader

# 支持的文件格式包括：.txt, .pdf, .docx, .csv, .md 等
documents = SimpleDirectoryReader(
    input_dir="/opt/llamaindex-app/data",
    recursive=True,
    required_exts=[".txt", ".pdf", ".md"]
).load_data()

print(f"加载了 {len(documents)} 个文档")
for doc in documents:
    print(f"  - {doc.metadata.get('file_name', 'unknown')}: {len(doc.text)} 字符")
EOF

六、构建聊天引擎

聊天引擎支持多轮对话，能够记住对话上下文：

cat > /opt/llamaindex-app/chat_engine.py <<'EOF'
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = OpenAILike(
    api_base="http://localhost:8080/v1", api_key="not-needed",
    model="local-model", is_chat_model=True
)
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

documents = SimpleDirectoryReader("/opt/llamaindex-app/data").load_data()
index = VectorStoreIndex.from_documents(documents)

# 创建支持多轮对话的聊天引擎
chat_engine = index.as_chat_engine(
    chat_mode="context",
    system_prompt="你是一位 VPS 技术专家，基于提供的文档回答问题。"
)

# 多轮对话示例
response1 = chat_engine.chat("SSH 安全有哪些建议？")
print("回答1:", response1)

response2 = chat_engine.chat("能详细解释第一条吗？")
print("回答2:", response2)
EOF

七、部署为 API 服务

pip install fastapi uvicorn

cat > /opt/llamaindex-app/api_server.py <<'EOF'
from fastapi import FastAPI
from pydantic import BaseModel
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

app = FastAPI(title="LlamaIndex RAG API")

Settings.llm = OpenAILike(
    api_base="http://localhost:8080/v1", api_key="not-needed",
    model="local-model", is_chat_model=True
)
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

documents = SimpleDirectoryReader("/opt/llamaindex-app/data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=3)

class Query(BaseModel):
    question: str

@app.post("/query")
async def query_documents(q: Query):
    response = query_engine.query(q.question)
    return {"answer": str(response), "sources": [n.metadata for n in response.source_nodes]}

@app.get("/health")
async def health():
    return {"status": "ok"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8200)
EOF

八、配置 Systemd 服务

cat > /etc/systemd/system/llamaindex-api.service <<EOF
[Unit]
Description=LlamaIndex RAG API
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/opt/llamaindex-app
ExecStart=/opt/llamaindex-app/venv/bin/python api_server.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable llamaindex-api
systemctl start llamaindex-api

九、LlamaIndex 与 LangChain 对比

两个框架各有侧重：LlamaIndex 在文档索引和检索方面更专业深入，提供了丰富的索引类型和查询策略；LangChain 功能范围更广，除了 RAG 外还支持 Agent、工具调用等场景。对于专注知识库问答的项目，LlamaIndex 通常是更好的选择。

十、常见问题

嵌入模型下载缓慢

首次运行时需要从 HuggingFace 下载嵌入模型。搬瓦工的海外节点通常下载速度很好，如果遇到问题可以手动预下载模型。

查询结果不准确

调整 similarity_top_k 参数增加检索数量，或改用更高质量的嵌入模型。文档分割的 chunk_size 也会影响检索质量。

总结

LlamaIndex 为构建 RAG 知识库系统提供了专业的工具。配合搬瓦工 VPS 上的本地模型和向量数据库（如 ChromaDB 或 Milvus），可以打造完整的私有知识问答系统。选购搬瓦工 VPS 请查看全部方案，使用优惠码 NODESEEK2026 可享受 6.77% 的折扣，购买链接：bwh81.net。