构建LangChain应用程序的示例代码：26、Fireworks + LangChain 执行 RAG 的示例

通过下载 Gemma 技术报告的 PDF 文件，分割文本，将其添加到向量数据库中，并使用 RAG 技术来回答问题。这个过程展示了如何结合使用多个库和工具，以实现更高效的信息检索和生成。: 这是一个虚构的人工智能平台，它与 LangChain 集成，提供了一种使用 RAG（Retrieval-Augmented Generation）模型的方法。RAG 是一种结合了检索（Retrieval）和生成（

Hugo_Hoo

965人浏览 · 2024-06-10 00:00:00

Hugo_Hoo · 2024-06-10 00:00:00 发布

Fireworks.AI + LangChain + RAG

Fireworks AI 致力于在使用 LangChain 时提供最佳体验，以下是 Fireworks + LangChain 执行 RAG 的示例。

查看我们的模型页面获取完整模型列表。我们在此教程中使用 accounts/fireworks/models/mixtral-8x7b-instruct 作为 RAG。

对于 RAG 目标，我们将使用 Gemma 技术报告 https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf

# 安装所需的库
%pip install --quiet pypdf chromadb tiktoken openai
# 卸载 langchain-fireworks 库
%pip uninstall -y langchain-fireworks
# 安装 Fireworks 库
%pip install --editable /mnt/disks/data/langchain/libs/partners/fireworks

# 导入 fireworks 库
import fireworks
print(fireworks)
import fireworks.client

<module ‘fireworks’ from ‘/mnt/disks/data/langchain/.venv/lib/python3.9/site-packages/fireworks/init.py’>

# 加载所需的库
import requests
from langchain_community.document_loaders import PyPDFLoader

# 从 URL 下载 PDF 并保存到临时位置
url = "https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf"
response = requests.get(url, stream=True)
file_name = "temp_file.pdf"
with open(file_name, "wb") as pdf:
    pdf.write(response.content)

# 创建 PyPDFLoader 对象
loader = PyPDFLoader(file_name)
# 加载 PDF 文件内容
data = loader.load()

# 分割文本
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 创建文本分割器，设置块大小为 2000 字符，重叠部分为 0
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
# 分割文档
all_splits = text_splitter.split_documents(data)

# 添加到向量数据库
from langchain_community.vectorstores import Chroma
from langchain_fireworks.embeddings import FireworksEmbeddings

# 创建 Chroma 向量数据库，使用 FireworksEmbeddings 作为嵌入
vectorstore = Chroma.from_documents(
    documents=all_splits,
    collection_name="rag-chroma",
    embedding=FireworksEmbeddings(),
)

# 创建检索器
retriever = vectorstore.as_retriever()

# 创建 RAG 提示
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# RAG 提示模板
template = """
根据以下上下文回答问题：
{context}

问题：{question}
"""
# 从模板创建 ChatPromptTemplate 对象
prompt = ChatPromptTemplate.from_template(template)

# 使用 Together 库创建语言模型（LLM）对象
from langchain_together import Together

llm = Together(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    temperature=0.0,
    max_tokens=2000,
    top_k=1,
)

# 创建 RAG 链
chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
)

# 调用 RAG 链，回答问题
chain.invoke("Mixtral 的架构细节是什么？")

‘\nAnswer: The architectural details of Mixtral are as follows:\n- Dimension (dim): 4096\n- Number of layers (n\_layers): 32\n- Dimension of each head (head\_dim): 128\n- Hidden dimension (hidden\_dim): 14336\n- Number of heads (n\_heads): 32\n- Number of kv heads (n\_kv\_heads): 8\n- Context length (context\_len): 32768\n- Vocabulary size (vocab\_size): 32000\n- Number of experts (num\_experts): 8\n- Number of top k experts (top\_k\_experts): 2\n\nMixtral is based on a transformer architecture and uses the same modifications as described in [18], with the notable exceptions that Mixtral supports a fully dense context length of 32k tokens, and the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively. This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction of the total set of parameters per token. Mixtral is pretrained with multilingual data using a context size of 32k tokens. It either matches or exceeds the performance of Llama 2 70B and GPT-3.5, over several benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.’

追踪链接：
https://smith.langchain.com/public/935fd642-06a6-4b42-98e3-6074f93115cd/r

总结：

本文档提供了一个使用 Fireworks AI、LangChain 和 RAG（Retrieval-Augmented Generation）技术的示例。通过下载 Gemma 技术报告的 PDF 文件，分割文本，将其添加到向量数据库中，并使用 RAG 技术来回答问题。这个过程展示了如何结合使用多个库和工具，以实现更高效的信息检索和生成。

扩展知识：

Fireworks AI: 这是一个虚构的人工智能平台，它与 LangChain 集成，提供了一种使用 RAG（Retrieval-Augmented Generation）模型的方法。RAG 是一种结合了检索（Retrieval）和生成（Generation）的人工智能模型，它通过检索相关信息来增强生成文本的能力。

点击阅读全文