Sign In
Free Sign Up
  • English
  • Español
  • 简体中文
  • Deutsch
  • 日本語
Sign In
Free Sign Up
  • English
  • Español
  • 简体中文
  • Deutsch
  • 日本語

使用Claude 3和Hugging Face在10分钟内构建RAG应用程序

在人工智能的进步领域中,RAG应用程序作为改变各个行业的转型工具而脱颖而出。这些应用程序通过增强数据分析能力和预测功能,为企业提供巨大的价值。RAG应用程序利用检索和生成模型的组合,简化了信息检索 (opens new window)过程,使研究人员能够快速访问各个领域的丰富知识。

各行各业的公司越来越倾向于使用RAG来提升他们的人工智能潜力。RAG应用程序中检索和生成组件的融合,有助于分析大量的客户数据,从中获取有针对性的洞察力,推动个性化推荐。这种协同作用使企业能够自信地利用生成式人工智能 (opens new window)技术,用于各种应用。此外,RAG应用程序的架构将信息检索与文本生成 (opens new window)模型相结合,为有效响应用户查询提供了对数据库的访问权限。

通过使用嵌入量化 (opens new window)和检索增强微调(RAFT)等技术对大型语言模型进行微调,开发人员可以提高RAG应用程序的鲁棒性和效率。

# Claude如何增强RAG模型

Anthropic开发的Claude 3 (opens new window)是一系列先进的AI模型,为准确性和上下文处理设定了新的标准。Claude 3系列包括三个模型:Haiku、Sonnet和Opus,每个模型都针对不同的需求进行了设计。这些模型以其高度的事实准确性和在长时间对话中保持上下文的能力而闻名。值得注意的是,Opus模型在包括研究生水平推理和基本数学测试在内的多个基准测试中都优于GPT-4。Claude 3可以处理多达200,000个标记的上下文窗口,比竞争对手更有效地处理大量文本输入。

LLM Benchmark

除了性能方面,Claude 3模型还注重道德AI的使用和安全性。它们旨在减少偏见,保护用户隐私,并对复杂提示提供准确的响应。这些模型还支持文本和视觉输入,使它们在内容创作和数据分析等各种应用中具有多样性。通过与各种平台建立强大的合作伙伴关系和可访问性,Claude 3模型在多个领域中具有创新和高效的位置。

# 为什么选择Hugging Face作为RAG模型

Hugging Face (opens new window)是一个领先的平台,以其广泛的预训练模型 (opens new window)库而闻名,这些模型针对各种自然语言处理任务进行了优化,包括检索增强生成(RAG) (opens new window)。这些模型涵盖了密集检索、序列到序列生成和其他高级任务,使开发人员能够轻松设置和尝试不同的RAG配置。平台提供的用户友好的API和全面的文档进一步简化了集成和微调过程,使即使对RAG模型不熟悉的人也能够快速实施有效的系统。通过利用Hugging Face的工具,如LangChain和Ray,开发人员可以构建可扩展和高性能的RAG系统,有效处理大规模数据和复杂查询。

在本博客中,我们将重点介绍使用Hugging Face的嵌入模型。Hugging Face提供了多种嵌入模型,这些模型对于RAG系统的初始步骤至关重要:检索相关文档。这些模型旨在生成高质量的嵌入,这对于准确和高效的信息检索至关重要。通过使用Hugging Face的嵌入模型,我们可以确保我们的RAG系统检索到与任何给定查询最相关的上下文,从而提高生成响应的整体准确性和性能。这个选择使我们能够利用Hugging Face的产品的稳健性和多功能性,同时根据嵌入生成的特定需求来定制我们的实现。

Boost Your AI App Efficiency now
Sign up for free to benefit from 150+ QPS with 5,000,000 vectors
Free Trial
Explore our product

# 使用Claude 3和Hugging Face嵌入构建RAG应用程序

现在,让我们使用Claude 3和Hugging Face嵌入模型构建一个RAG应用程序。我们将逐步介绍设置环境、加载文档、创建嵌入和使用Claude 3模型从检索到的文档中生成答案的步骤。

# 设置环境

首先,我们需要安装必要的库。取消下面的行的注释并运行它以安装所需的软件包。如果库已安装在您的系统上,则可以跳过此步骤。

pip install langchain sentence-transformers anthropic

这将安装我们的RAG系统所需的langchainsentence-transformersanthropic库。

# 设置环境变量

接下来,我们需要设置MyScale和Claude 3 API连接的环境变量。我们将使用MyScaleDB (opens new window)作为此RAG应用程序的向量数据库,因为它提供了高性能、可扩展和高效的数据检索能力。

import os
# 设置向量数据库连接
os.environ["MYSCALE_HOST"] = "your-host-name"
os.environ["MYSCALE_PORT"] = "443"
os.environ["MYSCALE_USERNAME"] = "your-user-name"
os.environ["MYSCALE_PASSWORD"] = "your-password-here"

# 设置Claude 3的API密钥
os.environ["ANTHROPIC_API_KEY"] = "your-claude-api-key-here"

此脚本设置了连接到MyScale向量数据库和Claude 3 API所需的环境变量。

注意:

如果您没有MyScaleDB帐户,您可以访问MyScale网站 (opens new window)注册一个免费帐户,并按照快速入门 (opens new window)指南操作。 要使用Claude 3 API,您需要在Claude控制台 (opens new window)上创建一个帐户。

# 导入必要的库

然后,我们导入所需的库,用于加载文档、拆分、嵌入和与Claude 3模型交互。

# 用于将文档拆分成块
from langchain.text_splitter import RecursiveCharacterTextSplitter
# 用于将MyScale用作向量数据库
from langchain_community.vectorstores import MyScale
# 用于使用Hugging Face进行嵌入
from langchain_huggingface import HuggingFaceEmbeddings
# 用于加载维基百科页面
from langchain_community.document_loaders.wikipedia import WikipediaLoader

import anthropic

这些库将帮助我们管理文档、创建嵌入并与Claude 3模型交互。

# 初始化Claude 3客户端

初始化与Claude 3 API通信的Claude 3客户端。

client = anthropic.Anthropic()

# 加载文档

对于这个示例,我们将使用WikipediaLoader从维基百科加载文档。

loader = WikipediaLoader(query="Fifa")

# 加载文档
docs = loader.load()

这将从维基百科加载与"Fifa"相关的文档。

# 拆分文档

我们使用RecursiveCharacterTextSplitter将加载的文档拆分成可管理的块。

character_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
docs = character_splitter.split_documents(docs)

这样可以确保将文档拆分为1000个字符的块,并且重叠部分为200个字符。

# 创建嵌入

接下来,我们使用Hugging Face嵌入模型为我们的文档块创建嵌入。

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
docsearch = MyScale(embedding_function=embeddings.embed_documents)
docsearch.add_documents(docs)

这一步为文档块生成嵌入,并将它们添加到我们的MyScale向量存储中。

# 测试文档搜索

我们测试文档搜索功能,以确保我们的嵌入和向量存储正常工作。

query = "Who won fifa Fifa 2022?"
docs = docsearch.similarity_search(query, 3)
docs

这将对查询"Who won Fifa 2022?"执行相似性搜索,并检索出最相关的3个文档。它将返回类似以下的文档:

[Document(page_content="The FIFA World Cup, often called the World Cup, is an international association football competition among the senior men's national teams of the members of the Fédération Internationale de Football Association (FIFA), the sport's global governing body. The tournament has been held every four years since the inaugural tournament in 1930, with the exception of 1942 and 1946 due to the Second World War. The reigning champions are Argentina, who won their third title at the 2022 tournament.\\nThe contest starts with the qualification phase, which takes place over the preceding three years to determine which teams qualify for the tournament phase. In the tournament phase, 32 teams compete for the title at venues within the host nation(s) over the course of about a month. The host nation(s) automatically qualify for the group stage of the tournament. The competition is scheduled to expand to 48 teams, starting with the 2026 tournament.", metadata={'source': '<https://en.wikipedia.org/wiki/FIFA_World_Cup>', 'summary': "The FIFA World Cup, often called the World Cup, is an international association football competition among the senior men's national teams of the members of the Fédération Internationale de Football Association (FIFA), the sport's global governing body. The tournament has been held every four years since the inaugural tournament in 1930, with the exception of 1942 and 1946 due to the Second World War. The reigning champions are Argentina, who won their third title at the 2022 tournament.\\nThe contest starts with the qualification phase, which takes place over the preceding three years to determine which teams qualify for the tournament phase. In the tournament phase, 32 teams compete for the title at venues within the host nation(s) over the course of about a month. The host nation(s) automatically qualify for the group stage of the tournament. The competition is scheduled to expand to 48 teams, starting with the 2026 tournament.\\nAs of the 2022 FIFA World Cup, 22 final tournaments have been held since the event's inception in 1930, and a total of 80 national teams have competed. The trophy has been won by eight national teams. With five wins, Brazil  is the only team to have played in every tournament. The other World Cup winners are Germany and Italy, with four titles each; Argentina, with three titles; France and inaugural winner Uruguay, each with two titles; and England and Spain, with one title each.\\nThe World Cup is the most prestigious association football tournament in the world, as well as the most widely viewed and followed single sporting event in the world. The viewership of the 2018 World Cup was estimated to be 3.57 billion, close to half of the global population, while the engagement with the 2022 World Cup was estimated to be 5 billion, with about 1.5 billion people watching the final match.\\nSeventeen countries have hosted the World Cup, most recently Qatar, who hosted the 2022 event. The 2026 tournament will be jointly hosted by Canada, the United States and Mexico, which will give Mexico the distinction of being the first country to host games in three World Cups.\\n\\n", 'title': 'FIFA World Cup'}),
 Document(page_content="The 2026 FIFA World Cup, marketed as FIFA World Cup 26, will be the 23rd FIFA World Cup, the quadrennial international men's soccer championship contested by the national teams of the member associations of FIFA. The tournament will take place from June 11 to July 19, 2026. It will be jointly hosted by 16 cities in three North American countries: Canada, Mexico, and the United States. The tournament will be the first hosted by three nations and the first North American World Cup since 1994. Argentina is the defending champion.", metadata={'source': '<https://en.wikipedia.org/wiki/2026_FIFA_World_Cup>', 'summary': "The 2026 FIFA World Cup, marketed as FIFA World Cup 26, will be the 23rd FIFA World Cup, the quadrennial international men's soccer championship contested by the national teams of the member associations of FIFA. The tournament will take place from June 11 to July 19, 2026. It will be jointly hosted by 16 cities in three North American countries: Canada, Mexico, and the United States. The tournament will be the first hosted by three nations and the first North American World Cup since 1994. Argentina is the defending champion.\\nThis tournament will be the first to include 48 teams, expanded from 32. The United 2026 bid beat a rival bid by Morocco during a final vote at the 68th FIFA Congress in Moscow. It will be the first World Cup since 2002 to be hosted by more than one nation. With its past hosting of the 1970 and 1986 tournaments, Mexico will become the first country to host or co-host the men's World Cup three times. The United States last hosted the men's World Cup in 1994, whereas it will be Canada's first time hosting or co-hosting the men's tournament. The event will also return to its traditional northern summer schedule after the 2022 edition in Qatar was held in November and December.", 'title': '2026 FIFA World Cup'}),
 Document(page_content="The 2022 FIFA World Cup was the 22nd FIFA World Cup, the world championship for national football teams organized by FIFA. It took place in Qatar from 20 November to 18 December 2022, after the country was awarded the hosting rights in 2010. It was the first World Cup to be held in the Arab world and Muslim world, and the second held entirely in Asia after the 2002 tournament in South Korea and Japan.\\nThis tournament was the last with 32 participating teams, with the number of teams being increased to 48 for the 2026 edition. To avoid the extremes of Qatar's hot climate, the event was held in November and December instead of during the traditional months of May, June, or July. It was held over a reduced time frame of 29 days with 64 matches played in eight venues across five cities. Qatar entered the event—their first World Cup—automatically as the host's national team, alongside 31 teams determined by the qualification process.", metadata={'source': '<https://en.wikipedia.org/wiki/2022_FIFA_World_Cup>', 'summary': "The 2022 FIFA World Cup was the 22nd FIFA World Cup, the world championship for national football teams organized by FIFA. It took place in Qatar from 20 November to 18 December 2022, after the country was awarded the hosting rights in 2010. It was the first World Cup to be held in the Arab world and Muslim world, and the second held entirely in Asia after the 2002 tournament in South Korea and Japan.\\nThis tournament was the last with 32 participating teams, with the number of teams being increased to 48 for the 2026 edition. To avoid the extremes of Qatar's hot climate, the event was held in November and December instead of during the traditional months of May, June, or July. It was held over a reduced time frame of 29 days with 64 matches played in eight venues across five cities. Qatar entered the event—their first World Cup—automatically as the host's national team, alongside 31 teams determined by the qualification process.\\nArgentina were crowned the champions after winning the final against the title holder France 4–2 on penalties following a 3–3 draw after extra time. It was Argentina's third title and their first since 1986, as well as being the first nation from outside of Europe to win the tournament since 2002. French player Kylian Mbappé became the first player to score a hat-trick in a World Cup final since Geoff Hurst in the 1966 final and won the Golden Boot as he scored the most goals (eight) during the tournament. Mbappé also became the first player to score in two consecutive finals since Vavá of Brazil did the same in 1958 and 1962. Argentine captain Lionel Messi was voted the tournament's best player, winning the Golden Ball. The tournament has been considered exceptionally poetic as the capstone of his career, for some commentators fulfilling a previously unmet criterion to be regarded as one of the greatest players of all time. Teammates Emiliano Martínez and Enzo Fernández won the Golden Glove, awarded to the tournament's best goalkeeper; and the Young Player Award, awarded to the tournament's best young player, respectively. With 172 goals, the tournament set a record for the highest number of goals scored in the 32-team format, with every participating team scoring at least one goal. Morocco became the first African nation to top Group stages with 7 points.\\nThe choice to host the World Cup in Qatar attracted significant criticism, with concerns raised over the country's treatment of migrant workers, women, and members of the LGBT community, as well as Qatar's climate, lack of a strong football culture, scheduling changes, and allegations of bribery for hosting rights and wider FIFA corruption.", 'title': '2022 FIFA World Cup'})]

现在,下一步是使用Claude 3模型从检索到的上下文中获取所需的结果。

# 准备查询上下文

我们为Claude 3模型准备从检索到的文档中的上下文。

stre = "".join(doc.page_content for doc in docs)

这将将检索到的文档的内容连接成一个字符串。

# 设置模型

指定我们要用于生成答案的模型。

model = 'claude-3-opus-20240229'

# 生成答案

最后,我们使用Claude 3模型根据提供的上下文生成答案。

response = client.messages.create(
        system =  "You are a helpful research assistant. You will be shown data from a vast knowledge base. You have to answer the query from the provided context.",
        messages=[
                    {"role": "user", "content":  "Context: " + stre + "\\\\n\\\\n Query: " + query},
                ],
        model= model,
        temperature=0,
        max_tokens=160
    )
response.content[0].text

这个脚本将上下文和查询发送给Claude 3模型,并检索生成的响应。您将获得类似以下的输出:

'根据提供的上下文,阿根廷赢得了2022年FIFA世界杯,成为卫冕冠军。段落中提到:\\n\\n"卫冕冠军是阿根廷,在2022年的比赛中赢得了他们的第三个冠军。"'

通过按照这些步骤,我们构建了一个RAG应用程序,该应用程序使用Hugging Face嵌入和Claude 3模型从知识库中检索和生成答案。

# 为什么选择MyScaleDB作为RAG模型

MyScaleDB是RAG应用程序的最佳选择,因为它结合了先进的向量搜索功能和SQL支持。它建立在ClickHouse之上,可以无缝集成向量和结构化数据,实现复杂和高效的查询。这种集成简化了数据管理,提高了RAG系统的性能和准确性,使其更容易高效地处理大量数据。

MyScaleDB的另一个重要优势是其成本效益和可扩展性。它被设计为市场上最实惠的向量数据库之一,同时与竞争对手相比提供卓越的性能。它提供了诸如多尺度树图(MSTG)之类的先进向量索引技术,减少资源消耗并提高精度。动态批处理和多线程支持高并发请求,保持低延迟和高吞吐量。这些功能使MyScaleDB成为可靠和高效的数据库解决方案,适用于可扩展的RAG应用程序。

Join Our Newsletter

# 结论

开发简单的RAG应用程序相对比较简单,但优化它们并保持其响应的准确性是具有挑战性的。这就是为什么正确选择工具(如LLMs、向量数据库和嵌入模型)至关重要。每个组件在确保应用程序高效运行并提供准确和相关响应方面都起着至关重要的作用。

向量数据库的选择尤其重要,因为它为LLM提供了必要的上下文。MyScaleDB (opens new window)已经证明在性能和准确性方面超越了领先的向量数据库 (opens new window)。此外,MyScaleDB为新用户提供了500万个免费的向量存储空间,使他们能够测试其功能并亲身体验其好处。这使它成为那些希望构建高性能RAG系统的人的绝佳选择。

Keep Reading
popular
images
SQL向量数据库正在塑造新的LLM和大数据范式

强大的大型语言模型(LLM)如GPT-4、Gemini 1.5和Claude 3在人工智能和技术领域产生了巨大影响。一些模型能够处理超过100万个标记,它们处理长上下文的能力令人印象深刻。然而: 许多数据结构对LLM来 ...

images
MyScale引入容量优化Pod

大型语言模型(LLM)应用,例如检索增强生成(RAG)应用,正在推动向量数据库的使用激增。许多基于LLM的应用程序需要大规模的数据集和高查询每秒(QPS)速率才能实现最佳性能。然而,大多数向量数据库供应商更注重存储容量而非QPS。 MyScale的MSTG算法是一种新颖的向量存储和 ...

images
使用MyScaleDB和LlamaIndex构建高级RAG应用程序

大型语言模型(LLM)以其理解和生成类似人类的文本的能力带来了巨大的价值。然而,这些模型也面临着显著的挑战。它们是在需要大量成本和时间的庞大数据集上进行训练的。在大规模数据集上训练这些模型所需的巨大成本和时间几乎不可能定期重新训练它们。这个限制意味着它们经常缺乏最新数据的更新,导致在查询不熟悉的主题时可能出现不准确的情况。这种现象被称为“幻觉”,它可能会降低应用程序的性能,并引发对其可靠性和真实性 ...

Start building your Al projects with MyScale today

Free Trial
Contact Us