Pinecone

本站(springdoc.cn)中的内容来源于 spring.io ，原始版权归属于 spring.io。由 springdoc.cn 进行翻译，整理。可供个人学习、研究，未经许可，不得进行任何转载、商用或与之相关的行为。商标声明：Spring 是 Pivotal Software, Inc. 在美国以及其他国家的商标。

本节将指导你设置 Pinecone VectorStore 用于存储文档嵌入并执行相似性搜索。

Pinecone 是一个流行的云原生向量数据库，支持高效存储和搜索向量。

先决条件

Pinecone 账户：开始前请先注册 Pinecone 账户。
Pinecone 项目：注册后生成 API Key 并创建索引。配置时需要这些信息。
用于计算文档嵌入的 EmbeddingModel 实例。可选方案包括：
- 如需生成 PineconeVectorStore 存储的嵌入向量，需为 EmbeddingModel 配置 API Key。

要设置 PineconeVectorStore，请从 Pinecone 账户获取以下信息：

Pinecone API Key
Pinecone Index Name（索引名）
Pinecone Namespace（命名空间）

该信息可在 Pinecone 管理界面查看。命名空间功能在 Pinecone 免费版中不可用。

自动配置

Spring AI 自动配置、Starter 模块的工件名称发生了重大变化。更多信息请参阅升级说明。

Spring AI 为 Pinecone 向量存储提供了 Spring Boot 自动配置功能。要启用它，请在项目的 Maven pom.xml 文件中添加以下依赖项：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-pinecone</artifactId>
</dependency>

或添加到你的 Gradle build.gradle 构建文件中：

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-vector-store-pinecone'
}

请参阅 “依赖关系管理”部分，将 Spring AI BOM 添加到构建文件中。

请参阅 “工件库” 部分，将 Maven Central 和/或快照工件库添加到构建文件中。

此外，你还需要一个已配置的 EmbeddingModel Bean。有关详细信息，请参阅 EmbeddingModel 部分。

下面是所需 Bean 的示例：

@Bean
public EmbeddingModel embeddingModel() {
    // Can be any other EmbeddingModel implementation.
    return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}

要连接到 Pinecone，你需要提供实例的访问详细信息。简单的配置可以通过 Spring Boot 的 application.properties 提供、

spring.ai.vectorstore.pinecone.apiKey=<your api key>
spring.ai.vectorstore.pinecone.index-name=<your index name>

# API key if needed, e.g. OpenAI
spring.ai.openai.api.key=<api-key>

请查看向量存储的配置参数列表，了解默认值和配置选项。

现在，你可以在应用中自动装配 Pinecone 向量存储并使用它了

@Autowired VectorStore vectorStore;

// ...

List <Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = this.vectorStore.similaritySearch(SearchRequest.builder().query("Spring").topK(5).build());

配置属性

你可以在 Spring Boot 配置中使用以下属性自定义 Pinecone 向量存储。

属性说明默认值

属性	说明	默认值
`spring.ai.vectorstore.pinecone.api-key`	Pinecone API Key	-
`spring.ai.vectorstore.pinecone.index-name`	Pinecone index name	-
`spring.ai.vectorstore.pinecone.namespace`	Pinecone namespace	-
`spring.ai.vectorstore.pinecone.content-field-name`	用于存储原始文本内容的 Pinecone 元数据字段名称.	`document_content`
`spring.ai.vectorstore.pinecone.distance-metadata-field-name`	用于存储计算距离的 Pinecone 元数据字段名。	`distance`
`spring.ai.vectorstore.pinecone.server-side-timeout`	服务器方超时时间	20 sec.

spring.ai.vectorstore.pinecone.api-key

Pinecone API Key

spring.ai.vectorstore.pinecone.index-name

Pinecone index name

spring.ai.vectorstore.pinecone.namespace

Pinecone namespace

spring.ai.vectorstore.pinecone.content-field-name

用于存储原始文本内容的 Pinecone 元数据字段名称.

document_content

spring.ai.vectorstore.pinecone.distance-metadata-field-name

用于存储计算距离的 Pinecone 元数据字段名。

distance

spring.ai.vectorstore.pinecone.server-side-timeout

服务器方超时时间

20 sec.

元数据过滤

你可以通过 Pinecone 存储使用通用的、可移植的元数据过滤器。

例如，可以使用以下文本表达式语言：

vectorStore.similaritySearch(
    SearchRequest.builder()
    .query("The World")
    .topK(TOP_K)
    .similarityThreshold(SIMILARITY_THRESHOLD)
    .filterExpression("author in ['john', 'jill'] && article_type == 'blog'").build());

或通过 Filter.Expression DSL 编程实现：

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.builder()
    .query("The World")
    .topK(TOP_K)
    .similarityThreshold(SIMILARITY_THRESHOLD)
    .filterExpression(b.and(
        b.in("author","john", "jill"),
        b.eq("article_type", "blog")).build()).build());

这些过滤器表达式会被转换为等效的 Pinecone 过滤器。

手动配置

若倾向手动配置 PineconeVectorStore，可通过 PineconeVectorStore#Builder 实现。

添加如下依赖到项目：

OpenAI：用于计算嵌入向量（必需）。

<dependency>
	<groupId>org.springframework.ai</groupId>
	<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

Pinecone

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-pinecone-store</artifactId>
</dependency>

参考 “依赖管理” 章节将 Spring AI BOM 添加至构建文件。

示例代码

要在应用中配置 Pinecone，可使用以下设置：

@Bean
public VectorStore pineconeVectorStore(EmbeddingModel embeddingModel) {
    return PineconeVectorStore.builder(embeddingModel)
            .apiKey(PINECONE_API_KEY)
            .indexName(PINECONE_INDEX_NAME)
            .namespace(PINECONE_NAMESPACE) // the free tier doesn't support namespaces.
            .contentFieldName(CUSTOM_CONTENT_FIELD_NAME) // optional field to store the original content. Defaults to `document_content`
            .build();
}

在代码中，创建一些文档：

List<Document> documents = List.of(
	new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
	new Document("The World is Big and Salvation Lurks Around the Corner"),
	new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

添加文档到 Pinecone：

vectorStore.add(documents);

最后，检索与查询相似的文档：

List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").topK(5).build());

如果一切顺利，你应该能够检索到包含文本 “Spring AI rocks!!” 的文档。

访问原生客户端

Pinecone 向量存储实现通过 getNativeClient() 方法提供了对底层原生 Pinecone 客户端（PineconeConnection）的访问：

PineconeVectorStore vectorStore = context.getBean(PineconeVectorStore.class);
Optional<PineconeConnection> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    PineconeConnection client = nativeClient.get();
    // Use the native client for Pinecone-specific operations
}

原生客户端使你可以访问 VectorStore 接口可能未暴露的、Pinecone 特有的功能和操作。