Azure AI 服务

本站(springdoc.cn)中的内容来源于 spring.io ，原始版权归属于 spring.io。由 springdoc.cn 进行翻译，整理。可供个人学习、研究，未经许可，不得进行任何转载、商用或与之相关的行为。商标声明：Spring 是 Pivotal Software, Inc. 在美国以及其他国家的商标。

本节将指导你配置 AzureVectorStore 以存储文档向量，并通过 Azure AI 搜索服务执行相似性搜索。

Azure AI Search 是微软 AI 平台中的多功能云托管信息检索系统，支持基于向量的存储与查询等特性。

先决条件

Azure 订阅：使用任何 Azure 服务需具备有效的 Azure 订阅。
Azure AI Search Service：创建 AI 搜索服务。服务创建后，从 “设置” 下的 “密钥” 部分获取管理员 apiKey，并从 “概览” 部分的 “Url” 字段获取端点。
（可选）Azure OpenAI 服务：创建 Azure OpenAI 服务。注意：可能需要填写单独申请表才能访问 Azure OpenAI 服务。服务创建后，从“资源管理”下的 “密钥和端点” 部分获取 apiKey 和端点。

配置

启动时，若通过构造函数设置 initialize-schema 布尔属性为 true（或使用 Spring Boot 时在 application.properties 文件中设置 …initialize-schema=true），AzureVectorStore 将尝试在 AI 搜索服务实例中创建新索引。

此为重大变更！在早期 Spring AI 版本中，此初始化默认为启用状态。

或者，你也可以手动创建索引。

要设置 AzureVectorStore，需要从上述先决条件中获取的设置以及索引名称：

Azure AI Search Endpoint
Azure AI Search Key
（可选） Azure OpenAI API Endpoint
（可选） Azure OpenAI API Key

可通过操作系统环境变量提供这些值。

export AZURE_AI_SEARCH_API_KEY=<My AI Search API Key>
export AZURE_AI_SEARCH_ENDPOINT=<My AI Search Index>
export OPENAI_API_KEY=<My Azure AI API Key> (Optional)

可替换 Azure OpenAI 实现为任何支持 Embeddings 接口的有效 OpenAI 实现。例如，可使用 Spring AI 的 Open AI 或 TransformersEmbedding 实现替代 Azure 实现来完成向量嵌入。

依赖

Spring AI 自动配置及 Sarter 模块的 Artifact 名称已发生重大变更。更多信息请参阅升级说明。

将以下依赖添加至项目中：

1. 选择 Embeddings 接口实现。你可以选择：

OpenAI Embedding
Azure AI Embedding
Local Sentence Transformers Embedding

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

<dependency>
 <groupId>org.springframework.ai</groupId>
 <artifactId>spring-ai-starter-model-azure-openai</artifactId>
</dependency>

<dependency>
 <groupId>org.springframework.ai</groupId>
 <artifactId>spring-ai-starter-model-transformers</artifactId>
</dependency>

2. Azure （AI Search）向量存储

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-store</artifactId>
</dependency>

请参考 “依赖管理” 部分将 Spring AI BOM 添加至构建文件。

配置属性

可通过以下 Spring Boot 配置属性自定义 Azure 向量存储：

属性默认值

属性	默认值
`spring.ai.vectorstore.azure.url`
`spring.ai.vectorstore.azure.api-key`
`spring.ai.vectorstore.azure.useKeylessAuth`	false
`spring.ai.vectorstore.azure.initialize-schema`	false
`spring.ai.vectorstore.azure.index-name`	spring_ai_azure_vector_store
`spring.ai.vectorstore.azure.default-top-k`	4
`spring.ai.vectorstore.azure.default-similarity-threshold`	0.0
`spring.ai.vectorstore.azure.embedding-property`	embedding
`spring.ai.vectorstore.azure.index-name`	spring-ai-document-index

spring.ai.vectorstore.azure.url

spring.ai.vectorstore.azure.api-key

spring.ai.vectorstore.azure.useKeylessAuth

false

spring.ai.vectorstore.azure.initialize-schema

false

spring.ai.vectorstore.azure.index-name

spring_ai_azure_vector_store

spring.ai.vectorstore.azure.default-top-k

spring.ai.vectorstore.azure.default-similarity-threshold

0.0

spring.ai.vectorstore.azure.embedding-property

embedding

spring.ai.vectorstore.azure.index-name

spring-ai-document-index

示例代码

配置应用中 Azure SearchIndexClient 的示例代码如下：

@Bean
public SearchIndexClient searchIndexClient() {
  return new SearchIndexClientBuilder().endpoint(System.getenv("AZURE_AI_SEARCH_ENDPOINT"))
    .credential(new AzureKeyCredential(System.getenv("AZURE_AI_SEARCH_API_KEY")))
    .buildClient();
}

要创建向量存储，你可以使用以下代码，注入在上述示例中创建的 SearchIndexClient Bean 和 Spring AI 库提供的 EmbeddingModel，后者实现了所需的 Embeddings 接口。

@Bean
public VectorStore vectorStore(SearchIndexClient searchIndexClient, EmbeddingModel embeddingModel) {

  return AzureVectorStore.builder(searchIndexClient, embeddingModel)
    .initializeSchema(true)
    // Define the metadata fields to be used
    // in the similarity search filters.
    .filterMetadataFields(List.of(MetadataField.text("country"), MetadataField.int64("year"),
            MetadataField.date("activationDate")))
    .defaultTopK(5)
    .defaultSimilarityThreshold(0.7)
    .indexName("spring-ai-document-index")
    .build();
}

对于过滤表达式中使用的任何元数据字段，必须明确列出所有元数据字段的名称和类型。上面的列表注册了可过滤的元数据字段：TEXT 类型的 country、INT64 类型的 year 和 BOOLEAN 类型的 active。

如果可过滤的元数据字段扩展了新条目，则必须（重新）上传/更新包含这些元数据的文件。

在代码中创建一些文档：

List<Document> documents = List.of(
	new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("country", "BG", "year", 2020)),
	new Document("The World is Big and Salvation Lurks Around the Corner"),
	new Document("You walk forward facing the past and you turn back toward the future.", Map.of("country", "NL", "year", 2023)));

将文档添加到你的向量存储中：

vectorStore.add(documents);

最后，检索与查询相似的文档：

List<Document> results = vectorStore.similaritySearch(
    SearchRequest.builder()
      .query("Spring")
      .topK(5).build());

如果一切顺利，你应该可以检索到包含文本 “Spring AI rocks!!” 的文档。

元数据过滤

你还可以利用 AzureVectorStore 的通用、可移植元数据过滤器。

例如，你可以使用文本表达式语言：

vectorStore.similaritySearch(
   SearchRequest.builder()
      .query("The World")
      .topK(TOP_K)
      .similarityThreshold(SIMILARITY_THRESHOLD)
      .filterExpression("country in ['UK', 'NL'] && year >= 2020").build());

或使用 DSL 表达式进行编程：

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(
    SearchRequest.builder()
      .query("The World")
      .topK(TOP_K)
      .similarityThreshold(SIMILARITY_THRESHOLD)
      .filterExpression(b.and(
         b.in("country", "UK", "NL"),
         b.gte("year", 2020)).build()).build());

可移植的过滤器表达式会自动转换为专有的 Azure 搜索 OData 过滤器。例如，以下可移植的过滤器表达式：

country in ['UK', 'NL'] && year >= 2020

会被转换成下面的 Azure OData 过滤器表达式：

$filter search.in(meta_country, 'UK,NL', ',') and meta_year ge 2020

访问原生客户端

Azure 向量存储实现可通过 getNativeClient() 方法访问底层本地 Azure Search 客户端（SearchClient）：

AzureVectorStore vectorStore = context.getBean(AzureVectorStore.class);
Optional<SearchClient> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    SearchClient client = nativeClient.get();
    // Use the native client for Azure Search-specific operations
}

原生客户端可让你访问特定于 Azure Search 的功能和操作，这些功能和操作可能无法通过 VectorStore 接口公开。