使用 Webclient 以流式下载大文件

2023-09-15

1、简介

在本教程中，我们将学习如何使用 WebClient 从服务器以流式下载一个大文件。我们通过创建一个简单的 Controller 和两个客户端进行演示。最后，我们将了解如何以及何时使用 Spring 的 DataBuffer 和 DataBufferUtils 工具类。

2、服务器

创建一个可以下载文件的简单 Controller。

首先，构建一个 FileSystemResource，传递一个文件路径，然后将其封装为 ResponseEntity 的 body：

@RestController
@RequestMapping("/large-file")
public class LargeFileController {

    @GetMapping
    ResponseEntity<Resource> get() {
        return ResponseEntity.ok()
          .body(new FileSystemResource(Paths.get("/tmp/large.dat")));
    }
}

其次，我们需要生成下载所用的示例文件，文件内容并不重要，所以我们使用 fallocate 在磁盘上生成一个指定大小的“空”内容文件。如下：

fallocate -l 128M /tmp/large.dat

然后就可以开始编写客户端了。

3、WebClient 使用 ExchangeStrategies 处理大文件

先用一个简单而有限的 WebClient 下载文件。使用 ExchangeStrategies 来提高 exchange() 操作的可用内存限制。这样，就能操作更多字节，但仍受限于 JVM 可用的最大内存。

使用 bodyToMono() 从服务器获取 Mono<byte[]>：

public class LimitedFileDownloadWebClient {

    public static long fetch(WebClient client, String destination) {
        Mono<byte[]> mono = client.get()
          .retrieve()
          .bodyToMono(byte[].class);

        byte[] bytes = mono.block();
        
        Path path = Paths.get(destination);
        Files.write(path, bytes);
        return bytes.length;
    }

    // ...
}

简而言之，上述代码整个响应内容提取到一个 byte[] 中。然后，将这些字节写入 path，并返回写入的字节数量。

创建 main() 方法进行测试：

public static void main(String... args) {
    String baseUrl = args[0];
    String destination = args[1];

    WebClient client = WebClient.builder()
      .baseUrl(baseUrl)
      .exchangeStrategies(useMaxMemory())
      .build();

    long bytes = fetch(client, destination);
    System.out.printf("downloaded %d bytes", bytes);
}

客户端需要两个命令行参数：下载 URL 和本地保存的目的地。为了避免在客户端中出现 DataBufferLimitException 异常，还需要配置一个 exchange strategy 来限制可加载到内存中的字节数。这里并未指定固定可用内存的大小，而是通过 Runtime 获取为应用程序配置的总内存大小。

注意，不建议这样做，这里只是为了演示。

private static ExchangeStrategies useMaxMemory() {
    long totalMemory = Runtime.getRuntime().maxMemory();

    return ExchangeStrategies.builder()
      .codecs(configurer -> configurer.defaultCodecs()
        .maxInMemorySize((int) totalMemory)
      )
      .build();
}

要说明的是，exchange strategy 自定义了客户端处理请求的方式。在本例中，我们使用的是 builder 中的 codecs() 方法，因此不会覆盖任何其他默认设置。

3.1、调整内存以运行客户端

随后，将把项目打包成一个 jar，放在 /tmp/app.jar 中，并在 localhost:8081 上运行服务器。然后，定义一些变量，并从命令行运行客户端：

limitedClient='com.baeldung.streamlargefile.client.LimitedFileDownloadWebClient' 
endpoint='http://localhost:8081/large-file' 
java -Xmx256m -cp /tmp/app.jar $limitedClient $endpoint /tmp/download.dat

注意，我们通过 -Xmx 参数指定了应用程序最大的可用内存为 256M。运行程序后，会成功下载到文件，输出如下：

downloaded 134217728 bytes

如果没有分配足够的内存，就会导致 OutOfMemoryError 异常，如下：

$ java -Xmx64m -cp /tmp/app.jar $limitedClient $endpoint /tmp/download.dat
reactor.netty.ReactorNetty$InternalNettyException: java.lang.OutOfMemoryError: Direct buffer memory

这种下载方式有局限性，如果文件大小超出了应用程序的可用内存大小，就会导致内存溢出异常。

4、WebClient 使用 DataBuffer 下载任意大小的文件

更安全的方法是使用 DataBuffer 和 DataBufferUtils 进行流式下载，这不会将整个文件加载到内存中。

这次我们使用 bodyToFlux() 来获取一个 Flux<DataBuffer>，将其写入我们的 path，并返回写入的字节数量：

public class LargeFileDownloadWebClient {

    public static long fetch(WebClient client, String destination) {
        Flux<DataBuffer> flux = client.get()
          .retrieve()
          .bodyToFlux(DataBuffer.class);

        Path path = Paths.get(destination);
        DataBufferUtils.write(flux, path)
          .block();

        return Files.size(path);
    }

    // ...
}

编写 main 方法，接收命令行参数、创建 WebClient 并下载文件：

public static void main(String... args) {
    String baseUrl = args[0];
    String destination = args[1];

    WebClient client = WebClient.create(baseUrl);

    long bytes = fetch(client, destination);
    System.out.printf("downloaded %d bytes", bytes);
}

这种下载方式不限制下载文件的大小。现在，我们将最大内存设为 32m 也就是下载文件大小的四分之一，然后再次运行客户端：

client='com.baeldung.streamlargefile.client.LargeFileDownloadWebClient'
java -Xmx32m -cp /tmp/app.jar $client $endpoint /tmp/download.dat

同样，也可以成功下载到完整的文件，并且不会发生异常：

downloaded 134217728 bytes

5、总结

在本文中，我们学习了如何使用 Webclient 以流式下载大文件。

参考：https://www.baeldung.com/webclient-stream-large-byte-array-to-file