使用 Spring 和 Hibernate 进行表分区

简介

本文将带你了解如何使用 Spring 和 Hibernate 实现表分区。

表分区的目标是将一个大型表分割为多个较小的分区表,以便关联的表和索引记录可以放入内存缓冲池,从而实现更高效的查找或扫描操作。

使用 PostgreSQL 进行表分区

PostgreSQL 为 表分区 提供了三种策略:

  • 列表分区(List Partitioning)
  • 范围分区(Range Partitioning)
  • Hash 分区(Hash Partitioning)

本例使用列表分区,按大洲来划分数据表。

例如,users 分区如下:

CREATE TABLE users (
    id bigint NOT NULL,
    first_name varchar(255),
    last_name varchar(255),
    registered_on timestamp(6),
    partition_key varchar(255),
    PRIMARY KEY (id, partition_key)
)
PARTITION BY LIST (partition_key)
 
CREATE TABLE users_asia
PARTITION OF users FOR VALUES IN ('Asia')
 
CREATE TABLE users_africa
PARTITION OF users FOR VALUES IN ('Africa')
 
CREATE TABLE users_north_america
PARTITION OF users FOR VALUES IN ('North America')
 
CREATE TABLE users_south_america
PARTITION OF users FOR VALUES IN ('South America')
 
CREATE TABLE users_europe
PARTITION OF users FOR VALUES IN ('Europe')
 
CREATE TABLE users_australia
PARTITION OF users FOR VALUES IN ('Australia')

posts 表分区如下:

CREATE TABLE posts (
    id bigint NOT NULL,
    title varchar(255),
    created_on timestamp(6),
    user_id bigint,
    partition_key varchar(255),
    PRIMARY KEY (id, partition_key)
)
PARTITION BY LIST (partition_key)
 
CREATE TABLE posts_asia
PARTITION OF posts FOR VALUES IN ('Asia')
 
CREATE TABLE posts_africa
PARTITION OF posts FOR VALUES IN ('Africa')
 
CREATE TABLE posts_north_america
PARTITION OF posts FOR VALUES IN ('North America')
 
CREATE TABLE posts_south_america
PARTITION OF posts FOR VALUES IN ('South America')
 
CREATE TABLE posts_europe
PARTITION OF posts FOR VALUES IN ('Europe')
 
CREATE TABLE posts_australia
PARTITION OF posts FOR VALUES IN ('Australia')
 
ALTER TABLE IF EXISTS posts
ADD CONSTRAINT fk_posts_user_id
FOREIGN KEY (user_id, partition_key)
REFERENCES users

用 JPA 和 Hibernate 映射表分区

使用 Spring 和 Hibernate 实现表分区需要在读写分区表的记录时提供分区 Key。

首先,创建一个 PartitionAware 基类,每个映射到分区表的 JPA 实体都将继承该基类:

@MappedSuperclass
@FilterDef(
    name = PartitionAware.PARTITION_KEY,
    parameters = @ParamDef(
        name = PartitionAware.PARTITION_KEY,
        type = String.class
    )
)
@Filter(
    name = PartitionAware.PARTITION_KEY,
    condition = "partition_key = :partitionKey"
)
public abstract class PartitionAware<T extends PartitionAware> {
 
    public static final String PARTITION_KEY = "partitionKey";
 
    @Column(name = "partition_key")
    @PartitionKey
    private String partitionKey;
 
    public String getPartitionKey() {
        return partitionKey;
    }
 
    public T setPartitionKey(String partitionKey) {
        this.partitionKey = partitionKey;
        return (T) this;
    }
 
    public T setPartition(Partition partition) {
        this.partitionKey = partition.getKey();
        return (T) this;
    }
}
  • @Filter 注解允许动态启用分区过滤功能
  • @PartitionKey 注解是在 Hibernate 6.2 中引入的,它允定义一个分区 key。

Partition 对象是一个枚举,用于定义应用支持的分区:

public enum Partition {
    ASIA("Asia"),
    AFRICA("Africa"),
    NORTH_AMERICA("North America"),
    SOUTH_AMERICA("South America"),
    EUROPE("Europe"),
    AUSTRALIA("Australia"),
    ;
 
    private final String key;
 
    Partition(String key) {
        this.key = key;
    }
 
    public String getKey() {
        return key;
    }
}

@MappedSuperclass 注解用于表示实体类的父类。它可以用来定义多个实体类之间的公共映射和字段。它本身并不被视为一个实体,不能直接持久化到数据库中。

User 实体继承了 PartitionAware 类,如下:

@Entity
@Table(name = "users")
public class User extends PartitionAware<User> {
 
    @Id
    @GeneratedValue
    private Long id;
 
    @Column(name = "first_name")
    private String firstName;
 
    @Column(name = "last_name")
    private String lastName;
 
    @Column(name = "registered_on")
    @CreationTimestamp
    private LocalDateTime createdOn = LocalDateTime.now();
     
    // get / set 方法省略
}

Post 实体同样,如下:

@Entity
@Table(name = "posts")
public class Post extends PartitionAware<Post> {
 
    @Id
    @GeneratedValue
    private Long id;
 
    private String title;
 
    @Column(name = "created_on")
    @CreationTimestamp
    private LocalDateTime createdOn;
 
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "user_id")
    private User user;
     
    // get / set 方法省略
}

现在,通过 Spring EntityManager 初始化机制在创建 JPA EntityManager 时默认启用分区过滤:

@Bean
public JpaTransactionManager transactionManager(
        EntityManagerFactory entityManagerFactory){
    JpaTransactionManager transactionManager = new JpaTransactionManager();
    transactionManager.setEntityManagerFactory(entityManagerFactory);
    transactionManager.setEntityManagerInitializer(entityManager -> {
        User user = UserContext.getCurrent();
        if (user != null) {
            entityManager.unwrap(Session.class)
                .enableFilter(PartitionAware.PARTITION_KEY)
                .setParameter(
                    PartitionAware.PARTITION_KEY,
                    user.getPartitionKey()
                );
        }
    });
    return transactionManager;
}

当前分区是根据当前登录的 User 设置的。由于启用了分区 Filter,每次实体查询都会从当前登录 User 的分区中选择 UserPost 实体。

创建以下 Hibernate PersistEventListener,在当前登录的 User 持久化的每个实体上设置 partition_key

public class PartitionAwareInsertEventListener
        implements PersistEventListener {
 
    public static final PartitionAwareInsertEventListener INSTANCE =
        new PartitionAwareInsertEventListener();
 
    @Override
    public void onPersist(PersistEvent event)
            throws HibernateException {
        final Object entity = event.getObject();
 
        if (entity instanceof PartitionAware partitionAware) {
            if (partitionAware.getPartitionKey() == null) {
                FilterImpl partitionKeyFilter = (FilterImpl) event
                    .getSession()
                    .getEnabledFilter(PartitionAware.PARTITION_KEY);
                partitionAware.setPartitionKey(
                    (String) partitionKeyFilter
                        .getParameter(PartitionAware.PARTITION_KEY)
                );
            }
        }
    }
 
    @Override
    public void onPersist(PersistEvent event, PersistContext persistContext)
            throws HibernateException {
        onPersist(event);
    }
}

使用以下 PartitionAwareEventListenerIntegrator 注册 PartitionAwareInsertEventListener

public class PartitionAwareEventListenerIntegrator
        implements Integrator {
 
    public static final PartitionAwareEventListenerIntegrator INSTANCE =
        new PartitionAwareEventListenerIntegrator();
 
    @Override
    public void integrate(
            Metadata metadata,
            BootstrapContext bootstrapContext,
            SessionFactoryImplementor sessionFactory) {
        sessionFactory
            .getServiceRegistry()
            .getService(EventListenerRegistry.class)
            .prependListeners(
                EventType.PERSIST,
                PartitionAwareInsertEventListener.INSTANCE
            );
    }
 
    @Override
    public void disintegrate(
        SessionFactoryImplementor sessionFactory,
        SessionFactoryServiceRegistry serviceRegistry) {
    }
}

PartitionAwareEventListenerIntegrator 是通过基于 Java 的属性配置提供给 Spring 的:

properties.put(
    EntityManagerFactoryBuilderImpl.INTEGRATOR_PROVIDER,
    (IntegratorProvider) () -> List.of(
        PartitionAwareEventListenerIntegrator.INSTANCE
    )
);

就这样!

测试

持久化一个 User 和 3 个 Post 实体:

final User vlad = new User()
    .setFirstName("Vlad")
    .setLastName("Mihalcea")
    .setPartition(Partition.EUROPE);
 
userRepository.persist(vlad);
 
UserContext.logIn(vlad);
 
forumService.createPosts(LongStream.rangeClosed(1, POST_COUNT)
    .mapToObj(postId -> new Post()
        .setTitle(
            String.format(
                "High-Performance Java Persistence - Part %d",
                postId
            )
        )
        .setUser(vlad)
    )
    .toList()
);

Hibernate 将生成以下 SQL INSERT 语句:

INSERT INTO users (
    registered_on,
    first_name,
    last_name,
    partition_key,
    id
)
VALUES (
    '2023-11-09 11:22:55.802704',
    'Vlad, Mihalcea',
    'Europe',
    1
)
INSERT INTO posts (
    created_on,
    partition_key,
    title,
    user_id,
    id
)
VALUES (
    '2023-11-09 11:19:55.856126',
    'Europe',
    'High-Performance Java Persistence - Part 1',
    1,
    1
), (
    '2023-11-09 11:19:55.856126',
    'Europe',
    'High-Performance Java Persistence - Part 2',
    1,
    2
),(
    '2023-11-09 11:19:55.856126',
    'Europe',
    'High-Performance Java Persistence - Part 3',
    1,
    3
)

检索 Post 实体:

List<Post> posts = forumService.findByIds(
    LongStream.rangeClosed(1, POST_COUNT).boxed().toList()
);

Hibernate 将根据当前登录 User 的分区进行过滤:

SELECT
    p.id,
    p.created_on,
    p.partition_key,
    p.title,
    p.user_id
FROM
    posts p
WHERE
    p.partition_key = 'Europe' AND
    p.id in (1, 2, 3)

在对该 SQL 查询运行 EXPLAIN ANALYZE 时,可以看到只有 posts_europe 表被扫描:

Seq Scan on posts_europe p 
  (cost=0.00..11.14 rows=1 width=1056)
  (actual time=0.022..0.023 rows=3 loops=1)
  Filter: (
    ((partition_key)::text = 'Europe'::text) AND
    (id = ANY ('{1,2,3}'::bigint[]))
  )

参考:https://vladmihalcea.com/table-partitioning-spring-hibernate/