For performance purposes, Hibernate can manage all the entities so they can be compared, cached, batched, and synchronized with the database only when needed. Hibernate uses a mechanism called Dirty Checking to identify changes in the managed entities.

During the persistence process, Hibernate uses this mechanism to track changes made on the managed entities to enhance performance by reducing the number of necessary queries that are executed on the database.

In this article, we’ll delve deeper into how this mechanism works, how useful it is and we’ll find out if it is possible to disable it.

How does it work?

Every time an entity becomes managed inside the Persistence Context, a snapshot of it is taken. This snapshot is the base of comparison on which future changes in the entity will be identified.

The snapshot represents the initial image of the entity that is already persisted or will be persisted during the next flush operation.

Since each entity is uniquely represented inside the Persistence Context, when changes are done on a managed entity, they will have to be merged with the existing entity to respect the constraint.

Let’s get two examples of entities becoming managed.

When loading the entity from the database

The find() method can be used to load an entity from the database. When loading the entity like that, we are sure that, since it comes directly from the database, it is the initial version of the entity. This offers the possibility of easily identifying further changes made to it.

/*
* When using the find() method, the user with id 1 is loaded from the
* database and becomes managed in the Persistence Context.
*
* A new snapshot entity is created at this point, userShapshot, being an
* exact copy of user entity.
*/
User user = entityManager.find(User.class, 1);

/*
* The name is only set on user entity while userSnapshot remains unchanged.
*/
user.setName(“Flavius”);

In this scenario, the snapshot is built starting from the entity that is loaded directly from the database.

2. When persisting a new entity

Let’s suppose we want to persist a new entity. In this case, there is no version of the entity already stored in the database, so we can’t use it as an initial entity for comparison.

/*
* A new transient entity is created.
*/
User user = new User();
user.setName(“Flavius”);

/*
* The entity is set to be synchronized with the database at flush time.
*
* A new snapshot entity is created at this point, userShapshot, being an
* exact copy of user entity.
*/
entityManager.persist(user);

/*
* The name is only set on user entity while userSnapshot remains unchanged.
*/
user.setName(“Zichil”);

In this scenario, the entity that is initially set to be persisted at flush time will become the initial entity used for identifying changes.

As you can see, in both scenarios, the snapshot is taken at the moment when the entity becomes managed inside the Persistence Context. Also, in both cases, we end up with two versions of the entity:

the proxy entity (the snapshot)the entity with some changesAt flush time, the two versions of the entity will be compared to identify if there are any changes between them. If so, they will be merged and an insert, update or delete operation will be performed. Otherwise, no operation will be executed.This figure exemplifies how the Dirty Checking mechanism works. When an entity is loaded from the database, a snapshot of it is also kept in the Persistence Context. When some entity fields are changed, the snapshot is not affected. Before committing the transaction, each entity is compared with its snapshot. Only the entities with changes will generate SQL queries. In this example, only User 1 is updated in the database since its name was changed from Bob to Bill.

Saving without saving! Is this magic?

The Dirty Checking mechanism can sometimes create strange situations for developers. Let’s have a look at the following code snippet.

What do you think the name of the user with id 1 will be after the current transaction is committed?User user = entityManager.find(User.class, 1); // User (id = 1, name = “Zichil”)
user.setName(“Flavius”);

Even if the user was not explicitly persisted after the name was changed, due to the Dirty Checking mechanism and because the user entity is managed by the Persistence Context, at the end of the transaction, the name of the user will be Flavius. Strange, right? The code from above is equivalent to the following:

User user = entityManager.find(User.class, 1); // User (id = 1, name = “Zichil”)
user.setName(“Flavius”);

entityManager.persist(user);

So, in this case, the persist method (or save method if you use Spring Data JPA) is optional. Should we explicitly add it? This decision is a matter of preference.

Even if not using it would encourage all developers to understand how the Dirty Checking mechanism works, I prefer using it because it makes the code easier to understand.

Performance issues

In the above examples, only one entity was used. But in real-case scenarios, there are multiple entities managed at the same time by a Persistence Context. For all of them, a snapshot version is stored in an array. This may cause performance issues regarding both the memory necessary for storing a copy of each entity and the processing time necessary for comparing all snapshot versions with their modified entities.

When there are multiple managed entities inside a Persistence Context, even if only one of them has changed, all the entities will be compared with their snapshots.

A possible way of avoiding these drawbacks would be to keep the units of work (this term is often used in JPA/Hibernate to refer to the block of code that contains the entities that a Persistence Context should manage) as small as possible.

Of course, in complex applications, this is a challenging thing to do. In such cases, an alternative can be to manually execute flush operations from time to time to force the Persistence Context to synchronize with the database, removing in this way the snapshots for the synchronized entities.

Another optimization that can be done is to use read-only transactions or immutable entities in cases where there’s no need to modify the entity.

Read-only transactions

In some cases, the entities that are retrieved from the database are not meant to be modified. To prevent the Dirty Checking mechanism from being performed on such entities, we can set the transaction as read-only. In Spring, this can be done as shown below.

@Transactional(readOnly = true)
public User getUser(Long userId) {
return userRepository.findById(userId);
}

Doing that, since the entity will never be modified, there is no point in looking for changes in its fields. For this reason, the snapshot of the entity won’t be created anymore, thus reducing the memory and processing that are needed by the Dirty Checking mechanism.

Immutable entities

Hibernate offers an annotation, @Immutable, for marking an entity as immutable. While being immutable, it won’t be possible to change its state.

@Entity
@Immutable
public class ImmutableUserEntity {

@Id
private Long id;
private String name;
}

The Dirty Checking mechanism also ignores an immutable entity. The snapshot won’t be taken and any changes made to it won’t be detected by Hibernate.

Conclusion

We already know that the Persistence Context is a core component of Hibernate. To make it work accordingly, Hibernate needs a mechanism for identifying the changes made to entities. The mechanism is called Dirty Checking and it has both advantages and drawbacks, depending on how it is used.

In this article, we’ve learned what this mechanism is, how it works and we’ve gone through some optimization techniques that can increase the performance of your applications even more.

I hope this article helps you. Thank you for reading!

The magic behind Hibernate’s Dirty Checking mechanism was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.