Repositories – why and how not to use them?
It’s really a sin not to use a repository, but I can be forgiven if I don’t use it in an example.
Consider your software dealing with customers, which, internally, have different features based on when the customer signed up and his status. If customer is registered less than 5 days ago, you should display a big red button saying ‘buy this feature’, and in the back office, you should prioritize tickets these customers create. Let’s call these customers ‘leads’. You will most likely have at least two pages, one for the customer, one for the back office. Assuming at least MVC, customer controller will execute something like:
statement.executeQuery(“select id, name, status, signUpDate, DATEDIFF(signUpDate, now()) < 5 and status=’ACTIVE’ as isLead from customers where id = 12345”);
And you display the Big Red Button. Next up, doing the same query in back office. Let’s indulge our minds with something hypothetical: you are not smart, and you will not extract this functionality into a separate class. Jumping into horror:
First you prepare the statement again in the back office controller. Then you execute it and prioritize support tickets. Business is happy. Now, that you did this so fast, you can probably add it up to a cron job that sends some promotional emails to these leads, every day at 10am. And since you did it once so fast, they are expecting you to finish the feature by yesterday morning. You copy-paste the statement, create the cron job, send the email, and you still have some time to spare.
Few months later, business analyst comes and says that the company has too many churners. They need to change the definition of a lead. So, now it’s 12 days since signup, status has to be either ACTIVE or PENDING_VERIFICATION. You need to estimate the cost of this change. If you are to do it, you can probably remember what the definition of a lead previously was. But, to make the matter into a true horror: someone joined your team and they were assigned this task. How error prone is this? Let’s toss some percentages around: 99.9999% chance a bug will be introduced. Even if you do it, would you remember all the places where you created SQL queries?
Don’t wake up just yet…
After a few weeks, the change is done. Now, the customers and account administrators complain the system is just too slow. You debug and find out that the SQL query is taking too long, so for sake of the story, just a caching mechanism is something that will speed up the system. How much does this change cost? Wouldn’t it be just nice to do it in one place? Yup. You extract this responsibility into a separate class and the horror ends.
So, this is a repository?
Basically, yes. But…
There are just too many ways of doing it wrong. Let’s name a few:
Domain model is equal to database schema
Since the domain model IS the schema, you are bound by restrictions of the database in question. Your domain model knows what the database is, how it’s storing the data, what are the underlying relations etc. This makes the core of your software, domain model and domain logic tied to specific database. If you use JPA, it’s exactly the same thing, but you are tied to relational model, instead of a specific database.
I consider this an implementation detail. Domain model must not care about this, at all. But, should domain logic care about repositories and should we always use this notion of SomethingSomethingRepository? I’d say: not always. Sometimes, the domain procedures will name the repository for you, for example: A customer walks into the store, sits and takes the product catalogue to browse.
However, when breaking up with your bad past of using database schema as domain object, ORMs can cause you a lot of pain, especially when one aggregate references another, creating a large graph. Usually, the first step in refactoring is breaking these strong links with Identity fields, then mapping database objects to domain objects.
One repository per table or collection
I have yet to live for the moment when the product owner comes to me and say: so, the customer walks into the store buys something from product table, then join it with the customer information table, which we join with his payment history. If he has 20 rows in the payment history table that have the status ‘paid’ and that happened less than 10 days ago excluding today, then we update the checkout table with discount of 20%, only if paid in cash.
By doing one repository per table, as in the case described above, the code will be procedural. The service that will have to encapsulate this protocol will have too many dependencies. It’s error prone. And nobody but you will understand it. And you will have to maintain that code. Instead, doing a repository per aggregate root allows us to interact with the domain model in manner familiar to the product owner. In fact, in most cases, P.O. could read the code and tell if it’s correct.
Technology specific repositories
I’m not sure about other languages, but in Java, we have Spring, so this might only be my rant against using Spring Data repositories as domain repositories. Why? Programmers, as a species, are the laziest species known to humanity. If there is a shortcut, be assured that over 90% of the programmers considered taking it, while well over 50% takes it (instead of actual research, I took a shortcut and made these numbers up).
Back to the topic. There are couple of downsides to this. Firstly, your domain layer suffers from immobility. It’s tied to Spring Data implementation of repository.
public interface CustomerRepository extends MongoRepository<Customer, String>
public interface CustomerRepository extends CrudRepository<Customer, String>
Just recently, we tried to integrate with a library, for which we had to connect our database to their repository. We used JPA 1, they used JPA 2.
Secondly, Spring Data repositories (as domain repositories) encourage domain model as database schema. It makes it damn hard for separation of a layer of objects that will describe what the schema looks like, while keeping the domain layer clean database objects and their transformations from and to domain objects.
And lastly, which is just a personal taste: naming. I find that there is absolutely no excuse, no excuse whatsoever, for writing code like customerRepository.findByOrderByOrderDateAsc() while describing a business process. That’s something that will never come out of the mouth of a product owner or business analyst.
In all, use an adapter for spring data repositories, it is a great tool, but in the domain layer, it fails miserably in layering, portability and, for me at least, naming.
In microservice environments, aggregates are separated by REST calls, or at least should be. What I’ve seen is people accessing these aggregates from ‘services’. Once upon a time, services were called managers. If you take this logic, you will understand that it does not manage the data. That’s the responsibility of the other microservice. What it does is it accesses it. Now this sounds like a repository… Why do we keep on calling them services?
But, what about SPAs? Should services, let’s say, in Angular, communicate with REST endpoints, or should they provide some business rules and delegate communication with REST services to repositories? I have exactly zero arguments why are we not doing that right now. Yet, there are some benefits to doing that. For example, Angular app is both web and a progressive app. When internet connection is unavailable, repository (should) become local. Or provide some cashing of the data. Like a true data access layer.
In the end, it’s all about maintainability and the cost of maintenance.
Repositories are a powerful pattern to encapsulate data access. The domain layer becomes oblivious of actual storage engine, whether it is relational database, document database or a microservice. While it’s unlikely that you will just switch databases, you get the benefit of a clear and clean domain, good separation of concerns and, most importantly, understandable and maintainable code.
Photo by Samuel Zeller on Unsplash