I wanted to just put a marker online, in case others are having similar doubts. I am currently in the midst of completely productionising my property platform. This has given me ample opportunity to reflect upon the merits and demerits of some of my approaches. Certainly, the most common thing I find with contracts, is clients play it uber safe to the point, delivery gets impacted.
A simple example is when faced with having a huge variety of CSV and XML files, I simply wrote my own automated loader to populate a DB and convert the types. It took me a couple of weeks, the applications aren't as fast at importing, as using SSIS, but I can import thousands of files and get on with other important things. Clients would never let you do this, and in turn, they may employ thousands of man hours writing SSIS packages.
Entity Framework is great
Entity Framework is a revelation. Version 6 +, lets you map objects more cleanly, it performs all the wiring required, and does a great job of letting the developers forget about the database. I personally, think it is incredible but is misused as we will see.
Entity Framework is great, but...
There are too many detractors to list, but I am going to cycle through a few of them. I am not going to be too cruel.
Availability bias or everybody is doing it, so we should do it too
There is a never ending crowd of developers wanting Entity Framework on their CV. Then there are those who have already done it, so let's just go with it. Similarly, it tends to be the go to object relational mapper because it exposes the model in a linq type fashion, and promises to try and respect querying the database optimally.
Entity Framework doesn't respect SQL
Try doing windowed functions, or subqueries - yes, it can do exists to a lesser extent, but correlated subqueries? Okay, so it lets you use stored procedures and functions.
Entity Framework encourages over-normalisation
This is outside the scope of this document as I have been working on a Database Domain Driven Design methodology over the last couple of years. However, the one clear challenge with EF is it expects you to normalise data. This model may be diametrically opposed to what we see on screen - often the reason why so much code is existed in the different application layers.
Entity Framework encourages lazy developers who don't know databases
To me, this is one of the biggest travesties. There is nothing wrong with abstraction of a database, as long as it can't undermine the integrity of the database. This isn't just contained to relational databases, but to NoSQL databases such as Elasticsearch and MongoDB - it is one reason I don't touch them because I would have to learn the underlying technology.
- Poor indexing strategies, if at all.
- Table locking.
- No backup strategy.
- Use of dbInitialiser rather than consistent release management approaches based upon databases.
The tightly coupled nature of Entity Framework
The main challenge, I see, with EF is how much of its functionality resides in large base classes not bound to an interface. This violates the single responsibility principle. More importantly, we tend to find, very technical implementations of code and linq inside controllers, resulting in a kind of soup more about EF than the application.
It doesn't support dependency injection, we can't just say because it doesn't support DI - it is bad, but it does reduce testability and introduce challenges.
The free for all nature of EF
I am thinking in terms of containers, single responsibility, and responsibilities - contracts so-to-say. The idea, I like to see, is that I know what I am getting. Who is to say somebody hasn't done all manner of things inside a controller on a huge number of tables.
EF encourages ad-hoc access, increasing trips to the DB
If you, in the course of a method, retrieve multiple tables and do multiple operations, you are going to the database multiple times. Sure, there may be connection pooling, but it is an increase in the number of round trips.
The lack of transparency
Certainly, we can run SQL Analyser to trace query statements, but there is a fundamental issue with having an application where the code is only understood by the developers who wrote it. This is the same gripe I have with unit testing, only the developers know what the tests actually do, and even then, the knowledge is highly esoteric.
When looking at Business Analysts and Testers, a good deal more are likely to be able to understand SQL than code such as Java and C#.
The harder nature to optimise
This is a big deal, and may not seem that much of a problem. In the old days - in the Microsoft world, people would create stored procedures, which did stuff and returned data. It was all very procedural and quite slow. It was found that too much business logic resided in the database and, to be fair, relational databases lacked the ability to express what the business model required. So, a move towards; richer clients or application tiers, and the database as a model occurred. In many ways, this was a good thing.
However, we now see five or six layers with logic existing across them. When we try to understand what an application is really doing with its data, the challenges begin.
Often, whilst, without its detractors - having single responsibility functions and objects gives a lot of clarity to how an application uses data. More importantly, these objects are compiled and can be subject to query optimisers easier. Suddenly, the coder is having to think about how their application may perform interacting with data stores.
The lack of adherence to effective data modelling
Again, this is too much to cover in this blog. I will probably be putting more on my limited company website in the future about this at www.inforhino.co.uk . I always think, has this problem been solved already.
In the case of data warehousing and record versioning - definitely. I won't say, it is always appropriate - so just before the Kimball zealots start screaming, pinch of salt needed.
However, as I try to think about how effective record life cycles can be managed through an entity framework model - I can't see anything but challenges.
How am I handling record versioning
I tend to think of a stored procedure being an aggregate root, with a number of IEnumerable /Array objects, tied to an event. The database takes care of this, and the application layer does what it needs to do in terms of validation and coordination.
I only wanted to write an article about Entity Framework not being always the go to guy. In fact, it is obvious I feel a lot stronger about it than I thought. Certainly, I haven't fleshed out these points to the point of incontrovertibility. There were many more points I wanted to make but didn't take the time.
I would say, always be wary of just going with all out Entity Framework - it doesn't remove you from the database, it removes you from the data.