A phase of my Agile Blue/Green deployment project is now complete, as I have a Proof of Concept which has been tested successfully. As you may recall from prior posts, the Blue/Green technique features two databases, the “green” live one that users connect to (I’ve made an arbitrary assignment of colours), and the offline “blue” one which has a different version of the schema (a previous or future release).
The idea is to upgrade or downgrade a schema version very quickly by cutting over from Green to Blue. Fine – but often a database is coupled to multiple applications and data feeds. It’s too big for the upgrade to be done from a backup file in normal downtime. This situation is reminiscent of “Achilles and the Tortoise” from Zeno’s Paradoxes. While the Blue database is trying to catch up, the Green receives new data, which will also have to be loaded into Blue before it goes live… and so on. The same applies in the other direction. Once you’ve promoted the Blue database, there’s no going back.
In my Proof of Concept, each top-level Transact-Sql command issued against the Green database is recorded in a third database named “Journal”. This captures the DML operation type, the target table, the date/time, and also preserves the sequence in which the statements executed. With this information a stored procedure in the Blue database can “play back” the changes and catch up with the Green one. Performance testing has barely started but I can at least say that inserts of 20M rows completed with no obvious problems.
I have used triggers for this. Any other approach could miss queries run directly against tables. Yes, all updates should be done through some kind of API, and no doubt they generally will be. When is that rule most likely to be broken? This will probably be when the application is broken and needs fixing as soon as possible. Isn’t that also the time when we most need a reliable back out option if the fix doesn’t work? That’s why we can’t afford to miss any changes. Could we use log shipping or a restore from the transaction log instead? This is much more appealing than using triggers; it would be simpler and would save a lot of coding. The problem is that by definition the Blue and Green schemas will be different, which makes these techniques impossible to use.
With a major architectural change it’s very important to widen out the discussion early on. Any seasoned developer will want a second opinion on performance and on whether the more obscure Transact-Sql statements will work as advertised. This is something the DBAs can help with.
More subtly, developers may realise that some of the new components could be used for more than one purpose. In this case, the data capture tables could also serve as a change log for auditors. They would also provide a very realistic set of inputs for application testing, and for troubleshooting in Production. Is any of this an duplication of something we already have? Is it an opportunity that should be exploited – or just scope creep? An early discussion with the DBAs (and of course with the people commissioning a commercial project) can help avoid these problems. This is what I’m intending to do in the next stage of the project.
2 thoughts on “Agile Continuous Delivery of Databases (3)”
This is funny. In the new situation I’ve been thrown into, they’re using what one co-worker called a ‘bastardized form of Agile’ that they’re calling ‘Flexible delivery.’ I noticed that they have blue and green parts of the application but no one explained to me what that meant (naturally). I wonder, now, if it’s related to what you’re discussing in your article.
I think it might be – and I don’t think that it’s yet possible to use Agile and Continuous Deployment in the fullest sense with a database, so if there’s been any thinking about how to switch features on/off selectively that’s probably as good as you get, certainly if it’s an “integration database” (per Martin Fowler) directly exposed to multiple applications, not sitting behind a service layer.
UI and business-layer programmers enthuse about continuous deployment but shuffle their feet and look at the floor when they’re asked where the database changes fit in. The code isn’t rthe problem, it’s the data. They’ll want an “application database” where no-one else is using the data, which is OK as long as we still have “one version of the truth” in the company. If the “application database” contains personal data and your company serves the UK or the EU it’s potentially making things more difficult under GDPR, where you really have to know where all the data is to fulfil the company’s declared policy. It’ll be interesting to see what effect GDPR has on system architecture.