One of my steepest learning curves in the last year and a half has been really getting into containers on K8’s and Docker and what running databases within them means. I thought I’d share a general overview, and plan to follow this post with more specific and more technical posts.
Stateful vs. Ephemeral
One of the central concepts of containers is that they are often Ephemeral to varying degrees. This means that in an ideal world, a container is instantiated to perform a particular task and then it goes away. On one extreme of the ephemeral spectrum are containers that literally run a single command and then are deleted. On the other are longer-lived containers that behave a bit more like application VMs that exist for weeks or months, but can be very quickly replaced and scaled without modifications being required to other components. Any of these containers are not stateful. You can kill them and replace them with another container with no problems.
Databases on the other hand involve stateful data. Often when we support a database server, the server exists for years and may not be even so much a rebooted for a year or two. High availability involves either copying the stateful data with a very low latency or two database servers sharing the same storage. Disaster recovery involves copying the data with low latency to a different physical location. The storage for databases is persistent. But we can still manage persistent data storage with ephemeral containers, as long as we are cognizant of the edge cases. For example, when changing database server versions, we cannot generally upgrade the DBMS and just point it at exactly the same data. There is often a conversion process that must occur if that happens that may change the system catalog of the database or formats of things like transaction log files.
Additionally, we have to take specific steps to make sure that the storage is, indeed, persistent and very highly available.
Containers vs. Virtual Machines (VMs)
When DBAs made the switch from hardware to VMs, the change for the DBA was not huge. Yes, there were some additional things to understand, but largely we could just treat VMs as servers. For the highest performance databases, we still insist on dedicated hardware. We certainly have learned to demand well-architected virtual solutions and many of us have learned what clues tell us the difference between well-architected and well-performing virtual infrastructure and virtual infrastructure that does not measure up.
Why to Containerize Db2
If databases don’t immediately fit the description of ephemeral containers, why should we even consider using containers? Besides the fact that the direction that most enterprises seem to be taking is more towards containerized solutions, there are some surprising advantages of running databases in containers.
Perception: RDBMSes are Obsolete
This is the top item on my list because I hear it all the time. The fact is that somewhere along the line, we have to have persistent data. We have to know the products being sold, or the customer details, along with a million other data points, and we need to store and manipulate this data in a consistent and performant format. Just last week, someone in my own company complained that the RDBMS was obsolete and complained about having to use one at all. But if you want consistent data and you need to be able to manipulate it in a variety of ways, the RDBMS is still often the best answer, and centralizing data is still not a bad idea. If we as DBAs continue to cling to the deployment methods we are used to and comfortable with, that only furthers this perception. Databases can be run in containers and can be run in ways that facilitate applications that run in containers.
Easy Creation of Additional Environments
If your database is properly containerized and you have well-designed helm charts, then people who are responsible for the platform and/or the application can easily spin up a new environment. This means that, without much DBA involvement, there can be purpose-specific development environments, or even ease the transition between hosting providers. Once I’ve built and validated a development environment in a new place, often someone else builds out the remaining environments and only a database restore is required from me.
Easy Integration into Agile/DevOps
If your company is using CI/CD, or even just more frequent releases of application code than the old traditional waterfall approach, then fitting the database into that work stream is critical. If you can’t fit the database into it, then you will be seen as a blocker. The speed of this process also puts extreme demands on a DBA. Suddenly fitting your change into the process requires a DBA being available whenever they’re doing a deploy 24/7. While a lot of DBA enablement of this is tied to using some sort of database version control like Liquibase, it is also magical to be able to integrate database system changes into the process. Now you know when the “server” restarts with a deploy and can get your changes into that pipeline as well, and it can go in without a DBA even in attendance.
Speed of Patching and Upgrades
Coming from a background as a systems DBA, this is perhaps the most magical advantage. Fix packs are so much easier – build a container on the new code ahead of time, and test it out, then when it’s time to apply, you just deploy a change to the helm chart to use the newer container, and run any post-patching steps and you’re done. No waiting for an installFixPack or db2iupdt to run. Back out is just as easy. Upgrades are just slightly more involved. Ask me later in the year, and I’ll have more details on them.
Build Similar Databases at Scale
Supporting WebSphere (now HCL) Commerce databases, I’ve spent a fair amount of my career working with not just one stack(dev/qa/stage/prod) of similar databases, but in most companies multiple stacks of these that all need a very similar if not identical build process. Nearly identical builds are generally boring exercises in detail orientation. With containers, you have everything you used to build the last one, and assuming your helm charts are in good shape, building another one is a matter of an hour or two that may not even involve the DBA.
Why not to Containerize Db2
We have run into some reasons not to containerize Db2.
Resources Required by a Single Container
The largest single servers in many enterprises are often database servers. Database servers don’t traditionally do well with being split up into a bunch of other servers. If databases are split across many servers, it’s rare for those servers to be easy to spin up and add/remove. The size traditionally required by database servers may even be larger than the host nodes behind your K8’s implementation, and a single container cannot span multiple nodes. You may need purpose-built larger nodes to live behind the database containers or may need to consider a different cluster with larger nodes for your databases. While Db2 easily supports sub-capacity licensing that works well in containers, all RDBMSes may not, so be cautious of this as well.
While this is improving, there is still overhead in containerization. For the largest most critical databases today, it’s still better to have them on dedicated VMs or hardware. HOWEVER, this probably applies to maybe 5-10 of the thousands of different databases I’ve seen in my career. If your company’s name isn’t a household name, and you’re not paying for true 4 9’s of availability across the board, it’s unlikely that you need this.
Containerization is a skill set that requires skills a DBA may not already have. This can be alleviated by partnering with a team that has these skills and by training. Containerization requires a fair level of OS knowledge and also knowledge of the containerization and orchestration layers. To do this, you have to have DBAs who are willing to learn and provide them with time and education. If your DBAs are already stressed and working 50+ hour work weeks to keep up with work, you’ll have to hire. Even if they’re just fully utilized, acquiring this kind of knowledge takes time and dedication, and you’ll probably have to augment your staff to support it. You will also have mistakes along the way. There is no way to prevent the mistakes, they are just learning experiences that are part of a containerization journey.
Next week, I’ll add a post on where Db2 containers come from and ways they are used. I’d love to hear your thoughts on running Db2 in containers in the comments below.
Originally published on DataGeek