The power of Docker and containerization
If you’ve been in IT over the last three years you would’ve heard people shout “DOCKER” from the rooftops demanding its immediate use, everywhere, in your technology infrastructure. And you may have said to yourself, “What on earth is a Docker?”
A few years ago I was just as confused by all the shouting, and could not tell the difference between an image or a container. But I’m now happy to say I’ve got a much better understanding of what Docker is, so I thought I’d create a few simple notes to help you avoid my confusion and understand the benefits (and potential pitfalls) of using Docker.
What is Docker?
Docker is a containerization product (and it’s also the company that supply the product). It’s Open Source and you can do an awful lot with the Community Edition. There are paid for versions as well if you want commercial support or enterprise features.
As well as Docker itself, there are products like Docker Swarm and Google Kubernetes that help you manage Docker containers at large scale. That is a separate topic that we won’t talk about here.
So what are containers?
Containers take the idea of virtual machines one stage further. A bit of history might make this clearer.
When I started in IT we used to have big machines that ran lots of different workloads. As hardware became (a bit) cheaper we got smaller machines that each ran a specialised workload, e.g. accounts or email. However there were a few different problems with this:
- If things got busy the machine could get slow, sometimes very slow…
- Trying to avoid congestion at peak times meant having to invest in spare capacity, which was usually wasted most of the time.
- If one process went “rogue” it could break everything else on the machine. This was a security problem as well as a reliability problem.
- Even if a process didn’t have a problem, a single process suffering heavy load could impact all the other services or processes running on the machine.
In the naughties we started using virtual machines (VM). With VMs, we can isolate each workload inside its own private copy of the operating system. This is good because software can be isolated and moved around to different machines depending on demand. However there are still a number of problems:
- It takes lot of memory to run all those different copies of Windows or Linux. They also took a lot of time to start, or restart if they crash.
- The OS extra layers made it hard to manage the system, and activities like rolling out security patches became much harder. Although there are tools to help with system configuration management at scale, it still takes time and effort
- Moving and restarting virtual machines as demand changes is complex
Over the last few years containerization has become popular, mainly thanks to Docker.
With Docker, each component of a software solution (the server, the database etc) is run in a container. In the container it looks like a complete and isolated computer and operating system, but in reality it’s a virtualized operating system. That is, one copy of Linux (or Windows) made to look like a dedicated copy of the operating system for each container.
So virtual machines make the hardware virtual, and now Docker does the same thing to the operating system.
The advantages of using Docker
There are a number of advantages to using Docker:
- Containers are very quick to start (no operating system to boot)
- They use a lot less resources because you’re only running one copy of the operating system
- There are a lot of tools and resources to help you
- For a correctly designed solution, containers can provide much better resilience and scalability (at the cost of some complexity)
- It can be a lot easier to use similar software in production, test and development environments.
Do containers have any drawbacks?
All technology involves trade-offs and compromises, and Docker is no different:
- Security is still an evolving capability, and if you need solid security isolation you may need to continue to use virtual machines
- If you take advantage of Docker’s scalability features, you’ll need to support a corresponding increase in complexity requiring specialist skills.
What’s the difference between a Docker Image and a Docker Container?
(The following information is taken from a Docker workshop that I gave at Linux Conf Australia in 2019)
A Docker Image is the files and metadata needed to start and run a container.Think of it as “file system plus metadata”.
Note that it doesn’t contain an operating system, only the files needed for our process.
Typically a Docker Image contains:
- Software packages your program needs to run
- Your program and static config files
- Metadata about access to OS Level resources (networking, persistent storage, ENV variables, etc.)
- Default startup command
It’s NOT Running Processes, just the information you need to start and run the required processes.
Compare that to a Docker Container, which refers to the in memory contents of an image that’s currently running under Docker.
It’s basically one or more processes using resources granted to it via Docker. A working application will usually require one or more containers (perhaps spread across different machines).
One image can be used to start many containers.
If you’d like to watch the Docker presentation I gave at Linux Conf Australia 2019, it’s right here in all its 99 minute glory!
How do I get started with Docker?
The Community Edition version of Docker can be downloaded for free. There are detailed setup instructions for various workstation platforms to get you going.
There are a number of online courses and videos to help. One we’ve used at PaperCut is Docker Mastery on Udemy.
So there you have it: a quick (and hopefully helpful) rundown of Docker and the potential of containerization.