Docker 101: A Beginner's Guide

Docker is a containerization platform that allows you to package applications and their dependencies into isolated units called containers. These containers run consistently across environments, by virtualizing the operating system instead of the hardware. Docker provides a lightweight, fast, and portable way to build, ship, and run applications, making it an essential tool for modern software development and microservices architectures.

Part 1: Understanding the Core Concepts

First, let’s understand what Docker is and the key components.

Docker vs. Virtual Machines (VMs)

Before diving into Docker’s architecture, it’s helpful to understand how it differs from traditional Virtual Machines (VMs).

VM emulate an entire machine including CPU, memory, storage, and a full guest operating system. They run on a hypervisor (like VMware or VirtualBox) and are great when you need strong hardware-level isolation or different OS environments.
Docker containers run on the host OS kernel and isolate at the process level. Containers virtualize the OS, not the hardware, that makes them incredibly lean and fast.

Docker Architecture Overview

Behind the scenes, Docker has a few key parts that work together:

Docker Client: The CLI tool docker that takes your commands and interacts with the Docker Daemon.
Docker Daemon: A background service dockerd expose REST API and manages images, containers, networks, volumes.
Docker Runtime: This containerd handles manages container lifecycle, and runc executes OCI containers.
Docker Engine: The combination of the client, daemon and runtime.
Docker Image: A read-only template for creating containers. It’s built from a Dockerfile.
Docker Container: A runnable, isolated instance of an image.
Docker Registry: A storage system for images. Docker Hub is the default public registry.

+---------------------------+
|        Docker CLI         |
|     (docker commands)     |
+---------------------------+
|         dockerd           |
|   REST API, management    |
+---------------------------+
|         containerd        |
|     lifecycle + images    |
+---------------------------+
|           runc            |
|    low-level executor     |
+---------------------------+
|        Linux kernel       |
+---------------------------+

How it all works:

You run a command like docker run nginx from your terminal.
The Docker Client sends this command to the Docker Daemon.
The Daemon checks if the nginx image exists locally. If not, it pulls it from a Registry (like Docker Hub).
The Daemon then creates and starts a new Container from that image.
Your NGINX web server is now running in an isolated container.

Dockerfile: The Blueprint for Images

A Dockerfile is a text-based script that contains instructions for building a Docker image. It’s the foundation of infrastructure-as-code for container environments. Each instruction creates a new layer in the image, which helps with caching and efficient builds.

Most Common Instructions

FROM: Specifies the base image (e.g., node:18-alpine).
WORKDIR: Sets the working directory for subsequent instructions.
COPY: Copies files from your host into the image’s filesystem.
RUN: Executes commands during the build process (e.g., npm install).
EXPOSE: Documents which network ports the container listens on at runtime.
CMD: Provides the default command to execute when the container starts.
ENTRYPOINT: Configures the container to run as an executable.
ENV: Sets environment variables.
ARG: Defines build-time variables.
USER: Sets the user for running subsequent commands.

Example Dockerfile:

# Define a build-time variable for the Node.js version
ARG NODE_VERSION=18-alpine
# Use an official Node.js runtime as a parent image
FROM node:${NODE_VERSION}

# Set environment variables for the container
ENV NODE_ENV=production
ENV PORT=3000

# Create a non-root user to run the application
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Set the working directory in the container
WORKDIR /app

# Copy package.json and package-lock.json
# and change ownership to the non-root user
COPY --chown=appuser:appgroup package*.json ./

# Install application dependencies
RUN npm install --production

# Bundle app source
COPY --chown=appuser:appgroup . .

# Switch to the non-root user
USER appuser

# Expose the port the app runs on
EXPOSE ${PORT}

# Set the entrypoint for the container
ENTRYPOINT ["node"]
# Provide the default command to the entrypoint
CMD ["server.js"]

Shell vs. Exec Form

Instructions like RUN, CMD, and ENTRYPOINT can be specified in two forms:

Shell form: CMD node server.js. The command is run in a shell (/bin/sh -c). This allows you to use shell features like variable substitution ($HOME), chaining (&&, ||). The PID 1 is the shell, not the actual application.
Exec form: CMD ["node", "server.js"]. The command is run directly as a binary, bypassing the shell. This is the preferred form for CMD and ENTRYPOINT as it handles signals (e.g., SIGINT, SIGTERM) correctly and avoids shell-related issues.

`CMD` vs. `ENTRYPOINT`

These two instructions define what the container runs on startup, but they have different purposes.

CMD: Sets a default command and/or parameters, which can be easily overridden from the docker run command line. And great for general-purpose containers.
ENTRYPOINT: Configures a container that will run as an executable. Arguments from docker run are appended to the ENTRYPOINT. Can be overridden only with --entrypoint flag.

The best practice is to use them together. Use ENTRYPOINT to specify the main executable and CMD to specify default arguments.

ENTRYPOINT ["ping"]
CMD ["localhost"]

docker run my-app executes ping localhost.
docker run my-app google.com executes ping google.com, overriding the CMD.

`ARG` vs. `ENV`

Both are used for variables, but they serve different purposes.

ARG: Defines variables that are only available during the image build. It is like parameters for Dockerfile, used to customize the build process, like specifying a software version to install. They are not available inside the running container.
ENV: Sets environment variables that are available both during the build and, more importantly, to the application running inside the container. This is the standard method for providing runtime configuration, such as API keys, database credentials, or environment-specific settings.

# ARG is for build-time customization
ARG NODE_VERSION=18-alpine
FROM node:${NODE_VERSION}

# ENV is for runtime configuration
ENV NODE_ENV=production

# You can use ARG to set an ENV
ARG APP_PORT=3000
ENV PORT=$APP_PORT

You can set ARG values during the build: docker build --build-arg NODE_VERSION=20-alpine . You can set ENV values at runtime: docker run -e NODE_ENV=development my-app.

The `.dockerignore` File

A .dockerignore file allows you to exclude files and directories from being copied into your image (similar to .gitignore). This is crucial for:

Reducing Image Size: Prevents large files or development artifacts (like node_modules or .git) from being included.
Improving Build Speed: Avoids invalidating the build cache with irrelevant file changes.
Security: Keeps sensitive files and credentials out of the image.

Create a .dockerignore file in the same directory as your Dockerfile:

.git
node_modules
npm-debug.log
Dockerfile
.dockerignore

Multi-Stage Builds

Multi-stage builds allow you to use multiple FROM statements in a single Dockerfile. This is a powerful technique to create small, secure production images by separating build-time dependencies from runtime artifacts.

# Stage 1: Build the application
FROM node:18-alpine AS builder
WORKDIR /app
COPY . .
RUN npm install && npm run build

# Stage 2: Create the final production image
FROM nginx:alpine AS final
# Copy only the built assets from the 'builder' stage
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Stage 1 (builder): This stage uses a Node.js image to install dependencies and build the application. It contains all the build tools and source code needed for compilation.
Stage 2 (final): This stage starts with a minimal NGINX image and copies only the compiled application assets from the builder stage. The resulting image is small and secure, as it contains no build tools or source code.

By default, docker build -t my-app . builds the final stage to create the optimized production image.
To inspect or debug an intermediate stage, you can build it specifically using the --target flag: docker build --target builder -t my-app-debug ..

Docker Image: The Read-Only Template

A Docker image is a lightweight, standalone, executable package that includes everything needed to run a piece of software, including the code, a runtime, libraries, environment variables, and config files.

Image Layers and Caching

Images are built in layers. Each instruction in a Dockerfile creates a new layer that represents a filesystem change (e.g., adding a file, installing a package). This layered architecture is efficient because:

Caching: Docker reuses unchanged layers from previous builds, making subsequent builds much faster.
Sharing: Multiple images can share common base layers, saving disk space.

Building an Image

To build an image from a Dockerfile, use the docker build command. The -t flag tags the image with a name and version (e.g., my-app:1.0.0), and the . specifies the current directory as the build context (the Dockerfile and source files location).

# Build an image and tag it as 'my-app' with version '1.0.0'
docker build -t my-app:1.0.0 .

Docker Tags

A tag is a label applied to a Docker image to identify a specific version or variant. It’s the part after the colon in an image name (e.g., nginx:1.25.2).

Always use specific, immutable tags (like 1.0.0 or a git commit hash) to ensure your deployments are predictable and repeatable.
If no tag is specified, Docker defaults to latest.

You can apply multiple tags to the same image:

# Set tag during the build
docker build -t my-app:1.0.0 .

# Add another tag to the image
docker tag my-app:1.0.0 my-app:stable

Best Practices

Use multi-stage builds to reduce image size.
Minimize layers, combine commands with && to reduce image bloat (e.g. RUN npm install && npm run build).
Use .dockerignore to exclude unnecessary files (e.g., node_modules, .git).
Pin versions for reproducibility.
Avoid using the latest tag in production, since it can point to different image versions over time.
Run as non-root for security.

Docker Container: The Running Instance

A Docker container is a live, running instance of a Docker image. It runs as an isolated process on the host machine’s kernel. You can create, start, stop, move, and delete containers. The OS-level virtualization makes containers much more lightweight than VMs.

How Containers Work

Containers use Linux kernel features like namespaces and control groups (cgroups) to achieve isolation.

Namespaces: Provide isolated environments for processes, networking, mounts, and users.
cgroups: Limit and monitor resource usage, such as CPU, memory, and I/O.
OverlayFS2: Enables layered file systems for images.

Containers are ephemeral by default. Any data written inside the container’s filesystem is lost when the container is removed.

Running a Container

To create and start a container from an image, you use the docker run command. This is the most fundamental command for interacting with containers.

# Run the 'my-app' image, name the container 'web'. Use -d to run in the background.
docker run -d --name web my-app

Part 2: Networking and Data

Now that you know how to build and run containers, let’s see how to connect them and manage their data.

Docker Networking

Docker networking allows containers to communicate with each other, the host machine, and the outside world. Each container gets its own isolated network stack (IP address, routing table, etc.).

Built-in Network Drivers

Docker uses network drivers to control how containers communicate:

bridge: Default driver. Creates a private, internal network on the host. Containers on the same bridge can communicate.
host: Removes network isolation. The container shares the host’s network stack.
none: Disables all networking for the container.
overlay: Used for multi-host communication in Docker Swarm mode.

Creating a Custom Network

It’s a best practice to create custom bridge networks for your applications rather than relying on the default one. Custom networks provide better isolation and an embedded DNS server that allows containers to resolve each other by name. To create a network, use the docker network create command:

# Create a custom bridge network named 'my-app-net'
docker network create my-app-net

Container-to-Container Communication

Docker provides an embedded DNS server for name-based service discovery. Containers on the same user-defined bridge network can resolve each other by their container name, but they are isolated from containers on other networks. This is a powerful feature for security and network segmentation.

A single container can be connected to multiple networks, allowing you to create tiered architectures. For example, you can have a frontend-net for your web server and a backend-net for your database, with an API service connected to both.

# Run the database, connected only to the backend network
docker run -d --name db --network backend-net postgres

# Run the API, connected to both networks
docker run -d --name api --network backend-net --network frontend-net my-api

# Run the web frontend, connected only to the frontend network
docker run -d --name web --network frontend-net my-web

In this setup the web container can reach the api container (using the hostname api), and api container can reach the db container. But, the web container cannot directly reach the db container, because they are on different networks.

External and Host Access

Exposing Ports: To make a container accessible from your host machine or the internet, you need to publish its ports using the -p or --publish flag.
```
# Map port 80 in the container to port 8080 on the host
docker run -d -p 8080:80 nginx
```
You can now access NGINX at http://localhost:8080.
Accessing the Host: From inside a container, you can access services running on your host machine using the special DNS name host.docker.internal. This is extremely useful for local development when your container needs to talk to a database or API running directly on your laptop.
- On Linux, this requires an extra flag: docker run --add-host=host.docker.internal:host-gateway my-container. It’s built-in on Docker Desktop (macOS/Windows).
- For example, a connection string inside your container might look like: Server=host.docker.internal;Database=my_db;

Docker Volumes: Persisting Data

Since containers are ephemeral, you need a way to store data permanently. Docker volumes are the preferred mechanism for persisting data generated by and used by Docker containers.

Volumes are directories managed by Docker that are mounted into a container. They live on the host machine (typically in /var/lib/docker/volumes/ on Linux) but are isolated from the host’s core functionality.
Volumes are persistent, portable, and safe to share between containers. They survive container restarts and removals.
Volumes can be named or anonymous. Named volumes are reusable; anonymous ones are tied to container lifecycle.

Types of Mounts

Volumes: (Recommended) Managed by Docker. Stored on the host filesystem. Portable and efficient.
Bind Mounts: Maps a specific file or directory from the host machine into the container (e.g., -v /path/on/host:/path/in/container). Less portable and can have permission issues.
tmpfs Mounts: Stored in the host’s memory only. Data is non-persistent.

Creating and Using Volumes

To create a volume, use the docker volume create command:

# Create a named volume named 'my-db-data'
docker volume create my-db-data

You can mount volumes using two different flags: -v (or --volume) and --mount.

-v syntax: is more concise [SOURCE:]/path/in/container[:ro]
- If SOURCE is a path on the host, it’s a bind mount.
- If SOURCE is just a name, it’s a named volume.
- If SOURCE is omitted, it’s an anonymous volume.
--mount syntax: is more explicit type=<volume|bind>,source=<NAME_OR_PATH>,target=/path/in/container[,readonly]

You can also make a volume read-only by appending :ro (for -v) or adding the readonly flag (for --mount), which is useful for sharing configuration without allowing the container to modify it.

# Run a container using the -v flag (creates a named volume implicitly if it doesn't exist)
docker run -d --name db1 -v my-db-data:/var/lib/postgresql/data postgres

# The same command using the more explicit --mount flag
docker run -d --name db2 --mount source=my-db-data,target=/var/lib/postgresql/data postgres

# Mount a config file from the host as read-only
docker run -d --name web --mount type=bind,source="$(pwd)"/nginx.conf,target=/etc/nginx/nginx.conf,readonly nginx

Part 3: The Most Common Docker Commands

Here is a quick reference for the commands you’ll use most often, with examples.

Building and Managing Images

Build an image from a Dockerfile. Use -t to specify the image name, . for the current directory, -f to specify the Dockerfile.
```
docker build -t my-app:1.0 .
```
List all images on your machine.
```
docker images
```
Download an image from a registry.
```
docker pull nginx:latest
```
Upload an image to a registry (after logging in).
```
docker push your-repo/my-app:1.0
```
Remove one or more images.
```
docker rmi my-app:1.0
```

Apply a new tag to an existing image.

docker tag my-app:1.0 your-repo/my-app:latest

Running and Managing Containers

Create and start a new container from an image. Use -d to run in the background, --name to give it a name, and -p to map a port.
```
docker run -d --name web -p 8080:80 nginx
```
List running containers. Use -a to see all containers (including stopped ones).
```
docker ps -a
```
Gracefully stop a running container.
```
docker stop web
```
Start a stopped container.
```
docker start web
```
Restart a running container.
```
docker restart web
```
Remove a stopped container.
```
docker rm web
```
Forcefully stop a container by sending a SIGKILL signal.
```
docker kill web
```

Interacting with Running Containers

Run a command inside a running container. The -it flags are for interactive access.
```
docker exec -it web /bin/bash
```
View the logs of a container. Use -f to follow the log output.
```
docker logs -f web
```

Copy files or folders from host to container filesystems.

docker cp ./index.html web:/usr/share/nginx/html/index.html

Copy files or folders from container to host filesystems.
```
docker cp web:/etc/nginx/nginx.conf ./nginx.conf
```
View detailed, low-level information about a Docker object.
```
docker inspect web
```

System and Cleanup

Remove unused data (stopped containers, dangling images, unused networks). Use -a to remove all.
```
docker system prune -a
```
Display system-wide information.
```
docker info
```
Show the Docker version information.
```
docker version
```

Part 4: Multi-Container Applications

Most real-world applications consist of multiple services (e.g., a web server, an API, a database). Docker Compose is the tool for defining and running these multi-container applications.

Docker Compose: Running Multi-Container Apps

Docker Compose uses a single YAML file (typically docker-compose.yml) to configure all of your application’s services, networks, and volumes. With one command, you can spin up (or tear down) your entire application stack.

Anatomy of a `docker-compose.yml` File

version: '3.8'

services:
  # The web frontend service
  web:
    image: nginx:latest
    ports: # Represents the -p flag
      - "8080:80" # Map port 80 in the container to port 8080 on the host
    volumes: # Represents the -v flag
      - ./nginx.conf:/etc/nginx/conf.d/default.conf # Mount a config file from the host
    networks: # Represents the --network flag
      - frontend-net # Connect to the frontend network
    depends_on:
      - api

  # The backend API service
  api:
    build: ./api # Build the image from the 'api' directory
    networks:
      - frontend-net
      - backend-net
    environment: # Represents the --env-file, --env, and -e flags
      - DB_HOST=db
    depends_on:
      db:
        condition: service_healthy # Wait for the DB to be healthy

  # The database service
  db:
    image: postgres:15
    volumes:
      - db-data:/var/lib/postgresql/data
    networks:
      - backend-net
    environment:
      - POSTGRES_PASSWORD=mysecretpassword
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

# Define the networks
networks:
  frontend-net:
  backend-net:

# Define the volumes
volumes:
  db-data:

Key Concepts

services: Defines each container (e.g., web, api, db).
networks: Creates networks for services to communicate on.
volumes: Defines persistent data volumes.
depends_on: Controls the startup order of services. Using it with healthcheck ensures a service is actually ready before its dependents start.
healthcheck: A command Docker runs periodically to check if a container is still working correctly.

Running Docker Compose

docker compose up: Build, (re)create, start, and attach to containers for a service. Add -d to run in the background.
docker compose down: Stop and remove containers, networks, and volumes.
docker compose logs: View logs from all services.
docker compose exec <service_name> <command>: Execute a command in a running service.

Part 5: Security & Advanced Topics

Finally, let’s cover some important topics related to security and advanced configuration.

Running Containers Securely: User Management

By default, containers run as the root user. This is a security risk, as a compromised container could potentially gain root access to the Docker host.

Best Practice: Always run your containers as a non-root user.

You can create and switch to a non-root user in your Dockerfile:

# Create a group and user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Tell Docker to run subsequent commands as this user
USER appuser

This simple step significantly improves your container’s security posture by limiting its privileges.

Managing Secrets in Docker

Never hardcode secrets like passwords, tokens, or API keys directly into your Dockerfile or source code.

For Local Development: Use environment variables passed via a .env file (which should be in your .gitignore). Docker Compose automatically picks up a .env file in the same directory.
For Production (Standalone Containers): Use the --env-file flag with docker run to load variables from a file.
For Production (Orchestration): When using orchestrators like Docker Swarm or Kubernetes, use their built-in secret management systems. Docker Swarm has Docker Secrets, which are encrypted and securely mounted into containers only when needed.

The Docker Socket: `/var/run/docker.sock`

This is a Unix socket file that the Docker daemon listens on. The docker CLI client communicates with the daemon via this socket.

Mounting this socket into a container (-v /var/run/docker.sock:/var/run/docker.sock) gives that container root-level access to the host. It can start/stop other containers, build images, and more. This is sometimes called “Docker-out-of-Docker”.

While powerful for CI/CD tools (like Jenkins or GitLab runners), it has major security implications. Only mount the Docker socket into containers you fully trust.

Distributing Images

To share your images, you push them to a registry. Docker Hub is the default public registry, but you can also use private registries from cloud providers (AWS ECR, Google ACR) or host your own.

Authenticates you to a Docker registry.

# For CI/CD, use a non-interactive login
echo "$DOCKER_PASSWORD" | docker login --username "$DOCKER_USERNAME" --password-stdin

Push your tagged image to the logged-in registry.
```
docker push your-repo/my-app:latest
```

Configuring the Docker Daemon (`daemon.json`)

You can customize the behavior of the Docker Engine itself by editing the daemon.json configuration file.

Location: /etc/docker/daemon.json on Linux.
Use Cases: Configure logging drivers, set up registry mirrors to speed up pulls, change the storage location for Docker data, and more.

Changes require a restart of the Docker daemon (sudo systemctl restart docker). This is an advanced topic, but it’s good to know it exists for fine-tuning your Docker setup.

Docker 101: A Beginner’s Guide