UPDATE: At the time of publishing thie blog post, Microsoft has removed the
mssql-tools
from the container image.
Context
Searching on the internet about using databases in containers, they will find hoards of blogs, posts, and suggestions to not use databases in containers. But for testing deployment, configurations, and integration testing, it’s really convenient to spin up a temporary database in a container and run isolated tests. Recently, I was working on a project where multiple applications were manually deployed in the test environment. I was tasked to automate the deployment process, and this is the culmination of what I learned from that implementation.
Initial Attempt
Microsoft provides a very rudimentary sample (mssql-customize) for customizing Microsoft SQL Server container image from their Docker Hub image.
This is the Dockerfile for the custom image:
FROM mcr.microsoft.com/mssql/server:2017-latest
# Create a config directory
RUN mkdir -p /usr/config
WORKDIR /usr/config
# Bundle config source
COPY . /usr/config
# Grant permissions for our scripts to be executable
RUN chmod +x /usr/config/entrypoint.sh
RUN chmod +x /usr/config/db-restore.sh
ENTRYPOINT ["./entrypoint.sh"]
In the Dockerfile, we run the entrypoint.sh
which executes db-restore.sh
(and detaches it from then current process) and starts the database server.
#!/bin/bash
# Start the script to restore the database
/usr/config/db-restore.sh &
# Start SQL Server
/opt/mssql/bin/sqlservr
In the db-restore.sh
, it will try to connect with the database (with a fixed retry count) and run the query to restore from the backup file.
#!/bin/bash
# if the backup doesn't exist, we can skip this script
if [ ! -f "/home/Database.bak" ]
then
echo "SETUP :: database backup file not found, exiting"
exit 0
fi
SUCCESS=false
MAX_ATTEMPTS=60
ATTEMPT_COUNT=0
DBSTATUS=1
while [ $SUCCESS = false ] && [ $ATTEMPT_COUNT -le $MAX_ATTEMPTS ]; do
echo "SETUP :: looking for existing database"
DBSTATUS=$(/opt/mssql-tools/bin/sqlcmd -h -1 -t 1 -U sa -P $MSSQL_SA_PASSWORD -Q "SET NOCOUNT ON; SELECT COUNT(1) FROM sys.databases WHERE name = N'Database'")
if [ $? -eq 0 ]; then
SUCCESS=true
DBSTATUS=$(echo $DBSTATUS | tr -cd '[:digit:]')
break
fi
echo "SETUP :: Attempt $ATTEMPT_COUNT failed. Trying again..."
ATTEMPT_COUNT=$(( ATTEMPT_COUNT + 1 ))
echo SETUP :: sleeping
sleep 1
done
if [ $ATTEMPT_COUNT -ge $MAX_ATTEMPTS ]; then
echo SETUP :: max attempt count reached, exiting
exit 1
fi
if ! [ "$DBSTATUS" == "0" ]; then
echo "SETUP :: database already exists, exiting"
exit 0
fi
# Run the setup script to create the DB and the schema in the DB
echo "SETUP :: starting database restoration"
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $MSSQL_SA_PASSWORD -Q "RESTORE DATABASE [Database] FROM DISK = '/home/Database.bak' WITH RECOVERY"
echo "SETUP :: database restoration complete"
This might seem like a crude implementation, but it is simple and does what it was designed for. Now, we have a custom SQL Server image that can-
- automatically restore the database to a certain point to run specific tests
- setup mock data to run specific integration tests
The Conundrum
After integrating the Docker image with other apps in a Docker Compose, a new problem emerges. The database restoration was aborted after some time.
This was odd because it was working perfectly while running separately from the other apps.
Turns out, one of the apps was configured to run database migration at startup. When the Docker services started, that app was racing with the restoration script to run its migration. For some reason, the app was starting a bit later than the restoration process. But before the restoration process completes, that app would connect with the server and create an empty database immediately.
I have two guesses for the database to abort restoration-
- The database backup was taken before some of the current features were implemented. That means the newly created database would have a different schema than the restoring backup. Thus the backup’s schema didn’t match in the midway of the restoration.
- There was no database when the restoration started and it started to create files (data, logs, etc.). However, the app created an empty database, thus creating files that were meant to be created by the backup restoration. When it tried to copy files from the backup to an existing file, it aborted.
But, I can’t say any of those were the exact cause because I didn’t try to dig down any deeper into this one, and I had to finish the implementation as soon as possible. That’s why I went on a different route.
Manual Restoration
Instead of automatically restoring the database from inside the Docker container, we can trigger the restoration script from outside using docker exec
. The procedure would be somewhat like this-
- start services with
docker compose
- stop backend container
- restore database
- start the backend container again
This can be automated with a simple script, which can be executed on the setup phase of the container services.
Now, the ENTRYPOINT
of the Dockerfile file will be the SQL Server binary-
FROM mcr.microsoft.com/mssql/server:2022-latest
WORKDIR /home
# Bundle config source
COPY . /home
ENTRYPOINT ["/opt/mssql/bin/sqlservr"]
This is the updated db-restore.sh
-
#!/bin/bash
MSG_PREFIX=""
function msg {
echo "[ setup ][ $MSG_PREFIX ] ${1}"
}
function check_backup {
if [ ! -f "${1}" ]
then
return 1
fi
return 0
}
SUCCESS=false
MAX_ATTEMPTS=60
ATTEMPT_COUNT=0
DBSTATUS=1
MSG_PREFIX="init"
while [ $SUCCESS = false ] && [ $ATTEMPT_COUNT -le $MAX_ATTEMPTS ]; do
msg "looking for existing database"
/opt/mssql-tools/bin/sqlcmd -h -1 -t 1 -U sa -P "$MSSQL_SA_PASSWORD" -Q "SET NOCOUNT ON; SELECT COUNT(1) FROM sys.databases"
if [ $? -eq 0 ]; then
SUCCESS=true
break
fi
msg "attempt $ATTEMPT_COUNT failed. Trying again..."
ATTEMPT_COUNT=$(( ATTEMPT_COUNT + 1 ))
msg "sleeping"
sleep 1
done
if [ $ATTEMPT_COUNT -ge $MAX_ATTEMPTS ]; then
msg "max attempt count reached, exiting"
exit 1
fi
function restore_db {
echo ""
MSG_PREFIX="${1}"
if ! check_backup "${2}"
then
msg "backup file not found, skipping"
else
msg "starting database restoration"
cat &> "/tmp/${1}.sql" <<EOF
USE master
GO
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'${1}')
ALTER DATABASE [${1}]
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE
DROP DATABASE [${1}]
GO
RESTORE DATABASE [${1}] FROM DISK = '${2}' WITH REPLACE
GO
EOF
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $MSSQL_SA_PASSWORD -i "/tmp/${1}.sql"
msg "database restoration complete"
fi
}
restore_db "Database" "/home/Database.bak"
MSG_PREFIX="init"
msg "done"
This time, the script will-
- make the database to
SINGLE_USER
mode, thus outside connections will be terminated - drop if the database was created
- restore the database from the backup file
Now, we can run run-db-restore.sh
to start the restoration process-
#!/bin/bash
if ! docker ps | grep "org/backend" &> /dev/null
then
echo "backend not running, exiting"
exit 1
fi
if ! docker ps | grep "org/mssql" &> /dev/null
then
echo "mssql server not running, exiting"
exit 1
fi
BACKEND_CONTAINER_ID=$(docker ps | grep "org/backend" | awk '{print $1}')
MSSQL_CONTAINER_ID=$(docker ps | grep "org/mssql" | awk '{print $1}')
echo "stopping backend"
docker stop "$BACKEND_CONTAINER_ID"
echo "running db-restore.sh script"
docker exec -it "$MSSQL_CONTAINER_ID" bash -c "/home/db-restore.sh"
echo "starting backend"
docker start "$BACKEND_CONTAINER_ID"
echo "done"
The Problem With Self-Signed Certificate Error
The sqlcmd
client in SQL Server 2022 container image, will throw an error because of the self-signed certificate. This is because the server creates a self-signed certificate if not configured.
As our goal is to spin up a container for testing, and not for production. We can bypass setting up the signed certificate for now.
For this, we can use the -C
parameter for sqlcmd
to disable certificate signing check-
-/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $MSSQL_SA_PASSWORD -i "/tmp/${1}.sql"
+/opt/mssql-tools/bin/sqlcmd -C -S localhost -U sa -P $MSSQL_SA_PASSWORD -i "/tmp/${1}.sql"
Thanks to this Reddit post for this solution.
Backup Restoration Time
Another big problem was the restoration time. The console prints that restoration was complete within 20-30 seconds. But, it takes 15-20 minutes for the sqlcmd
process to actually exit.
We can’t wait this much time only to set up an ephemeral test environment. For example, setting up
3 separate tests will take almost 45 minutes to 1 hour just for the database restoration.
Looking out for some time, I stumbled upon this StackOverflow question where this behavior was described in detail.
The solution is to install sqlcmd
from Microsoft’s repository instead of the one already installed in the SQL Server container image.
apt-get install curl -y
curl https://packages.microsoft.com/keys/microsoft.asc | tee /etc/apt/trusted.gpg.d/microsoft.asc
apt-get update
apt-get install mssql-tools18 unixodbc-dev -y
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
After installation, we have to use /opt/mssql-tools18/bin/sqlcmd
for the database restoration instead of using /opt/mssql-tools/bin/sqlcmd
.
Rootless Containers
The SQL Server 2022 Docker image is a rootless container, meaning the default user is not the root
user and sudo
or su
binary is not installed on the image.
To install the sqlcmd
tool, we need to tell docker exec
to execute as root
user-
-docker exec -it "$MSSQL_CONTAINER_ID" bash -c "/home/db-restore.sh"
+docker exec -it -u root "$MSSQL_CONTAINER_ID" bash -c "/home/db-restore.sh"
Have a look at this Github issue for more details on this.
Updated Codes
Final db-restore.sh
would be the following-
#!/bin/bash
MSG_PREFIX=""
function msg {
echo "[ setup ][ $MSG_PREFIX ] ${1}"
}
function check_backup {
if [ ! -f "${1}" ]
then
return 1
fi
return 0
}
SUCCESS=false
MAX_ATTEMPTS=60
ATTEMPT_COUNT=0
DBSTATUS=1
MSG_PREFIX="init"
while [ $SUCCESS = false ] && [ $ATTEMPT_COUNT -le $MAX_ATTEMPTS ]; do
msg "looking for existing database"
/opt/mssql-tools/bin/sqlcmd -h -1 -t 1 -U sa -P "$MSSQL_SA_PASSWORD" -Q "SET NOCOUNT ON; SELECT COUNT(1) FROM sys.databases"
if [ $? -eq 0 ]; then
SUCCESS=true
break
fi
msg "attempt $ATTEMPT_COUNT failed. Trying again..."
ATTEMPT_COUNT=$(( ATTEMPT_COUNT + 1 ))
msg "sleeping"
sleep 1
done
if [ $ATTEMPT_COUNT -ge $MAX_ATTEMPTS ]; then
msg "max attempt count reached, exiting"
exit 1
fi
# install mssql-tools18, which has the updated sqlcmd binary
apt-get install curl -y
curl https://packages.microsoft.com/keys/microsoft.asc | tee /etc/apt/trusted.gpg.d/microsoft.asc
apt-get update
apt-get install mssql-tools18 unixodbc-dev -y
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
function restore_db {
echo ""
MSG_PREFIX="${1}"
if ! check_backup "${2}"
then
msg "backup file not found, skipping"
else
msg "starting database restoration"
cat &> "/tmp/${1}.sql" <<EOF
USE master
GO
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'${1}')
ALTER DATABASE [${1}]
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE
DROP DATABASE [${1}]
GO
RESTORE DATABASE [${1}] FROM DISK = '${2}' WITH REPLACE
GO
EOF
/opt/mssql-tools18/bin/sqlcmd -C -S localhost -U sa -P $MSSQL_SA_PASSWORD -i "/tmp/${1}.sql"
msg "database restoration complete"
fi
}
restore_db "Database" "/home/Database.bak"
MSG_PREFIX="init"
msg "done"
The final run-db-restore.sh
would be the following-
#!/bin/bash
if ! docker ps | grep "org/backend" &> /dev/null
then
echo "backend not running, exiting"
exit 1
fi
if ! docker ps | grep "org/mssql" &> /dev/null
then
echo "mssql server not running, exiting"
exit 1
fi
BACKEND_CONTAINER_ID=$(docker ps | grep "org/backend" | awk '{print $1}')
MSSQL_CONTAINER_ID=$(docker ps | grep "org/mssql" | awk '{print $1}')
echo "stopping backend"
docker stop "$BACKEND_CONTAINER_ID"
echo "running db-restore.sh script"
docker exec -it -u root "$MSSQL_CONTAINER_ID" bash -c "/home/db-restore.sh"
echo "starting backend"
docker start "$BACKEND_CONTAINER_ID"
echo "done"