Post

Container Escapes 101 - Resource sharing

Container Escapes 101 - Resource sharing

Our shared kernel

One of the fundamental tenants of containers is that they’re a process that shares a kernel’s resources. It is not a virtual machine.

1
2
3
4
5
6
7
8
9
10
11
$ docker run -it redhat/ubi9:9.6
[root@cf0166412881 /]# uname -a
Linux cf0166412881 6.8.0-63-generic #66-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 13 20:09:49 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

$ docker run -it ubuntu:24.04
root@024fa13d4f18:/# uname -a
Linux 024fa13d4f18 6.8.0-63-generic #66-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 13 20:09:49 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

$ docker run -it ghcr.io/some-natalie/some-natalie/whoami:latest
5471916781e4:/$ uname -a
Linux 5471916781e4 6.8.0-63-generic #66-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 13 20:09:49 UTC 2025 aarch64 Linux

No matter which “base” we use as a container, the output of the kernel information is always the same. What else can we learn about the host? 😈

Many good things come from /proc

Everything is a file, so everything has a spot in a filesystem. The proc filesystem (procfs) is special. Mounted at /proc, it exposes data about running processes and system information as a hierarchical file structure. This provides a standard interface for accessing process data without kernel memory access or complex tracing. In Linux, procfs also enables runtime kernel parameter modification via sysctl.

It’s both full of juicy information and can modify information on the host. Let’s explore a bit.

1
2
3
4
5
$ docker run -it ubuntu:24.04
root@6f2ea7fe47c3:/# cat /proc/version
Linux version 6.8.0-63-generic (buildd@bos03-arm64-119) (aarch64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #66-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 13 20:09:49 UTC 2025
root@6f2ea7fe47c3:/# cat /proc/uptime
26396.31 52604.87

Exercise #1 - exploring /proc

❓ What filesystems does your container see?

hint You're looking for the mounts listing.
example answer
root@6f2ea7fe47c3:/# cat /proc/self/mounts
overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/ADNX2GLUN545KN2F6WQE7YW2XQ:/var/lib/docker/overlay2/l/HZM7NJRAIEFSTGL5OLD5C4TXRG,upperdir=/var/lib/docker/overlay2/7b8b18b466eb271da137324273cbdb1d29d2c46602a393b6472bc552f290d82a/diff,workdir=/var/lib/docker/overlay2/7b8b18b466eb271da137324273cbdb1d29d2c46602a393b6472bc552f290d82a/work,nouserxattr 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,nosuid,size=65536k,mode=755,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
sysfs /sys sysfs ro,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup cgroup2 ro,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
shm /dev/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=65536k,inode64 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /etc/resolv.conf ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /etc/hostname ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /etc/hosts ext4 rw,relatime 0 0
devpts /dev/console devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/interrupts tmpfs rw,nosuid,size=65536k,mode=755,inode64 0 0
tmpfs /proc/kcore tmpfs rw,nosuid,size=65536k,mode=755,inode64 0 0
tmpfs /proc/keys tmpfs rw,nosuid,size=65536k,mode=755,inode64 0 0
tmpfs /proc/latency_stats tmpfs rw,nosuid,size=65536k,mode=755,inode64 0 0
tmpfs /proc/timer_list tmpfs rw,nosuid,size=65536k,mode=755,inode64 0 0
tmpfs /proc/scsi tmpfs ro,relatime,inode64 0 0
tmpfs /sys/firmware tmpfs ro,relatime,inode64 0 0


❓ CPU architecture could be valuable to planning our escape path. What information can we see about the CPU?

hint You're looking for cpuinfo
example answer
root@e36facaf3813:/# cat /proc/cpuinfo
processor	: 0
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint bf16 afp
CPU implementer	: 0x61
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 0

processor	: 1
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint bf16 afp
CPU implementer	: 0x61
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 0


❓ In general, there are two types of files of interest here. System files and info not attached to a specific process is in /proc. Processes have individual directories in /proc/PID. Pick a process … we’ll start one to use as an example. What can we tell about it?

hint Start a process with sleep 300 & and the system will return the PID as a number. Use that to explore /proc/PID for a while.
example answer
root@e36facaf3813:/# cat /proc/13/status
Name:	sleep
Umask:	0022
State:	S (sleeping)
Tgid:	13
Ngid:	0
Pid:	13
PPid:	1
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	256
Groups:	0
NStgid:	13
NSpid:	13
NSpgid:	13
NSsid:	1
Kthread:	0
VmPeak:	    2268 kB
VmSize:	    2268 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	    1152 kB
VmRSS:	    1152 kB
RssAnon:	       0 kB
RssFile:	    1152 kB
RssShmem:	       0 kB
VmData:	     208 kB
VmStk:	     132 kB
VmExe:	      20 kB
VmLib:	    1800 kB
VmPTE:	      48 kB
VmSwap:	       0 kB
HugetlbPages:	       0 kB
CoreDumping:	0
THP_enabled:	1
untag_mask:	0xffffffffffffff
Threads:	1
SigQ:	1/15160
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000000000
SigCgt:	0000000000000000
CapInh:	0000000000000000
CapPrm:	00000000a80425fb
CapEff:	00000000a80425fb
CapBnd:	00000000a80425fb
CapAmb:	0000000000000000
NoNewPrivs:	0
Seccomp:	2
Seccomp_filters:	1
Speculation_Store_Bypass:	thread vulnerable
SpeculationIndirectBranch:	unknown
Cpus_allowed:	3
Cpus_allowed_list:	0-1
Mems_allowed:	00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	3
nonvoluntary_ctxt_switches:	5


❓ How long has the system been up?

hint You're looking for uptime but it's only given in seconds.
example answer
root@e36facaf3813:/# cat /proc/uptime
29879.96 59547.38


Uptime is an odd one, as there’s two numbers. The first number is the one you’re looking for. It’s the number of seconds the host system has been up. The second number is how much each core has been idle. Since most computers have multiple cores, this can be many times the amount of the first number.1 We’ll talk more about what this number can imply about our isolation in a bit.

Exercise #2 - system requests

One of the very interesting files in /proc is “system request key”. It’s a key combination that’ll do something when pressed, similar to CTRL + ALT + DEL in Windows. What that something is depends on the key we press … which is then written to /proc/sys/kernel/sysrq. The full table of options are in the documentation for each version of the kernel.

Let’s use this to nicely power off our host VM from within our container (knowing we could be a lot less nice).

1
2
3
4
5
6
7
8
9
10
11
$ docker run -it ubuntu:24.04
root@00290878e11f:/# echo o > /proc/sysrq-trigger
bash: /proc/sysrq-trigger: Read-only file system
root@00290878e11f:/# exit
exit

$ docker run --privileged -it ubuntu:24.04
root@5c434a1221af:/# echo o > /proc/sysrq-trigger
Read from remote host 127.0.0.1: Connection reset by peer
Connection to 127.0.0.1 closed.
client_loop: send disconnect: Broken pipe

Without the --privileged flag, we can’t write to the file we want to. This is by design, although we may find some ways around that moving forward. Sadly, privileged execution is quite common in the field so this is quite relevant.

🎗️ Our host machine is now powered off, so please remember to power it back on before continuing.

❓ Does this work equally well from another container?

hint Try the UBI 9 image we were working with earlier, redhat/ubi9:9.6
example answer
$ docker run --privileged -it redhat/ubi9:9.6
[root@176c617f1af0 /]# echo o > /proc/sysrq-trigger
Read from remote host 127.0.0.1: Connection reset by peer
Connection to 127.0.0.1 closed.
client_loop: send disconnect: Broken pipe
Looks like it works just fine! 🙃


❓ Does this work if the container is privileged, but the user inside isn’t root?

hint Use the non-root image we'd been playing with earlier, ghcr.io/some-natalie/some-natalie/whoami:latest
example answer
$ docker run -it --privileged ghcr.io/some-natalie/some-natalie/whoami:latest
6b800b046fbe:/$ echo o > /proc/sysrq-trigger
/bin/sh: can't create /proc/sysrq-trigger: Permission denied
trying again another way Let's run that nonroot-by-default container with the root user that we couldn't sudo into. ⛓️‍💥

$ docker run -it --privileged --user root ghcr.io/some-natalie/some-natalie/whoami:latest
538e5fec092c:/# echo o > /proc/sysrq-trigger
Read from remote host 127.0.0.1: Connection reset by peer
Connection to 127.0.0.1 closed.
client_loop: send disconnect: Broken pipe


In short, “privilege” means two different and not always identical things. --privileged (at runtime) or otherwise granting a container privileged access is running it with host privileges. Whether or not “that process with a shell we’re calling a container” has root access inside of itself isn’t related, but frequently overlaps. Escape might involve escalation of privileges in and/or out of this box to pull off.

Exercise #3 - namespaces

“Namespace” is another term that means different things based on context. In Kubernetes, it’s a high-level abstraction that’s used to group resources together for convenience. These define what a process is allowed to see. It’s how the system shows resources to a process, but they can be dedicated for increased isolation. There are eight at present.2 At a high level:

  • cgroup - control groups
  • ipc - inter-process communication, does exactly what it sounds like
  • mount - controls mount points, enabling filesystem reads and writes
  • network - a virtual network stack, enabling network communication
  • process - process IDs
  • time - system time
  • uts - allows a process to know the hostname (stands for Unix Time-Sharing)
  • user - user IDs and mapping them between host and process

What namespaces can we see in /proc/self/ns/?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ docker run -it ubuntu:24.04
root@e4ffffdb31b4:/# ls -lah /proc/self/ns
total 0
dr-x--x--x 2 root root 0 Jul  9 16:51 .
dr-xr-xr-x 9 root root 0 Jul  9 16:51 ..
lrwxrwxrwx 1 root root 0 Jul  9 16:51 cgroup -> 'cgroup:[4026532174]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 ipc -> 'ipc:[4026532172]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 mnt -> 'mnt:[4026532170]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 net -> 'net:[4026532175]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 pid -> 'pid:[4026532173]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 pid_for_children -> 'pid:[4026532173]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Jul  9 16:51 uts -> 'uts:[4026532171]'

Okay, so we see something … symlinks to other parts of a filesystem, that is. Those numbers are namespace IDs for that process. By convention (although not strictly a guarantee), these are above 0xf0000000 are user-created namespaces. Converting that number to decimal, any of the namespace IDs below 4026531841 likely belong to the host. Those below that threshold belong to the host. In our case, that’s time (to share time, but of little value to us in escape) and user

This user ID mapping allows the process running as UID 1000 on the host to be the same as UID 1000 in the guest. It’s handy for using containers as a way to pre-bundle a process’s dependencies. Instead, it may let us look around a bit more at what else this UID 1000 is running. To see this in action, let’s launch our VM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# from the host VM
user@escapes:~$ id
uid=1000(user) gid=1000(user) groups=1000(user),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),101(lxd),990(docker)
user@escapes:~$ mkdir temp

# now launch our container
user@escapes:~$ docker run -it -v /home/user/temp:/home/user ghcr.io/some-natalie/some-natalie/whoami:latest
9cef1f0149b9:/$ cd ~
9cef1f0149b9:~$ echo "who am i really, though?" > question.txt
9cef1f0149b9:~$ exit

# from the host VM again
user@escapes:~$ cat temp/question.txt
who am i really, though?
user@escapes:~$ ls -lah temp/
total 16K
drwxrwxr-x 2 user user 4.0K Jul 10 02:56 .
drwxr-x--- 5 user user 4.0K Jul 10 02:54 ..
-rw------- 1 user user   60 Jul 10 02:56 .ash_history
-rw-r--r-- 1 user user   25 Jul 10 02:56 question.txt

Step by step, we’re

  1. Checking what user ID we are on the host (1000) and creating a temporary directory to play in.
  2. Launching the container, sharing that directory in as /home/user inside of it, then writing a file to it.
  3. Looking at the owner and permissions of the file we created from inside the container on the host.

This isn’t exactly a security decision, but it’s darn convenient to allow a process to write to storage and be a known user ID. It can bring about a lack of isolation … so be deliberate. 🤷🏻‍♀️

Let’s see what happens when these two fields mismatch. Launch the same container using another user ID inside of it, as shown below.

1
2
3
4
5
user@escapes:~$ docker run -it --user 1001 -v /home/user/temp:/home/user ghcr.io/some-natalie/some-natalie/whoami:latest
3094ccfc413b:~$ whoami
whoami: unknown uid 1001
3094ccfc413b:~$ id
uid=1001 gid=0(root) groups=0(root)

❓ Try to read and write to the /home/user/question.txt file we created earlier. What happens?

hint You may need to change directories and list out the contents.
example answer
3094ccfc413b:~$ cd /home/user/
3094ccfc413b:/home/user$ ls -lah
total 16K
drwxrwxr-x    2 user     user        4.0K Jul 10 02:56 .
drwxr-xr-x    1 root     root        4.0K Jul  7 02:36 ..
-rw-------    1 user     user          60 Jul 10 02:56 .ash_history
-rw-r--r--    1 user     user          25 Jul 10 02:56 question.txt
3094ccfc413b:/home/user$ echo "\n i'm user 1001!\n" >> question.txt
/bin/sh: can't create question.txt: Permission denied
3094ccfc413b:/home/user$ cat question.txt
who am i really, though?


❓ Now let’s launch the container using root explicitly as our named user, even though it isn’t the default, then try the same tasks.

1
2
3
user@escapes:~$ docker run -it --user 0 -v /home/user/temp:/home/user ghcr.io/some-natalie/some-natalie/whoami:latest
1f8a94707abe:/# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)
hint You may need to change directories and list out the contents.
example answer
1f8a94707abe:/# cd /home/user
1f8a94707abe:/home/user# ls -lah
total 16K
drwxrwxr-x    2 user     user        4.0K Jul 10 02:56 .
drwxr-xr-x    1 root     root        4.0K Jul  7 02:36 ..
-rw-------    1 user     user          60 Jul 10 02:56 .ash_history
-rw-r--r--    1 user     user          25 Jul 10 02:56 question.txt
1f8a94707abe:/home/user# echo "i am roooooooooooot" >> question.txt
1f8a94707abe:/home/user# cat question.txt
who am i really, though?
i am roooooooooooot


❓ Now try to be a tiny bit mischievious and change the owner of question.txt to be root and not user, then verify that did what you expected to on the host.

example answer
1f8a94707abe:/home/user# chown root:root question.txt
1f8a94707abe:/home/user# exit
user@escapes:~$ ls -lah temp/
total 16K
drwxrwxr-x 2 user user 4.0K Jul 10 02:56 .
drwxr-x--- 5 user user 4.0K Jul 10 02:54 ..
-rw------- 1 user user   60 Jul 10 02:56 .ash_history
-rw-r--r-- 1 root root   50 Jul 10 04:10 question.txt    ‼️‼️ our shenanigans worked ‼️‼️


Lastly, let’s try to list all running processes. This shouldn’t work by default, as our pid namespace isn’t shared.

1
2
3
4
5
6
7
8
9
10
11
12
13
user@escapes:~$ docker run -it -v /home/user/temp:/home/user ghcr.io/some-natalie/some-natalie/whoami:latest
82bce6fcc430:/$ ps aux
PID   USER     TIME  COMMAND
    1 user      0:00 /bin/sh -l
    6 user      0:00 ps aux
82bce6fcc430:/$ exit

user@escapes:~$ docker run -it --user 0 -v /home/user/temp:/home/user ghcr.io/some-natalie/some-natalie/whoami:latest
b01303a6bb15:/# ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh -l
    7 root      0:00 ps aux
b01303a6bb15:/# exit

And as expected, it doesn’t work - not for user and not for root. This isn’t shared from the host, so the container process only has visibility of itself and its child processes. Let’s explicitly share the pid namespace and see what we can find out.

1
2
3
4
5
6
7
8
9
10
11
12
user@escapes:~$ docker run -it --pid=host ghcr.io/some-natalie/some-natalie/whoami:latest
5c66e602f6a6:/$ ps aux
PID   USER     TIME  COMMAND
    1    root      0:00 {systemd} /sbin/init
    2    root      0:00 [kthreadd]
    3    root      0:00 [pool_workqueue_]
    4    root      0:00 [kworker/R-rcu_g]
    1986 user      0:00 docker run -it --pid=host ghcr.io/some-natalie/some-natalie/whoami:latest
    2004 root      0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 5c66e602f6a62cfed7af22e2e9adb8624723bce649341446c4f529dfd22d09f4 -address
    2029 user      0:00 /bin/sh -l
    2049 user      0:00 ps aux
# ... lots more here ...

Note how we can see our host’s processes, including us at PID1986 and our children processes for /bin/sh and the command we just ran. This visibility is great if we can have it.

📚 tl;dr - a lot gets shared by sharing a kernel, even if the specifics are up to the combination of configuration and runtimes. Next up … capabilities define what we can do, so how can we find out what capabilities we can work with?

Back to the index.


Footnotes

  1. Read more about /proc/uptime in the fine man pages  

  2. Kernel docs for namespaces namespaces(7) and user_namespaces(7)  

This post is licensed under CC BY-NC-SA 4.0 by the author.