Security: Contexts and Shared Resources
There’s a simple mental framework I find really useful when thinking about infrastructure and how to defend it. I call it “contexts and shared resources.”
A context is some place that code runs. It’s a box you draw around the running code. The scope is flexible. Use whatever makes sense to you in your situation. Here are a few examples:
- A running process.
- An “application,” whatever that means in your system.
- A container.
- A virtual machine.
- A physical machine.
- A rack.
- A datacenter.
- A “project,” or whatever the resource grouping abstraction is in your cloud of choice.
- A person having thoughts.
In the world of infrastructure, you never have just one instance of a given context, and a big part of the security job is to keep those contexts from interacting with each-other in undesirable ways.
A resource is something that a context (or the stuff in it) needs to do its job (sometimes a resource can also be a context, it’s not a particularly strict definition). Here are some examples:
- A process depends on the kernel to deliver key system services, such as time, networking, access to the filesystem, etc.
- A VM depends on a hypervisor to manage it.
- An application depends on a set of credentials to access the database (it depends on the database too).
- A machine in a rack depends on a person to replace it when it breaks.
- The air you need to vibrate to communicate those thoughts.
Here’s the problem: contexts share resources for all sorts of reasons:
- Two containers are on the same VM sharing the same kernel, because neither needs the entire machine to itself.
- Two machines in a rack sharing the same network switch, because neither produces enough traffic to saturate it.
- Two processes on different machines share the same identity credentials even though they have different jobs, because it was easier than setting up a new identity.
- Two virtual machines share the same cloud project because they’re managed by the same team.
- Two applications share the same database.
- Two people sharing the same room to have a conversation. There’s a third person listening, hidden.
Sometimes the reasons are reasonable, sometimes they are not. All times, sharing a resource creates a security risk.
All. Times. There’s no such thing as zero-risk sharing. Attacks proceed as a series of steps that escalate between contexts via the manipulation of shared resources. If two contexts share nothing, you can’t move between them. This is the basic physics of infrastructure security (the “advanced physics” is all of the incredibly clever and varied and occassionally beautiful ways someone can attack a shared resource; this is what makes security difficult).
Keep in mind that sharing is usually a good thing. Share a train with other people to save gas and save the planet. Share a meal with friends to build your relationship. Overcommit resources in your cloud fleet to make cheap compute available to everyone and provide major benefits to the global economy. Share networking and storage because you can’t do anything useful without being connected. Sharing (not money!) makes the world go round.
Attacks on infrastructure proceed as a series of steps that leverage one context’s access to shared resources to escalate to other contexts. Rinse and repeat until you reach your target. Attackers are constantly looking for ways to get from A to B. Is a shared resource you’re responsible for on that path? It might be. If you haven’t put yourself in the “attacker mindset,” tried to attack your own system, and thought about how to defend it, it probably is vulnerable.
From a practical standpoint, it takes a lot of effort to “harden” (make more secure) any given context, because contexts usually share lots of things and do lots of things, and you have to do the work to understand and model all those things, make recommendations for how to reduce possible interactions to only what’s needed, negotiate on improvements that require nontrivial work from impacted teams, and agree on priorities based on risk. You never have the resources to fix everything, but you work really hard to fix the things that you think matter most. Aside: I’m very hopeful that AI will eventually be able to give live security feedback to development teams, because retrofitting security after-the-fact is just so much harder than having it built in to the philosophy of the system and the philosophy of the team.
So security teams like to pick a few common, important contexts and shared resources that are natural places to “wrap” whatever needs doing, and we designate these wrappers “security boundaries.” Then we pour a bunch of effort into analyzing them, finding the weaknesses, getting issues fixed, building tests and tooling to make sure they stay fixed, running remediation programs to drive adoption of these boundaries, doing our best to guide teams to use them appropriately, and eventually (ideally) build them into frameworks that reduce the amount of security-sensitive decisionmaking teams need to do.
That’s what a “security boundary” is: a context and/or a set of resources you decided to put extra effort into protecting. As a result, these also become common recommendations for mitigating situations that carry additional attack risk, like running untrusted code in a VM as a cloud provider.
No “security boundary” is completely secure, and while they are powerful constructs to rally organizations around when trying to improve security at scale, they shouldn’t be sacred and unquestionable, and they only go so far as a barrier. There can be good reasons to do things differently. My recommendation: If you have a security team who has put a lot of work into creating security boundaries for your benefit, you should ask them for an opinion before you bypass those boundaries. They’ll want to work with you to understand your needs and help you design alternative mitigations for any additional risks that come from that bypass.
The framework of contexts and shared resources is not perfect (all frameworks are a simplification of reality) but it has still helped me a lot when trying to identify risk. I hope it helps you think more critically about what you need to do to make your systems safer. Take some time, put yourself in the attacker mindset, draw out your contexts and shared resources on a piece of paper, and see what you find. You might be surprised!
P.S. Since I spend the vast majority of my time working on Kubernetes security, here are just a few additional examples of contexts and shared resources in Kubernetes:
- Containers, which share the same kernel. Not much of a security boundary by default, because the kernel is written in C, a memory-unsafe language, and “container breakout” vulnerabilities are thus a dime-a-dozen. gVisor, as one example of a sandboxing technology, hardens this boundary by re-implementing select kernel functionality in Go, a memory-safe language.
- The “Node” (a VM or machine), which shares a cluster with other nodes and is shared by multiple containers. While some significant effort has been put into hardening the node boundary, it’s imperfect, development teams currently need to be careful to align their application permissions with it and security teams need to write policy to help development teams do that.
- The “Kubernetes Control Plane” (kube-apiserver and the various controllers that reconcile configuration), which is shared by the nodes and many containers in the cluster, and itself implements a significant set of security controls.
- The service account a workload runs as has, by nature, access to other resources. Sometimes service accounts end up shared by multiple different workloads out of convenience, though this is decidedly not a best practice. Over-granting access rights can threaten even “surrounding” boundaries by providing direct ways around them.
August 4, 2024