Rethinking Threat Models

probably you do not know about threat modelling

The four basic questions on threat modelling are,

  1. What are we building?

  2. What can go wrong?

  3. What are we going to do about it?

  4. Did we do a good job?

"What are we building?"

I can make an analogy to physical buildings such as homes or restaurants. It is a physical building. The question of "What are we building?" is similar to defining the goals of the physical building, which determine what kind of options and investments we need to make. Are we building a home, hotel, café, government offices, workplace rental, or real estate such as malls? The security of those types of physical buildings varies according to the needs, budgets, compliance, and laws. Similarly, with software systems, we should understand what kind of software systems we are building—B2B, B2C, government-use software, SaaS, or on-premise. Depending on the software systems we build, the needs, budgets, and compliance vary. We cannot invest in DDoS attack solutions for a solution that is not predominantly used and is internal. The best way is to put it under VPN.

Understanding the needs and the quality of security is important to grasp the lay of the land.

“Who knows the building well?”

Is it the security guys who know the nook and corner of the building, how it is built, which cement is used, and what quality of steel has been used? It is the software engineers who know best. In most cases, there would not be a single person who knows everything, and the information is asymmetric across software engineers. If you are lucky, you will find the golden duck who knows everything about the building.

Let us assume we identify a person who knows A to Z about the software that is built. Now, how is he going to share it? Let’s connect on a meet and understand. Now, is he going to give everything in words from his mouth? He has a database in his mind, and you can query your questions from his mind during the call. A call is the worst way to get an understanding of the building. Maybe to start with, you can do it, but not to get the whole picture.

In writing, the golden duck is going to write about the building in 10 pages. How do you understand the whole 10 pages in a single mind? It is going to be difficult. Now comes diagrammatic representation. A software engineer can share a rough diagram laying out the basic structure of the software. You can understand how the software is running/designed. Combining writing and diagrams is the best way to get the big picture.

Diagrams

What is a model? A model is an understanding of something. Something can be a car, the working of a car, the road where the car drives, the person who is driving the car, etc. Understanding something comes down to concepts. Conceptualization is about how we perceive the world and what we know. A human model tells you about humans and has its boundaries.

Representation of a model can be diagrams, writings, videos, audio, or databases. A human model cannot be used to diagnose mental illness (for example, an MRI scan of the brain—a brain model, which is a part of the human model—cannot be used to diagnose mental illness). That’s why the term “All models are wrong, but some are useful” exists: a model does not represent actual reality but is a simplification used for some purpose. Choosing the right model to understand the software is important.

“What are the models available to understand the software?”

Here comes conceptual modelling. One example is UML models—state diagrams, sequence diagrams, flowcharts, and DFDs. A conceptual model is a collection of entities, relationships, cardinality (counting principle), and temporality (discrete or continuous events).

Models can be categorized into two types: static and dynamic models. Static models cannot answer questions about past or future events. To query a model, you ask questions. A model has knowledge (facts, concepts). Certain models, such as databases, can be used to make inferences from knowledge automatically, while for others, such as diagrams, inferences must be manual.

To a DFD, you can query questions such as:

  1. How is the customer data flowing from Service 1 to Service 2?

  2. How is the customer data stored in Service 1?

  3. Why is user data flowing between two different VPC networks?

Answers to simple questions 1 and 2 are easy to infer from a DFD model. But for question 3, a DFD cannot answer. You have to capture the knowledge or determine whether the information is really needed to answer those questions. In no way can a DFD answer questions about security. Can it answer if Service 1 data is being encrypted?

The way we should model the DFD is by collecting the questions you would probably ask, and the model should answer them. For example, if you want to include encryption details and you have a compliance standard that all services should have encryption at rest with AES-256 encryption algorithms, then you could use a different color or representation for the service.

How will I infer which services are not encrypted? You can identify them by seeing which services are not colored.

Why are we not using a database? A database would be the best approach to answer questions and make automatic inferences, but it would be a product, similar to GRC products. It would be a global standard that might not work for our context.

So, it is important to select which model you want to use or create one to answer your important questions.

“what can go wrong?”

I believe "what can go wrong" cannot be a separate asset. The current way is, for example, you use DFD and then brainstorm using frameworks such as STRIDE or others to jot down the threats in an asset (document). Having a document to a model is a metamodel. Metamodels are models about models. Here the document (metamodel - writing model) talks about security threats of a UML Diagram (model).

The question you have to ask is: why do you choose writing for a metamodel? Why not a diagram? If I capture "what can go wrong" as a separate asset, then what is the purpose of "what are we building?" Is it to understand "what are we building" by models that are not meant for deriving "what can go wrong?" I believe the best case is to combine both and not see them as step-by-step processes.

The best case is to have a single model that represents "what are we building" and gives inference to "what can go wrong." If you separate the two, maintaining them is difficult—nearly impossible. If you treat them separately, you can only do it for a small scope.

What are we building? A signup page with some backend changes, represented by a state diagram.

What can go wrong? You perform attack trees, STRIDE over diagrams.

Can we consider the whole product for "what are we building?" The whole current product is already built. The meaning and the scope do not match then. But "what can go wrong," if you treat it separately, can include the entire scope because it needs the entire scope. If I treat it together: what can go wrong in what we are building? This changes the meaning to only a smaller scope.

Treating it together is a better approach for incremental threat identification rather than treating it separately.

I am not going to talk about "What are we going to do about it?" and "Did we do a good job?" as it is a process-level issue—not much to discuss.

Conclusion

Understanding conceptual modeling will help improve your threat modelling and enable innovation. If needed, we should create a new model specifically for security rather than relying on common models designed for other purposes.

Good Reference

Reply

or to participate.