For a lot of us, intel somehow finds its way through the many nooks and crannies of our cyber operations pipeline and ends up doing some good stuff:
But as I've highlighted here and here, it is hard to know exactly what happens in the middle. We know what we want: to lower the operational risk of doing business for our companies, institutions, NGOs, etc. What's unclear is this: What are the many interacting pieces that make this happen? Where are the chokepoints? Where is the friction? How efficiently is decision-making pushed through the pipes? And finally, what is the insight that it generates?
In my many adventures into Active Defence, you've seen me enter a few mazes, unravel a few knots and unfold a new pathway to understand the relationships between threat intel, data pipelines, hunting, detection and deception. I've noticed a stable pattern across all these incursions:Â a holistic view of operational data workflows.
I'm not talking about data merely in terms of logging telemetry from your systems, debugging information for your software, or capturing metrics, I'm talking about the underlying operational architecture of how your data, no matter the source, is converted into actionable insight, enriched at each step of an interconnected web of streams that pull the right levers and results in business decisions.
For data and information to turn into insight, they have to engage the main operative engine of any business: decision-making.
Furthermore, I will risk a bold definition here: decision-making is the act of transforming available information into actions that generate the conditions for more decisions to be made.
Think about it in terms of finite and infinite games (James P. Carse)
What? Consider the following: decisions are only possible within the context of having options. When you don’t have options, there is no decision, there is just inertia, the kind of inertia that makes a rock continue indefinitely in outer space.
A rock floating in the vacuum of space experiences very little friction or other forces. Because of inertia, if that rock is moving, it will continue to move in a straight line at a constant speed practically forever, unless something acts upon it.
The whole point of making a decision is to give you the option to make more decisions in the future. You don’t want to end up in a checkmate situation, with no further moves to make. This is the equivalent of having your entire network encrypted by ransomware with no backups and no viable decryption options.
Information is essential to this process. Information is an optionality token. It can generate options for you and your business. As long as you can ingest information and convert it into insight, you can feed the decision-making machine of the business. This can translate into increased profit and optimized productivity.
There is a direct correlation between how efficiently information is transformed into insight, and your ability to make decisions that secure or transform the business.
But how do you know whether your cybersecurity operations workflows are efficient enough to translate information into action in optimal ways? How do you measure the utilization of these optionality tokens that information brings to you? How do you know if you are paying opportunity costs by not utilizing information flowing through your operational pipelines?
You need to start thinking about your cyber operations as a pipeline, an interconnected web of information that leads to actions taken or not taken. This can mean the difference between being in the newspapers the next day because of a new data breach, and nothing more than an actively exploited vulnerability patched just in time.
There is a meta-data layer of sorts, something that helps you keep your finger on the pulse.
I’m talking about operational cybersecurity business workflows, the interfaces between the people, process and technology that make your cyber operations run smoothly.
A data-centric organization does not structure its data pipeline around existing functions but redefines the functions around the data pipeline (like in Threat Hunting vs Detection Engineering: The Saga Continues).
It's all about information.
And information is about people.
In the context of practical cybersecurity operations, the most important type of information you will ever encounter has a name: threat intelligence.
If you cannot design, capture and monitor your operational pipeline in a way that is driving risk reduction and security control uplifts, all of it with a DevOps approach, then it will be hard to measure how intel translates into impact.
An Approach, the Git Way
So what is the best approach? What should your operational cyber pipeline look like? I came up with a few heuristics that might assist in the design process.
Intel and Impact Driven: Based on the profiling and continuous collection of information for the most important threats to your organisation, and clearly articulating how these threats drive impact by mapping them out to your risk or control areas at the other end.
Threat Modelled: The ability to translate your existing intel into concrete attack paths that can compromise your assets or data based on all the nuances of your specific environment (and not just generic intel).
DevOps Centric: A measure of consistency, automation readiness and your ability to scale operations at machine speed.
Three to Five Key Milestones: The points in your pipeline where you keep your finger on the pulse, the stages that give you an indication of the state of information at a point in time, metrics should capture key risk and performance indicators at each stage.
Fractal: Anywhere you look in your pipeline, at any milestone, you should be able to zoom in and find another pipeline that feeds the original milestone. This enables a design pattern where different teams feed into each other.
We can forever argue whether these heuristics are good or bad, but the point of a heuristic is that it works as a collective and battle-tested mental shortcut our brains use to simplify complex situations and make quicker decisions (they can also lead to bias by the way, so always reflect later on your quick decision).
I am not sure if these rules of thumb above work for everyone else, they do for me. But my main problem has always been: how to visualize this pipeline? We are sensorial, and I don't know about you but I understand a topic much better when I can visualize it.
Is there a concept that can shape the mental model needed to even think of how a data-driven intel-to-impact pipeline should be represented? I've been searching for this for quite some time and until recently, my best option was a fishbone diagram style:
However, something was missing because different teams or functions have dependencies on each other, before even feeding the main pipe.
The graph would look weird when representing these dependencies along the timeline of the main pipe:
Finally, my inner DevOps gnome realised the answer was simpler than anticipated:Â you need to think of your data pipeline as a Git commit pipe and your team as a Repository.
Whoa, hold on! Git? Repositories? Is this getting too weird?
Don't worry, it's simpler than it sounds. The true power lies in how it mirrors the way a successful data pipeline should function in a data-centric organisation, like a well-oiled code-merging machine!
it doesn't matter whether you are an "operational" function or not...
Think of it this way: each team in your cyber organisation is like a branch in a Git repository. Your IR team has their branch, your Detection Engineering team has theirs, and so on. Each branch is working on its own BAU workload and features, gathering unique data, and generating valuable insights.
Now, imagine your main data pipeline as the "main" branch in your repository. Just like developers merge their code changes into the main branch to create a final product, each team "merges" their data and insights into the main data pipeline, at different stages of the pipeline.
The end goal? to create an impact on the opposite end of the pipe, where you influence your risk and control areas.
Now, think of your team or function as a "repository" – a central hub where all these changes are stored and shared. This means everyone is on the same page, working together towards a common goal.
You have different Merging points where a function or a team's output plugs into another team's input at a certain point in the pipeline, effectively merging their operative outcomes with another team's operative input, triggering a process on their side.
At different points in the pipeline, you can keep your finger on the pulse by capturing and reporting on some metrics that are meaningful to you. Each function or team can have their own and you can come up with merged stats for the main pipeline:
While Git's ability to rewind is cool, that's not the real magic when it comes to building a data-centric operations workflow. By conceptualising operations this way, we are not saying you can go back in time, but that your operational pipe becomes a DevOpsified version of itself.
Applying it to R1D3
Remember R1D3? Let's use that framework as an example of how we can apply this "Git" thinking to your operations model.
We could draw the main stages of R1D3 in the main pipeline, and then we can imagine how some teams or functions would contribute to it at different points in time.
The main things to notice in the above diagram are:
It's just a model, and as I always echo:Â all models are wrong, some models are useful
A function like Intel obtains real-time data from the DFIR & Monitoring function based on analysis of signals
Modelling done by Intel analysis feeds into your Discovery phase where people like your threat hunters could be working their magic
An imagined Vuln. Management function would feed their assessments of the system's landscape to your Discovery phase too
Your DFIR team will contribute and merge their efforts directly into the Disruption phase
Eventually, all teams converge at the Development stage where insights are translated into impact by materially improving your controls
Your pipeline works in terms of "Releases" where each release means that a certain amount of actions have been performed in your network to drive controls and risk either up or down
There is a feedback loop, whereby all the artefacts released by your pipeline can feed back into your functions
Now, imagine if you could deploy such a pipeline by adapting your existing ITSM ticketing tools or workflow automation suites. What if you could actually have a repository where all your operational data is stored as YAML files or similar?
I will leave that to your imagination ;)
Thanks for staying with me this long ;) I hope you enjoyed my musings