The Log Collection Paradox
How good data management applies to log collection.
I love data. I was a math geek growing up and turned my affinity for statistics into a career. Intuitively most of us know that data drives informed decision making leading to better business outcomes. That’s only if, however, you do a good job collecting, managing, and interpreting that data. Often times this doesn’t seem to be the case. Data management best practices apply to log collection, as that is essentially what it is. The caveat, though, is that not collecting certain logs can have dire consequences. There are plenty of tools, though, that can strengthen your security posture by drastically improving the way you collect and manage your logs.
When it comes to log collection, two approaches seem to dominate the marketplace. The first one is, “Whatever we have to do to steer clear of negative consequences – particularly auditors.” These people, of course, take the Minimalist approach, collecting as little as possible to make managing the whole system as easy as possible. And who can blame them? Unless you are passionate about data management, you probably have dozens of other priorities you’d rather spend your valuable time on. There is inherently a lot of risk in this approach as you will probably not have the information you need when the time comes. If you ride a motorcycle or know somebody who does you may have heard the common phrase “There are two types of riders, those who have been in an accident, and those who will be in an accident.” The wisdom being that an accident is inevitable and it’s best to be prepared, motorcycles are dangerous after all. This axiom is equally applicable to cybersecurity, because there are really only two types of business, those that have been breached, and those that will be breached. It’s one thing to pass an audit, it’s another beast entirely coping with a breach.
The other common approach I often see, when it comes to log collection, is “D. All the Above.” If you have unlimited resources, it’s an awfully attractive option. After all, why risk it. When push comes to shove, if you have all the data at your disposal, you know you have the answers in your archives somewhere. While I don’t totally take issue with this “Maximalist” approach, I think there are a lot of ways to enhance it. For starters, a small environment is generating at least 5 gigs in logs a day, which means around 125 plus logs per second, or 10.8 million logs per day. That is a lot of data and larger environments produce thousands of times more data than that. This means more overhead from hardware costs to SIEM costs. When you are being charged by your solution by units of data ingested into the system, ingesting everything not only costs your business unnecessarily, but for many organizations it makes their SIEM solutions prohibitively expensive. We see it every day. A medium size enterprise goes with a market leader in Security Analytics only to see their bill go from a couple hundred thousand a year to several million. You only have to mingle at the RSA conference story to hear frustration after frustration, and it doesn’t have to be this way.
There are so many ways to focus log collection, but I have three favorites:
- Log Truncation
- Forensic Storage
- Tiered Analytics
Log truncation is simple enough, Snare has a whole paper on it and how it works. What has always surprised me is the reluctance to do it. Log truncation is the removal of the superfluous text on windows event logs. Every windows event has a string of verbose text that has no forensic value. There is no need to collect it and it will only bog down your network and your storage. We’ve seen environments where upwards of 70% of log data was verbose text, and if cut from the log would have saved the organization considerable cost. Generating cost on data that your analytics tool will eventually ignore is quite simply, a waste of money.
Forensic storage is a trending topic in our industry now. Data volume is growing at an incredible rate and we need a lot of this data to piece together what happened in the unfortunate event of a breach. The problem is trying to detect a breach by always sifting through all that data all the time increases mean-time-to-detection (MTTD) which also increases mean-time-to-response (MTTR). That’s where forensic storage comes in. Forward on all critical event logs to your security analytics platform, while keeping everything else with even a shred of forensic value on separate servers. They are there if you need them, but they aren’t driving up hardware and software costs, and more importantly, aren’t bogging down your analytics platform or your incident response team. Every study on data management you see has data scientists reporting that they spend over 70% of their time on data preparation.1 That is wild. Highly skilled and highly paid employees, like data scientists, should not be spending up to 4/5ths of their time on menial tasks. That’s what happens when you inundate your systems with data though. Forensic storage is an easily implemented solution that not only improves security KPIs, but saves you money as well.
That brings us to Tiered Analytics. Tiered analytics, as the data dork that I am, is my favorite solution to data management, but it is also the most complex. While in theory companies of every size can take advantage, it gets increasingly important as your organization grows. A lot of companies do it to a degree already. When branch offices and/or individual departments have their own KPIs and dashboards while that same data is fed into executive level dashboards at corporate, that is Tiered Analytics. This approach helps your business get insight into that data your business generates at multiple levels, with varying degree of detail and varied perspectives.
SANS has a white paper written by Dace Shackleford that gives great examples from each tier. The dashboards and KPIs, for example, built for the C-Level executives would need to answer the following questions:
- What is our overall risk posture?
- What are our high-value targets?
- What are the risks if our high-value targets are compromised?
- What are the most cost-effective ways of reducing risks?
While IT management would need analytics to answer questions like:
- What is really happening on the network?
- Are any systems operating outside of policy?
- Based on current system workloads can anything be virtualized?
- What impact would an outage or downtime of a system have on the business?
- Can we decommission any assets?
Then, another tier down you have various monitoring and response teams looking to answer an almost indefinite number of questions requiring a number of purpose-built dashboards and potentially custom KPIs:
- Are website links in various emails from a known list of bad websites?
- Are their network assets that should be monitored more frequently?
- Are there changes to any host configuration settings or files closely tied to a website visit?
- What is the impact on other network assets if this one is compromised?
- Have there been any unusual example of port and protocol usage?
- Should we monitor some assets more frequently because of the amount of use they get?
- Is an employee who is supposed to be on vacation logged in at the office?
- Is their suspicious activity after a USB port was used?
These questions are obviously geared to reducing MTTD to detection and improving incidence response. Some are also aimed at understanding the impact of individual assets on the business.2 Several questions require pulling together disparate datasources, not just logs. Workplace management software, for example, can help you identify when an employee in Barbados on vacation is also somehow at their machine in the office. STIX can help you correlate activity on the network with malicious websites and IPs. Inundating your analytics tools with superfluous data from logs only makes it that much more tedious to bring in data from the rest of your business, which is already becoming an imperative. With a tiered analytics solution, you can even pick and choose which datasources to bring in where, giving your teams easily digestible data sets to analyze and report on increasing the efficiency of each business unit and drastically improving your security posture.
You have only to look at the most recent Verizon Data Breach Report (2018) to see how little progress we’ve made in uncovering breaches. It takes 68% of businesses months or more to uncover they’ve been breached.3 There is so much an organization can do to improve its posture that it can be daunting to even begin. The first step is better data management, make life easier for the people living and working in all that data. The second is to prioritize and work towards more sophisticated approaches action item by action item. Truncation is an easy first step, and forensic storage is a no-brainer. After that comes security analytics whose architecture will vary company to company, but whose implementation will be critical to improving both an organization’s security posture and the cost effectiveness of their security solutions. By improving the way we tackle today’s security challenges, we’ll be better equipped to meet tomorrow’s.
1 Cleaning Big Data: Most Time-consuming, Least Enjoyable Data Science Task, Survey Says: Gil Press – https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#44e4090d6f63
2 Shackleford, D. (2016, January). Using Analytics to Prevent Future Attacks and Breaches. Retrieved December 18, 2018.
3 Verizon. (2018, March). 2018 Data Breach Investigations Report. Retrieved November 27, 2018, from https://www.verizonenterprise.com/resources/reports/rp_DBIR_2018_Report_execsummary_en_xg.pdf