Apr 02

Now, a few words on looking for things. When you go looking for something specific, your chances of finding it are very bad. Because of all the things in the world, you’re only looking for one of them. When you go looking for anything at all, your chances of finding it are very good. Because of all the things in the world, you’re sure to find some of them. – Daryl Zero

This sage advice from the greatest private detective in the world isn’t just applicable to figuring out who is blackmailing you; it’s also useful for general problem solving. We’ve had some great discussions lately with some people who have some really tough data problems. Now mind you these problems are all over the map. Some problems are completely different and some others overlap. One challenge in problem solving is imposing arbitrary restrictions on how a problem is solved. People sometimes latch onto a particular way of thinking and as my Uncle Olaf used to say “When all you have is a hammer, everything looks like a nail…”. Take the following data for example. The first image below with the single green line looks pretty simple. It looks like it is telling you all you need to know without a lot of complexity. The problem is that the next image with the blue line is the actual data that the first image was created from. After you look at both, it is clear that the view of the single straight line could potentially be misleading as the trend is actually going down at the end of the detailed graph.



What is far more useful, offers simplicity, and provides detail for validation is the following image that includes both types of data in relation to each other.


But this topic is really a lot bigger than this simple analysis. It is really about the value of detailed data. There is some very real criticism of large complex amounts of data illustrated by famed Clay Shirky in his keynote at the Web 2.0 Expo in 2008 called  “It’s Not Information Overload. It’s Filter Failure”. Shirky’s talk is about the massive drop in the production cost of content on the Internet. And he is absolutely correct. If you have the ability to create the right filters, you don’t need all the detail. The problem is that is one BIG IF.

Just In Case

Even though we don’t completely understand it we continue to find new and creative ways to simulate the human brain’s ability to filter at multiple levels; we’re just not there yet. So this is why detail data is very necessary. We can provide some pretty good methods to determine what you probably want to hear, see, and read but intelligent systems should always have a human seat belt to make those connections between what are seemingly unrelated events. As Isaac Asimov said “The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka” but ‘That’s funny…’”.

The only thing you can count on is…

Now let’s go ahead and give that maybe a person comes up with a Clever Idea™ that provides a pretty good answer for a sufficiently difficult question…or even a pretty simple one for that matter. They finally figured out based on tons and tons of data a fantastic equation to predict the future trends of that data. I mean, the data isn’t THAT complex. So there isn’t any need to look at all that detail right? That sounds really swell. But this is the wrong place for a Clever Idea™ since even if it is the best idea ever thought of it demands some very large assumptions. The assumption is ”the equation will always be right”. The assumption is “things will never change”. The assumption is “we won’t learn how to ask better questions”. The assumption is “Eh, 80% accuracy is good enough”. Since most of us pay taxes we know those first assumptions are not only wrong but potentially dangerous.

There is no crying in baseball

That last assumption is actually very deceiving. In the graphs above, there are only 40 data points that are summarized. If the data I’m looking at is the aggregate length of traffic delays over days it would lead me to believe I need to add some more road infrastructure to relieve congestion. But looking at that detail data may have caused me to ask some more questions, to which I might have figured out that the local baseball team won its first title in 86 years which caused that big spike for almost 3 weeks. After the win the 5 best players quit, the manager went to another team and the owner put his one-time-musician son in charge of the team. But since I didn’t look at the detail data to figure out what was going on I spent a lot of unnecessary money hiring a lot of contractors that went 35% over budget and took twice as long to finish the project. As a result I got fired and now I man a toll booth for the city. Lesson learned? => Data Sampling Leads to Bad Decisions.

Insert Magic Here

To recap, we can’t constantly try to understand an avalanche of data, but at some point we may need all that data, and we don’t want to burn all the forests to store it all, but we also don’t want to end up crying alone a toll booth. Now is the time for that Clever Idea™. There is more and more intellectual property being put into No Joke technology that significantly cuts down on the storage (think 10x reductions) costs of data. Being able to store a massive amount of data for a really long time and constantly keep refining sophisticated algorithms that analyze it giving us better and better answers over time sounds like what we want right? Well, that’s the goal. Big Data technology continues to evolve and improve over time.

Yes, Technology Can Help

My advice to those trying to solve tough data problems is to look at the state of the art in the industry. Today a telecom company is using sophisticated data analysis to provision more network speeds when a big baseball game is occurring in a few days and data scientists are figuring out the number of days an individual will spend in the hospital next year based off of past insurance claims by the individual. So no matter how small your data analysis problem there is probably some Crazy Brilliant™ technology already out there that can be used to solve your problem very well. Data compression technology, data crunching speeds, and algorithm sophistication will continue to grow in new and exciting ways that you can use to make life easier for yourself and your clients.

Aaron Bawcom is the Chief Technology Officer for Reflex Systems, a provider of end-to-end virtualization management solutions based out of Atlanta, GA. Contact him at abawcom@reflexsystems.com.

written by Aaron Bawcom \\ tags: , , ,

Leave a Reply