Category Archives: Big Data

Save The Planet With Machine Learning

I have a new car and I love it. To achieve better fuel efficiency, it tells me when to shift. Now I like to get to know my car, so I keep a close eye on how much fuel I use. The display can show me this in real-time. While driving home yesterday I noticed something odd.

When I drove 130 km/h, the car used the same amount of fuel when driving 100 km/h in the same gear (as suggested by the car). My assumption was that 100 km/h was too slow for that particular gear. I tested this assumption by shifting back a gear on the next 100 km/h stretch. Even though my car was telling me to shift to 6th gear, I found that in 5th gear the car used 0.3 l/100km less fuel. This morning I tried again, and found no difference between 5th and 6th gear. Apparently there are environmental factors (e.g wind, incline, engine temperature etc.) that influence which gear is most efficient. The algorithm in my car doesn’t take this into account. It just looks at speed and acceleration to determine the right gear.


We could try to make the algorithm smarter, but that is a flawed approach. The premise that we can create an algorithm upfront that makes the best calculation is fundamentally wrong. This is a perfect case for Microsoft Azure Machine Learning. Through learning it can figure out when to use which gear based on telemetry data. And not just for my car, but all the cars of the same model. There are approximately 1 billion cars in the world. Assuming these drive an average of 10,000 km a year, saving just 0.1 l/100km would save 1 trillion liters of fuel per year.

Book Review – Disruptive Possibilities: How Big Data Changes Everything

Even though this is just a little book (just 80 pages), Disruptive Possibilities: How Big Data Changes Everything (Jeffrey Needham, O’Reilly) is a very good and insightful read. Jeffrey Needham explains very well what Big Data is and how it differs from “traditional” computing. He effectively shows you need to approach Big Data differently, because the “old school” approach to data just doesn’t scale. In that sense he echo’s my view on the subject of data: not all data needs to be normalized and transactional and you can save a lot of effort and money (on expensive hardware and software) by picking the right requirements for types of data you are dealing with. For instance, you would need a pretty good reason to store a file in a RDBMS. Because most current IT staff has been brought up with in the RDMBS paradigm, it is often the tool of choice, without thinking about it. This book effectively breaks with that way of thinking, and I would encourage developers, architects, database administrators etc. to read this book to get a sense of perspective. It would greatly help us in not making the mistake of tackling “new world” problems with “old world” solutions. The book is very easy to read, with good examples, funny stories, and insightful comments. There’s also a side step into neuro science and the future of supercomputing, which is not only good to know, but interesting in itself.