Lately, I have observed considerable numbers of announcements regarding big data for business intelligence as if this trend is novel. Even the Financial Times has published a Special Report on the Connected Business, with an emphasis on the analysis on big data.
Years ago while at middle school, I read Isaac Asimov’s Foundation series, first published in 1942, where he discussed psychohistory in which mathematical modelling of quadrillions of humans predicted their future behavior. That is when I was first exposed to the concept of “big data” – large numbers provide enough data to predictive behavior. In other words, the more discreet information you have on a record and the size of the database, the more useful is that information. And that theory still applies today – the more data you accumulate regarding a population, the more you can determine some rational behavior when sifting through the data.
But the key to accumulating data from my experience has been the ability to identify the metadata needed as well as the granularization of the information attached to each record. Otherwise, it becomes a challenge to find the right record(s) and the information retrieved might contain holes that makes the statistical analysis meaningless. In other words, the famous tenet applies – GIGO – Garbage In, Garbage Out.
In setting up database systems, I always encountered certain challenges in which GIGO would surface. In one Petabyte system, I saw substantial hurdles to accumulate multimedia records, being able to identify the content, and retrieve the appropriate records. One DoD potential client had been recording news broadcasts throughout the World – from English to Japanese to Arabic. The challenge was that these video downloads had no metadata attached and the system needed to identify a record , as an example, Ms. Clinton’s appearance at a specific children’s hospital in Zimbabwe. Part of the file is a voice recording, and that recording could be in Zimbabwean language and idiom. To make this system work, I had to find a language translation system to convert Zimbabwe into English, and another software application to cut and paste the right metadata for that particular file. The metadata had to include source, date, location, and brief description of the content. Video files also fill up quickly storage systems. And the information had to be parsed into pieces to find that particular file.
Another system handling financial transactions had been developed with considerable details, but the calculations became problematic. The application handled billions of dollars of transaction. The financial instruments were being sold to other banks, and it had to have enough metadata to follow the transaction from origination to sale, and maintain revenues derived in basis points. The same database design had to be printed on a weekly and monthly basis, and constantly calculate the cash flows. In other words, if the data had not been discrete enough, the system could not achieve its objective of publishing the monthly reports and calculate the profitability – all in billions of dollars.
Any database design must incorporate the basic information for each record, but it also include incremental data points that might serve as a tool for marketing. For example, I recently met the CEO of a rug cleaning business. He does keep a record of his customers, but I suggested that having some details on the customers – pricing, location, service requested – his database can hone in on the profiles of customers per city or region, could later offer discount prices where sales are weaker, follow up at some future date, or identify the trend of what type of service is being requested in order to isolate customer trends. And the more discrete data he accumulates for each transaction, the more statistics he could use to increase sales in the same vein as the psychohistory equation proposed by Asimov.
Every business should consider the potential for the many applications to accumulate data for every daily transaction and being able to sift through this data and identify the right analysis and trends. There are so many off-the-shelf software that can help.