Let’s explore Big Data from different perspectives over time, not between organizations today. The evolution of the definition should paint an interesting picture of the accessibility (proliferation) and application of Big Data.
According to Gil Press, a contributor to Forbes, “The first documented use of the term “big data” appeared in a 1997 paper by scientists at NASA, describing the problem they had with visualization (i.e. computer graphics) which ‘provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.’”1 This description focuses on the processing of similar data (data sets) and the difficulties translating the resulting information into graphic format. This makes sense for the time period and organization in which it was introduced.
NASA data sets would be scientific in nature, and most likely organized as standalone artifacts or products (i.e., results of a design of experiments for rocket fuel mixture optimization in a matrix or data frame including factor levels and results). This also refers to data sets as a singular type of product with limited complexity (“data sets”), whereas now data can be information in many different forms and formats, including text/numeric, photograph and video, sound, etc. The internet was in its adolescent phase at best in the mid-to-late 90s, thus “remote disk” interconnectivity must have presented numerous challenges as well including speed and quality. This first definition is focused on size and visualization.
Big Data is also defined as “data that cannot be stored in a single storage unit. Big Data refers to data that is arriving in many different forms, be they structured, unstructured, or in a stream.”1 This is an acceptable definition in that information gathering systems (including sensors and other input mechanisms) seem to innovate and expand as offerings and an industry each year, if not each month. I have witnessed this as air freight and manufacturing clients evolved from the 1980s through today. Grainger (formerly W.W. Grainger) is a nationwide industrial supply company I worked with in the 80s and 90s. The original information systems they employed were manual – forms and file cabinets. In the late 80s, they updated to an elementary Material Requirements Planning (MRP) system at each location to balance inventory levels with demand/consumption. As we pressed into the mid-90s, bar coding and hand-held scanners were included, and a regional system was created to coordinate the inventories more accurately and across multiple locations. Now, a web-based system is employed that includes store, district, area, region, and national data integrating logistic and customer information (including satisfaction and rating scores). This includes customizable visual reporting (dashboards) and near-real time updates. Some areas have employed radio-frequency identification (RFID) to track inventory movements within locations, and delivery vehicles are monitored (speed, location, sudden stops/accident indicators, etc.). In this case, increased data is an indicator of increased potential capabilities. “Potential” becomes “actual” when the data is translated into actionable information, as it is in this specific case.
The Gartner, Inc. IT Glossary describes Big Data as “high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”2 This explanation is the most comprehensive of the three. While early definitions focused on size and structure (big and diverse), Gartner includes the aspects of velocity, processing, decision making (driven by insight or analytics), and automation.
Using the example above (Grainger micro case study), the number and frequency of transactions relayed from the RFID units (each product – over 1.5 million types) and the delivery vehicles to the data warehouse are tremendous, and they occur in near-real time. As this data is pouring in, it is applied through analytics and presented through dashboards to enable employees in the warehouse and executives in the boardroom to make decisions that will positively impact the company. Now, consider that this information is also connected with the Salesforce.com Customer Relationship Management (CRM) system, and we might have to consider the term ‘Huge Data’.
Big Data is not a new idea or even a new term (as mentioned earlier in this document, it was first coined in 1997), but it does and will evolve. The concept of dealing with massive amounts of diverse information arriving at a rapid rate is a component of the human condition, but it must be framed by context. For example, in the Bible we can read the Account of Joseph in Egypt (popularized by the Andrew Llyod Webber’s musical production, “Joseph and the Amazing Technicolor Dreamcoat” (1970)). Pharaoh recognized Joseph’s talent for strategic and operations management and positioned him as the COO of Egypt (Genesis 41:41-46). Utilizing historical harvest, weather, water management (the Nile), and other information Joseph was able to calculate and plan a food reserve system that allowed the kingdom to not only survive a severe drought, but to prosper through the sale of some reserves to neighboring nations. While information technology was primitive at this point, the volume, velocity, and variety of data or information received, processed, and applied was substantial for that period.
Fast forward to the present. The volume, velocity, and variety has increased in synch with technological advances. From clay tablets and papyrus to nodes and cloud computing, as our capabilities advance, Big Data evolves. Much like data and data utilization, the definition of Big Data will continue to evolve along with its uses. It's a concept that has a great complexity by nature and is far beyond that of any tangible item or exercise, and the most important aspect today is the effective and efficient application of its capabilities – not a standardized description.
References:
1. Press, Gil. Forbes, 12 Big Data Definitions: What's Yours? Retrieved from http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#25b5ff4821a9
2. Sharda, R., Delen, D., and Turban, E. (2015). Business Intelligence and Analytics: Systems for Decision Support. Boston: Pearson.
Comments