top of page

Impacts of Inappropriate Data Gathering


This research finds that ineffective or absent data collection processes can be destructive to an organization in three ways. First, the use of nonrepresentative or anecdotal sampling is a source of instability from an analytical perspective, and this injects uncertainty into the data itself, the associated analytical procedures, and the findings. Second, the failure to identify root causes instead of correlations and the inability to discern between root causes and symptoms cause confusion and waste resources through the implementation of ineffective solutions. Third, the inability or unwillingness to aggressively, or at least proactively, pursue new knowledge or understanding severely limits the potential quality of any decisions made. This research also finds that several statistical methods can be employed to mitigate the risks and damages associated with these failure modes. Design of Samples (DoS) or statistical sampling provides multiple advantages through the application of probabilities and reduces the uncertainty associated with nonrepresentative sampling. Design of Experiments (DoE) provides a method to discover and quantify the effects of root causes. Simulation allows analysts to pursue new knowledge and understanding through the deliberate generation of new but relevant and applicable data and the utilization of this data in mathematical models. This research also finds that the application of Christian values benefits the data collection process and the host organization as well. The instruction provided by God permeates both the Old and New Testaments. Specific Biblical guidance includes the avoidance of reliance on opinion (and bias), the pursuit and understanding of root causes, and the value of increased understanding and knowledge.

Impacts of Inappropriate Data Gathering

“The insights from data analysis are only as accurate as the data and the analysis behind them” (Bartlett, 2013, p. 50). Inappropriate (or absent) data collection can mislead and eventually destroy a business. The knowledge and replication of best data collection practices such as Design of Samples (DoS), Design of Experiments (DoE), and simulation provide some techniques to address this challenge, but Christian values provide the core guidance for avoiding specific pitfalls.

Bartlett (2013) describes six types of data collection within two categories (passive and active collection), including observational (passive), census taking (active), anecdotal or nonrepresentative sampling (active), Design of Samples (DoS) or statistical sampling (active), Design of Experiments (DoE) or experimentation (active), and simulation (active) (p. 198-200). Business leaders are called to “rely on a combination of data collection, descriptive analytics, predictive/explanatory modeling, and optimization” (Mehdizadeh et al., 2020). This rational approach outlines the analytical process wherein data collection (DoS and DoE) supports the analytics (DoE and simulation), and ultimately, analytical procedures are performed to understand performance history (descriptive statistics) or to predict future outcomes (predictive and prescriptive analytics).

This research explores the value of several of these methods, including DoS, DoE, and simulation. This research also considers the impact of the application of Christian values in this pursuit. The Holy Scripture warns its readers to avoid reliance on opinion (and bias), to pursue and understand root causes, and to work and pray for increased understanding and knowledge.

Design of Samples or Statistical Sampling

Statistical sampling “randomly selects elements from the population” (Bartlett, 2013, p. 199). This randomization “protects us from subjective bias” and “is the key ingredient that defines representative means of extracting information” (Bartlett, 2013, p. 202). This basic notion of a mathematically designated sampling method is not common in the customers we serve. The rule of thumb, albeit an illogical rule, that many corporations follow is to gather whatever information is readily available and easy to retrieve as quickly as possible. This is a form of anecdotal or non-representative sampling, a worst practice in the fields of data analytics and business management.

When comparing DoS to anecdotal/non-representative sampling, “the difference is that an anecdotal sample provides insight into the possible; a statistical sample is representative and provides insight into the probable” (Bartlett, 2013, p. 200). According to Bartlett (2013), the probabilities assigned to each element in a DoS allow analysts to better understand the probability distribution or the shape of the population under study (p. 200). DoS provides organizations with the ability to designate the appropriate risk level, e.g., 95 percent, and apply randomization to their collection process within a fixed sample (which is representative of the parent population). This allows leaders and practitioners to better understand the data that supports the analysis that in turn supports better decisions. After all, “the objective of data analysis is to provide facts that will support better judgment” (Bartlett, 2013, p. 50).

Despite this, some corporations continue to fail in this endeavor. “The most routine mistake is to depend on an anecdotal sample to be representative” (Bartlett, 2013, p. 200). This error may be caused by ignorance regarding statistical procedures but is often driven by the overreliance on opinion and the consequential injection of bias. The application of Christian values can overcome this challenge.

The Scriptures, including the Old and the New Testaments, confirm that different opinions abound amongst men. “One person esteems one day as better than another, while another esteems all days alike” (English Standard Version Bible, 2001, Romans 14:5). They also validate the importance of knowledge or understanding over these opinions. “A fool takes no pleasure in understanding, but only in expressing his opinion” (English Standard Version Bible, 2001, Proverbs 18:2). Industrial best practices align with these principles, reducing bias in statistical studies and discouraging the reliance on opinions embedded in organizational decision-making processes.

Design of Experiments and the Pursuit of Root Causes

“We predict the future in order to improve the present context” (Ahmed & Pathan, 2019, p. 312). Prediction is supported by regression through the application of confidence and prediction intervals, but prediction in this sense is not a causal investigation. It is a projection based on relationships or correlations, not a root cause reveal. Many of the corporations I have worked with are unaware of the difference between correlation and causation. Correlation is popularized in the field of regression and refers to the relationship(s) between multiple variables. Specifically, correlation is used to describe how a dependent variable changes as one or multiple independent variables change. This does not imply that a change in the independent variable(s) causes a change in the dependent variable, but rather, that the changes are related somehow (perhaps through a lurking or unknown variable).

For example, both flip flop (sandal) sales and the instances of shark attacks may rise during the summer season, but that does not mean that flip flop sales cause an increase in these attacks. Regression reveals relationships, but DoE, however, enables root cause analysis. “DoE is the most aggressive approach toward estimation, and it is the only way to solve the chicken-and-egg causal problems” (Bartlett, 2013, p. 199). This method does employ regressive analytics, but also utilizes “a controlled estimation of the relationship between a response (dependent) variable and other explanatory (independent) variables – called factors” (Bartlett, 2013, p. 199).

The confusion regarding the differences between regression and DoE extend into publications as well. “MSA and other tools can be used as sources to identify the sources of the problem, probably a potential problem” (Doshi & Desai, 2019). MSA is not an effective method for the discovery of the source (or cause) of any problems, but instead, a method for identifying and quantifying the symptoms of potential problems, e.g., different methods being utilized by different operators. The why in this instance remains unknown without the appropriate application of DoE, which in turn requires a sampling method different from DoS.

“DoE facilitates influencing the information collected,” “viewing possible landscapes create by intervention,” and by employing this technique, “we can infer cause-and-effect relationships” (Bartlett, 2013, p. 201). DoS does not include these types of influential efforts, and instead “randomly selects sampling units to measure a multidimensional landscape as it naturally appears” (Bartlett, 2013, p. 201). DoE requires manipulation to produce effective experiments, and this manipulation of the combinations of factors and associated levels permits researchers to identify root causes.