Jim Bowie
- Aug 19, 2022
- 9 min read

Impacts of Inappropriate Data Gathering

Abstract

This research finds that ineffective or absent data collection processes can be destructive to an organization in three ways. First, the use of nonrepresentative or anecdotal sampling is a source of instability from an analytical perspective, and this injects uncertainty into the data itself, the associated analytical procedures, and the findings. Second, the failure to identify root causes instead of correlations and the inability to discern between root causes and symptoms cause confusion and waste resources through the implementation of ineffective solutions. Third, the inability or unwillingness to aggressively, or at least proactively, pursue new knowledge or understanding severely limits the potential quality of any decisions made. This research also finds that several statistical methods can be employed to mitigate the risks and damages associated with these failure modes. Design of Samples (DoS) or statistical sampling provides multiple advantages through the application of probabilities and reduces the uncertainty associated with nonrepresentative sampling. Design of Experiments (DoE) provides a method to discover and quantify the effects of root causes. Simulation allows analysts to pursue new knowledge and understanding through the deliberate generation of new but relevant and applicable data and the utilization of this data in mathematical models. This research also finds that the application of Christian values benefits the data collection process and the host organization as well. The instruction provided by God permeates both the Old and New Testaments. Specific Biblical guidance includes the avoidance of reliance on opinion (and bias), the pursuit and understanding of root causes, and the value of increased understanding and knowledge.

“The insights from data analysis are only as accurate as the data and the analysis behind them” (Bartlett, 2013, p. 50). Inappropriate (or absent) data collection can mislead and eventually destroy a business. The knowledge and replication of best data collection practices such as Design of Samples (DoS), Design of Experiments (DoE), and simulation provide some techniques to address this challenge, but Christian values provide the core guidance for avoiding specific pitfalls.

Bartlett (2013) describes six types of data collection within two categories (passive and active collection), including observational (passive), census taking (active), anecdotal or nonrepresentative sampling (active), Design of Samples (DoS) or statistical sampling (active), Design of Experiments (DoE) or experimentation (active), and simulation (active) (p. 198-200). Business leaders are called to “rely on a combination of data collection, descriptive analytics, predictive/explanatory modeling, and optimization” (Mehdizadeh et al., 2020). This rational approach outlines the analytical process wherein data collection (DoS and DoE) supports the analytics (DoE and simulation), and ultimately, analytical procedures are performed to understand performance history (descriptive statistics) or to predict future outcomes (predictive and prescriptive analytics).

This research explores the value of several of these methods, including DoS, DoE, and simulation. This research also considers the impact of the application of Christian values in this pursuit. The Holy Scripture warns its readers to avoid reliance on opinion (and bias), to pursue and understand root causes, and to work and pray for increased understanding and knowledge.

Design of Samples or Statistical Sampling

Statistical sampling “randomly selects elements from the population” (Bartlett, 2013, p. 199). This randomization “protects us from subjective bias” and “is the key ingredient that defines representative means of extracting information” (Bartlett, 2013, p. 202). This basic notion of a mathematically designated sampling method is not common in the customers we serve. The rule of thumb, albeit an illogical rule, that many corporations follow is to gather whatever information is readily available and easy to retrieve as quickly as possible. This is a form of anecdotal or non-representative sampling, a worst practice in the fields of data analytics and business management.

When comparing DoS to anecdotal/non-representative sampling, “the difference is that an anecdotal sample provides insight into the possible; a statistical sample is representative and provides insight into the probable” (Bartlett, 2013, p. 200). According to Bartlett (2013), the probabilities assigned to each element in a DoS allow analysts to better understand the probability distribution or the shape of the population under study (p. 200). DoS provides organizations with the ability to designate the appropriate risk level, e.g., 95 percent, and apply randomization to their collection process within a fixed sample (which is representative of the parent population). This allows leaders and practitioners to better understand the data that supports the analysis that in turn supports better decisions. After all, “the objective of data analysis is to provide facts that will support better judgment” (Bartlett, 2013, p. 50).

Despite this, some corporations continue to fail in this endeavor. “The most routine mistake is to depend on an anecdotal sample to be representative” (Bartlett, 2013, p. 200). This error may be caused by ignorance regarding statistical procedures but is often driven by the overreliance on opinion and the consequential injection of bias. The application of Christian values can overcome this challenge.

The Scriptures, including the Old and the New Testaments, confirm that different opinions abound amongst men. “One person esteems one day as better than another, while another esteems all days alike” (English Standard Version Bible, 2001, Romans 14:5). They also validate the importance of knowledge or understanding over these opinions. “A fool takes no pleasure in understanding, but only in expressing his opinion” (English Standard Version Bible, 2001, Proverbs 18:2). Industrial best practices align with these principles, reducing bias in statistical studies and discouraging the reliance on opinions embedded in organizational decision-making processes.

Design of Experiments and the Pursuit of Root Causes

“We predict the future in order to improve the present context” (Ahmed & Pathan, 2019, p. 312). Prediction is supported by regression through the application of confidence and prediction intervals, but prediction in this sense is not a causal investigation. It is a projection based on relationships or correlations, not a root cause reveal. Many of the corporations I have worked with are unaware of the difference between correlation and causation. Correlation is popularized in the field of regression and refers to the relationship(s) between multiple variables. Specifically, correlation is used to describe how a dependent variable changes as one or multiple independent variables change. This does not imply that a change in the independent variable(s) causes a change in the dependent variable, but rather, that the changes are related somehow (perhaps through a lurking or unknown variable).

For example, both flip flop (sandal) sales and the instances of shark attacks may rise during the summer season, but that does not mean that flip flop sales cause an increase in these attacks. Regression reveals relationships, but DoE, however, enables root cause analysis. “DoE is the most aggressive approach toward estimation, and it is the only way to solve the chicken-and-egg causal problems” (Bartlett, 2013, p. 199). This method does employ regressive analytics, but also utilizes “a controlled estimation of the relationship between a response (dependent) variable and other explanatory (independent) variables – called factors” (Bartlett, 2013, p. 199).

The confusion regarding the differences between regression and DoE extend into publications as well. “MSA and other tools can be used as sources to identify the sources of the problem, probably a potential problem” (Doshi & Desai, 2019). MSA is not an effective method for the discovery of the source (or cause) of any problems, but instead, a method for identifying and quantifying the symptoms of potential problems, e.g., different methods being utilized by different operators. The why in this instance remains unknown without the appropriate application of DoE, which in turn requires a sampling method different from DoS.

“DoE facilitates influencing the information collected,” “viewing possible landscapes create by intervention,” and by employing this technique, “we can infer cause-and-effect relationships” (Bartlett, 2013, p. 201). DoS does not include these types of influential efforts, and instead “randomly selects sampling units to measure a multidimensional landscape as it naturally appears” (Bartlett, 2013, p. 201). DoE requires manipulation to produce effective experiments, and this manipulation of the combinations of factors and associated levels permits researchers to identify root causes.

Identifying these causal roots is supported in Christian Scripture as well, including the causes of evil (1 Tim 6:10), death and salvation (Rom 6:23), and depression and happiness (Prov 12:25) (English Standard Version Bible, 2001). “For the love of money is a root of all kinds of evils. It is through this craving that some have wandered away from the faith and pierced themselves with many pangs” (English Standard Version Bible, 2001, 1 Timothy 6:10). In this passage, Christians are taught to avoid the love of money (the root cause) in order to avoid all kinds of evils (or pangs). Christian leaders are also aware of the cause of spiritual death, and at the other end of the spectrum, the root cause of eternal life. “For the wages of sin is death, but the free gift of God is eternal life in Christ Jesus our Lord” (English Standard Version Bible, 2001, Romans 6:23). A third example is found in Proverbs. “Anxiety in a man’s heart weighs him down, but a good word makes him glad” (English Standard Version Bible, 2001, Proverbs 12:25). Anxiety causes depression while a good word causes joy.

The application of these Christian principles reinforces the notion that discovering the root cause for any problem (or success - there are positive symptoms that warrant root cause analysis, too) is critical for the identification and selection of the right solution and this supports better decisions. Additional statistical tools are available as well, including simulation. In many ways, simulation relates directly to DoE. In fact, “we can regard DoE as a simulation tool” (Bartlett, 2013, p. 199). However, additional simulation methods abound and are helpful in the collection and generation of data to support effective business analytics.

Simulation

“Simulation leverages a model that simulates a business situation we are trying to understand to generate outcomes data” (Bartlett, 2013, p. 199). If the data for certain population parameters is collected and understood, additional process and performance metrics can be generated or simulated (and collected) by analysts to recreate (or create) different business scenarios to increase organizational understanding or knowledge. The core simulation process is not complicated and “is typically performed in two steps: first, one samples (pseudo) random numbers; in the second step an algorithm transforms these random numbers into simulated physical events” (Otten et al., 2021).

While big data analytics provides new simulation tools and techniques, simulation itself is not new, but it is not widespread in its application. “This is a well-understood and under-applied approach,” perhaps because “it can be extremely challenging to build a relevant model” (Bartlett, 2013, p. 199). This message rings true throughout the history of statistics, especially from George Box, a recognized significant contributor to the discipline. He stated, “all models are wrong, but some are useful” (Camacho et al., 2020). Simulations provide practitioners with the capability to gain knowledge and wisdom beyond historical data. This capability supports the Scriptural pursuit of knowledge and wisdom.

Christian values encourage Christ’s followers to cherish and seek out increased knowledge and wisdom, two key benefits of improved data collection, or in this case, data generation. We are instructed to “turn (our) heart(s) to know and to search out and to seek wisdom and the scheme of things” (English Standard Version Bible, 2001, Ecclesiastes 7:25). This seeking, this pursuit, leads to a positive form of change or transformation in the decision-making arena - improved discernment. “Be transformed by the renewal of your mind, that by testing you may discern what is the will of God” (English Standard Version Bible, 2001, Romans 12:2). Simulation assists Christian leaders in this endeavor and improves data collection by enabling introduction of new, mathematically generated data to support advanced predictive and prescriptive analytics.

Conclusion

Inappropriate (or absent) data collection represents a critical risk for any organization. This risk may be mitigated in two ways. The first is technical and involves the replication of best data collection practices including DoS, DoE, and simulation. The second is foundational, and for Christian leaders, more important. Christian values are cited throughout the Old and New Testaments and provide instruction regarding the fallacy of reliance on opinion (and bias), the value in understanding root causes, and the importance of increased understanding and knowledge. The combination of these two risk mitigation solution sets provides a distinct advantage for Christian leaders and affords these individuals the opportunity to value and refine the source(s) of their information and the techniques they utilize to harvest its value.

References

Ahmed, M., & Pathan, A. (2019). Data analytics. CRC Press. https://doi.org/10.1201/9780429446177

Bartlett, R. (2013). A practitioner's guide to business analytics: using data analysis tools to improve your organization’s decision making and strategy. McGraw-Hill.

Camacho, J., Smilde, A. K., Saccenti, E., & Westerhuis, J. A. (2020). All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance. Chemometrics and Intelligent Laboratory Systems, 196, 103907. https://doi-org.ezproxy.liberty.edu/10.1016/j.chemolab.2019.103907

Doshi, J. A., & Desai, D. A. (2019). Measurement system analysis for continuous quality improvement in automobile SMEs: multiple case study. Total Quality Management & Business Excellence, 30(5-6), 626-640. https://doi.org/10.1080/14783363.2017.1324289

English Standard Version Bible. (2001). ESV Online. https://esv.literalword.com/

Mehdizadeh, A., Cai, M., Hu, Q., Alamdar Yazdi, M. A., Mohabbati-Kalejahi, N., Vinel, A., Rigdon, S. E., Davis, K. C., & Megahed, F. M. (2020). A review of data analytic applications in road traffic safety. Part 1: descriptive and predictive modeling. Sensors (Basel, Switzerland), 20(4), 1107. https://doi.org/10.3390/s20041107

Otten, S., Caron, S., de Swart, W., van Beekveld, M., Hendriks, L., van Leeuwen, C., Podareanu, D., Ruiz de Austri, R., & Verheyen, R. (2021). Event generation and statistical sampling for physics with deep generative models and a density information buffer. Nature communications, 12(1), 2985. https://doi.org/10.1038/s41467-021-22616-z

ROXTAR
CONSULTING

Impacts of Inappropriate Data Gathering

Recent Posts

ROXTAR CONSULTING

Impacts of Inappropriate Data Gathering

Recent Posts

ROXTAR
CONSULTING