Abstract
1. Introduction
2. Motivating example
3. Modeling
4. Cases
5. Conclusion
Acknowledgments
References
Abstract
Data envelopment analysis (DEA) is a technique for identifying the best practices of a given set of decision-making units (DMUs) whose performance is categorized by multiple performance metrics that are classified as inputs and outputs. Although DEA is regarded as non-parametric, the sample size can be an issue of great importance in determining the efficiency scores for the evaluated units, empirically, when the use of too many inputs and outputs may result in a significant number of DMUs being rated as efficient. In the DEA literature, empirical rules have been established to avoid too many DMUs being rated as efficient. These empirical thresholds relate the number of variables with the number of observations. When the number of DMUs is below the empirical threshold levels, the discriminatory power among the DMUs may weaken, which leads to the data set not being suitable to apply traditional DEA models. In the literature, the lack of discrimination is often referred to as the “curse of dimensionality”. To overcome this drawback, we provide a simple approach to increase the discriminatory power between efficient and inefficient DMUs using the well-known pure DEA model, which considers either inputs only or outputs only. Three real cases, namely printed circuit boards, Greek banks, and quality of life in Fortune’s best cities, have been discussed to illustrate the proposed approach.
Introduction
Data envelopment analysis (DEA) is an excellent management science tool that measures the relative performance of a set of entities or decision-making units (DMUs) with multiple performance measures that are classified as inputs and outputs. Nevertheless, problems of discrimination between efficient and inefficient DMUs often arise when there is a relatively large number of performance measures (variables) when compared to the number of DMUs; this may lead to efficient units being incorrectly classified as inefficient and inefficient units being misclassified as efficient. As Adler and Yazhemsky (2010, p. 283) showed, “the latter occurs particularly frequently with small data sets under the assumption of variable returns-to-scale”. In the literature, the lack of discrimination is often referred to as the “curse of dimensionality” (e.g., Adler & Golany, 2007; Daraio & Simar, 2007). The lack of discriminating power has important implications, as in practice it can limit the managerial insights that can be drawn (Ghasemi, Ignatius, & Rezaee, 2019). In this sense, regarding the number of DMUs (sample size), it is quite clear that there are advantages to having larger data sets, as at a given number of DMUs, the efficiency score of each DMU can rely heavily on the number of variables (inputs and outputs) (Cinca & Molinero, 2004) – as such, the greater the number of variables, the less discerning the DEA analysis is (Jenkins & Anderson, 2003). Nevertheless, the literature indicates some empirical rules regarding the number of DMUs versus the number of inputs and outputs.