خلاصه
1. مقدمه
2. رویکرد یادگیری ماشینی
3. کدهای صنعت
4. داده ها
5. نتایج
6. تجزیه و تحلیل تکمیلی: مورد مدرسه بازرگانی هاروارد
7. بحث
ضمیمه.
منابع
Abstract
1. Introduction
2. Machine learning approach
3. Industry codes
4. Data
5. Results
6. Supplementary analysis: Harvard Business School case
7. Discussion
Appendix.
References
چکیده
هدف و سهم اصلی این مطالعه ترسیم و نشان دادن سودمندی یک رویکرد یادگیری ماشین برای پرداختن به مشکلات تحقیقاتی مبتنی بر پیشبینی در تحقیقات حسابداری، و مقایسه این رویکرد با رویکرد مبتنی بر توضیح مرسومتر و آشنا برای اکثر محققان حسابداری است. برای نشان دادن این رویکرد، این مطالعه از یادگیری ماشینی برای پیشبینی بخش صنعت یک شرکت با استفاده از دادههای صورتهای مالی در دسترس عموم شرکت استفاده میکند. نتایج نشان میدهد که یک الگوریتم میتواند یک بخش صنعت را فقط با این دادهها با درجه بالایی از دقت پیشبینی کند، بهویژه اگر یک طبقهبندی غیرخطی به جای طبقهبندیکننده خطی استفاده شود. علاوه بر این، الگوریتمها توانستند یک تمرین جفتسازی صنعت-شرکت برگرفته از کتابهای درسی مقدماتی حسابداری و موارد MBA را انجام دهند، با پاسخهای پیشبینیشده که دقت بالایی در اجرای این تمرین نشان میدهد. این مطالعه نشان میدهد که چگونه رویکردها و الگوریتمهای یادگیری ماشین میتوانند برای طیف وسیعی از حوزههای حسابداری ارزشمند باشند، جایی که پیشبینی به جای توضیح متغیر وابسته، حوزه اصلی نگرانی است.
توجه! این متن ترجمه ماشینی بوده و توسط مترجمین ای ترجمه، ترجمه نشده است.
Abstract
The main aim and contribution of this study is to outline and demonstrate the usefulness of a machine learning approach to address prediction-based research problems in accounting research, and to contrast this approach with a more conventional explanation-based approach familiar to most accounting scholars. To illustrate the approach, the study applies machine learning to predict a firm's industry sector using the firm's publicly available financial statement data. The results show that an algorithm can predict an industry sector with just this data to a high degree of accuracy, especially if a non-linear classifier is used instead of a linear classifier. Additionally, the algorithms were able to carry out an industry-firm pairing exercise taken from introductory accounting text books and MBA cases, with predicted answers showing a high degree of accuracy in carrying out this exercise. The study shows how machine learning approaches and algorithms can be valuable to a range of accounting domains where prediction rather than explanation of the dependent variable is the main area of concern.
Introduction
The main aim and contribution of this study is to outline and demonstrate the usefulness of a machine learning approach to address specific research problems in accounting research, and to contrast this approach with a more conventional explanation-based approach familiar to most accounting scholars. To illustrate the approach, the study sets out to predict a firm’s industry sector, as specified by the North American Industry Classification System (NAICS), using the firm’s publicly available financial statement data. The results show that an algorithm can predict an industry sector with just this data to a high degree of accuracy, especially if a non-linear classifier is used instead of a linear classifier.
The main difference between a machine learning approach and a conventional approach is that a machine learning approach is prediction-orientated whereas the conventional approach is explanation-orientated. In other words, a machine learning approach focuses primarily on the out-of-sample prediction of the dependent variable rather than the explanation of the dependent variable within-sample (Bao, Ke, Li, Yu, & Zhang, 2020). Prediction is not necessarily the same as explanation (Shmueli, 2010), and the machine learning approach is of value to a range of applications where prediction of a dependent variable is the main, and perhaps only, concern. Such applications are common in business and economics research (Kleinberg, Ludwig, Mullainathan, Nber, & Obermeyer, 2015). The measurement of success in prediction-orientated approaches is out-of-sample prediction accuracy rather than within-sample significance levels (p-values), and the theoretical specification of the conceptual model is, to a degree, determined by the algorithm, rather than a priori by the researcher.
Results
The median values of the features for the target NAICS codes are displayed in Table 3. These descriptives are of interest because they provide insight into the potential predictive value of the feature. For example, some features are zero or near-zero for most firms: A6 Investments, L3 Unearned Revenue, and E2: Preferred Stock. It is unlikely these features will provide much predictive value in separating out firms into industry sectors.
Correlations between the features are not tabulated for brevity, but it is apparent that some features are, virtually by definition, strongly correlated. For example, L5 Total Long-term Debt and R12 Long Term Debt/Capital. Given their high correlation, these features could serve as potential candidates for exclusion in future analysis.
atures could serve as potential candidates for exclusion in future analysis. It is common in accounting literature to measure the performance of classifiers using the Receiver Operating Curve (ROC) (see e.g. Jackson & Wood, 2013). However it is not possible to do this here because the ROC is confined to binary classifiers, and is not appropriate for multi-label classification. Instead, we follow the approach common in the machine learning literature (Geron, 2019), and work with confusion matrices, precision, recall, and the F1 score.