Using an easily implementable methodology for identifying potential data errors, I identify and correct cases where Compustat miscodes its auditor variable. In this paper, I present the methodology and provide SAS code that implements the methodology, enabling researchers to easily identify and correct auditor variable miscodings. Further, I provide a list of corrections for a sample of Compustat firms from 2001 to 2014. Auditor variable miscodings have implications for both audit-specific research as well as general capital markets research. I find that some of the miscodings arise from the fact that, following an auditor change, the previous auditor's report remains in a firm's 10-K, and Compustat occasionally codes the previous auditor as the current auditor. Aside from identifying and correcting miscodings, I also find that a non-zero number of firms change to a new auditor and then, after only one year with the new auditor, switch back to the prior auditor.
Compustat serves as a major tool in performing archival research in accounting and finance. While most research uses Compustat as is, recent research identifies limitations of the Compustat data (e.g., Boritz & No, 2013; Casey, Gao, Kirschenheiter, Li, & Pandit, 2016; Chychyla & Kogan, 2014; Heitzman & Lester, 2018; Keil, 2017; Mills, Newberry, & Novack, 2003). For example, Mills et al. (2003) note that Compustat sometimes miscodes net operating loss carryforwards (NOLs) as zero or missing when a disclosed value exists. Casey et al. (2016) establish an overall process for filling in missing Compustat values with an appropriate value, calculated from other information, or with zeros when appropriate. While Casey et al. (2016) note that Compustat goes through an extensive data validation process, this existing research shows that miscoding occasionally occurs in the Compustat data. The potential for miscoding is likely highest for information that only appears in footnotes rather than on the face of the financial statements (e.g., the NOL disclosures identified by Mills et al., 2003). The auditor variable (Compustat variable AU) is one such variable subject to potential miscoding in Compustat. The auditor variable is important in a wide range of capital markets research, but especially in audit research focused on the Big N/non-Big N distinction (e.g., DeFond, Erkens, & Zhang, 2017; Lawrence, Minutti-Meza, & Zhang, 2011), industry specialization (e.g., Gaver & Utke, 2018; Minutti-Meza, 2013), auditor tenure (e.g., Gul, Fung, & Jaggi, 2009; Myers, Myers, & Omer, 2003), auditor changes (e.g., DeFond & Subramanyam, 1998), and related areas. While some recent audit studies rely on Audit Analytics instead of Compustat, Compustat remains heavily used in both general capital markets research and audit research. For example, of the audit studies cited above, only DeFond et al. (2017) use Audit Analytics as their main data source. Further, Audit Analytics only covers years after 2000, making Compustat necessary for studies of earlier time periods, which continue to be conducted (e.g., Choi, Kim, & Raman, 2017; Jiang, Wang, & Wang, 2018; Kraft, Vashishtha, & Venkatachalam, 2018).