Abstract
Public awareness of and concerns about companies’ social and environmental impacts have seen a marked increase over recent decades. In parallel, the quantity of relevant information has increased, as states pass laws requiring certain forms of reporting, researchers investigate companies’ performance, and companies themselves seek to gain a competitive advantage by being seen to operate fairly and transparently. However, this information is typically dispersed and non-standardized, making it complicated to collect and analyze. To address this challenge, the WikiRate platform aims to collect this information and store it in a standardized format within a centralized public repository, making it much more amenable to analysis. In the context of WikiRate, this article introduces easIE, an easy-to-use information extraction (IE) framework that leverages general Web IE principles for building datasets with environmental, social, and governance information from the Web. To demonstrate the flexibility and value of easIE, we built a large-scale corporate social responsibility database comprising 654,491 metrics related to 49,009 companies spending less than 16 hours for data engineering, collection, and indexing. Finally, a data collection exercise involving 12 subjects was performed to showcase the ease of use of the developed framework.
- Avshalom Madhala Adam and Tal Shavit. 2008. How can a ratings-based method for assessing corporate social responsibility (CSR) provide an incentive to firms excluded from socially responsible investment indices to invest in CSR? Journal of Business Ethics 82, 4, 899--905.Google ScholarCross Ref
- Tobias Anton. 2005. XPath-wrapper induction by generalizing tree traversal patterns. In Proceedings of Lernen, Wissensentdeckung and Adaptivitt (LWA’05). 126--133.Google Scholar
- Ahmed Belkaoui and Philip G. Karpik. 1989. Determinants of the corporate decision to disclose social information. Accounting, Auditing and Accountability Journal 2, 1, 1--16.Google ScholarCross Ref
- J. H. Bragdon Jr. and J. A. Marlin. 1972. ls pollution profitable? Risk Management 19, 9--18.Google Scholar
- Peter J. Carrington, John Scott, and Stanley Wasserman. 2005. Models and Methods in Social Network Analysis. Vol. 28. Cambridge University Press, Cambridge, MA.Google Scholar
- Danny Coward and Yutaka Yoshida. 2003. Java Servlet Specification Version 2.3. Sun Microsystems.Google Scholar
- Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. 2001. RoadRunner: Towards automatic data extraction from large Web sites. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB’01), Vol. 1. 109--118. Google ScholarDigital Library
- Nilesh Dalvi, Ravi Kumar, and Mohamed Soliman. 2011. Automatic wrappers for large scale Web extraction. Proceedings of the VLDB Endowment 4, 4, 219--230. Google ScholarDigital Library
- Islam Elshahat, Clark Wheatley, and Ahmed Elshahat. 2015. Is pollution profitable? A cross-sectional study. Academy of Accounting and Financial Studies Journal 19, 2, 59.Google Scholar
- Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. 2014. Web data extraction, applications and techniques: A survey. Knowledge-Based Systems 70, 301--323. Google ScholarDigital Library
- Andy Field. 2013. Discovering Statistics Using IBM SPSS Statistics. Sage. Google ScholarDigital Library
- Anna Lisa Gentile, Ziqi Zhang, Isabelle Augenstein, and Fabio Ciravegna. 2013. Unsupervised wrapper induction using linked data. In Proceedings of the 7th International Conference on Knowledge Capture. ACM, New York, NY, 41--48. Google ScholarDigital Library
- Vasiliki Gkatziaki, Symeon Papadopoulos, Sotiris Diplaris, and Yiannis Kompatsiaris. 2017. Large-scale open corporate data collection and analysis as an enabler of corporate social responsibility research. In Proceedings of the International Conference on Internet Science.Google ScholarDigital Library
- Chun-Nan Hsu and Ming-Tzung Dung. 1998. Generating finite-state transducers for semi-structured data extraction from the Web. Information Systems 23, 8, 521--538. Google ScholarCross Ref
- Kei Kanaoka, Yotaro Fujii, and Motomichi Toyama. 2014. Ducky: A data extraction system for various structured Web documents. In Proceedings of the 18th International Database Engineering and Applications Symposium. ACM, New York, NY, 342--347. Google ScholarDigital Library
- Nicholas Kushmerick. 1997. Wrapper Induction for Information Extraction. Ph.D. Dissertation. University of Washington. Google ScholarDigital Library
- Legislation.gov.uk. 2015. Modern Slavery Act 2015. Retrieved March 15, 2018, from http://www.legislation.gov.uk/ukpga/2015/30/section/54/enacted.Google Scholar
- Wei Liu, Xiaofeng Meng, and Weiyi Meng. 2010. Vide: A vision-based approach for deep Web data extraction. IEEE Transactions on Knowledge and Data Engineering 22, 3, 447--460. Google ScholarDigital Library
- Joshua D. Margolis, Hillary Anger Elfenbein, and James P. Walsh. 2007. Does it pay to be good? A meta-analysis and redirection of research on the relationship between corporate social and financial performance. Ann Arbor 1001, 48109--1234.Google Scholar
- Jean B. McGuire, Alison Sundgren, and Thomas Schneeweis. 1988. Corporate social responsibility and firm financial performance. Academy of Management Journal 31, 4, 854--872.Google Scholar
- Richard Mills, Stefano De Paoli, Sotiris Diplaris, Vasiliki Gkatziaki, Symeon Papadopoulos, Srivigneshwar R. Prasad, Ethan McCutchen, Vishal Kapadia, and Philipp Hirche. 2016. WikiRate.org—leveraging collective awareness to understand companies’ environmental, social and governance performance. In Proceedings of the International Conference on Internet Science. 74--88.Google ScholarCross Ref
- Ion Muslea, Steve Minton, and Craig Knoblock. 1998. Stalker: Learning extraction rules for semistructured, Web-based information sources. In Proceedings of the AAAI-98 Workshop on Artificial Intelligence and Information Integration. 74--81.Google Scholar
- Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in social media. Data Mining and Knowledge Discovery 24, 3, 515--554. Google ScholarDigital Library
- Ashok Prasad. 2014. India’s new CSR law sparks debate among NGOs and businesses. Retrieved March 15, 2018, from http://www.theguardian.com/sustainable-business/india-csr-law-debate-business-ngo.Google Scholar
- M. Purushothaman, G. Tower, R. Hancock, and R. Taplin. 2000. Determinants of corporate social reporting practices of listed Singapore companies. Pacific Accounting Review 12, 2, 101--133.Google Scholar
- Steven Scalet and Thomas F. Kelly. 2010. CSR rating agencies: What is their global impact? Journal of Business Ethics 94, 1, 69--88.Google ScholarCross Ref
- SEC. 2014. Fact Sheet: Disclosing the Use of Conflict Minerals. Retrieved March 15, 2018, from https://www.sec.gov/News/Article/Detail/Article/1365171562058.Google Scholar
- Peter A. Stanwick and Sarah D. Stanwick. 1998. The relationship between corporate social performance, and organizational size, financial performance, and environmental performance: An empirical examination. Journal of Business Ethics 17, 2, 195--204.Google ScholarCross Ref
- Ambec Stefan and Lanoie Paul. 2008. Does it pay to be green? A systematic overview. Academy of Management Perspectives 22, 4, 45--62.Google ScholarCross Ref
- Wendy L. Tate, Lisa M. Ellram, and Jon F. Kirchoff. 2010. Corporate social responsibility reports: A thematic analysis related to supply chain management. Journal of Supply Chain Management 46, 1, 19--44.Google ScholarCross Ref
- Ken T. Trotman and Graham W. Bradley. 1981. Associations between social responsibility disclosure and characteristics of companies. Accounting, Organizations and Society 6, 4, 355--362.Google ScholarCross Ref
- Lilian Soares Outtes Wanderley, Rafael Lucian, Francisca Farache, and José Milton de Sousa Filho. 2008. CSR information disclosure on the Web: A context-based approach analysing the influence of country of origin and industry sector. Journal of Business Ethics 82, 2, 369--378.Google ScholarCross Ref
- Jiying Wang and Fred H. Lochovsky. 2003. Data extraction and label assignment for Web databases. In Proceedings of the 12th International Conference on World Wide Web. ACM, New York, NY, 187--196. Google ScholarDigital Library
- Aries Widiarto Sutantoputra. 2009. Social disclosure rating system for assessing firms’ CSR reports. Corporate Communications: An International Journal 14, 1, 34--48.Google ScholarCross Ref
- Tak-Lam Wong and Wai Lam. 2010. Learning to adapt Web information extraction knowledge and discovering new attributes via a Bayesian approach. IEEE Transactions on Knowledge and Data Engineering 22, 4, 523--536. Google ScholarDigital Library
Index Terms
- easIE: Easy-to-Use Information Extraction for Constructing CSR Databases From the Web
Recommendations
Socio-Economic and Environmental Impacts of Poor Paper Management at Higher Education Institutions in Ethiopia: Evidence From Hawassa University
Currently there are 34 universities in Ethiopia which enrolled hundreds of thousands of students which generated large volume of waste paper a year. Therefore, this article has an objective of assessing the socio-economic and environmental impacts of ...
An exploration of the categories associated with ICT project sustainability in rural areas of developing countries: a case study of the Dwesa project
SAICSIT '06: Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countriesInformation Communication Technologies (ICTs) play a significant role in enhancing rural development in developing countries. However, rural ICT projects are confronted with challenges that result in projects that are not sustainable, or fail altogether. ...
Ducky: a data extraction system for various structured web documents
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications SymposiumThe World Wide Web has become a primary source of information. Therefore, extracting data from Web sources has become a key technology. In this paper, we introduce a semi-automatic system Ducky: including a Web Wrapper which extracts data from Web ...
Comments