A concise introduction to the emerging field of data science, explaining its evolution, relation to machine learning, current uses, data infrastructure issues, and ethical challenges. The goal of data science is to improve decision making through the analysis of data. Today data science determines the ads we see online, the books and movies that are recommended to us online, which emails are filtered into our spam folders, and even how much we pay for health insurance. This volume in the MIT Press Essential Knowledge series offers a concise introduction to the emerging field of data science, explaining its evolution, current uses, data infrastructure issues, and ethical challenges. It has never been easier for organizations to gather, store, and process data. Use of data science is driven by the rise of big data and social media, the development of high-performance computing, and the emergence of such powerful methods for data analysis and modeling as deep learning. Data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting non-obvious and useful patterns from large datasets. It is closely related to the fields of data mining and machine learning, but broader in scope. This book offers a brief history of the field, introduces fundamental data concepts, and describes the stages in a data science project. It considers data infrastructure and the challenges posed by integrating data from multiple sources, introduces the basics of machine learning, and discusses how to link machine learning expertise with real-world problems. The book also reviews ethical and legal issues, developments in data regulation, and computational approaches to preserving privacy. Finally, it considers the future impact of data science and offers principles for success in data science projects.
Cited By
- McDowell K (2021). Storytelling wisdom, Journal of the Association for Information Science and Technology, 72:10, (1223-1233), Online publication date: 17-Sep-2021.
- Hayes P, van de Poel I and Steen M (2020). Algorithms and values in justice and security, AI & Society, 35:3, (533-555), Online publication date: 1-Sep-2020.
- Hossari M, Dev S and Kelleher J TEST Proceedings of the 2019 11th International Conference on Computer and Automation Engineering, (78-81)
- Eugeni R The Post-advertising Condition. A Socio-Semiotic and Semio-Pragmatic Approach to Algorithmic Capitalism Social Computing and Social Media. Communication and Social Communities, (291-302)
- Hagen L, Seon Yi H, Pietri S and E. Keller T Processes, Potential Benefits, and Limitations of Big Data Analytics: A Case Analysis of 311 Data from City of Miami Proceedings of the 20th Annual International Conference on Digital Government Research, (1-10)
Index Terms
- Data Science
Recommendations
Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing SystemsThe work involved in gathering, wrangling, cleaning, and otherwise preparing data for analysis is often the most time consuming and tedious aspect of data work. Although many studies describe data preparation within the context of data science workflows,...
Ensuring High-Quality Private Data for Responsible Data Science: Vision and Challenges
On the Horizon, Regular Papers and Challenge PaperHigh-quality data is critical for effective data science. As the use of data science has grown, so too have concerns that individuals’ rights to privacy will be violated. This has led to the development of data protection regulations around the globe ...
Big data and data science: what should we teach?
The era of big data has arrived. Big data bring us the data-driven paradigm and enlighten us to challenge new classes of problems we were not able to solve in the past. We are beginning to see the impacts of big data in every aspect of our lives and ...