We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This self-contained guide introduces two pillars of data science, probability theory, and statistics, side by side, in order to illuminate the connections between statistical techniques and the probabilistic concepts they are based on. The topics covered in the book include random variables, nonparametric and parametric models, correlation, estimation of population parameters, hypothesis testing, principal component analysis, and both linear and nonlinear methods for regression and classification. Examples throughout the book draw from real-world datasets to demonstrate concepts in practice and confront readers with fundamental challenges in data science, such as overfitting, the curse of dimensionality, and causal inference. Code in Python reproducing these examples is available on the book's website, along with videos, slides, and solutions to exercises. This accessible book is ideal for undergraduate and graduate students, data science practitioners, and others interested in the theoretical concepts underlying data science methods.
Tensors are essential in modern day computational and data sciences. This book explores the foundations of tensor decompositions, a data analysis methodology that is ubiquitous in machine learning, signal processing, chemometrics, neuroscience, quantum computing, financial analysis, social science, business market analysis, image processing, and much more. In this self-contained mathematical, algorithmic, and computational treatment of tensor decomposition, the book emphasizes examples using real-world downloadable open-source datasets to ground the abstract concepts. Methodologies for 3-way tensors (the simplest notation) are presented before generalizing to d-way tensors (the most general but complex notation), making the book accessible to advanced undergraduate and graduate students in mathematics, computer science, statistics, engineering, and physical and life sciences. Additionally, extensive background materials in linear algebra, optimization, probability, and statistics are included as appendices.
Recommender systems are ubiquitous in modern life and are one of the main monetization channels for Internet technology giants. This book helps graduate students, researchers and practitioners to get to grips with this cutting-edge field and build the thorough understanding and practical skills needed to progress in the area. It not only introduces the applications of deep learning and generative AI for recommendation models, but also focuses on the industry architecture of the recommender systems. The authors include a detailed discussion of the implementation solutions used by companies such as YouTube, Alibaba, Airbnb and Netflix, as well as the related machine learning framework including model serving, model training, feature storage and data stream processing.
This book introduces relevant and established data-driven modeling tools currently in use or in development, which will help readers master the art and science of constructing models from data and dive into different application areas. It presents statistical tools useful to individuate regularities, discover patterns and laws in complex datasets, and demonstrates how to apply them to devise models that help to understand these systems and predict their behaviors. By focusing on the estimation of multivariate probabilities, the book shows that the entire domain, from linear regressions to deep learning neural networks, can be formulated in probabilistic terms. This book provides the right balance between accessibility and mathematical rigor for applied data science or operations research students, graduate students in CSE, and machine learning and uncertainty quantification researchers who use statistics in their field. Background in probability theory and undergraduate mathematics is assumed.
Harnessing the power of data and AI methods to tackle complex societal challenges requires transdisciplinary collaborations across academia, industry, and government. In this compelling book, Munther A. Dahleh, founder of the MIT Institute for Data, Systems, and Society (IDSS), offers a blueprint for researchers, professionals, and institutions to create approaches to problems of high societal value using innovative, holistic, data-driven methods. Drawing on his experience at IDSS and knowledge of similar initiatives elsewhere, Dahleh describes in clear, non-technical language how statistics, data science, information and decision systems, and social and institutional behavior intersect across multiple domains. He illustrates key concepts with real-life examples from optimizing transportation to making healthcare decisions during pandemics to understanding the media's impact on elections and revolutions. Dahleh also incorporates crucial concepts such as robustness, causality, privacy, and ethics and shares key lessons learned about transdisciplinary communication and about unintended consequences of AI and algorithmic systems.
Published in collaboration with The British Universities Industrial Relations Association (BUIRA), this book critically reviews the future of Industrial Relations (IR) in a changing work landscape and traces its historical evolution. Essential for academics, students and trade unions, it explores IR's significant changes over the past decade and its ongoing influence on our lives.
It is impossible to view the news at present without hearing talk of crisis: the economy, the climate, the pandemic. This book asks how these larger societal issues lead to a crisis with work, making it ever more precarious, unequal and intense. Experts diagnose the nature of the problem and offer a programme for transcending above the crises.
Offering theoretical frameworks from experts as well as practical examples to support women transitioning through menopause in the workplace, this is a go-to reference for academics and policy makers working in the field.
Introduction to Probability and Statistics for Data Science provides a solid course in the fundamental concepts, methods and theory of statistics for students in statistics, data science, biostatistics, engineering, and physical science programs. It teaches students to understand, use, and build on modern statistical techniques for complex problems. The authors develop the methods from both an intuitive and mathematical angle, illustrating with simple examples how and why the methods work. More complicated examples, many of which incorporate data and code in R, show how the method is used in practice. Through this guidance, students get the big picture about how statistics works and can be applied. This text covers more modern topics such as regression trees, large scale hypothesis testing, bootstrapping, MCMC, time series, and fewer theoretical topics like the Cramer-Rao lower bound and the Rao-Blackwell theorem. It features more than 250 high-quality figures, 180 of which involve actual data. Data and R are code available on our website so that students can reproduce the examples and do hands-on exercises.
This guide illuminates the intricate relationship between data management, computer architecture, and system software. It traces the evolution of computing to today's data-centric focus and underscores the importance of hardware-software co-design in achieving efficient data processing systems with high throughput and low latency. The thorough coverage includes topics such as logical data formats, memory architecture, GPU programming, and the innovative use of ray tracing in computational tasks. Special emphasis is placed on minimizing data movement within memory hierarchies and optimizing data storage and retrieval. Tailored for professionals and students in computer science, this book combines theoretical foundations with practical applications, making it an indispensable resource for anyone wanting to master the synergies between data management and computing infrastructure.
Drawing examples from real-world networks, this essential book traces the methods behind network analysis and explains how network data is first gathered, then processed and interpreted. The text will equip you with a toolbox of diverse methods and data modelling approaches, allowing you to quickly start making your own calculations on a huge variety of networked systems. This book sets you up to succeed, addressing the questions of what you need to know and what to do with it, when beginning to work with network data. The hands-on approach adopted throughout means that beginners quickly become capable practitioners, guided by a wealth of interesting examples that demonstrate key concepts. Exercises using real-world data extend and deepen your understanding, and develop effective working patterns in network calculations and analysis. Suitable for both graduate students and researchers across a range of disciplines, this novel text provides a fast-track to network data expertise.
Active labour market policies aim to assist people not in work into work through a range of interventions including job search, training and in-work support and development. While policies and scholarship predominantly focus on jobseekers' engagement with these initiatives, this book sheds light for the first time on the employer's perspective.
The past two decades have seen an explosion both in the volume of data we use, and our understanding of its management. However, while techniques and technology for manipulating data have advanced rapidly in this time, the concepts around the value of our data have not. This lack of progress has made it increasingly difficult for organisations to understand the value in their data, the value of their data, and how to exploit that value.
Halo Data proposes a paradigm shift in methodology for organisations to properly appreciate and leverage the value of their data. Written by an author team with many years' experience in data strategy, management and technology, the book will first review the current state of our understanding of data. This opening will demonstrate the limitations of this status quo, including a discussion on metadata and its limitations, data monetisation and data-driven business models. Following this, the book will present a new concept and framework for understanding and quantifying value in an organisation's data and a practical methodology for using this in practice.
Ideal for data leaders and executives who are looking to leverage the data at their fingertips.
Based on the authors' extensive teaching experience, this hands-on graduate-level textbook teaches how to carry out large-scale data analytics and design machine learning solutions for big data. With a focus on fundamentals, this extensively class-tested textbook walks students through key principles and paradigms for working with large-scale data, frameworks for large-scale data analytics (Hadoop, Spark), and explains how to implement machine learning to exploit big data. It is unique in covering the principles that aspiring data scientists need to know, without detail that can overwhelm. Real-world examples, hands-on coding exercises and labs combine with exceptionally clear explanations to maximize student engagement. Well-defined learning objectives, exercises with online solutions for instructors, lecture slides, and an accompanying suite of lab exercises of increasing difficulty in Jupyter Notebooks offer a coherent and convenient teaching package. An ideal teaching resource for courses on large-scale data analytics with machine learning in computer/data science departments.
The third edition of this practical introduction to Python has been thoroughly updated, with all code migrated to Jupyter notebooks. The notebooks are available online with executable versions of all of the book's content (and more). The text starts with a detailed introduction to the basics of the Python language, without assuming any prior knowledge. Building upon each other, the most important Python packages for numerical math (NumPy), symbolic math (SymPy), and plotting (Matplotlib) are introduced, with brand new chapters covering numerical methods (SciPy) and data handling (Pandas). Further new material includes guidelines for writing efficient Python code and publishing code for other users. Simple and concise code examples, revised for compatibility with Python 3, guide the reader and support the learning process throughout the book. Readers from all of the quantitative sciences, whatever their background, will be able to quickly acquire the skills needed for using Python effectively.
Big data and algorithmic decision-making have been touted as game-changing developments in management research, but they have their limitations. Qualitative approaches should not be cast aside in the age of digitalisation, since they facilitate understanding of quantitative data and the questioning of assumptions and conclusions that may otherwise lead to faulty implications being drawn, and - crucially - inaccurate strategies, decisions and actions. This handbook comprises three parts: Part I highlights many of the issues associated with 'unthinking digitalisation', particularly concerning the overreliance on algorithmic decision-making and the consequent need for qualitative research. Part II provides examples of the various qualitative methods that can be usefully employed in researching various digital phenomena and issues. Part III introduces a range of emergent issues concerning practice, knowing, datafication, technology design and implementation, data reliance and algorithms, digitalisation.
Data science is the foundation of our modern world. It underlies applications used by billions of people every day, providing new tools, forms of entertainment, economic growth, and potential solutions to difficult, complex problems. These opportunities come with significant societal consequences, raising fundamental questions about issues such as data quality, fairness, privacy, and causation. In this book, four leading experts convey the excitement and promise of data science and examine the major challenges in gaining its benefits and mitigating its harms. They offer frameworks for critically evaluating the ingredients and the ethical considerations needed to apply data science productively, illustrated by extensive application examples. The authors' far-ranging exploration of these complex issues will stimulate data science practitioners and students, as well as humanists, social scientists, scientists, and policy makers, to study and debate how data science can be used more effectively and more ethically to better our world.
Throughout the COVID-19 pandemic, flexible working has become the norm for many workers. This volume offers an original examination of flexible working using data from 30 European countries and drawing on studies conducted in Australia, the US and India. Rather than providing a better work-life balance, the book reveals how flexible working can lead to exploitation, which manifests differently for women and men, such as more care responsibilities or increased working hours.
As the world becomes increasingly connected, it is also more exposed to a myriad of cyber threats. We need to use multiple types of tools and techniques to learn and understand the evolving threat landscape. Data is a common thread linking various types of devices and end users. Analyzing data across different segments of cybersecurity domains, particularly data generated during cyber-attacks, can help us understand threats better, prevent future cyber-attacks, and provide insights into the evolving cyber threat landscape. This book takes a data oriented approach to studying cyber threats, showing in depth how traditional methods such as anomaly detection can be extended using data analytics and also applies data analytics to non-traditional views of cybersecurity, such as multi domain analysis, time series and spatial data analysis, and human-centered cybersecurity.
Healthcare has recently seen numerous exciting applications of artificial intelligence, industrial engineering, and operations research. This book, designed to be accessible to a diverse audience, provides an overview of interdisciplinary research partnerships that leverage AI, IE, and OR to tackle societal and operational problems in healthcare. The topics are drawn from a wide variety of disciplines, ranging from optimizing the location of AEDs for cardiac arrests to data mining for facilitating patient flow through a hospital. These applications highlight how engineering has contributed to medical knowledge, health system operations, and behavioral health. Chapter authors include medical doctors, policy-makers, social scientists, and engineers. Each chapter begins with a summary of the health care problem and engineering method. In these examples, researchers in public health, medicine, and social science as well as engineers will find a path to start interdisciplinary collaborations in health applications of AI/IE/OR.