To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
What are the benefits of immunization of infants against measles? In particular, does immunization save lives? To answer that question you can use data on immunization rates and mortality in various countries in various years. International organizations collect such data on many countries for many years. The data is free to download, but it’s complex. How should you import, store, organize, and use the data to have all relevant information in an accessible format that lends itself to meaningful analysis? And what problems should you look for in the data, how can you identify those problems, and how should you address them?
Predicting whether people will repay their loans or default on them is important to a bank that sells such loans. Should the bank predict the default probability for applicants? Or, rather, should it classify applicants into prospective defaulters and prospective repayers? And how are the two kinds of predictions related? In particular, can the bank use probability predictions to classify applicants into defaulters and repayers, in a way that takes into account the bank’s costs when a default happens and its costs when it forgoes a good applicant?
The goal of the book as a whole is to ‘translate’ coin evidence for a new generation of historians. The work of Michael Crawford represented a major leap forward in the study of Roman republican coins during the twentieth century while on the work of earlier generations.The major thematic structure of the book is summarized, and eight basic principles related to the use of coin evidence are laid out.
You want to predict the price of used cars as a function of their age and other features. You want to specify a model that includes the most important interactions and nonlinearities of those features, but you don’t know how to start. In particular, you are worried that you can’t start with a very complex regression model and use LASSO or some other method to simplify it, because there are way too many potential interactions. Is there an alternative approach to regression that includes the most important interactions without you having to specify them?
You want to understand whether and by how much online and offline prices differ. To that end you need data on the online and offline prices of the same products. How would you collect such data? In particular, how would you select for which products to collect the data, and how could you make sure that the online and offline prices are for the same products?
You want to identify hotels in a city that are underpriced for their location and quality. You have scraped the web for data on all hotels in the city, including prices for a particular date, and many features of the hotels. How can you check whether the data you have is clean enough for further analysis? And how should you start the analysis itself?
Poor-quality measurements are likely to yield meaningless or unrepeatable findings. High-quality measurements are characterised by validity and reliability. Validity relates to whether the right quantity is measured and is assessed by comparing a metric with a gold-standard metric. Reliability relates to whether measurements are repeatable and is assessed by comparing repeated measurements. The accuracy and precision with which measurements are made affect both validity and reliability. A major source of unreliability in behavioural data comes from the involvement of human observers in the measurement process. Where trade-offs are necessary, it is better to measure the right quantity somewhat unreliably than to measure the wrong quantity very reliably. Floor and ceiling effects can make measurements useless for answering a question, even if they are valid and reliable. Outlying data points should only be removed if they can be proved to be biologically impossible or to result from errors.
Interpreting results correctly and communicating them honestly are vital parts of what scientists do. Incorrect interpretation of data often results from avoidable statistical mistakes. Common pitfalls arise from abuse of significance testing, misunderstanding of correlations and overgeneralisation of findings. Publishing peer-reviewed papers in scientific journals is the primary means by which researchers communicate their findings to other scientists. A scientific paper has an established basic format comprising title, abstract, introduction, methods, results and discussion. Open Science practices are an important part of the modern publication process. Non-technical (lay) summaries and press releases are tools for communicating behavioural research to journalists and the public. All science involves potential conflicts of interest, and their influence on scientific communication is an unresolved cause for concern. Several organisations oversee the integrity of science, but ultimately it is the personal responsibility of each individual researcher to behave with openness and integrity.
This chapter looks at cases where those subject to Roman hegemony attempted to throw off Roman control and also where the power of individuals within the state became so contested that it threatened the constitutional integrity of the republic.In the first half coin evidence is used to look at South Italian communities that sided with Hannibal during the Second Punic War, uprisings of enslaved peoples and Roman responses, and the failed attempt by Rome’s former Italian allies to set up a rival federal state.The second half examines what numismatic evidence can tell us about the autocratic ambitions of Marius, Sulla, and Pompey and ends with a close look at how Sulla’s memory was used during the period of Pompey’s ascendency.
Are larger companies better managed? To answer this question, you downloaded data from the World Management Survey. How should you describe the relationship between firm size and the quality of management? In particular, can you describe that with the help of a single number, or an informative graph?
There is a substantial difference in the average earnings of women and men in all countries. You want to understand more about the potential origins of that difference, focusing on employees with a graduate degree in your country. You have data on a large sample of employees with a graduate degree, with their earnings and some of their characteristics, such as age and the kind of graduate degree they have. Women and men differ in those characteristics, which may affect their earnings. How should you use this data to uncover gender difference that are not due to differences in those other characteristics? And can you use regression analysis to uncover patterns of associations between earnings and those other characteristics that may help understand the origins of gender differences in earnings?
Public trust in science depends on scientists behaving legally and ethically. Ethical science is also often better science. To be ethical, research must be of sufficient quality to further scientific understanding and its potential benefits should outweigh the risks of harm to subjects or other stakeholders. All research must also be lawful. Conducting a harm–benefit analysis is central to ensuring that ethical standards are maintained in research and is required for the majority of behavioural studies. Formal ethical approval must be obtained before starting to collect data. Research on animals should minimise animal suffering by following the 3Rs principles of replacement, reduction and refinement. Humane end points should be used to limit unnecessary suffering. Research on humans should respect the autonomy and rights of participants and will generally require informed consent, the right to withdraw and debriefing. Deception is potentially harmful and should only be used following careful consideration.
You need to predict rental prices of apartments using various features. You don’t know that the various features may interact with each other in determining price, so you would like to use a regression tree. But you want to build a model that gives the best possible prediction, better than a single tree. What methods are available that keep the advantage of regression trees but give a better prediction? How should you choose from those methods?