To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
You want to identify hotels in a city that are good deals: underpriced for their location and quality. You have scraped the web for data on all hotels in the city, and you have cleaned the data. You have carried out exploratory data analysis that revealed that hotels closer to the city center tend to be more expensive, but there is a lot of variation in prices between hotels at the same distance. How should you identify hotels that are underpriced relative to their distance to the city center? In particular, how should you capture the average price–distance relationship that would provide you a benchmark, to which you can compare actual price to find good deals?
Your task is to help the admissions staff at a university design the online advertising for a graduate program. They have several competing ideas about how the online ad should look. How would you design an experiment that could tell which idea would work best? In particular, how many subjects would you need, and how would you assign them into groups, reach the subjects, and measure the outcome? Once you have the data, how would you examine data quality, estimate the effect of showing one version versus another version, and how would you use these results to answer the original question?
You want to know how the industrial production of your country is affected by changes in the import demand of your country’s largest trading partner. You have time series data on the industrial production of your country and total imports of its trading partner. How should you estimate this effect? Is there a way to get a reasonably precise effect estimate when your time series is not very long? In particular, can you use similar time series from similar countries to get a good and more precise estimate of the effect for your country?
Does smoking make you sick? And can smoking make you sick in late middle age even if you stopped years earlier? You have data on many healthy people in their fifties from various countries, and you know whether they stayed healthy four years later. You have variables on their smoking habits, their age, income, and many other characteristics. How can you use this data to estimate how much more likely non-smokers are to stay healthy? How can you uncover if that depends on whether they never smoked or are former smokers? And how can you tell if that association is the result of smoking itself or, instead, underlying differences in smoking by education, income, and other factors?
Your task is to predict the number of daily tickets sold for next year in a swimming pool in a large city. The swimming pool sells tickets through its sales terminal that records all transactions. You aggregate that data to daily frequency. How should you use the information on daily sales to produce your forecast? In particular, how should you model trend, and how should you model seasonality by months of the year and days of the week to produce the best prediction?
How likely is it that you will experience a large loss on your investment portfolio of company stocks? To answer this, you have collected data on past returns of your portfolio and calculated the frequency of large losses. Based on this frequency, how can you tell what likelihood to expect in the coming calendar year? And can you quantify the uncertainty about that expectation in a meaningful way?
You want to uncover the effect of flexible work hours on employee retention: whether by giving employees more freedom to choose their work hours makes them more likely to stay with their employer. You can use observational data on firms from two years, and some firms introduced flexible work hours between those two years. How can you use this data to estimate the effect you are after?
You want to find out if the earnings of women and men tend to be different in your country in the occupation you are considering: market analysts. Analyzing data from a random sample of market analysts in the USA, you find that women earn less, by 11 percent on average. How much gender difference can you expect among all market analysts in the USA? In particular, is there a difference in the population, or is it just a chance event true in your data? And can you generalize these results to the future or to other countries?
Life expectancy at birth shows how long residents of a country live; it is a summary measure of their health. Residents of richer countries tend to live longer, but you want to know the strength of that pattern. You also want to identify countries where people live especially long for the income level of their country, to start thinking about what may cause their exceptional health. You download cross-country data on life expectancy and GDP per capita, and you want to uncover the pattern of association between them. How would you do that in a way that accommodates potentially nonlinear patterns and, at the same time, produces results that you can interpret?
You are considering investing in a company stock, and you want to know how risky that investment is. In finance, a relevant measure of risk relates returns on a company stock to market returns: a company stock is considered risky if it tends to move in the direction of the market, and the more it moves in that direction, the riskier it is. You have downloaded data on daily stock prices for many years. How should you define returns? How should you assess whether and to what extent returns on the company stock move together with market returns?
You want to know whether online and offline prices differ in your country for products that are sold in both ways. You have access to data on a sample of products with their online and offline prices. How would you use this data to establish whether prices tend to be different or the same for all products?
You work for a company that wants to quantify the benefits of its online advertising: how many people buy its product because they see an ad posted online. How can you translate this question into something you can uncover using actual data? What kind of data do you need to get a good answer to this question? What would be the most important issues to consider with that data?