To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
What are the benefits of immunization of infants against measles? In particular, does immunization save lives? To answer that question you can use data on immunization rates and mortality in various countries in various years. International organizations collect such data on many countries for many years. The data is free to download, but it’s complex. How should you import, store, organize, and use the data to have all relevant information in an accessible format that lends itself to meaningful analysis? And what problems should you look for in the data, how can you identify those problems, and how should you address them?
Predicting whether people will repay their loans or default on them is important to a bank that sells such loans. Should the bank predict the default probability for applicants? Or, rather, should it classify applicants into prospective defaulters and prospective repayers? And how are the two kinds of predictions related? In particular, can the bank use probability predictions to classify applicants into defaulters and repayers, in a way that takes into account the bank’s costs when a default happens and its costs when it forgoes a good applicant?
You want to predict the price of used cars as a function of their age and other features. You want to specify a model that includes the most important interactions and nonlinearities of those features, but you don’t know how to start. In particular, you are worried that you can’t start with a very complex regression model and use LASSO or some other method to simplify it, because there are way too many potential interactions. Is there an alternative approach to regression that includes the most important interactions without you having to specify them?
You want to understand whether and by how much online and offline prices differ. To that end you need data on the online and offline prices of the same products. How would you collect such data? In particular, how would you select for which products to collect the data, and how could you make sure that the online and offline prices are for the same products?
You want to identify hotels in a city that are underpriced for their location and quality. You have scraped the web for data on all hotels in the city, including prices for a particular date, and many features of the hotels. How can you check whether the data you have is clean enough for further analysis? And how should you start the analysis itself?
Are larger companies better managed? To answer this question, you downloaded data from the World Management Survey. How should you describe the relationship between firm size and the quality of management? In particular, can you describe that with the help of a single number, or an informative graph?
There is a substantial difference in the average earnings of women and men in all countries. You want to understand more about the potential origins of that difference, focusing on employees with a graduate degree in your country. You have data on a large sample of employees with a graduate degree, with their earnings and some of their characteristics, such as age and the kind of graduate degree they have. Women and men differ in those characteristics, which may affect their earnings. How should you use this data to uncover gender difference that are not due to differences in those other characteristics? And can you use regression analysis to uncover patterns of associations between earnings and those other characteristics that may help understand the origins of gender differences in earnings?
You need to predict rental prices of apartments using various features. You don’t know that the various features may interact with each other in determining price, so you would like to use a regression tree. But you want to build a model that gives the best possible prediction, better than a single tree. What methods are available that keep the advantage of regression trees but give a better prediction? How should you choose from those methods?
You want to predict rental prices of apartments in a big city using their location, size, amenities, and other features. You have access to data on many apartments with many variables. You know how to select the best regression model for prediction from several candidate models. But how should you specify those candidate models to begin with? In particular, which of the many variables should they include, in what functional forms, and in what interactions? More generally, how can you make sure that the candidates include the truly good predictive models?
You have a car that you want to sell in the near future. You want to know what price you can expect if you were to sell it. You may also want to know what you could expect if you were to wait one more year and sell your car then. You have data on used cars with their age and other features, and you can predict price with several kinds of regression models with different right-hand-side variables in different functional forms. How should you select the regression model that would give the best prediction?
A country experienced a major natural disaster in the recent past. You want to estimate the effect on total GDP in the year of the disaster, and the following few years. You have data on GDP and other macro variables for the country and several other countries for several years before and after the disaster. It’s straightforward to show how total GDP changed after the disaster. But how should you use this data to estimate the counterfactual: how total GDP would have changed in the country without the natural disaster?
Many firms are owned by their founder or family members of their founder. You want to uncover whether such founder/family-owned firms are as well managed as other kinds of firms and, if there is a difference, how much of that is due to their ownership as opposed to something else. You have cross-sectional observational data on firms and their management practices, and you estimate a difference using simple regression. But is that difference due to founder/family ownership? In particular, can you use multiple regression to get a good estimate of the effect of founder/family ownership? If not, can you tell whether your estimate is larger or smaller than the true effect?
This textbook provides future data analysts with the tools, methods, and skills needed to answer data-focused, real-life questions; to carry out data analysis; and to visualize and interpret results to support better decisions in business, economics, and public policy. Data wrangling and exploration, regression analysis, machine learning, and causal analysis are comprehensively covered, as well as when, why, and how the methods work, and how they relate to each other. As the most effective way to communicate data analysis, running case studies play a central role in this textbook. Each case starts with an industry-relevant question and answers it by using real-world data and applying the tools and methods covered in the textbook. Learning is then consolidated by 360 practice questions and 120 data exercises. Extensive online resources, including raw and cleaned data and codes for all analysis in Stata, R, and Python, can be found at www.gabors-data-analysis.com.
Bayesian Econometric Methods examines principles of Bayesian inference by posing a series of theoretical and applied questions and providing detailed solutions to those questions. This second edition adds extensive coverage of models popular in finance and macroeconomics, including state space and unobserved components models, stochastic volatility models, ARCH, GARCH, and vector autoregressive models. The authors have also added many new exercises related to Gibbs sampling and Markov Chain Monte Carlo (MCMC) methods. The text includes regression-based and hierarchical specifications, models based upon latent variable representations, and mixture and time series specifications. MCMC methods are discussed and illustrated in detail - from introductory applications to those at the current research frontier - and MATLAB® computer programs are provided on the website accompanying the text. Suitable for graduate study in economics, the text should also be of interest to students studying statistics, finance, marketing, and agricultural economics.