We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Theory is the essential foundation on which an empirical network study is built. A network theory stipulates a certain, carefully defined network and offers a reason why it relates to other variables. Pinning down what the precise network of theoretical interest is and fleshing out a reason why it matters is what makes up the key preliminary work in empirical networks research design. It can be tempting to rush through this preliminary step, especially when data are readily available. Note that doing so comes with risks. Design blunders are more debilitating in networks research than in other data collection endeavors. Thinking through all aspects of a theoretical setup takes time, but is part of the real work of research design. Taking the time early is an investment in avoiding wasted effort later. This chapter presents a framework to help construct a theory that is maximally useful for guiding empirical research design.
Generalized linear models extend classical linear models in two ways. They allow the fitting of a linear model to a dependent variable whose expected values have been transformed using a "link" function. They allow for a range of error families other than the normal. They are widely used to fit models to count data and to binomial-type data, including models with errors that may exhibit extra-binomial or extra-Poisson variation. The discussion extends to models in the generalized additive model framework, and to ordinal regression models. Survival analysis, also referred to as time-to-event analysis, is principally concerned with the time duration of a given condition, often but not necessarily sickness or death. In nonmedical contexts, it may be referred to as failure time or reliability analysis. Applications include the failure times of industrial machine components, electronic equipment, kitchen toasters, light bulbs, businesses, loan defaults, and more. There is an elegant methodology for dealing with "censoring" – where all that can be said is that the event of interest occured before or after a certain time, or in a specified interval.
Earlier chapters introduced modeling approaches for a continuous, normally distributed response. Biological data are often not so neat, and the common practice was to transform continuous response variables until the assumption of normality was met. Other kinds of data, particularly presence–absence and behavioral responses and counts, are discrete rather than continuous and require a different approach. In this chapter, we introduce generalized linear models and their extension to generalized linear mixed models to analyze these response variables. We show how common techniques such as contingency tables, loglinear models, and logistic and Poisson regression can be viewed as generalized linear models, using link functions to create the appropriate relationship between response and predictors. The models described in earlier chapters can be reinterpreted as a version of generalized linear models with the identity link function. We finish by introducing generalized additive models for where a linear model may be unsuitable.
There is a daunting array of statistical “methods” out there – regression, ANOVA, loglinear models, GLMMS, ANCOVA, etc. They often are treated as different data analysis approaches. We take a more holistic view. Most methods biologists use are variations on a central theme of generalized linear models – relating a biological response to a linear combination of predictor variables. We show how several common “named” methods are related, based on classifying biological response and predictor variables as continuous or categorical. We use simple regression, single-factor ANOVA, logistic regression, and two-dimensional contingency tables to show how these methods all represent generalized linear models with a single predictor. We describe how we fit these models and outline their assumptions.
We start by outlining how the generalised linear models (GLM) extend the classical linear models, namely by the use of the link function, transforming the values of the response variable predicted by the model. We also present the types of statistical distribution we can choose for the unexplained (residual) variation and relate them to the most commonly encountered forms of biological data. The decomposition of the variation in the response variable, using the analysis of deviance, is described together with the concepts of maximum likelihood and of the null model. We also explain how to handle overdispersion, which is the larger-than expected residual variation in GLMs with an assumed Poisson or binomial distribution. We show the ways we can select predictors for inclusion in our model, focusing on the idea of model parsimony, measured by AIC criterion. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.