To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This book is about doing variation analysis. My goal is to give you a step-by-step guide which will take you through a variationist analysis from beginning to end. Although I will cover the major issues, I will not attempt a full treatment of the theoretical issues nor of the statistical underpinnings. Instead, you will be directed to references where the relevant points are treated fully and in detail. In later chapters, explicit discussion will be made as to how different types of analysis either challenge, contribute to or advance the basic theoretical issues. This is important for demonstrating (and encouraging) evolution in the field and for providing a sense of its ongoing development. Such a synthetic perspective is also critical for evolving our research in the most interesting direction(s). In other words, this book is meant to be a learning resource which can stimulate methodological developments, curriculum development as well as advancements in teaching and transmission of knowledge in variation analysis.
WHAT IS VARIATION ANALYSIS?
Variation analysis combines techniques from linguistics, anthropology and statistics to investigate language use and structure (Poplack 1993: 251). For example, a seven-year-old boy answers a teacher's question by saying, ‘I don't know nothing about that!’ A middle-aged woman asks another, ‘You got a big family?’ Are these utterances instances of dialect, slang, or simply performance errors, mistakes? Where on the planet were they spoken, why, by people of what background and character, in which sociocultural setting, under what conditions?
How do you collect data? This chapter will outline tried-and-true data collection techniques.
The most fundamental challenge for sociolinguistic research is how to obtain appropriate linguistic data to analyse. But how do you actually do it? The best exemplars that exist – Labov (1972a), Milroy (1987) and Sankoff (1973, 1974) – were written in the 1960s and 1970s. Detailed individual accounts are rarely published, except in dissertation methodology chapters. Some of most memorable fieldwork tips I ever received were from chatting to sociolinguists at conferences (see also Feagin 2002: 37). In fact, fieldwork methods may be the best-kept secret of sociolinguistics. In this chapter, you will learn everything I know about how to collect data.
THE BASICS
The very first task is to design a sample that addresses ‘the relationship between research design and research objectives’ (Milroy 1987: 18, Milroy and Gordon 2003: 24). At the outset, a sociolinguistic project must have (at least) two parts: 1) a (socio)linguistic problem and 2) appropriate data to address it.
Perhaps the consensus on good practice in this regard is to base one's sampling procedure on ‘specifiable and defensible principles’ (Chambers 2003: 46). The question is: What are these, and how to apply them?
DATA COLLECTION
According to Sankoff, the need for good data imposes three different kinds of decisions about data collection on the researcher: a) choosing what data to collect; b) stratifying the sample; and c) deciding on how much data to collect from how many speakers.
The variationist approach to sociolinguistics began during the 1960s, when Labov, working with Uriel Weinreich, developed a theory of language change (Weinreich et al. 1968). Thereafter, Labov continued to advance the method and analysis of language variation and change, which today is often referred to as variation theory (e.g. Labov 1963, 1966/1982).
In the 1970s, one of Labov's graduate students at the University of Pennsylvania was Shana Poplack. In 1981, Shana became a professor of Sociolinguistics at the University of Ottawa's Department of Linguistics, the same year I entered the MA programme. I was fortunate to be Shana's student until I completed my Ph.D. dissertation in 1991. Everything you will read in this book has come directly from what has been passed on from this lineage – training, techniques, insights, knowledge, and sheer passion for the field. The entire period from 1981 to 1995 was an invaluable apprenticeship through my studies with Shana and our many collaborations (e.g. Tagliamonte and Poplack 1988, Poplack and Tagliamonte 1989, 1991). I also benefited tremendously from the influence of David Sankoff, whose input to my questions of method and analysis was innumerable.
Most knowledge and learning in variation theory has been acquired like this, passed on through word of mouth, from one researcher to the next (see also Guy 1988: 124). In fact, it has often been noted that the practical details of how to actually do variation analysis are arcane, largely unwritten and, for the most part, undocumented (but see Paolillo 2002).
What do you do with a linguistic variable once you've found one?
This chapter will provide a step-by-step procedure for setting up an analysis of a linguistic variable. It will detail the procedures for coding, how to illustrate the linguistic variable and how to test claims about one variant over another.
Once you have decided on a linguistic variable to study and have a good idea how you will circumscribe it, the next step is to begin the extraction phase.
DATA EXTRACTION
Data extraction refers to the process and procedures involved in sifting through your data in order to find and select the relevant tokens, i.e. each and every instance of each variant within the context of variation (Chapter 5), and place each token into a data file. A number of procedures have evolved over the years which greatly facilitate the extraction process. To be very practical about exactly how this is done, I will make use of the data shown earlier in Chapter 5, example (12). Suppose we were to extract the tokens of variable (ing) from this material; the data could be listed as in (1) with the word containing the variable along with some of the context in which it occurred displayed in each line. I have found that putting the word containing the variable in capital letters makes it easier to see what is going on in the data.
How do you find a linguistic variable? This chapter will discuss the key construct in the variationist paradigm – the linguistic variable. It will detail the definition of a linguistic variable, describe what it is, how to identify it and how to circumscribe it.
DEFINING THE LINGUISTIC VARIABLE
The definition of a linguistic variable is the first and also the last step in the analysis of variation. It begins with the simple act of noticing a variation – that there are two alternative ways of saying the same thing.
(Labov to appear)
The most fundamental construct in variation analysis is the ‘linguistic variable’. The quote above is the most recent one I could find from Labov himself; turning back to the original definition of the linguistic variable you find something a little more complicated. In 1966, Labov (1966/1982: 49) says the linguistic variable must be ‘high in frequency, have a certain immunity from conscious suppression … [be] integral units of larger structures, and … be easily quantified on a linear scale’. Furthermore, the linguistic variable was required to be ‘highly stratified’ and to have ‘an asymmetric distribution over a wide range of age levels or other ordered strata of the society’ (Labov 1972c: 8). In this chapter, I shall ‘unpack’ what all this means. At the outset, however, the most straightforward and simple definition of the linguistic variable is simply ‘two or more ways of saying the same thing’ (Labov 1972c, Sankoff 1980: 55).
This chapter will discuss the relevant results for interpreting a variation analysis. What it all boils down to is ‘finding the story’.
There comes a time when the analyses must stop. The marginal data has been honed to perfection. The multivariate analyses have been run enough. The results are as they are. Now it is time to pause and reflect, to interpret them.
The essential task is to understand and explain the nature of variability in a data set. What constrains it? What underlying mechanism produced it? What grammatical work is the variable doing in the grammar? If two (or more) data sets are being compared, do they share an underlying grammar? To what extent is their grammar shared and, if only to a certain extent, how far? Is the variable stable or is it implicated in linguistic change? Can the path of its linguistic development be traced through the variable grammar? Is it an innovation, a re-analysis or a retention?
By the time you have reached this point, and if you have completed each of the exercises in this book, most of the work is already done. You have articulated the issues and posed the questions. You have collected the data, constructed the corpus, discovered the variable, and circumscribed, extracted and coded it. You have gone through the many-layered procedures of analysis, re-analysis, honing and refining. You know your data inside out, every cross-tabulated cell of it.
This chapter will illustrate the procedures for performing a multivariate analysis, particularly how to determine if the analysis is the ‘best’ one, how to look for and identify ‘interaction’, and what to do about it when you find it.
At this point, you are ready to move forward with a fully fledged variable rule analysis of your data. Goldvarb 2.1, Goldvarb 2001 and Goldvarb X permit variable rule analysis with binomial applications. Only one or two values can be declared as application values (Rand and Sankoff 1990: 24). The multinomial one level visible under the ‘CELLS’ drop-down menu in Goldvarb 2.1 and Goldvarb X have never been implemented. To my knowledge the only version of the variable rule program which permits more than binomial application is Varbrul 3, which permits the trinomial case. For further discussion, see Rousseau and Sankoff (1978a, 1978b). However, very few analyses in the field have used this type of analysis.
A further requirement for running the variable rule program is that the condition file you use produces marginal results with no singletons and no KnockOuts. Having worked through your analysis as described in Chapter 7, you should already have a condition file, or series of condition files, that meet this requirement. With one of these condition files and the data file open, load the cells to memory, and save them.
This chapter will introduce the statistical analysis of variable rules. It will cover the terms ‘input’,' log likelihood' and ‘significance’, and describe what they mean.
The fact that grammatical structures incorporate choice as a basic building block means that they accept probabilization in a very natural way, mathematically speaking.
(Sankoff 1978: 235)
This chapter is written in two parts. The first part is theoretical, designed to tell you about the variable rule program, variable rules and their history. Importantly, I address the issues of why to use the variable rule program at all. The second part is practical, providing you with an overview of how the fundamental characteristics of the variable rule program function in the Goldvarb series of programs.
First, here is where you can download the variable rule program (Rand and Sankoff 1990, Robinson et al. 2001, Sankoff et al. 2005): http://www.crm.umontreal.ca/~sankoff/GoldVarb_Eng.html; http://www.york.ac.uk/depts/lang/webstuff/goldvarb/; http://individual. utoronto.ca/tagliamonte/Goldvarb/GV_index.htm (Goldvarb 2.1, Goldvarb 2001, Goldvarb X).
THEORY
Much of what is mysterious about variationist sociolinguistics comes from the often arcane technical descriptions of its primary analytic tool, the variable rule program. Like many things that involve numbers, reading about the variable rule program often incites a negative response. However, the variable rule program is an incredible tool, not only for conducting sophisticated statistical analyses, but also for helping you to make sense of linguistic data, and even for simply organising it.
This chapter will outline the method for reporting the results of a multivariate analysis, including Ns, %s and Total Ns, corrected mean, selected factors, etc.
The foundation of variation analysis is its attempt to discover not individual occurrences, not even overall rates of occurrence, but patterns of variability in the body (or bodies) of material under investigation. To aid in the interpretation of these patterns there are a number of different lines of evidence which arise from the statistical modelling techniques of multivariate analysis.
THREE LINES OF EVIDENCE
There are three levels of evidence (Poplack and Tagliamonte 2001: 92, Tagliamonte 2002: 731) available for interpreting the results of variation analysis as performed by the step-up/step-down method of multiple regression: 1) statistical significance, i.e. Which factors are statistically significant at the .05 level and which are not? 2) relative strength, i.e. Which factor group is most significant (largest range) or least (smallest range)? 3) What is the order (from more to less) of factors within a linguistic feature (constraint hierarchy)? Finally, bringing in the interpretative component of variation analysis, 4) Does this order reflect the direction predicted by one or the other of the hypotheses being tested? Each of these bits of information can, and should, be used to build your argumentation about the linguistic variable.
In this chapter, we explore the notion of commodified identity and introduce a series of tools and frameworks by which to analyse its discursive constitution. We pursue four different interpretations of the term ‘commodified identities’:
Identities of consumers (accounts for and practices of consumption).
The process of identity commodification through acts of consumption (How do commercial discourses such as advertisements ‘speak’ to us and engage us with their message?).
Representations of identities in commodified contexts (for example, consumer femininity, commodified ‘laddism’).
Self-commodifying discourses (for example, personal advertisements, job applications/CVs/references, commercial telephone sex lines).
In order to address all of these connotations of ‘commodified identity’ we draw on critical discourse analysis and critical discursive psychology. In other words, we analyse the linguistic content of advertising or promotional material, but will, in a detailed case study of men's lifestyle magazines, relate this to in-depth interviews and reader-response exercises conducted with groups of male consumers. This kind of two-way analysis captures meanings at the interface between contexts of production, text and consumption and is allied to a growing tradition of research known as a ‘circuits of culture’ model central to contemporary cultural studies (for example, Du Gay et al. 1997; Johnson 1986). A circuits of culture model acknowledges the importance of a global consideration of all moments in the broader context of commercial culture (that is, production, text, consumption, lived identities of consumers) and the often complex ways in which they may intersect.
In this chapter, we consider how to define and analyse ‘institutional identities’, This is a less straightforward task than might initially seem the case. Does ‘institutional identity’ refer to fixed, pre-discursive and complementary pair roles, such as ‘doctor and patient’? Does it refer to any identity that is displayed in talk oriented to institutional goals or activities? Is it possible to identify ‘institutionality’ linguistically? Do we need prior knowledge of institutional encounters to understand them?
We discuss two main approaches to understanding the links between institutions, discourse and identity. Ethnomethodological and conversation analytic (CA) approaches argue that ‘institutionality’ or institutional identities are emergent properties of talk-in-interaction. In contrast, critical discourse analytic (CDA) accounts argue that the way people interact in social situations reflects existing macro-social forces. Any analysis of institutional interaction starts with a critique of institutions as structures that embed power relations within them. Institutional identity is therefore a function of these existing relations. The tension between these two approaches is summarised usefully by Mäkitalo and Saljö (2000: 48):
Analysts interested in institutional talk … face an interesting dilemma when it comes to the problem of how to account for the relationship between structural and enduring features of institutions and interactional dynamics. At a general level, this issue concerns how talk is occasioned by organizational structure, and precisely what is ‘institutional’ about talk. This relation between stable communicative practices and in situ talk is often understood as a matter of trying to connect ‘macro’ (social structure) with ‘micro’ (talk) or, alternatively, the ‘present’ with the ‘historical’.