Exercise: Perfect Progressive Copular Constructions

I’m working on a new case study on statistically underrepresented constructions to complement (or perhaps replace) the case study on negative evidence in Section 8.2.2.3 of CLGM. The case study involves perfect progressive passive constructions (inspired by the broader case study on progressive passives in Manfred Krug and Julia Schlüter’s Research Methods in Language Variation and Change, Cambridge, 2013). It is a complex case study and I’m not sure it will lead to anything, but it has yielded a by-product that might make an interesting exercise.

To get a first overview of perfect progressive passives, I did what Krug and Schlüter (and others) have done, and simply queried the BNC for the sequence “been being” (the CQP query I used was ⟨[word=”been”%c] [pos=”AV.”]? [word=”being”%c]⟩, allowing for the potential occurrence of an adverb). This yielded six hits:

Because some people are making large profits, and raising the total of profits by so doing, it is proposed to come down on the ‘innocent’ businesses, whose profits may actually have gone down, who have been being ‘responsible’ and not increasing their prices; …. (BNC A69)
I may have been being a bit selfish, but I could n’t bear to lose him in that way, and he seemed to be making such an effort himself, not ever putting weight on that leg and eating as much as he could. (BNC ASH)
One of her major contributions to WACC has been being involved in the impetus for representation of women in decision-making at all levels …. (BNC EBE)
I mean I may have been being a bit naive [unclear] as a general sort of, just a rep within NALGO, …. (BNC F7J)
That was a time when the proudest moment of Denis’s young life had been being brought by his mother to Fitzgerald’s Park to see his father, brave and bold and handsome in his dress uniform, …. (BNC FRJ)
That er, er, little action has been taken in the last thirty forty years since this has been being discussed, erm, I think the first international conference erm, produced their own report in nineteen sixty. (BNC JJG)

As others have noted, only one of these (example (6)) is a passive; examples (3) and (5), which may look like passives at first glance, are actually copular constructions with an adverbial clause as its complement. It is known that perfect progressive passives are rare, but that is a case study for another day.

What struck my eye were examples (1), (2) and (4) – these are regular copular constructions with adjectives as their subject complement. Three examples in a 100 million-word corpus is probably enough to convince us that progressive perfect copular constructions exist in English, but it is a number that suggests that (just like present progressive passives) these constructions are underrepresented.

It is an interesting exercise to try to determine whether this is indeed the case. In order to do so, we need the overall frequency of copular constructions with adjectival subject complements in the BNC, categorized by tense/aspect combinations. This will then allow us to derive expected frequencies, compare them to the observed frequencies, and test the difference for statistical significance.

How do we go about this? Well, thanks to the relatively fixed word order of English, such constructions are fairly easy to search for given a corpus that provides pos-tags and lemmas (which the BNC does). We basically have to search for the lemma BE followed by an adjective – something like ⟨[hw=”be”] [pos=”AJ.”%c]⟩.

It’s a bit more complicated, however, as adverbs and similar expressions can occur between the copula and the adjective. Thus, in a first step, I searched for the lemma BE, followed by up to four tokens that were neither an adjective nor a punctuation mark, followed by an adjective (the cqp query is ⟨[hw=”be”] [pos!=”(AJ.|PUN)”]{0,4} [pos=”AJ.”]⟩.) This gave me a long list of expressions, like very, a bit, also, not, and many adverbs ending in _-ly, some of which can be combined, some of which cannot. I decided that it would be safe to query for the individual words, allowing any number in any order, with the exception of a bit, a little, far from and more or less, which I only allowed in this exact sequence. The combined query looks as follows (note that I also allow an optional quotation mark before the adjective), as example (1) above shows that speakers sometimes put an adjective in quotation marks in copular constructions):

Show query

This yielded 868,826 results. I then counted the different word forms of the copula BE (easily done in the Corpus Workbench by running the command count Last by word%c on match[0]). I then sorted the word forms according to the tense form they represent, to the extent that this is possible:

Infinitive (be): 144850
Simple Present: 448594 (the sum of am (9085), ‘m (23728), are (119228), ‘re (17386), is (216132), and ‘s (63035).
Simple past: 224126 (the sum of was (161852) and were (62274))

This left the forms been (35057) and being (14592), that do not allow us to assign an underlying tense directly. For been we could check, how many of them are preceded by have or has (present perfect) and how many by had (past perfect) using a query like the following:

Show query

This only yields 32,329 hits, instead of the 35,057 hits we’re looking for, as there are cases where the lemma HAVE is separated by unexpected material (such as the appositions in example 7 and 8), or where it has been omitted (as in example 9):

Theatres were closed during the Cromwellian period, but with the restoration of the monarchy in 1660 came Court comedy and the beginning of the ‘comedy of manners’ which has, in one way or another, been popular right up to the present day. (BNC A06)
They have also, when confronted with [any] critical research findings, been quick to use this power to neutralize the critical impact … (BNC A0K)
Not putting yourselves out, I reckon, on account of him being a working fella. Been different if he was one of your upper crust. (BNC A73)

If we were interested in differences between past perfect and present perfect, we could solve this problem, either by manually going through the 2728 hits that the query above misses (which is a lot of work), or by assuming that these cases are distributed across the past and present perfect in the same proportion as the cases that are captured by the query (which is a reasonable assumption, but we could check it by drawing a random sample of the missed cases and check it manually to see whether it conforms to the assumed distribution).

Since we are not interested in this difference, but rather in the combination of the perfect in general with the progressive, we will simply treat all cases with been as instantiating the perfect, without distinguishing further.¹

This leaves the crucial form being – this part of our results contains all progressives, including perfect progressives like those in examples (1) to (4) above, but it also contains many cases where the form being is not part of a progressive, as in the following cases:

Being nervous and taking risks are two of the main things you will have to face as an aspiring actor. (BNC A06)
BS is like a man who knows all about swimming, even to the point of being able to train the Olympic team, but who can not swim himself, and V is the man with the normal talent for swimming. (BNC A0T)
I quickly became aware of what a ‘real polis’ was and more importantly what boundaries one had to cross to cease being real and in effect become unreal, inauspicious, and inhuman. (BNC A0K)

In order to find the progressives, we have to find those cases where being is preceded by another instance of the lemma BE. We can do this by using the following query, which looks for the lemma BE, optionally followed by one or more adverbs or similar expressions, followed by the word form being, followed optionally, again, by one or more adverbs or similar expressions:

Show query

This query will miss a few cases, where something other than one of the expressions in the query occurs between the lemma BE and the word form being (analogous to examples (7) and (8) above). A bit of manual checking did not yield such cases, however, unlike in the case of HAVE + BEEN, so such cases are probably rare enough to ignore.

If we count the word forms for the lemma BE returned by the query just given, we can derive the frequencies for different types of progressives analogous to the way we derived the frequencies for simple tenses above:

Infinitive progressive (be): 16
Present progressive: 973 (sum of am (27), ‘m (198), are (179), ‘re (194), is (229), and ‘s (146).)
Past progressive: 613 (sum of was (495) and were (118))
Perfect progressive (been): 4
“Double” progressive (being): 1

The cases of the perfect progressive found by the query are examples (1) to (4) above, i.e., they include example (3), which is a false hit. The “double” progressive is also a false hit from a transcript of spoken language, where the word being is simply repeated:

I think we’ve tried to err being being reasonable.

It is an interesting question, whether a “double” progressive actually exists. Structurally, it should be possible to say something like (14) (compare it to example (10) above):

Being being nervous is quite taxing for the aspiring actor.

While being nervous in (10) would mean “experiencing nervousness”, (14) would perhaps mean “acting as though experiencing nervousness”. We will leave this question for another time.

We now finally have all the frequencies we need to construct a table crossing the variable ASPECT with the values SIMPLE and PROGRESSIVE with the variable TENSE-ASPECT with the values INFINITIVE, PRESENT, PAST, PERFECT and PROGRESSIVE. Note that I am including the two false hits in the data, as we did not check the other cells for false hits either.

OBSERVED	simple	progressive	TOTAL
infinitive	144850	16	144866
present	448594	973	449567
past	224126	613	224739
perfect	35057	4	35061
continuous	14592	1	14593
TOTAL	867219	1607	868826

From this table, we can derive the expected frequencies (if you don’t know how, reread Section 7.1.3.1 of CLGM):

EXPECTED	simple	progressive	TOTAL
infinitive	144598.05	267.95	144866
present	448735.47	831.53	449567
past	224323.32	415.68	224739
perfect	34996.15	64.85	35061
continuous	14566.01	26.99	14593
TOTAL	867219	1607	868826

Clearly, perfect progressives are underrepresented in the data – there should be 65, but we only find 4 (actually, 3). Also underrepresented are the infinitive progressive (as in Sometimes it may be appropriate to support the child who appears to be being facetious (BNC HYA)), and the hypothetical “double” progressive (which actually does not occur at all). In contrast, the present progressive and the past progressive are slightly overrepresented.

The overall chi-square value is 437.57, which, at four degrees of freedom is highly significant (p < 001) (check the table of critical chi-square values on p. 447 of CLGM (note that it is unfortunately missing the column with the degrees of freedom, but check the fourth row). We may want to know more precisely where the effect comes from, so let us calculate the individual chi-square components for each cell and check them for significance (see the discussion in CLGM p. 202-204).

CHI-SQUARE	simple	progressive
infinitive	0.44	236.90
present	0.04	24.07
past	0.17	93.66
perfect	0.11	57.10
continuous	0.05	25.03

As you can see, all over- and underrepresentations of the progressive contribute significantly to the overall chi-square value (check the table in CLGM p. 448).

We started out trying to determine whether perfect progressive instances of the copular construction with adjectival complements are less frequent than they should be, but we have now shown that there are three forms that are underrepresented: the infinitive progressive (to be being silly), the perfect progressive (have/has/had been being silly) and the hypothetical double progressive (being being silly). In contrast, two forms are overrepresented: the present (am/are/is being silly) and the past (was/were being silly).

Asking ourselves why this is the case, notice that for the overrepresented tense-aspect combinations, the first word form of the lemma BE in the sequence does not have the regular verb stem be-, but is a suppletive form that is phonologically very dissimilar to the regular verb stem of the second occurrence in the sequence. In contrast, for the three underrepresented tense-aspect combinations, the first and the second occurrence of the lemma BE in the sequence both have the regular verb stem be-, leading to a repetition of phonological material. That speakers avoid such repetitions is known – this principle is sometimes called “Horror aequi” – see case study 8.2.4.3 in CLGM.

It seems that we have stumbled upon an instance of this principle. The principle is interesting, because it shows that phonological surface form influences speakers’ willingness to use particular syntactic constructions – this should not be the case if you believe that language consists of independent modules (which I don’t but many linguists do): Syntactically, there is no difference between Ashley was being silly and Ashley has been being silly – in both cases, the verb be occurs twice and in both cases the syntactic relationship between the two occurrences is the same. The only difference is that there is no phonological overlap between the two forms in the first case (which is more frequent than expected) but there is such an overlap in the second case (which is much less frequent than expected).

If you want to make the distinction, here are the frequencies of the different word forms of have returned by the query above: have (13534), of (86), ‘ve (1205), has (6421), ‘s (871), had (9415), ‘d (622), having (175).[↩]

Anatol Stefanowitsch

Corpus Linguistics for the Masses

Exercise: Perfect Progressive Copular Constructions

Leave a Reply Cancel reply