Author Archives: astefanowitsch

Drawing syntax trees with R

A while back I was looking into treebanks (something that a future edition of CLM should probably spend more time on than the current one, which basically just points out that they exist). I created some small treebanks, trying out different parsers and manually correcting their output. In order to find errors in the parse, I used Yoichiro Hasebe’s great online tool RSyntaxTree – so named because it is written in Ruby, not, unfortunately, in R – to visualize the trees.

Then it struck me how great it would be if I could actually use R to draw the trees for me instead. I looked around for a package that would do this, and I don’t remember if I couldn’t find one or if I just didn’t like what I found. Anyway, I decided to come up with a way on my own – and I did, relying almost exclusively on existing packages. This post describes how. Continue reading

Exercise: Perfect Progressive Copular Constructions

I’m working on a new case study on statistically underrepresented constructions to complement (or perhaps replace) the case study on negative evidence in Section 8.2.2.3 of CLGM. The case study involves perfect progressive passive constructions (inspired by the broader case study on progressive passives in Manfred Krug and Julia Schlüter’s Research Methods in Language Variation and Change, Cambridge, 2013). It is a complex case study and I’m not sure it will lead to anything, but it has yielded a by-product that might make an interesting exercise.

To get a first overview of perfect progressive passives, I did what Krug and Schlüter (and others) have done, and simply queried the BNC for the sequence “been being” (the CQP query I used was ⟨[word=”been”%c] [pos=”AV.”]? [word=”being”%c]⟩, allowing for the potential occurrence of an adverb). This yielded six hits: Continue reading

Review of CLGM in the IJCL

Kevin Gerigk has reviewed Corpus Linguistics: A Guide to the Methodology for the International Journal of Corpus Linguistics. The review is open access, so you can read it here.

The review is very useful, because it draws attention to ways in which a future edition of the book might be improved. I would like to respond very briefly to three issues raised in the review. Continue reading

Precision and recall

Precision and recall are discussed in Section 4.1.2 of “Corpus Linguistics: A Guide to the Methodology” (CLGM) (p. 111–116). Frequently, students seem to have more difficulties than I would have expected in understanding these concepts, so I looked around to see how other people have explained it. Frequently, a fishing metaphor is used, which I quite like, as it has a potential to explain many other aspects of corpus linguistics. So I decided to write my own version of such a metaphorical explanation, which I may or may not include in a future edition of CLGM. Continue reading

Review of CLGM in the Časopis pro moderní filologii

Lucie Lukešová reviewed Corpus Linguistics: A Guide to the Methodology for the Časopis pro moderní filologii last year. If you read Czech or if you, like me, are willing to trust Google Translate, you can read the text here.

The review is very positive overall, concluding as follows: “I dare
say that the author succeeded in what he set out to do – to create a textbook that was lacking in the market. It is full of information, and yet the reader does not find themselves lost or overwhelmed. That is why I am happy to recommend it not only to all my students, but also to colleagues who, like me, sometimes need a reliable beacon (and sometimes a lifeline) in the stormy waters of corpus data.”

I sometimes dream of living in an old lighthouse on the Baltic Sea coast – it will always remain a dream, as I am very much an urbanite who gets nervous when he is more than a few hours away from a major city, but it certainly lets me appreciate Lucie’s maritime metaphor!

A message to my readers

My open-access textbook Corpus Linguistics: A Guide to the Methodology, which took me 15 years to write, was finally published in early 2020, just as the COVID pandemic hit the world.

I had planned to launch the book together with a companion website containing additional resources, study questions, exercises and the like,  but like many colleagues, I was overwhelmed by the sudden COVID-induced task of moving my teaching and my administrative duties online, disrupting all of the comfortable work routines I had adopted to leave time for things like research, family life, and setting up companion websites for textbooks.

As I do not see an end to the pandemic, let alone to the disruptions it has caused, I have decided to launch the website in blog form. On the one hand, this format is more modest than what I had originally planned, as it means that the website will remain perpetually incomplete, growing toward a more complete version of itself post by post whenever I find the time.

On the other hand, this format is more aspirational than I had originally envisioned, as enough time has passed since the publication of the book for me to start thinking about where it might be improved, and this blog will be a place not only for the exercises and study questions I had originally planned, but also (or perhaps, instead) for revisions and additional material to be included in a second edition (which, however, should not be expected for at least another three years, so please keep using the current edition)! Continue reading