A Comparative Corpus Analysis Tells Us Nothing We Don't Already Know

Readers alerted me to an article in the journal Cognition entitled Poor Writing, Not Specialized Concepts, Drives Processing Difficulty in Legal Language (here). It’s also the subject of this item in MIT News.

This article tells us nothing we don’t already know.

It’s by Eric Martinez, a recent law school graduate and licensed attorney who is now a graduate student in brain and cognitive sciences at MIT; Frank Mollica, a former visiting researcher at MIT who is now a lecturer in computational cognitive science at the University of Edinburgh; and Edward Gibson, an MIT professor of brain and cognitive sciences. Here’s the abstract:

Despite their ever-increasing presence in everyday life, contracts remain notoriously inaccessible to laypeople. Why? Here, a corpus analysis (n ≈10 million words) revealed that contracts contain startlingly high proportions of certain difficult-to-process features–including low-frequency jargon, center-embedded clauses (leading to long-distance syntactic dependencies), passive voice structures, and non-standard capitalization–relative to nine other baseline genres of written and spoken English. Two experiments (N=184) further revealed that excerpts containing these features were recalled and comprehended at lower rates than excerpts without these features, even for experienced readers, and that center-embedded clauses inhibited recall more-so than other features. These findings (a) undermine the specialized concepts account of legal theory, according to which law is a system built upon expert knowledge of technical concepts; (b) suggest such processing difficulties result largely from working-memory limitations imposed by long-distance syntactic dependencies (i.e., poor writing) as opposed to a mere lack of specialized legal knowledge; and (c) suggest editing out problematic features of legal texts would be tractable and beneficial for society at-large.

The authors compared a bunch of contracts to some Wall Street Journal articles, TV/movie scripts, spoken language, newspaper articles, blogs, magazine articles, and web pages. They say, “Our study provides the first large-scale systematic account of the presence of all of these features in legal texts, both overall and relative to a baseline.” From their findings, the authors conclude that contracts are poorly written and that society at large would benefit if contracts were easier to read.

But that adds nothing of value. We already know what contract prose looks like. I’ve chronicled in excruciating detail the features discussed in the article, and a lot more besides. I’ve done so because changing contract language requires hacking through the jungle, one contract usage at a time. In the MIT News item, Gibson says, “In this study, we’re documenting in detail what the problem is,” but it does nothing of the sort. Instead, it’s a bird’s-eye view. A satellite’s view.

The data are of no particular value either, because there’s no prospect of crunching these numbers to derive anything other than what we already know. For one thing, even clear and concise contracts would likely rate as less easy to read than text written for general readers: contracts will always be more limited, more stylized, and more complicated. And you can’t use the data for comparison—it just says “worse.” If you want data about how easy or hard contracts are to read, a better option would be readability scores, but they too offer only the coarsest sort of assessment. I wrote about them in this 2006 blog post, and I haven’t used them since.

Here are a few additional thoughts:

The article says that contacts “are at once ubiquitous and impenetrable, read by virtually everyone yet understood by seemingly no one, except lawyers.” That’s a common misconception. In fact, instead of lawyers being in command of traditional language, they’re among the bamboozled. And the article assumes that lawyers actually write contracts. In fact, they largely copy-and-paste them. It might be helpful to take these factors into account if you want to “understand why lawyers choose to write in such an esoteric manner in the first place.”

The article offers the following “before” and “after” example:

In the event that any payment or benefit by the Company (all such payments and benefits, including the payments and benefits under Section 3(a) hereof, being hereinafter referred to as the “Total Payments”), would be subject to excise tax, then the cash severance payments shall be reduced.
In the event that any payment or benefit by the Company would be subject to excise tax, then the cash severance payments shall be reduced. All payments and benefits by the Company shall hereinafter be referred to as the”Total Payments.” This includes the payments and benefits under Section 3(a) hereof.

Their “after” example is still a mess. That’s because the article looks at four features of contract prose, and the “before” example includes just one of them. The article doesn’t come close to capturing the dysfunction of contract language.

The “before” example includes a clumsy defined-term parenthetical creating what I call an “integrated definition.” I’ve seen it suggested that one piece of useful advice one could derive from the article is not to “center-embed” (to use the article’s jargon) defined-term parentheticals. That’s exactly the kind of advice you cannot derive from this article. Defined terms are a useful tool in contract drafting, and integrated definitions are a useful vehicle for creating defined terms, so it would be preposterous to dispense with them if they appear other than at the end of a sentence. It’s not a good idea to get advice on watchmaking from someone looking at the world through a telescope.

A few thoughts:

1/ Certain mentalities never go away, they just take new forms. Nineteenth-century grammarians (‘Never end a sentence with a preposition’) are now a certain kind of plain-language censors.

2/ There are no atheists. One worships either oneself or one or more other entities.

3/ It’s a temptation to think the world would be better off if people in other fields acted in the light of what I think my field has established. So cognitive science specialists think they know what contract drafters should do. Maybe they’re right, but we should all, especially scientists, be humble and open to correction by the facts. Not smug.

4/ I have long thought that the law, compared to, say, subatomic particle physics, contains nothing ordinary intelligence cannot apprehend.

5/ For example, the notion that the burden of harm should be borne by whoever caused the harm isn’t, um, rocket surgery. But its simplicity is often hidden by the need to use specialized language to work things out: duty of care, negligence, recklessness, actionable, delegation, presumption, immunity, qualified immunity, statute of limitations, affirmative defense, summary judgment, appellate issue, and perhaps one or two (hundred thousand) other terms.

6/ Even if a journey of a thousand miles consists of little steps, each may be vital. I think of contract language as those steps. Each should be taken carefully to help the parties achieve the goals for which they made the deal.

7/ Since human material progress depends in large part on the efficient use of resources, and peaceful voluntary commerce among people making and performing agreements may be the best way to achieve that, then developing and polishing language that makes agreements work more smoothly is a service to humanity, a contribution to human thriving.

8/ So maybe I should adjust my former opinion that Ken’s outstanding achievement is ‘the disciplined use of “shall”‘. Maybe it’s devoting a career to removing sand from the gears of the division of labor. –Wright

3 thoughts on “A Comparative Corpus Analysis Tells Us Nothing We Don’t Already Know”

Brian D. Day
11 March 2022 at 5:39 pm
I have found that the smuggest, most condescending people tend to be (i) atheists, (ii) vegans, and (iii) people who bemoan the current state of legal writing. I’m at least 1.75 of those things, but those people still irritate me.
Anecdotally, I’ve found that contract drafting has generally (if very gradually) improved over the past decade or so (probably coinciding with the publishing of the first addition of the MSCD). At the same time, more and more people are rending their garments over the state of contract writing. Where were these people in the 80s and early 90s, which seemed to be the high-water mark for deplorable contracts?
It seems to me that there are two general ways contracts go awry: (i) they are clumsy and difficult to read and (ii) they don’t say what they need to say. The first is irritating; the second is catastrophic. But most contract complainers focus almost solely on the first to the exclusion (if not detriment) of the second. I’ve read a lot of contracts that gloss over key principles in an effort to be more readable. If I had to chose, I’d pick enforceability over readability every day. This is why I like the MSCD, because it’s careful to address both, with probably most emphasis being on enforceability and avoiding ambiguity.
All that said, a lot of contracts still aesthetically suck. The biggest complaint that I have with contract readability is that the drafters often try to cram too many ideas into a single sentence. Fix that, and a lot of drafting problems will go away.
All of this is a long-winded way of saying that I agree with Ken’s assessment of the MIT article.
- Ken Adams
  11 March 2022 at 5:52 pm
  Phew! I thought you were going to end the first paragraph by saying “And Adams is the worst of the lot!”
AWrightBurkeMPhil
13 March 2022 at 8:36 pm
A few thoughts:
1/ Certain mentalities never go away, they just take new forms. Nineteenth-century grammarians (‘Never end a sentence with a preposition’) are now a certain kind of plain-language censors.
2/ There are no atheists. One worships either oneself or one or more other entities.
3/ It’s a temptation to think the world would be better off if people in other fields acted in the light of what I think my field has established. So cognitive science specialists think they know what contract drafters should do. Maybe they’re right, but we should all, especially scientists, be humble and open to correction by the facts. Not smug.
4/ I have long thought that the law, compared to, say, subatomic particle physics, contains nothing ordinary intelligence cannot apprehend.
5/ For example, the notion that the burden of harm should be borne by whoever caused the harm isn’t, um, rocket surgery. But its simplicity is often hidden by the need to use specialized language to work things out: duty of care, negligence, recklessness, actionable, delegation, presumption, immunity, qualified immunity, statute of limitations, affirmative defense, summary judgment, appellate issue, and perhaps one or two (hundred thousand) other terms.
6/ Even if a journey of a thousand miles consists of little steps, each may be vital. I think of contract language as those steps. Each should be taken carefully to help the parties achieve the goals for which they made the deal.
7/ Since human material progress depends in large part on the efficient use of resources, and peaceful voluntary commerce among people making and performing agreements may be the best way to achieve that, then developing and polishing language that makes agreements work more smoothly is a service to humanity, a contribution to human thriving.
8/ So maybe I should adjust my former opinion that Ken’s outstanding achievement is ‘the disciplined use of “shall”‘. Maybe it’s devoting a career to removing sand from the gears of the division of labor. –Wright

3 thoughts on “A Comparative Corpus Analysis Tells Us Nothing We Don’t Already Know”

Leave a Comment Cancel reply

The voice that matters.