A Comparative Corpus Analysis Tells Us Nothing We Don’t Already Know

Readers alerted me to an article in the journal Cognition entitled Poor Writing, Not Specialized Concepts, Drives Processing Difficulty in Legal Language (here). It’s also the subject of this item in MIT News.

This article tells us nothing we don’t already know.

It’s by Eric Martinez, a recent law school graduate and licensed attorney who is now a graduate student in brain and cognitive sciences at MIT; Frank Mollica, a former visiting researcher at MIT who is now a lecturer in computational cognitive science at the University of Edinburgh; and Edward Gibson, an MIT professor of brain and cognitive sciences. Here’s the abstract:

Despite their ever-increasing presence in everyday life, contracts remain notoriously inaccessible to laypeople. Why? Here, a corpus analysis (n ≈10 million words) revealed that contracts contain startlingly high proportions of certain difficult-to-process features–including low-frequency jargon, center-embedded clauses (leading to long-distance syntactic dependencies), passive voice structures, and non-standard capitalization–relative to nine other baseline genres of written and spoken English. Two experiments (N=184) further revealed that excerpts containing these features were recalled and comprehended at lower rates than excerpts without these features, even for experienced readers, and that center-embedded clauses inhibited recall more-so than other features. These findings (a) undermine the specialized concepts account of legal theory, according to which law is a system built upon expert knowledge of technical concepts; (b) suggest such processing difficulties result largely from working-memory limitations imposed by long-distance syntactic dependencies (i.e., poor writing) as opposed to a mere lack of specialized legal knowledge; and (c) suggest editing out problematic features of legal texts would be tractable and beneficial for society at-large.

The authors compared a bunch of contracts to some Wall Street Journal articles, TV/movie scripts, spoken language, newspaper articles, blogs, magazine articles, and web pages. They say, “Our study provides the first large-scale systematic account of the presence of all of these features in legal texts, both overall and relative to a baseline.” From their findings, the authors conclude that contracts are poorly written and that society at large would benefit if contracts were easier to read.

But that adds nothing of value. We already know what contract prose looks like. I’ve chronicled in excruciating detail the features discussed in the article, and a lot more besides. I’ve done so because changing contract language requires hacking through the jungle, one contract usage at a time. In the MIT News item, Gibson says, “In this study, we’re documenting in detail what the problem is,” but it does nothing of the sort. Instead, it’s a bird’s-eye view. A satellite’s view.

The data are of no particular value either, because there’s no prospect of crunching these numbers to derive anything other than what we already know. For one thing, even clear and concise contracts would likely rate as less easy to read than text written for general readers: contracts will always be more limited, more stylized, and more complicated. And you can’t use the data for comparison—it just says “worse.” If you want data about how easy or hard contracts are to read, a better option would be readability scores, but they too offer only the coarsest sort of assessment. I wrote about them in this 2006 blog post, and I haven’t used them since.

Here are a few additional thoughts:

The article says that contacts “are at once ubiquitous and impenetrable, read by virtually everyone yet understood by seemingly no one, except lawyers.” That’s a common misconception. In fact, instead of lawyers being in command of traditional language, they’re among the bamboozled. And the article assumes that lawyers actually write contracts. In fact, they largely copy-and-paste them. It might be helpful to take these factors into account if you want to “understand why lawyers choose to write in such an esoteric manner in the first place.”

The article offers the following “before” and “after” example:

In the event that any payment or benefit by the Company (all such payments and benefits, including the payments and benefits under Section 3(a) hereof, being hereinafter referred to as the “Total Payments”), would be subject to excise tax, then the cash severance payments shall be reduced.

In the event that any payment or benefit by the Company would be subject to excise tax, then the cash severance payments shall be reduced. All payments and benefits by the Company shall hereinafter be referred to as the”Total Payments.” This includes the payments and benefits under Section 3(a) hereof.

Their “after” example is still a mess. That’s because the article looks at four features of contract prose, and the “before” example includes just one of them. The article doesn’t come close to capturing the dysfunction of contract language.

The “before” example includes a clumsy defined-term parenthetical creating what I call an “integrated definition.” I’ve seen it suggested that one piece of useful advice one could derive from the article is not to “center-embed” (to use the article’s jargon) defined-term parentheticals. That’s exactly the kind of advice you cannot derive from this article. Defined terms are a useful tool in contract drafting, and integrated definitions are a useful vehicle for creating defined terms, so it would be preposterous to dispense with them if they appear other than at the end of a sentence. It’s not a good idea to get advice on watchmaking from someone looking at the world through a telescope.

