Using Topic-Modeling in Legal History, with
an Application to Pre-Industrial English Case

Law on Finance

PETER GRAJZL AND PETER MURRELL

The last few decades have seen the ever-increasing importance of quan-
titative empirical methods in historical studies in general, and in economic
history in particular. However, these methods have made few inroads into
pre-twentieth-century, and especially pre-industrial, legal history, despite
the central place of law in the history of world economic development.1

Law and History Review May 2022, Vol. 40, No. 2
© The Author(s), 2022. Published by Cambridge University Press on behalf of the American Society for Legal
History. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence
(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction
in any medium, provided the original work is properly cited.
doi:10.1017/S0738248022000153

Peter Grajzl is in the Department of Economics, The Williams School of
Commerce, Economics, and Politics, Washington and Lee University,
Lexington, Virginia and CESifo, Munich, Germany <grajzlp@wlu.edu> Peter
Murrell is in the Department of Economics, University of Maryland, College
Park, Maryland <pmurrell@umd.edu> They thank two referees for their insightful
comments, and especially the editor, Gautham Rao, for his encouragement and
insights. They are also grateful to Jonathan Gong for research assistance and to
participants at the AI4Law workshop in February 2021 at the University of
Oxford for helpful comments.

1. S. Robertson, “Searching for Anglo-American Digital Legal History,” Law and History
Review 34 (2016): 1047–69, noting that “as the fields of digital humanities and digital his-
tory have grown in scale and visibility since the 1990s, legal history has largely remained on
the margins of those fields.” There are some important very recent examples for the nine-
teenth century, such as K. Funk and L.A. Mullen, “The Spine of American Law: Digital
Text Analysis and U.S. Legal Practice,” American Historical Review 123 (2018): 132–64.
In recent years, a number of empirical articles are appearing that use data from the eighteenth
century made available by the Old Bailey Proceedings project. See T. Hitchcock,
R. Shoemaker, C. Emsley, S. Howard, and J. McLaughlin, “The Proceedings of the Old
Bailey, 1674-1913” www.oldbaileyonline.org (accessed April 2021). Both of these works
rely on the types of computational advances that we highlight in this article and that we
feel will lead to a quiet revolution in legal historical studies. Existing, more traditional stud-
ies on the period before the nineteenth century usually contain very small samples or few
variables, implying that there is a limited ability to apply the types of empirical methods
that are now commonplace in economic history. E. Cavell, “The Measure of Her Actions:
A Quantitative Assessment of Anglo-Jewish Women’s Litigation at the Exchequer of the
Jews, 1219-81,” Law and History Review 39 (2021): 135–72 provides a recent example

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://creativecommons.org/licenses/by/4.0/
mailto:grajzlp@wlu.edu
mailto:pmurrell@umd.edu
https://www.oldbaileyonline.org
https://doi.org/10.1017/S0738248022000153


No doubt the relative absence of such quantitative legal history is because
the legal record is mostly in words, the processing of which requires com-
putational power that is orders of magnitude beyond that needed for num-
bers. However, with huge increases in computer power in recent years and
the associated development of desktop text-analyzing software, the menu
of research methods and results available to all legal historians is now rap-
idly changing. Text can be processed and analyzed as quickly and easily as
numbers were two decades ago.2 Libraries of readily usable computational
packages are available for the statistical analysis of texts. We now have the
possibility of using the text of centuries ago as data.3

The objective of the present article is to convey to traditional legal his-
torians the role that these new computational techniques can play in legal-
historical research. We do so by presenting an example of the types of
results that can be produced with these new tools.4 As we present the
example, we outline the steps that must be taken in the computational-
statistical process. But our presentation does not require readers to be con-
versant with the intricacies of such methods. We provide verbal, intuitive
descriptions of the methods used and the tasks that must be accomplished.

of a very interesting exercise in early legal history that is, understandably, limited by a small
sample with few variables. D. Klerman, “Settlement and the Decline of Private Prosecution
in Thirteenth-Century England,” Law and History Review 19 (2001): 1–65 is notable in pro-
viding a very early example of pre-industrial legal history that is exceptional for the central-
ity of empirical methods in its contribution.
2. The general problem is usefully captured as “How do you write a national history that

was the product of lawmaking in 50 separate jurisdictions?” as posited by E. Nystrom and
D. Tanenhaus, “The Future of Digital Legal History: No Magic, No Silver Bullets,”
American Journal of Legal History 56 (2016): 150–67. This problem is multiplied in case
law where one is studying hundreds of years and thousands of cases. The methods we
describe in this article almost completely remove the sample-size and limited-observations
constraint referred to in the previous footnote. Notably, the general project that includes
the current article did not rely on any extramural funding, emphasizing that the techniques
we describe are within the reach of all scholars.
3. See, for example, J. Grimmer and B. M. Stewart, “Text as Data: The Promise and Pitfalls

of Automatic Content Analysis Methods for Political Texts,” Political Analysis 21 (2013): 267–
97; M. Gentzkow, B. Kelly, and M. Taddy, “Text as Data,” Journal of Economic Literature 57
(2019): 535–74; and M.A. Livermore and D. N. Rockmore, eds., Law as Data: Computation,
Text, and the Future of Legal Analysis (Santa Fe: SFI Press, 2019).
4. In legal history, two early examples of the use of the new sets of computational tools

are provided by D. Tanenhaus and E. Nystrom, “Let’s Change the Law: Arkansas and the
Puzzle of Juvenile Justice Reform in the 1990s,” Law and History Review 34 (2016):
957–97; and C. Romney, “Using Vector Space Models to Understand the Circulation of
Habeas Corpus in Hawai’i, 1852–92,” Law and History Review 34 (2016): 999–1026. In
contrast to the exercise reported in this article, these two examples do not use the computa-
tional methods to drive an empirical exercise but rather use these methods as search proce-
dures to find those legal materials on which a more traditional analysis should be focused.

Law and History Review, May 2022190

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


Our view is that what we offer in this article will be instructive for scholars
currently using traditional legal-historical approaches. If the past two
decades of methodological developments in the humanities, the social sci-
ences, and law teach us anything, it is more or less inevitable that the new
computational methods will become a part of the toolkit of legal history.
Importantly, we do not argue that the new computational approach will

replace existing methods. In fact, we do the opposite. As we present our exam-
ple, we detail many instances in which our use of the existing work of tradi-
tional legal historians has played an absolutely vital role in our ability to
produce any novel insights from our application of the new tools. Thus, through
the use of example, we hope to show how traditional and computational legal
history can complement each other as the field of legal history moves into the
new age in which the use of computational methods will become standard. In
doing so, we are also able to pinpoint where each of the traditional and
computational approaches to legal history has its comparative advantage.
To make this article more accessible to those unfamiliar with any of these

new methods, we focus on only one, topic-modeling, which indeed is one of
the most popular machine-learning techniques that has been applied in his-
tory, law, and social science.5 Concentrating on one method allows us to
focus on the essential characteristics of machine-learning and to discuss
them in intuitive, non-technical ways, addressing our exposition not to

5. On the popularity of topic-modeling, see Gentzkow et al., “Text as Data” and J. Guldi
and B. Williams “Synthesis and Large-Scale Textual Corpora: A Nested Topic Model of
Britain’s Debates over Landed Property in the Nineteenth Century,” Current Research in
Digital History 1 (2018), https://doi.org/10.31835/crdh.2018.01 (accessed April 2021)
Several other machine-learning and related computational approaches have been utilized to
investigate law-as-data. Machine-learning methods have been used, for example, to predict
court outcomes; see for example, D.M. Katz, M.J. Bommarito, and J. Blackman, “A
General Approach for Predicting the Behavior of the Supreme Court of the United States,”
PLoS One 12 (2017): e0174698. Word and document embedding models represent words
and documents as numerical scores for a long list of variables, thereby helping to quantify
the meaning of words and documents on the basis of their proximity to other words and doc-
uments in the corpus; see, for example, E. Ash and D.L. Chen, “Case Vectors: Spatial
Representations of the Law Using Document Embeddings,” in Law as Data, ed. M.A.
Livermore and D.N. Rockmore (Santa Fe: SFI Press, 2019), 313–37. Embedding approaches
have been employed, for example, to investigate the presence of racial bias in judicial opin-
ions; see, for example, D. Rice, J.H. Rhodes, and T. Nteta, “Racial Bias in Legal
Language,” Research & Politics April-June (2019), 1–7. For an overview of the use of
machine-learning and computational methods in the emerging research field of computational
analysis of law-as-data, see J. Frankenreiter and M.A. Livermore, “Computational Methods in
Legal Analysis,” Annual Review of Law and Social Science 16 (2020): 39–57. For innovative
applications of computational methods to legal-historical themes, but not focusing on English
case law, see, for example, S. Klingenstein, T. Hitchcock, and S. DeDeo, “The Civilizing
Process in London’s Old Bailey,” Proceedings of the National Academy of Sciences 111
(2014): 9419–24 and Funk and Mullen, “The Spine of American Law”.

Using Topic-Modeling in Legal History 191

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.31835/crdh.2018.01
https://doi.org/10.1017/S0738248022000153


those who want to learn the details of the computational analysis but rather to
those who want to understand the types of insights that computational-text-
analysis can bring to substantive domain-specific research. As Grimmer,
Roberts, and Stewart argue, “machine learning is as much a culture defined
by a distinct set of values and tools as it is a set of algorithms.”6

This article is written by economists. One additional impetus underlying
the writing of this article followed from our observation that the main-
stream economics literature has tended to ignore the insights of the histo-
rians of case law, while traditional case law historians hardly refer to the
methods and findings of economists.7 Perhaps this is because economists
are more moved by quantitative evidence, which is not easily found in
the history of case law. This article is an attempt to straddle the two fields,
to show that there can be strong complementarities between them.
To illustrate the power of topic-modeling for legal history, we provide

new quantitative information on developments in English case law and
legal ideas from the mid-sixteenth century to the mid-eighteenth century.
Thus, central to our approach in this article is showing the usefulness of
the computational methods by providing an example of their application
to ongoing debates in legal history. In contrast to much existing work in
digital history, we do not argue for the productiveness of the computational
methods by focusing on the methods themselves. Rather, we endeavor to
make the case by providing an example of the contribution of the methods
to an understanding of the past that is directly relevant to the disciplines of
legal history and economics.8

This era of English law has been of particular interest to both legal histo-
rians and economists, for related reasons: for the former because much law

6. J. Grimmer, M. E. Roberts, and B. Stewart, “Machine Learning for Social Science: An
Agnostic Approach”, Annual Review of Political Science 24 (2021): 395–419.
7. R. Harris, “The Encounters of Economic History and Legal History,” Law and History

Review 21 (2003): 297–346 identified this separation of these fields, and his conclusions still
seem to apply today.
8. On these points more generally, see S. Robertson and L. Mullen, “Arguing with Digital

History: Patterns of Historical Interpretation,” Journal of Social History 54 (2021): 1005–22,
who argue that “Digital history has only rarely contributed interpretative or argumentative
scholarship that contributes to disciplinary understandings of the past,” largely because of
its focus on the methodological. Beyond the example appearing here, the use of the methods
outlined in this article and of the data set discussed here are provided in several additional
articles that contribute to disciplinary understandings of the past: See P. Grajzl and
P. Murrell, “A Machine-Learning History of English Caselaw and Legal Ideas Prior to
the Industrial Revolution II: Applications,” Journal of Institutional Economics 17 (2021):
201–16; P. Grajzl and P. Murrell, “A Macrohistory of Legal Evolution and Coevolution:
Property, Procedure, and Contract in Pre-Industrial English Caselaw” https://dx.doi.org/10.
2139/ssrn.4005612; and P. Grajzl and P. Murrell “Of Families and Inheritance: Law and
Development in Pre-Industrial England” https://dx.doi.org/10.2139/ssrn.3975015

Law and History Review, May 2022192

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://dx.doi.org/10.2139/ssrn.4005612
https://dx.doi.org/10.2139/ssrn.4005612
https://dx.doi.org/10.2139/ssrn.4005612
https://dx.doi.org/10.2139/ssrn.3975015
https://dx.doi.org/10.2139/ssrn.3975015
https://doi.org/10.1017/S0738248022000153


relevant to the modern era was created then; for the latter because of the pos-
sible connection between legal developments and the rise of Britain as the
first industrial power. In particular, the progress of the financial sector in
the decades preceding the Industrial Revolution has received much attention
in economics. However, the work of economists on pre-industrial finance
has placed little emphasis on case law, which for many is the defining char-
acteristic of the English legal family. We show how topic-modeling can use
the case law record to cast new light on the patterns and sources of
finance-related legal developments in England from the middle of the six-
teenth century to the Industrial Revolution. In doing so, we find invaluable
the accumulated insights of legal historians, echoing the views of users of
topic models in other fields who emphasize how the traditional “close” read-
ing of texts must be used alongside the “distant” reading provided by
machine-learning.9 The outputs generated on the basis of the new methods
are the complements of traditional legal-historical research. The results from
topic-modeling are not replacements for the detailed, and immensely valu-
able, contextual analysis of traditional legal historians, but instead simply
offer a different sort of lens for studying legal-historical phenomena.
The focus is on the use of the quantitative output produced by one exist-

ing topic-modeling exercise, that of Grajzl and Murrell, henceforth referred
to as GM.10,11 By building on an existing implementation of topic-

9. The useful distinction between close and distant reading arose among scholars of liter-
ature in what has become known as the “digital humanities,” in which debates about the use-
fulness of computational methods, particularly topic-modeling, were both early and very
spirited. For the digital humanities, see, for example, A. Goldstone and T. Underwood,
“The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could
Tell Us,” New Literary History 5 (2014): 359–84. For history, see the very early study by
S. Block “Doing More with Digitization: An Introduction to Topic Modeling of Early
American Sources,” Commonplace 6 (2006), http://commonplace.online/article/doing-
more-with-digitization/, accessed April 2006 and more recently J. Guldi, “Critical Search:
A Procedure for Guided Reading in Large-Scale Textual Corpora,” Journal of Cultural
Analytics 3 (2018), https://doi.org/10.22148/16.030, (accessed April 2021). For the same
emphasis in political science, see Grimmer and Stewart, “Text as Data” and, in a joint prod-
uct of a sociologist and two computer scientists, P. DiMaggio, M. Nag, D. Blei, “Exploiting
Affinities Between Topic Modeling and the Sociological Perspective on Culture: Application
to Newspaper Coverage of U.S. Government Arts Funding,” Poetics 41 (2013): 570–606.
For legal history, see Robertson, “Searching”.
10. See P. Grajzl and P. Murrell, “A Machine-Learning History of English Caselaw and

Legal Ideas Prior to the Industrial Revolution I: Generating and Interpreting the Estimates,”
Journal of Institutional Economics 17 (2021): 1–19; and P. Grajzl and P. Murrell, “A
Machine-Learning History of English Caselaw and Legal Ideas Prior to the Industrial
Revolution II: Applications,” Journal of Institutional Economics 17 (2021): 201–16.
11. The work on English case reports is part of a much larger project on using computa-

tional and statistical techniques to understand English history. The earliest products of this
project combined legal history and intellectual history, with two articles addressing

Using Topic-Modeling in Legal History 193

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

http://commonplace.online/article/doing-more-with-digitization/
http://commonplace.online/article/doing-more-with-digitization/
https://doi.org/10.22148/16.030
https://doi.org/10.1017/S0738248022000153


modeling, this article can omit descriptions of the technical nuances and
the details of data construction, making it accessible to a wider audience.
We do, however, provide an intuitive description of the methods used to
generate the raw quantitative output of the topic model, a description
that is intended to be accessible to those not versed in the details of
computational-statistical modeling.
We then use that intuitive description of topic modeling to describe its

data outputs. Importantly, the outputs of a topic model are not the endpoint
of such an exercise. Rather, they constitute data that can be productively
employed as an input into subsequent analyses. Thus, we turn to examples
of the substantive insights that can be generated from the data set produced
by the machine learning. We highlight the types of information that can be
generated and made readily available to other scholars. That information
can be easily used by those who have no intention of implementing the
methods themselves but rather are interested in the types of substantive
results that can be generated by the data that are the output of a topic
model.
Section I presents the informal overview of topic-modeling. It begins

with a brief history of how this tool has been used in the humanities,
law, and the social sciences, showing that the particular exercise that this
article presents is a natural outgrowth of two decades of development
and application of topic-modeling. This short history argues that topic-
modeling should not be regarded as immediately alien to legal history in
view of the fact that it has been applied in fields whose objects of study
share many features with the history of the law.
Then, we proceed with a non-technical discussion of the assumptions,

methods, and outputs of topic-modeling. This informal overview has the
advantage that it lays bare the types of assumptions about texts that
machine-learning uses, so that the weaknesses of the new approaches
can be clearly seen.
The raw data used for the topic model discussed here are virtually all

reports on cases heard before 1765 that appear in the English Reports, a
corpus comprising 52,949 reports.12 Topic-modeling produces parsimoni-
ous summaries of this enormous amount of text information, which

understanding more general sets of ideas, focused on Francis Bacon and Edward Coke. See
P. Grajzl and P. Murrell, “Toward Understanding 17th Century English Culture: A Structural
Topic Model of Francis Bacon’s Ideas,” Journal of Comparative Economics 47 (2019): 111–
35; and P. Grajzl and P. Murrell, “Characterizing a Legal-Intellectual Culture: Bacon, Coke,
and Seventeenth-Century England,” Cliometrica 15 (2021): 43–88.
12. The digitized copies of the English Reports were purchased from a publishing com-

pany domiciled in South Africa. It is beyond the scope of this article to provide the many
details of the initial processing of these digital copies, and the cleaning of them. Suffice it

Law and History Review, May 2022194

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


comprises 31,057,596 words. Because topic-modeling is an unsupervised
machine-learning technique, the shape of the summaries themselves is
not produced in order to answer a particular question or to test a particular
hypothesis. Rather, the text-data themselves shape their own synopsis.
The summaries are in the form of 100 “topics,” as if the computational

methods had produced a new digest of English law, divided into 100 sections.
This is the “dimensionality reduction” aspect of machine-learning, producing
an organized summary of an enormous amount of text that no human being
could possibly hope to read (or at least retain and organize in memory).13 In
this respect, topic-modeling dovetails with one of the central concerns of his-
torians: to provide compelling narratives. The computer is essential to the
production of the narrative because so much information is captured and con-
densed. This is especially the case when the attempt is to capture ebbs and
flows over centuries: Guldi and Armitage emphasize the potential in big
data to return historical studies to the longue durée.14

In the case of topic-modeling, the computer output itself is only the
beginning, and much interpretation is needed. The sections of the digest
come without names; one just knows which case reports feature a particular
digest section most prominently and which vocabulary that section most
favors. The detailed work of legal historians over the centuries then pro-
vides the background for analysis of this information, enabling the
researcher to understand which areas of law a particular section of the
digest contains, thereby driving the crucial step of topic naming. Close

to say that a very large proportion of GM’s labor time devoted to the pertinent research pro-
jects was consumed in all of these tasks. For more details, see GM and M. Schmidt,
“Institutional Persistence and Change in England’s Common Law: 1700-1865” (PhD
diss., University of Maryland, 2015). Because a central objective of GM was to include
as many reports as possible, which necessarily implied computational processing of all
reports, there was a need to exclude a small percentage of reports with too many words
that did not have a counterpart in either modern English or standard Latin. Chiefly, this
had the effect of excluding reports in Law French. There is no doubt that this is a blemish
on the application of the computational methods. Initially, there were 60,249 pre-1765
reports in the data set, but 6,917 were dropped because they were in Law French and a fur-
ther 383 were removed because they contained too many unrecognizable words. This left
52,949 reports.
13. The summary does not rely on any existing classifications: we return to this point in

the conclusion.
14. See J. Guldi and D. Armitage, The History Manifesto (Cambridge: Cambridge

University Press, 2014), emphasizing that “Over the last decade, the emergence of the digital
humanities as a field has meant that a range of tools are within the grasp of anyone, scholar
or citizen, who wants to try their hand at making sense of long stretches of time. Topic mod-
eling software can machine read through millions of government or scientific reports and
give back some basic facts about how our interest in ideas have changed over decades
and centuries.”

Using Topic-Modeling in Legal History 195

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


reading undertaken in the context of traditional legal history is essential to
interpret the output of the computer’s distant reading.
Importantly, the naming of topics can be done by any researcher who

has obtained the data produced by the topic model: it is undertaken
quite separately from the computational analysis. This possibility for the
sharing of the generated data is one of the most important contributions
that the new methods can offer to legal history and legal-historical
research: the results that are the output of the topic-modeling exercise
can be made available and used as inputs by all researchers.
Once the underlying nature of each topic is understood, the researcher

can then proceed to analyze the vast amount of quantitative information
produced by the computational methods. This article provides an example
of how such information can be used: its input data are the output data of
GM and we use those data to present new results and provide insights that
any readers could have produced had they availed themselves of the same
data.
Each of Sections III, IV, and V is built around just one evocative figure

intended to provide an interpretation of the development of case law and
legal ideas relevant to finance in pre-industrial England. The origins of
our interest in these developments lies in our background as economists.
In Section II, we review the debates that have made the history of
English law on finance important in that discipline, and explain how the
combination of machine-learning methods and the prior insights of legal
historians offers new information pertinent to those debates. We ask and
answer the following questions. Which time periods evidence the most
intense development of that area of law and legal ideas? Which pre-
existing elements of law, such as property or contract, were most important
as inputs into this development, and when? What were the relative roles of
common law and equity in spurring these developments?
Section III introduces the 15 of the 100 GM-estimated topics that are

most relevant to finance, the most salient sections of the machine-produced
digest. These fifteen topics were identified by the authors on the basis of
topic content, and therefore the overall category of finance is not an entity
produced by the topic modeling itself. This is just one of the many exam-
ples we provide in this article of the fact that the modeling of the topics
themselves is not the endpoint of the analysis, but rather provides the
data that the researcher uses, in combination with existing information
and judgment, to proceed to real substance.
The periods of the most intense development of the relevant case law

become evident by examining timelines that show when these fifteen topics
are most prevalent in the English Reports. For example, the timelines show
what will be very familiar to legal historians: that ideas on assumpsit

Law and History Review, May 2022196

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


developed in the early seventeenth century. But the timelines can add to
these insights by demonstrating that attention to assumpsit peaked around
1630, while the development of ideas on the validity of contracts, for
example, was largely a product of the 1690s and later. Cumulatively, our
timelines of the finance-related areas of law suggest that the seventeenth
century witnessed many advances in case law that became relevant to
eighteenth–century finance.
Section IV considers connections among the developments that take place

in differing areas of law. Any case report usually incorporates ideas from var-
ied legal domains even if one specific issue is central to the case: a single case
is indexed within many sections of the digest. The topic-modeling produces
data on the proportion of each of the 100 estimated topics that is present in
each of the 52,949 reports of cases. Thus, one can find, for example, whether
a case that is very much centered on trusts tends to emphasize contract issues
or property considerations. By examining such connections in general, one
can make conclusions about the legal ideas in one domain that were relevant
to, and possibly fed into, the legal ideas in another domain. In Section IV, we
identify the links among the fifteen topics (the digest sections) identified with
finance, as well as links between these fifteen topics and ones not classified
within finance. We find, for example, that early-seventeenth-century devel-
opments in the case law of contracts had significant effects on later devel-
opments in case law relevant to finance
Section V examines the relative importance of common law and equity

in producing law relevant to finance. Although case reports are unambig-
uously assignable to courts and although specific legal notions were
often the particular province of either common law or equity, each type
of court absorbed ideas from the other. For example, a case on trusts in
Chancery (an equity court) could well use ideas on contracts developed
in Common Pleas or King’s Bench (common-law courts). The develop-
ment of ideas in a given legal domain can then be ultimately viewed as
reflecting debates in both common law and equity. We examine the relative
importance of law and equity for each topic related to finance.
Interestingly, our evidence shows that many of the critical areas of law
on finance were a product of equity, and not of the common law. To
state the implied conclusion in its most contentious form, Britain might
never have been economically powerful enough to spread its common
law around the world had it relied solely on the common law at the time
that it began spreading its system of law around the world.
Section VI concludes, providing reflections on both the promise of com-

putational text analysis for legal history and its pitfalls. We comment on
what topic-modeling can and cannot do. We emphasize that topic-
modeling can provide new sources of data for other researchers: once a

Using Topic-Modeling in Legal History 197

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


massive volume of texts are summarized, the quantitative summaries them-
selves can provide inputs into further research. Peering into the future, one
can detect signs that unsupervised machine-learning might be gradually
changing the research perspectives of social science, with descriptive anal-
yses now becoming more acceptable. The almost exclusive emphasis on
the hypothetico-deductive method is waning (very slightly at the moment)
and exercises in the inductive spirit are gaining credibility. This change
would naturally lead to much more complementarity between traditional
legal historians and those who favor the use of computational and statistical
methods in the social sciences.

I. An Introduction to Topic-Modeling

The techniques that we describe here are descendants of the seminal paper
by Blei, Ng, and Jordan,15 particularly the structural topic model by
Roberts, Stewart, and Airoldi, which is the version of topic-modeling
used by GM to produce their results.16 Topic-modeling originated in com-
puter science, in pursuit of using computational methods to summarize

15. D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” Journal of
Machine Learning Research 3 (2003): 993–1022. One measure of the prominence of this
contribution is that this is the seventh most cited article in computer science that was pro-
duced this millennium (https://citeseerx.ist.psu.edu/stats/articles). To be sure, there were a
number of similar algorithms developed before the Blei et al. contribution, but early excite-
ment about these methods seems to have focused on Blei et al., perhaps because of the acces-
sible software developed for implementation. See A.K. McCallum, “MALLET: A Machine
Learning for Language Toolkit,” (http://mallet.cs.umass.edu, accessed April 2021). In their
note explaining this software, S. Graham, S. Weingart, and I. Milligan, “Getting Started with
Topic Modeling and MALLET,” (https://programminghistorian.org/en/lessons/topic-model-
ing-and-mallet, accessed April 2021). state: “You will sometimes come across the term
‘LDA’ when looking into the bibliography of topic modeling. LDA and Topic Model are
often used synonymously, but the LDA technique is actually a special case of topic model-
ing created by David Blei and friends. . . . It was not the first technique now considered topic
modeling, but it is by far the most popular. . .They all work in much the same way.” One
such earlier algorithm was used in study by Newman and Block in the first history publica-
tion to use topic-modeling. See D. J. Newman and S. Block, “Probabilistic Topic
Decomposition of an Eighteenth-Century American Newspaper,” Journal of the American
Society for Information Science and Technology 57 (2006): 753–67.
16. See M.E. Roberts, B.M. Stewart, and E.M. Airoldi, “A Model of Text for

Experimentation in the Social Sciences,” Journal of the American Statistical Association
111 (2016): 988–1003, whose general approach is very similar to that of Blei et al., but
has an emphasis on incorporating document meta-information (such as date of publication)
directly into the analysis. Small details would have changed had we used LDA, but we are
sure the overall picture would have remained the same. For copious detail on the
structural-topic-model approach to topic-modeling, including how to get started on imple-
mentation, see https://www.structuraltopicmodel.com (accessed June 2019).

Law and History Review, May 2022198

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://citeseerx.ist.psu.edu/stats/articles
https://citeseerx.ist.psu.edu/stats/articles
http://mallet.cs.umass.edu
http://mallet.cs.umass.edu
https://programminghistorian.org/en/lessons/topic-modeling-and-mallet
https://programminghistorian.org/en/lessons/topic-modeling-and-mallet
https://programminghistorian.org/en/lessons/topic-modeling-and-mallet
https://www.structuraltopicmodel.com
https://www.structuraltopicmodel.com
https://doi.org/10.1017/S0738248022000153


large amounts of text information. Within the social sciences and human-
ities, the field in which topic-modeling first flourished was the digital
humanities, particularly literature, obviously a field for which text is cen-
tral. In that discipline, the rise in popularity was probably fueled by the rhe-
toric of the assertions of the advantages of distant reading over traditional
close reading, and ensuing debates. With text, rather than numbers, provid-
ing much of the core data in politics, political science was the next major
discipline to see the advantages of the new machine-learning approaches,
particularly topic-modeling. It is much more difficult to find applications
in political theory, which is perhaps the closest analog in political science
to case law.17 Political science was naturally followed by law, also presum-
ably because much of its data are texts, but legal history has been slow to
follow. Digital humanities, political science, and law seem to be the three
major non-computational-science disciplines in which applications using
topic-modeling, and related techniques, appear regularly in the top journals
and are cited regularly within the mainstream of the field.
Economics and history, particularly legal history, the disciplines

reflected in this article, are ones in which the application of topic models
has lagged. In economics, this is readily explained by the enormous influ-
ence of the hypothetico-deductive paradigm, with its emphasis on the test-
ing of hypotheses concerning isolated causal facts rather than an interest in
broad narrative.18 The uses of topic-modeling in economics most usually
focus on new measurements of highly specific phenomena, to fit into a par-
ticular implementation of that paradigm.19 Our use of topic-modeling is
therefore rather different from the few applications in the mainstream of
our field: our objective is to provide a broad narrative of finance-related
English case law over two centuries. To the extent that we match our
data to specific hypotheses, it is because we came to realize after the con-
struction of our narrative how our narrative naturally reflected on these
hypotheses, not because we aimed originally to test them.

17. On this point, see H. Bonin, “From Antagonist to Protagonist: ‘Democracy’ and ‘peo-
ple’ in British Parliamentary Debates, 1775–1885,” Digital Scholarship in the Humanities
35 (2020): 759–75. One example using a topic-model-type method is L. Blaydes,
J. Grimmer, and A. McQueen, “Mirrors for Princes and Sultans: Advice on the Art of
Governance in the Medieval Christian and Islamic Worlds,” Journal of Politics 80
(2018): 1150–67.
18. Applications in sociology have also lagged, perhaps because the hypothetico-

deductive method has had increasing sway in that field as well. For the lag in sociology,
see N.C. Lindstedt, “Structural Topic Modeling for Social Scientists: A Brief Case Study
with Social Movement Studies Literature, 2005–2017,” Social Currents 6 (2019): 307–18.
19. For example, S. Hansen and M. McMahon, “Shocking Language: Understanding the

Macroeconomic Effects of Central Bank Communication,” Journal of International
Economics 99 (2016): S114–S133.

Using Topic-Modeling in Legal History 199

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


The reason for history’s lag in applying machine-learning in general, and
topic-modeling in particular, is less clear, to us at least.20 As already men-
tioned, topic-modeling leads naturally to a historical narrative. But Guldi
and Armitage, who emphasize this point also, argue that research in history
has turned away from exercises that examine long time periods and expan-
sive subjects, exactly the areas in which machine-learning can contribute.
The high degree of technical complexity in existing applications of topic-
modeling to history might also have discouraged some researchers.21

However, as we hope to show in this article, researchers interested in
using the output of topic models do not themselves have to engage in all
the complexities of producing topic model estimates. If that output is freely
available to all, as is the case with GM, it is enough for subsequent
researchers to understand how to interpret that output when using it as a
source of data as a basis for further exploration. An analogy is helpful
here. Economic historians using estimates of national income are not
required to produce those estimates themselves, or even to grasp all the
complexities of data gathering and index number construction. As we
will show, by example, the output data of topic-modeling can be used in
an exactly analogous way as input data for further exercises.

A. The Topic Model

The algorithms producing topic-modeling estimates begin with a conceptu-
alization of the process of document (in our context, case report) genera-
tion that is extremely crude, but lends itself to formalization in a
statistical model. It is the explicitness of the conceptualization that facili-
tates interpretation of the results of the analysis, producing the insights
that legal historians might appreciate. Such an interpretation is often not
possible with the results of other machine-learning techniques, such as neu-
ral networks, in which the focus is on prediction or problem-solving, rather

20. Stephen Robertson emphasizes that text-analysis in history has been held back simply by
the availability of a large stock of digital texts. S. Robertson, “The Differences between Digital
Humanities and Digital History,” Debates in Digital Humanities (2016) (https://dhdebates.gc.
cuny.edu/read/untitled/section/ed4a1145-7044-42e9-a898-5ff8691b6628#ch25m, accessed
March 2021). This constraint is rapidly being relaxed. Indeed, one of the contributions of
GM is to make machine readable, cleaned versions of the English Reports available for scholars
in general. See GM and the concluding section of this article for more details.
21. For interesting articles of this kind, see A. Barron, J. Huanga, R. Spang, and

S. DeDeo, “Individuals, Institutions, and Innovation in the Debates of the French
Revolution,” Proceedings of the National Academy of Sciences 115 (2018): 4607–12; and
A. Rule, J. Cointet, and P. Bearman, “Lexical Shifts, Substantive Changes, and
Continuity in State of the Union Discourse, 1790–2014,” Proceedings of the National
Academy of Sciences 112 (2015): 10,837–44.

Law and History Review, May 2022200

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://dhdebates.gc.cuny.edu/read/untitled/section/ed4a1145-7044-42e9-a898-5ff8691b6628&num;ch25m
https://dhdebates.gc.cuny.edu/read/untitled/section/ed4a1145-7044-42e9-a898-5ff8691b6628&num;ch25m
https://doi.org/10.1017/S0738248022000153


than description. But the relative ease of interpretation comes with a cost:
the simple conceptualization will surely foster a general skepticism.22 We
give an unvarnished view here to emphasize limitations, and why they
arise.
The process of generating case reports envisaged by topic-modeling may

be summarized as follows. An author (in our context, a legal reporter) is
viewed as beginning with a fixed number of topics, essentially lodged in
his or her brain and available for use when writing. Topics might be well-
identified legal concepts, such as assumpsit or habeas corpus, or ideas that
cut across many domains of law, such as revocation, or even a particular
reporting style.23 When a particular topic is used, the author simply has
a greater preference for the vocabulary more closely associated with that
topic than for other words. For example, when the author refers to the
topic assumpsit, the author will have a greater likelihood of using the
word “promise”; similarly mention of bail will be frequent when using
the topic habeas corpus. The production of a document, a case report,
then entails the author choosing to emphasize some topics less and some
more, depending on the general context of that report. A document will
be a mixture of topics. Thus, a particular case report might tend to empha-
size, for example, both assumpsit and habeas corpus because the defendant
was in debtor’s prison as a result of a case involving non-payment of a
contractual debt. The words “promise” and “bail” would then appear
prominently in this case report, but words such as “daughter” or “wife”
would hardly appear because they are associated with topics that empha-
size estates or wills, which are of no relevance for these particular types
of cases.
Thus, a topic model views a document as one created in a process in

which the author has chosen to emphasize certain topics, which in turn
emphasize their own characteristic vocabularies. Consistently, a document
is fed into the statistical analysis as a bag of words that has been stripped of
all syntactic and sentence structure. However, each word choice is based on
the emphasized topics and the vocabulary emphasized by these topics. This

22. As counterpoint to this apology for simplification, see S. Robertson, “Digital
Humanities” in The Oxford Handbook of Law and Humanities, ed. S. Stern, M. Del Mar,
and B. Meyler, (Oxford: Oxford University Press, 2019), emphasizing that “If humanities
scholars chafe at such simplification, it is worth noting that narrative, the favored represen-
tational model of humanities scholars, is a deliberately simplified account that is illuminating
because of, not despite, its simplification.”
23. The productive use of machine-learning to detect style was emphasized by Matthew

L. Jockers, one of the most forceful advocates of machine-learning in the digital humanities;
see M.L. Jockers,Macroanalysis: Digital Methods and Literary History (Urbana: University
of Illinois Press, 2013).

Using Topic-Modeling in Legal History 201

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


conceptualization then views semantic content as becoming embedded in a
report via word choices, which will be highly correlated across related
reports. For example, “contract” and “promise” will appear frequently
together, but their presence will be negatively correlated with the appear-
ance of the words “daughter” and “will,” which in turn frequently
co-occur. The topic-modeling algorithm produces results that reflect
semantic content because it leverages these patterns of correlations across
documents. In the phrasing of Mohr and Bogdanov, “relationality trumps
syntax.”24 Similarly, topic-modeling is able to “see” through polysemy
because meanings are embodied in combinations of word usage not in sin-
gle words.25 The model will reflect the sense of “extent” in “he is not
bound to prove the whole extent of a debt” very differently from the one
in “the Crown may not proceed against its debtor either by extent or
scire facias,” because of the repetition of the accompanying words across
many cases.
The bag-of-words assumption is obviously a stylization that does no jus-

tice to the process of writing. One should note, however, that this assumption
is partially a consequence of current limitations in computational power.
With expected increases in computational power, much more acceptable
characterizations of the process of authoring a document will become avail-
able when using techniques that are descendants of the ones described here.26

One final step in the conceptualization of the document-writing process
is to acknowledge that different authors of case reports have different
characteristics, and that indeed the same author will be influenced by
circumstances such as the timing of the case and the court adjudicating
it. This can be explicitly incorporated into the estimation process when
using the structural topic model. In the application reported in this article,
the author of a specific case report is viewed as being influenced by the year
in which the report was written and the court in which the case was heard.
At this stage, we would imagine that readers unversed in topic-

modeling, and in machine-learning methods more broadly, are immensely
skeptical. We were too when we began using such techniques. But after
several years of poring over results, comparing those results to existing
ideas in the literature, and seeing the added value of insights that were
not possible to reach when utilizing conventional approaches to the

24. J. W. Mohr and P. Bogdanov, “Introduction−Topic Models: What They Are and Why
They Matter,” Poetics 41 (2013): 545–69.
25. DiMaggio et al., “Exploiting Affinities”.
26. One could instead view documents as collections of two- or three-word chunks, or

even larger phrases. But the required processing power increases proportionately with the
number of distinct phrases, which increases exponentially with the number of words allowed
to be in a phrase.

Law and History Review, May 2022202

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


analysis of texts, we saw how topic-modeling can provide a powerful com-
plement to the traditional work of historians. Moreover, “legal history is
better positioned for a digital turn than most historical fields when it
comes to the amenability of legal sources to computational analysis”
because reporters followed consistent forms of presentation using special-
ized vocabulary, where the correspondence between words and meaning
remained much more constant both over time and among individuals
than would have been the case for ordinary language.27

B. Estimation

Estimation begins with the observations that are available to the
researcher: the documents (case reports) and the information that charac-
terizes authors. The researcher must decide on the number of topics; that
is, the number of sections of the digest. Using statistical criteria and a more
subjective evaluation of the coherence and meaning of the produced topics,
GM judged that 100 different topics adequately captured the salient emphases
in the reports on pre-1765 cases.28 This element of human judgment is part of
the process of validating the overall topic model: “Researchers must also
interpret the topic model output, probably iteratively, so that a best fit can
be found between the number of topics and an overall level of
interpretability.”29

Given the estimated topics, machine-learning provides a measure of the
importance of each vocabulary word to each topic. In the current example,
this is the proportion of each of the 41,174 distinct vocabulary words in
each of the 100 topics. The estimation also predicts the proportion of
any given one of the 52,949 documents that can be attributed to the use
of each topic. And given that each document is labeled as reporting on a
case heard at a specific time in a specific court, the estimates provide infor-
mation on how the use of various topics varies with those characteristics,
year, or court.

27. See Robertson, “Searching”. On this point, see also P. Grajzl and P. Murrell, “Lasting
Legal Legacies: Early English Legal Ideas and Later Caselaw Development During the
Industrial Revolution,” Review of Law & Economics (2022), pre-publication online version,
https://doi.org/10.1515/rle-2021-0070 (accessed April 16, 2022).
28. As evidenced by the large, related literature, computational scientists and statisticians

usually emphasize rule-based criteria for model choice, relying solely on numerical informa-
tion derived from the estimating process or the output data. In contrast, practitioners empha-
size the element of subjective judgment, which would take into account the perceived quality
of the topics reflecting both the uses to which they are to be put and the nature of the text
data that is used in estimation. See, for example, Gentzkow et al., “Text as Data”; DiMaggio
et al., “Exploiting Affinities”; and Mohr and Bogdanov, “Introduction−Topic Models.”
29. See Mohr and Bogdanov, “Introduction−Topic Models,” 560.

Using Topic-Modeling in Legal History 203

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1515/rle-2021-0070
https://doi.org/10.1017/S0738248022000153


C. What Are the Topics?

Topic-modeling is an unsupervised machine-learning exercise. The estima-
tion of the topics is not guided by any objective to match topics to
pre-existing ideas about what is in the law. Thus, the produced objects,
the topics, come unlabeled. The researcher must provide titles for the sec-
tions of the machine-produced digest. This requires applying insights from
existing legal-historical research. The information described in the previous
paragraph is matched against those insights. One examines closely the
vocabulary most used by a topic and one closely reads those case reports
in which the topic is most prominent. This is an extremely laborious task,
but GM found that it was not conceptually difficult to identify the idea or
ideas underlying each and every one of their 100 topics.
One important part of the general methodology to emphasize here is that

the identification of what a topic refers to cannot rely solely on a perusal of
the vocabulary or words that a topic most uses, even those words that a topic
most uses relative to other topics. It is absolutely essential to read the doc-
uments in which a topic is most prominent. The labeling of a topic must
make sense in relation to the content of all the other estimated topics,
because the specific emphasis in one topic might only be clear when con-
trasting that topic to a closely related one with a different emphasis.
The reason to highlight this point is that many, probably a large majority,
of the articles that have used topic-modeling to date base the interpretation
of topics only on perusal of the words that a topic most uses. A reading
of the documents requires domain-specific knowledge, and in the case of
pre-industrial English history, it certainly requires struggling with a very dif-
ferent form of English. That is one reason why we emphasize that modern
machine-learning and traditional doctrinal text analyses are complements.
This painstaking naming process is an essential ingredient of the valida-

tion of a topic model exercise: simply making sure that its results provide
an intuitive, coherent whole, both within topics and across topics.30 The
relative ease, in the conceptual sense, of topic naming in GM does suggest
that their whole topic-modeling exercise has high validity. If many topics
were simply mysterious, then one would conclude that the specific features
of the machine-learning process were not well suited to the texts being
analyzed.
Some of GM’s topics fit snugly within existing concepts in the legal, his-

torical, and traditional text-analysis literature. For example, the topic names
“Assumpsit,” Bankruptcy,” and “Uses” resonate closely with legal concepts

30. See Grimmer et al., “Machine Learning for Social Science,” stating: “Rather than
place our trust fully in models and fit statistics, we argue that human feedback is essential
for judging the quality of model results used for discovery.”

Law and History Review, May 2022204

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


and instruments covered at length in textbooks on the history of English
law.31 Other sets of topics split a single broad subject into several constituent
areas (e.g., Implementing Ambiguous Wills, Contingency in Wills, Validity
of Wills). Yet further types of topics encompass substantive issues that cut
across many substantive areas of law (e.g., Revocation, Determining
Damages and Costs) or refer to general legal ideas and modes of reasoning
about cases as opposed to specific domains of application (e.g., Coke
Reporting).32 This an example of topic-modeling as an exercise in discovery,
rather than an exercise in prediction or hypothesis testing, which would
instead be focused on a search for anticipated patterns in case law or legal
ideas.33

When economists name topics in such an analysis, there is undoubtedly
a tendency to focus on the functional domain to which the law is applied. A
legal scholar would probably focus more on the legal doctrines captured in
a topic and the historical origins of those doctrines. We are therefore sure
that legal historians would have chosen a slightly different set of names
than GM did for at least a subset of the 100 topics, probably finding labels
that resonate more with internal characteristics of the legal system and less
with outward effects on economic agents.34

The fact that the list of topic names, the titles of the sections of the digest
of pre-1765 English case law and associated legal ideas, only partially

31. In order to distinguish our topic names clearly in the remainder of this article, we cap-
italize them.
32. An additional type of topic is identified by DiMaggio et al. who in “Exploiting

Affinities,” argue that “Topic models often shunt noisy data into uninterpretable topics in
ways that strengthen the coherence of topics that remain.” In fact, our experience is not
that the topics are uninterpretable, per se, but rather that the interpretation means that the
topic tells one nothing about the substantive inquiry in question. For example, GM find a
topic that they call Non-Translated Latin. Sixteenth and seventeenth century lawyers not
only had their own version of English, but their Latin was also highly idiosyncratic. The
text preparation procedures were able to handle idiosyncratic English and standard Latin,
but not idiosyncratic Latin.
33. This point is much emphasized in the literature, something to which we return more

fully in the Conclusion. See, for example, DiMaggio et al., “Exploiting Affinities”; Mohr
and Bogdanov, “Introduction−Topic Models”; A. Goldberg, “In Defense of Forensic
Social Science,” Big Data & Society (2015), July-Dec: 1–3 and L.K. Nelson, “Leveraging
the Alignment Between Machine Learning and Intersectionality: Using Word Embeddings
to Measure Intersectional Experiences of the Nineteenth Century U.S. South,” Poetics 88
(2021): 101539, 1–18.
34. Since the topic-modeling algorithm produces unlabeled topics and since the data out-

put from that algorithm can easily be transmitted, other researchers could easily produce
their own set of names for the 100 topics. Indeed, doing so could inspire much more research
that benefits from topic models. One research team produces the output data from the topic
model, which can then easily be the input data for the work of other researchers.

Using Topic-Modeling in Legal History 205

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


matches the chapter and section headings of a legal history textbook can be
viewed as either a vice or a virtue, depending on the reader’s perspective. It
might be troubling for some readers to look at a topic like Geographic
Jurisdiction of Laws and realize that this topic is prominent in case reports
that deal with such divergent areas of law as the relations between parishes
and the legal status of individual citizens of belligerent nations. This topic
appears prominently in case reports that span the whole time period covered
by our data, and it appears in cases heard in all of the major courts. Thus,
some areas of emphasis suggested by topic-modeling do not fit comfortably
within existing classifications based on more traditional techniques. But this,
in fact, shows the power of these machine-learning methods, highlighting
how legal ideas can appear in many different types of cases. By covering
the gamut of case reports in a particular time period, topic-modeling is an
exercise in discovery, unearthing substantive patterns and connections
between seemingly disparate notions that would likely remain unnoticed
with the use of traditional methods restricted by the limits of human memory
and reason. We return to this point in the Conclusion, where we comment on
how machine-learning is changing the research practices in several fields,
lessening the hold of the hypothetico-deductive method, and opening up
possibilities for inductive exercises.

II. The Elements of English Legal History Emphasized in Mainstream
Economics

In this section, we explain why we, as economists, chose to focus on the
law relevant to finance in articulating the properties, promise, and pitfalls
of topic-modeling. Ideas about the history of the law have made a differ-
ence in economics. Some of the conventional wisdom that drives important
areas of mainstream economics reflects on subjects that are of great interest
to those legal historians studying developments before the twentieth cen-
tury. However, there appears to have been little cross-fertilization between
the literatures of the two fields, certainly as far as those literatures focus on
the case law of the pre-industrial era.35 Hence, the specific ideas embraced

35. Some economic historians have been very aware of detailed developments in the legal
sphere, but it seems to be the case that such economic historians have had little effect on the
perspectives on English legal history that are dominant in the mainstream of economic anal-
ysis, as exemplified in the works to be discussed in the ensuing paragraphs. R. Harris, in
“The Encounters,” was early in making a case for productive exchange between legal history
and economics, stressing that legal historians did not pay sufficient attention to the economic
history literature. We are more concerned here with the lack of interchange in the reverse
direction.

Law and History Review, May 2022206

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


by mainstream economists do not always match the legal history that has
been developed by those researchers whose primary audience is legal
scholars and who approach the study of legal history with traditional
text-analysis methods.
Finance is not normally a category or an immediate domain of interest

within the legal-history literature of the pre-industrial era.36 However,
this area of law is vital to economic history because England’s financial
revolution preceded, and was perhaps a key input into, the Industrial
Revolution. Ideas about English legal history have been influential in
areas of economics as diverse as the regulation of modern financial mar-
kets, protection of investor rights, and the relief of poverty in the poorest
countries. This is no doubt due largely to the global influence of Britain
from the eighteenth century on, the importance of the British financial
and industrial revolutions, and the spread of the common law around the
globe. It is also certainly due to the fact that understanding the sources
of economic development is often considered the most important question
of economics, and Britain led the world in political and economic develop-
ment for more than two centuries.
Our focus here is on the two most influential strains of thought that are

driven by interpretations of English legal history and that have had wide
currency in mainstream economics. Given this focus, we unfortunately
cannot do justice to the many authors, particularly those studying institu-
tional and economic history, who challenge these views, and offer nuanced
caveats.37 The two legal-history-based paradigms are those following the

36. The word “finance” appears only twice in J.H. Baker, An Introduction to English
Legal History, fifth edition (Oxford: Oxford University Press, 2019) and the pertinent issues
are in separate discussions, included under property and contract.
37. Some salient critiques are N. Sussman and Y. Yafeh, “Institutional Reforms, Financial

Development and Sovereign Debt: Britain 1690–1790,” Journal of Economic History 66
(2006): 906–35; P. Murrell, “Design and Evolution in Institutional Development: The
Insignificance of the English Bill of Rights,” Journal of Comparative Economics 45
(2017): 36–55; L. Neal, “How It All Began: The Monetary and Financial Architecture of
Europe During the First Global Capital Markets, 1648–1815,” Financial History Review 7
(2000): 117–40; P. O’Brien, “The Nature and Historical Evolution of an Exceptional
Fiscal State and Its Possible Significance for the Precocious Commercialization and
Industrialization of the British Economy from Cromwell to Nelson,” Economic History
Review 64 (2011): 408–46; S. Ogilvie and A.W. Carus, “Institutions and Economic
Growth in Historical Perspective,” in Handbook of Economic Growth, ed. P. Aghion and
S.N. Durlauf (Amsterdam: Elsevier, 2014), 403–513; D. Coffman, A. Leonard, and
L. Neal (ed.), Questioning Credible Commitment: Perspectives on the Rise of Financial
Capitalism (Cambridge: Cambridge University Press, 2013); and G.M. Hodgson, “1688
and All That: Property Rights, the Glorious Revolution and the Rise of British
Capitalism,” Journal of Institutional Economics 13 (2017): 79–107.

Using Topic-Modeling in Legal History 207

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


seminal articles by North and Weingast (henceforth NW)38 and La Porta,
Lopez-de-Silanes, Shleifer, and Vishny (henceforth LLSV).39 Both para-
digms focus on high-level, even constitutional, elements of the legal sys-
tem rather than on the information that occupies most of English legal
history and which provides the data for this article; that is, the vast collec-
tion of reports on the deliberations within the courts. Both sets of works
have had enormous influence in economics, in areas far removed from
their original domain of application.40

The approach of NW is that “the institutional changes of the Glorious
Revolution permitted the drive toward British hegemony and dominance
of the world.”41 In emphasizing the effects of constitutional measures, par-
ticularly the Bill of Rights and the Act of Settlement, NW are followed by
the influential works of Acemoglu and Robinson42 and North, Wallis, and
Weingast.43

LLSV also emphasize overarching features of the legal system.44 Their
focus is on the overall characteristics of law-making and legal adjudication
and how these produce different types of legal processes in common-law

38. NW, “Constitutions and Commitment: The Evolution of Institutions Governing Public
Choice in Seventeenth-Century England,” Journal of Economic History 49 (1989): 803–32.
39. LLSV, “Legal Determinants of External Finance,” Journal of Finance 52 (1997):

1131–50; and LLSV, “Law and Finance,” Journal of Political Economy 106 (1998):
1113–55.
40. A computational search of JSTOR reveals how unusual these two works are in their

spread across the whole of economics. NW, “Constitutions and Commitment,” appears in the
Journal of Economic History and is referred to in JSTOR thirteen times as often as the typ-
ical article published at the same time in that journal. The references to NW, “Constitutions
and Commitment,” are twice as common in the journals outside economic history as in eco-
nomic history journals, while for the typical article published in the same journal at the same
time, the ratio is 0.6. Similarly, LLSV, “Legal Determinants,” appears in the Journal of
Finance and is referred to in JSTOR twenty times as often as the typical article published
at the same time in the same journal. The references to LLSV, “Legal Determinants,” are
twice as common in the journals outside of finance as in finance journals, while for the typ-
ical article at the same time in the same journal, the ratio is 0.24.
41. NW, “Constitutions and Commitment,” 830.
42. D. Acemoglu and J.A. Robinson, Why Nations Fail: The Origins of Power, Prosperity

and Poverty (New York: Crown Business, 2012). See, for example, reiteration that “The
Glorious Revolution limited the power of the king and the executive, and relocated to
Parliament the power to determine economic institutions . . .The Glorious Revolution was the
foundation for creating a pluralistic society. . .The government. . .steadfastly enforced property
rights. . . Historically unprecedented was the application of English law to all citizens.
Arbitrary taxation ceased, and monopolies were abolished almost completely. . .” at 102.
43. D.C. North, J.J. Wallis, and B.R. Weingast, Violence and Social Orders: A

Conceptual Framework for Interpreting Recorded Human History (Cambridge:
Cambridge University Press, 2009).
44. See LLSV, “Legal Determinants”; and LLSV, “Law and Finance”.

Law and History Review, May 2022208

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


and civil-law countries. In many works following on the original articles,
summarized by La Porta, Lopez-de-Silanes, and Shleifer,45 the authors,
and others, bring enormous amounts of modern data at a very detailed
level to bear on their work. But to the extent that they engage with legal
history it is at the level of the approaches to law that were developed in
England and France and the effect of these approaches on system-wide
characteristics such as judicial independence, the use of juries, organization
of the legal system, and the sources of law.
The reader will notice from the foregoing summary that the two influen-

tial legal-history paradigms that have had a broad influence across a swathe
of economics do not rest on detailed examinations of the vast number of
routine developments in the law that is the stuff of the history emphasized
by traditional legal historians. These paradigms do not invoke characteriza-
tions of the development of English law in the period 1550–1750, which
are based on the records of the courts and apply to domains that are crucially
important for a capitalist economy: contract, property, and tort. They do not
reflect the painstakingly slow developments occurring in procedures, prece-
dent, and forms of legal action, which affected how the courts functioned
and how litigants could use the law. In short, within the two institutional nar-
ratives that have been most successful in using English legal history to influ-
ence the way economists think about the world, the work of scholars within
traditional legal history is largely missing.
A machine-learning history of English case law offers the chance to bridge

the fields of economics and legal history. By using as input the reports on
tens of thousands of historical cases, it absorbs, albeit imperfectly, the
most important information used by legal historians, the micro-level case-
report data that are far removed from the macro-level constitutional and legal-
system arrangements emphasized by NW and LLSV. By interpreting the
results of the analysis using centuries of insights developed by scholars
who have focused on case law, a machine-learning approach incorporates ele-
ments of traditional legal-historical research and complements existing exege-
ses on legal development. At the same time, a machine-learning history offers
the type of broad narrative about case law that would be so difficult for an
outsider to the field of legal history to grasp without access to the results
of the topic model, even with the use of such a superb textbook as that by
Baker.46 In the ensuing sections, we illustrate the power of topic-modeling
in the context of the developments and features of case law and legal ideas
pertinent to finance.

45. R. La Porta, F. Lopez-de-Silanes, and A. Shleifer, “The Economic Consequences of
Legal Origins,” Journal of Economic Literature 46 (2008): 285–332.
46. See Baker, An Introduction.

Using Topic-Modeling in Legal History 209

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


III. Characterizing Temporal Change: The Development of Case Law
and Legal Ideas Relevant to Finance

Within the sections of the 100-topic machine-produced digest of pre-1765
English case law and associated legal ideas, we identified fifteen topics as
pertinent to finance. Topic-modeling does not tell us which topics to des-
ignate as relevant to finance: this is our judgment based on an understand-
ing of the content of all topics estimated by GM. The topics we designated
as finance ones are: Arbitration and Umpires, Assumpsit, Bankruptcy,
Bonds, Claims from Financial Instruments, Contract Interpretation and
Validity, Executable Purchase Agreements, Execution and Administration
of Estates, Identifying Contractual Breach, Implementing Trusts,
Mortgages, Negotiable Bills and Notes, Pleadings on Debt, Prioritizing
Claims, and Repaying Debt. Table 1 contains a brief description of these
topics, focusing on select key words (or rather their stems) and the top
case reports identified by topic-modeling.47

Figure 1 presents timelines for these fifteen topics over the years 1550–
1750. To interpret these figures, it is best to focus on a particular example,
so we will use Assumpsit. Taking a particular year, say 1600, the figure
indicates that the topic Assumpsit occupied roughly 3% of the attention
in the case reports heard in that year.48 These timelines reflect a feature
of topic-modeling that has been much emphasized in the literature. They
capture the changing amount of attention in English courts in a very
long time period reflecting thousands of cases, focusing not on landmark
rulings, but rather on overall trends reflecting data that might be only a
tiny part of each individual case. As Goldstone and Underwood found
for the digital humanities, “Quantitative methods may be especially useful
for characterizing long, gradual changes, because change of that sort is
otherwise difficult to grasp.”49

47. Many more details on these topics can be found in GM and the corresponding appen-
dices. See note 10.
48. Of course, despite the large number of reports used to produce the data, any given year

might have only a few cases. Therefore, the figures are moving averages, producing smooth-
ness, especially removing prominent idiosyncrasies arising in years when the data are sparse.
Additionally, such figures are usually accompanied by confidence intervals that indicate how
imprecise the estimate of the timeline is in any given year. In our applications, those intervals
are very narrow for all the timelines. Thus, it is sufficient to focus only on the averages that
appear in the diagram.
49. Goldstone and Underwood, “The Quiet Transformations,” 379. This resonates with

comments in Guldi and Armitage, The History Manifesto, and in Flanders on what computers
can do: J. Flanders, “Detailism, Digital Texts, and the Problem of Pedantry,” TEXT Technology
2 (2005): 41–70.

Law and History Review, May 2022210

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


Table 1. The fifteen finance topics briefly described.

Arbitration & Umpires: 0.67% Key word-stems include ‘award’, ‘arbitr’, ‘umpir’,
‘arbitra’, ‘attach’, ‘releas’, ‘perform’. Top reports revolve around whether the
arbitrators made timely decisions and had chosen an umpire.

Assumpsit: 1.51% Key word-stems include ‘assumpsit’, ‘promis’, ‘indebitatus’,
‘consider’, ‘forbear’, ‘indebt’, ‘debt’. Top reports focus on if an assumpsit had taken
place and whether an action of assumpsit is allowed.

Bankruptcy: 0.48% Key word-stems include ‘bankrupt’, ‘creditor’, ‘assigne’, ‘debt’,
‘bankruptci’, ‘assign’, ‘commiss’. Top reports focus on the assignment of the
bankrupt’s estate.

Bonds: 1.29% Key word-stems include ‘bind’, ‘condit’, ‘oblig’, ‘debt’, ‘perform’,
‘void’, ‘sureti’. Top reports concern bonds, focusing on the obligations of the bonds
and whether they were satisfied.

Claims from Financial Instruments: 0.75% Key word-stems include ‘annuiti’, ‘cent’,
‘annum’, ‘southsea’, ‘ayear’, ‘stock’, ‘dividend’. Top reports describe instances of
resolving monetary claims concerning bonds, stocks, dividends, mortgages, annuities.

Contract Interpretation & Validity: 0.56% Key word-stems include ‘agreement’,
‘contract’, ‘bargain’, ‘write’, ‘agre’, ‘specif’, ‘sign’. Top reports revolve around
interpretation of the meaning of a contract in a given setting.

Executable Purchase Agreements: 0.72% Key word-stems include ‘purchas’, ‘sell’,
‘convey’, ‘fraud’, ‘deed’, ‘conceal’, ‘reliev’. Top reports concern contractual transfers
of property rights and what renders the contract executable.

Execution & Administration of Estates: 1.12% Key word-stems include ‘executor’,
‘administr’, ‘testat’, ‘asset’, ‘executrix’, ‘administratrix’, ‘probat’. Top reports involve
the actions of administrators or executors of estates.

Identifying Contractual Breach: 0.64% Key word-stems include ‘breach’, ‘coven’,
‘perform’, ‘nonpay’, ‘evict’, ‘break’, ‘refus’. Top reports are about ascertaining and
clarifying whether breach of contract has occurred in a given situation.

Implementing Trusts: 0.62% Key word-stems include: ‘trust’, ‘estat’, ‘chariti’, ‘profit’,
‘decre’, ‘convey’, ‘beneficiari’. Top reports concern implementation trusts, and rules
to determine what is permissible in implementation.

Mortgages: 0.56% Key word-stems include ‘mortgag’, ‘mortgagor’, ‘redempt’, ‘equiti’,
‘encumbranc’, ‘interest’, ‘foreclos’. Top reports depict disputes pertaining to rights
and obligations of mortgagors, mortgagees, and impacted parties.

Negotiable Bills & Notes: 0.59% Key word-stems include ‘bill’, ‘note’, ‘accept’,
‘endorse’, ‘promissory’, ‘merchant’, ‘exchange’. Top reports describe the use of bills
of exchange and promissory notes, focusing on their negotiability.

Pleadings on Debt: 0.99% Key word-stems include ‘plea’, ‘obligatori’, ‘behalf’,
‘aforesaid, ‘premis’, ‘verifi’, ‘attorney’. Top reports focus on the various pleadings to
which creditor and debtor have access.

Prioritizing Claims: 0.83% Key word-stems include ‘estat’, ‘debt’, ‘person’, ‘shall’,
‘payment’, ‘creditor’, ‘asset’. Top reports focus on who should be paid when claims
exceed available funds.

(Continued )

Using Topic-Modeling in Legal History 211

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


There is a crucial question of how to interpret the meaning of that
amount of attention, which is a central concern of GM. It is natural to
think that the height of a timeline reflects how much a certain area of
law is used or not, and this is exactly the assumption in the oft-used word-
frequency analysis. The fallacy of such an approach becomes evident on
examining our example topic, Assumpsit. Its timeline exhibits an inverted
U, with attention to the topic almost vanishing from case reports during the
eighteenth century. But we know from the careful work of legal historians
that the idea of assumpsit was thoroughly embodied in law by that time. So
the height of the timeline does not show how much litigants and judges
actually depend on a particular idea at a given point in time.50 We know
in fact from the detailed legal history that assumpsit was more and more
accepted in the late sixteenth century, became authoritative early in the
seventeenth century, and was elaborated in many cases in subsequent
decades. Therefore, the height of the timeline in a particular year is infor-
mative of the rate of development of doctrines in that year rather than the
use of the doctrine. This is a reflection of the obvious: litigants do not
waste time litigating elements of the law that are accepted by all; judges
emphasize the matters that are in dispute; and writers of case reports attract
readers by telling them something new, rather than by rehashing settled
matters.
To explore this logic, GM build a simple evolutionary model of the pro-

duction of case reports. Here the logic can be easily explained using a sim-
ple analogy with a subject that is painfully familiar to us all. The spread of
an idea is like the spread of a virus. The inverted U is like the pandemic
curve that we all want to see flattened. Case reports will show a lot of

Table 1. (Continued.)

Repaying Debt: 1.59% Key word-stems include ‘payment’, ‘interest’, ‘due’, ‘repay’,
‘discharge’, ‘indebt’, ‘lend’. Top reports lay out the details of paying back a sum of
money that is owed, often with a focus on interest and often via complex transactions.

Note: The percentage figures are the proportions of the topic in the whole corpus. The mean topic
proportion in the whole corpus of reports is 1.0% and the median is 0.81%. The mean topic
proportion of the 15 finance topics is 0.85%, the median is 0.67%, and their sum is 12.8%.

50. Note that it is entirely possible that the topic Assumpsit vanished from cases in the
early eighteenth century while the word “assumpsit” was used in a considerable number
of case reports from that era. This is possible because topics reflect the co-occurrence of
related words rather than only the frequency of single words. When the word “assumpsit”
is used in later cases it might be invoked very briefly to reference a huge area of law without
being accompanied by many words that were necessary to use in earlier cases, before the
notion of assumpsit became readily accepted.

Law and History Review, May 2022212

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


attention to an idea when it is relatively new and becoming more important,
just as the count of positive tests for the presence of a virus will rise when
the pandemic is becoming very serious. Once an idea is old, it will not
show up in the body of case reports, just as there will no longer be
many new infections when herd immunity arrives.
Thus the timelines provide a very simple answer to the question of when

various aspects of legal development occurred. They are crude, missing

Figure 1. Finance topics over time.

Using Topic-Modeling in Legal History 213

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


many nuances of the legal record, but that is the cost of trying to summa-
rize masses of data in a parsimonious way. That element of simplicity is
present in all statistical work endeavoring to extract simple core facts
from masses of data. A non-machine-learning approach to answering this
question would necessarily involve deeper investigation into how the lan-
guage was being used in individual reports and how those reports resonated
with the wider context. As in many instances in this article, we emphasize
that the two different approaches, our statistical distant reading, and the
more traditional close reading, are complements. The former is much
more likely to reflect the development of ideas within a broad swathe of
all cases, including lesser ones. The latter would naturally reflect a nar-
rower set of cases found to be especially influential.
Given that we can view the height of the timeline at any moment as cap-

turing the incremental rate of development of legal ideas, there is an even
simpler way to summarize the cumulative development of the law. This
will be especially useful in the interpreting the information that appears
in the next two sections. Given the evolutionary logic, for any specific
topic, one can calculate the year that marks the passing of the halfway
mark of all legal development that did occur during 1550–1750. (Think
of the virus analogy when a vaccine is not available: we could find the pre-
cise year in which the proportion of the population that had been infected
passed 50%.) We have made this calculation for all fifteen topics included
in Figure 1, and the relevant years are marked on that figure with vertical
lines. To take the example of Assumpsit again, the vertical line is placed at
1631, indicating that half of the legal development pertinent to Assumpsit
that would occur during 1550–1750 actually had occurred by 1631. Even
though the landmark decision, in Slade’s case, was rendered in 1602, our
data summary suggests that much development of related law still occurred
after that decision. This is not surprising: landmark cases establish a prin-
ciple that needs to be fully articulated in a variety of settings.
One of the findings that is immediate from a quick perusal of the time-

lines and dates in Figure 1 is that several pertinent areas of law were sub-
stantially settled well before 1688, the period typically given short shrift in
the study of English financial arrangements in the economics literature.
Significantly, even late developers such as Implementing Trusts and
Negotiable Bills and Notes show spikes in attention during the third quarter
of the seventeenth century. Well before the Glorious Revolution, there was
broad acceptance by the legal profession of many of the ideas relevant to
modern finance. The financial revolution in England was occurring
throughout the seventeenth century, at least as far as the development of
pertinent legal ideas was concerned. This is decidedly not the picture
that emerges from the main strands of the relevant literature in economics.

Law and History Review, May 2022214

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


At the same time, it would be difficult to make this precise conclusion from
the traditional legal-historical literature alone: we are not aware of any
scholar who has stated this conclusion, let alone documented it in as pre-
cise a manner as our use of topic-modeling data does.
Examining the early and late developing topics in Figure 1, it is clear that

the areas of law that developed early are rather broad, in the sense that they are
not about specific financial instruments per se, but rather about more general
areas of law, where progress is perhaps a pre-condition for the use of specific
financial instruments. The earliest developing areas are Assumpsit, Bonds,
Identifying Contractual Breach, and Pleadings on Debt, all of which are rel-
evant to a wide spectrum of economic activity. In contrast, the areas of law
that developed later pertain to much more specific financial arrangements
such as Bankruptcy, Mortgages, and Negotiable Bills and Notes.
More generally, for the reader interested in areas of law beyond finance,

recall that Figure 1 focuses on just 15 of the 100 topics. Many different
lessons on the development of various areas of law could be extracted
from the complete set of timelines presented in GM.

IV. Uncovering Interconnections: The Links between Finance and
Other Areas of Law

We know that a report of a case will normally refer to many different legal
ideas, even though the decision in a particular case usually hinges on one
particular aspect of law.51 Detailed rules on Repaying Debt are formulated
in the context of earlier developments in Assumpsit and Bonds, for exam-
ple. Therefore important insights about legal development can be obtained
by examining whether case reports emphasizing one particular topic also
emphasize other specific topics. Co-occurrence of two topics at the case-
report level is evidence of complementarity in the use of legal ideas. It
shows that the corresponding topics aid each other in expressing a specific
set of ideas, indicating a shared conceptual foundation.
This is (positive) topic correlation, a measure of the degree to which a pair

of topics tend to be mentioned in the same case reports. Finding those topic
pairs with the largest positive correlations is a first step in detecting associ-
ations between different areas of legal development. If one finds that topics

51. DiMaggio et al., “Exploiting Affinities,” 582, point out, in a rather different context,
that topic-modeling’s assumption of many ideas mixed in a single text provides a significant
advantage: “[A] virtue of topic modeling is its deep affinity to the central insight in the soci-
ology of culture that texts do not necessarily reflect a single perspective but are often char-
acterized by heteroglossia, the co-presence of competing ‘voices’—perspectives or styles of
expression—within a single text.”

Using Topic-Modeling in Legal History 215

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


X and Y are highly correlated and, furthermore, that X developed earlier
than Y, then that is suggestive of causality, with X an input into Y rather
than vice versa. For example, the development of law relevant to Bonds is
more likely to have provided input into the development of law on
Repaying Debt than vice versa, given that these topics are strongly positively
correlated and given the information on their timing in Figure 1.
To illustrate these considerations, consider the justifiably uncelebrated

case of Alcock v Blowfield, heard by the King’s Bench in the third year of
the reign of Charles I.52 The case report is an unusual one because one
topic dominates: Assumpsit accounts for 69% of the case according to the
GM topic-model estimates. Procedural Rulings on Actions accounts for a
further 5% of the case report. If this pattern were repeated over a sufficient
number of case reports, then one would find that these two topics, one con-
tract and the other procedural, would be correlated with each other. This is
indeed the case, with these two topics exhibiting a correlation of 0.25, a
rather high level of inter-relationship. However, since the corresponding
areas of law were developing at the same time (see Figure 1), we have no
strong indication of the direction of causality for this particular topic pair.
It is worth emphasizing that Alcock v Blowfield is just one of the 52,949

case reports in the data. The type of information given in the previous
paragraph is available for all cases. Because the GM topic model produces
data on the proportion of the 100 topics that occupies each of the reports, it
is then trivial to find correlations between reports in topic usage. By pro-
viding information about the connection between apparently disparate
cases, the statistical analysis offers clues that might ultimately be helpful
to the more traditional type of analysis usually undertaken by legal histo-
rians. Moreover, if the correlations are based on subtle connections
between topics that appear in many cases, their existence might be very dif-
ficult to detect without quantitative tools: the computer is “a device that
extends the range of our perceptions to phenomena too minutely dissemi-
nated for our ordinary reading. The computer is. . .being asked to help the
researcher perceive patterns at a finer-than-human level of granularity.”53

A. The Criteria for Displaying Connections and the Resultant Network of
Legal Ideas

With 100 topics, there are 4,950 distinct correlations and therefore there is a
need to focus on the most important. We consider only correlations that are
greater than 0.15, of which there are only eighty-five: these are the strongest

52. Alcock v. Blowfield (1627) 95 E.R. 74, 1061.
53. Flanders, “Detailism,” 57.

Law and History Review, May 2022216

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


2% of the correlations. We are interested primarily in the fifteen finance-related
topics. Nevertheless, in examining the development of law related to finance it
is important to focus not only on these fifteen topics, but also on any topics that
are related to them, since law outside finance can surely influence the develop-
ment of finance-related law. In examining correlations, we therefore include all
topics related to a finance topic via at most two steps: a non-finance topic is
included if it has a correlation greater than 0.15 with any topic that has a cor-
relation of greater than 0.15 with a finance-related topic. This leaves us with
fifty-seven links to study, half of which are direct links to the finance topics
themselves. From this fact alone, an interesting observation arises.
Two-thirds of the most important links in our data, fifty-seven of eighty-five,
connect to finance, and one third are directly connected to finance topics. In
contrast, finance topics are only 15% of all topics. This is evidence that the
development of law related to finance is at the center of English legal develop-
ments in the period under study.
Focusing on the top 2% of correlations is a very stringent criterion, forced

upon us by a combination of two factors. First, parsimony is essential to
extracting lessons from overwhelming amounts of data. Second, we are
examining an area of law that seems to have many connections with other
areas of law. However, if a reader were interested in burrowing down into
an area of law that was much less broadly connected with other areas, a
weaker criterion for the size of the correlation could be used: the narrowness
of the area of the law would provide its own parsimony.54

Even fifty-seven correlations are hard to parse if one solely focuses on a list
of topics and their associated correlations. In this case a picture is certainly
worth a thousand words. We present our findings with the aid of Figure 2.
All relevant topics and connections, given the above criteria, appear in the dia-
gram: there are fifteen finance topics, twenty-four non-finance topics that are
related to the fifteen finance topics, and fifty-seven connections, indicated by
dashed lines. The names of the fifteen finance topics are capitalized to distin-
guish them.55 The topic names are accompanied by the estimate of the mid-
year of topic development discussed in the previous section.

54. For example, if one were interested in the workings of the Poor Laws one might want to
examine topics related to Geographic Settlement of Children. Then one would be led to examine
a narrow but interesting set of topics: Reviewing Local Orders, Employment of Apprentices and
Servants, Decisions after Criminal Conviction, and Clarifying Legislative Acts.
55. For an understanding of what the finance topic names signify, the reader is directed to

Table 1. For reasons of brevity, similar discussions of the topic names for non-finance topics
are omitted, with the reader referred to the relevant elements of GM. After the publication of
GM, one topic name that appears in Figure 2 was reconsidered and changed. Interacting in
Court has been changed to Decisional Logic, with the renaming prompted by a further read-
ing of the case reports that most use this topic.

Using Topic-Modeling in Legal History 217

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


B. Insights from the Network of Topics Related to Finance

What such a diagram has the potential to offer is the easy detection of pat-
terns that indicate broad lessons in the development of the law. These pat-
terns are readily found in Figure 2 and they are not difficult to interpret.
The core finance-related topics are in a block in the lower left of the dia-
gram, with many interconnections between them. To the right of these are a
set of topics whose development was concentrated in the first half of the
seventeenth century. These topics are related most closely to contract
law and to procedural developments relevant to litigants pursuing contract
cases in court. The fact that Assumpsit, an early topic, is connected with
procedural topics suggests that the procedural rigor of early common law
was of key importance in addressing matters of debt. Above these topics
is a small block of very early developing areas of law connected to transfer
of ownership of property or transfer of the right to use the property, for
example on leases. The reason for the connection between these and the
broader elements of contract law is transparent.
The largest contrast is between the topics in the lower right of the dia-

gram, connected to contract, and those in the upper left of the diagram.
The latter group focuses on inheritance and wills. Those are topics
whose development came much later in the seventeenth century than the
contract-related topics discussed in the previous paragraph. The topics in

Figure 2. Interconnections of finance topics.

Law and History Review, May 2022218

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


the upper left focus on inheritance and wills and mainly concern property
issues, as is inevitable given the importance of land as the basis of family
relationships at that time. And given the importance of trusts in dealing
with these complicated family-inheritance relationships it is not surprising
that the topic Implementing Trusts should be intimately connected to this
block of topics.
If one wanted to tell an overarching story of development of case law

and legal ideas relevant to finance that is evoked by this figure but removed
from the nuances of specifics, it would be the following. Early stirrings of
an agricultural revolution and the growth of the rural textile industry stim-
ulated a market in the transfer of land-use rights. This led to cases concern-
ing disputes on leases and rentals, which in turn spurred refinements in
contract law. Such refinements were closely associated with the develop-
ment of court procedures that channeled contract disputes as they entered
the court system. These developments naturally fed into the law relevant
to the exchange of financial property and to the debts that arose as a result.
But a separate relationship was with the law relevant to both property and
the family because the types of arrangements that are so important for
finance, trusts and mortgages, for example, were intimately connected
with the way in which English families were trying to structure their inher-
itance arrangements. Given the timing of events, it seems that the two areas
of law, finance and family-inheritance, were developed in tandem, rather
than one obviously being the precursor of the other.
For the reader interested in examining interconnections among different

areas of law, we must emphasize that we have only provided one example
of many different analyses that could be carried out using as data the cor-
relations derived from the topic-modeling exercise. As far as we are aware,
there exists no network analysis on any subject in the pre-industrial legal
history literature that is similar to the one explored in Figure 2, even though
some aspects of the connections appearing in that figure have certainly
been known to legal historians. Where topic-modeling goes beyond what
already exists in the legal history literature is that it is a tool to tell a broader
story, leveraging a comprehensive set of cases, picking up patterns that
might be reflected only in the repetition of thousands of minute sections
of text, introducing easily-understood quantifications, and facilitating the
use of visualizations that aid the genesis of fresh legal-historical insights.

V. Law Versus Equity in Case Law and Legal Ideas on Finance

In examining the development of legal doctrines, legal historians are very
careful to differentiate between law and equity, between the activities of

Using Topic-Modeling in Legal History 219

https://doi.org/10.1017/S0738248022000153 Published online by Cambridge University Press

https://doi.org/10.1017/S0738248022000153


the common-law courts and those outside this system, particularly the Court
of Chancery.56 Nevertheless, this distinction is not made as clearly as it
should be in the related economics literature, especially when interpreting
the development of law on finance and understanding the strengths of the
English legal system. Our met