Ewing, NJ - A statistical method known as stylometry has been used by a group of researchers to prove that a 1583 publication of Consolatio by Roman orator Marcus Tullius Cicero (106-43 BC) is a forgery. The results of these researchers were published in the Journal of Literary and Linguistic Computing in December.

Many would recoil at the suggestion that "literary style" could be captured by numbers, yet statisticians and computer scientists have now developed techniques that do seem to provide a quantitative assessment of style. This method, called stylometry, analyzes frequently-used words, often "function words," in a given text as a feature of authorship. The technique is successful because different authors often use different "function words" almost subconsciously in their writing.

To build a bridge between humanities scholars, who sometimes have been suspicious of attempts to quantify literary style, David Holmes, an assistant professor of mathematics and statistics at The College of New Jersey, has formed relationships with scholars trained in humanities in his work. In his most recent research on Cicero, he worked with Emily K. Tse, from the Department of Classics at UCLA as well as Richard S. Forsyth from the Bristol Stylometry Research Unit at the University of West England. In the past, he has teamed with an expert on Stephen Crane in an attempt to determine the authorship of newspaper articles thought to be written by the journalist and author.

"Stylometry can be a powerful tool, as J.F. Burrows in 1992 was able to show by using the technique on the works of the Bronte sisters and obtaining three distinct cluster," according to Holmes. This is quite remarkable, Holmes added, since the three were linked by heredity and upbringing, lived an isolated life, and wrote in the same genre.

With such impressive results using stylometry, the researchers decided the 1583 Consolatio would be a good candidate for analysis since the current consensus among scholars that it is a forgery is based only on circumstantial evidence. A secondary objective was to determine if stylometry, previously used almost exclusively on English texts, would work using a different language structure since the Consolatio and all control texts were written in Latin.

The Consolatio and the Latin language

When Cicero's daughter died in 45 BC, Cicero composed a philosophical work now known as the Consolatio. Despite Consolatio's reputation in the classical world, only fragments of the text are known to have survived the fall of the Roman Empire. However, a prominent humanist scholar Carlo Sigonio printed a book in 1583 purporting to be a rediscovery of Cicero's Consolatio. Some of Sigonio's contemporaries voiced doubts about the authenticity of the work, and since that time scholarly opinion has differed over the genuineness of the published work.

"In assessing the text, it was important for us to understand the Latin language," Holmes said. "We worked closely with experts from Classical Studies in determining three major (and several minor) phases of the language between the time of Cicero and that of Sigonio."

Classical Latin covers the period from about 100 BC until about 250 AD. Classical Latin was already something of an artificial construct by the middle of the first century AD, and after the fall of the Roman Empire early in the fifth century AD it ceased to be a living language except in law, diplomacy, scholarship and theology. In Western Europe, all who were educated wrote in Latin. The Latin of this period (over a thousand years) is termed medieval Latin. Medieval Latin was predominantly ecclesiastical in nature because teaching was left almost entirely in the hands of churchmen. Finally, the third phase of the language is termed Neo-Latin because it can be dated to the fourteenth century with the revival of humanism and the renaissance of classical learning. Although it was an attempt to re-establish the Latin of the Golden Age of Classical Latin, it could never be a reproduction of that language because technology and society had changed too much in the interim, and it was no one's native tongue, according to Holmes. In fact, many of its users were not proficient at speaking it, but only in writing it.

These three phases of the language are an important component in determining authorship of the Consolatio. "It is often tempting to combine all authors into one single, large multivariate analysis," Holmes said. "However, because the authors of the third age of Latin, sixteen centuries later, tried to mimic authors of Classical Latin, the results of one large analysis are confusing. It is only by breaking the analysis into tasks that the results become feasible," he added. If the Consolatio were genuine, it would have been written by one of the foremost stylists of Classical Latin, whereas if it were a forgery, it would most likely have been written by an imitator, in Neo-Latin. Therefore, at the heart of the problem, according to Holmes, is a way of discerning between Cicero and Ciceronianism--the Neo-Latin imitation of Classical Latin.

The Method

In consultation with Classical Studies experts, the group determined five or six authors and seventy works from the Classical Latin period and the Neo-Latin period for the study. Cicero and Sigonio were among the authors in their selected periods. Twelve works were selected (one from each author) as the basis for determining the function words for comparison. With the help of a Latinist, the forty-six most frequent Latin words, that were not content words, were selected.

When comparing texts written by classical writers against the forty-six function words, clear patterns of usage became present. The method showed a clear distinction between Cicero and other writers of the Classical Latin period in terms of usage of function words. Comparing the same function words against the authors in the Neo-Latin period also produced clear patterns of usage among Sigonio and his contemporaries.

Having checked the efficacy of the set of 46 common words as a discriminator for both sets of control texts, the researchers turned to the main question at hand--was the Consolatio a true Cicero work. Using genuine Cicero and genuine Sigonio texts as defined groups, a stepwise discriminant analysis was run on the data. Four words emerged as the best discriminators between Cicero and Sigonio. The discriminant analysis was accurate nearly 94% of the time without cross validation. The Consolatio was broken into two parts and a discriminant function score was computed for each sample. Using this method, both parts of the Consolatio were assigned to the Sigonio group.

As a verification, because previous analysis has shown that time or genre effect are often so strong that they can partly mask authorship, the researchers conducted a discriminant analysis on the two defined groups of Classical Latin texts and Neo-Latin texts. The model had an accuracy of more than 94%, without cross-validation, in assigning texts to the time periods. The Consolatio was then allocated to one of these two groups, with the results of this discriminant analysis showing that the Consolatio was in fact Neo-Latin.

These two analyses support the opinion of Latin scholars that the Consolatio of 1583 is a work of Neo-Latin, and therefore, not a discovery of Cicero's long lost text. To shed further light on the authorship of the text, the researchers attempted to assign it to one of the popular Neo-Latin authors of the time, namely Sigonio, Muretus, Riccoboni or Vettori, using stylometry. Using variables that best differentiate Sigonio from the other three contemporaries, the Consolatio is clearly more like Sigonio than any of the other three authors. Based on the evidence of this analysis, Riccoboni or Vettori were ruled out as authors of the Consolatio. Muretus, however, could not be entirely ruled out as the author. A stepwise linear discriminant analysis performed on Muretus and Sigonio only, which was 100% successful in assigning seven Muretus and seven Sigonio samples to their correct source, gave both halves of the Consolatio to Sigonio. Therefore, based upon numerous analyses, the evidence shows the Consolatio is a work of Neo-Latin, and not a genuine Ciceronian work. The evidence is quite strong that Sigonio himself wrote the work, but not conclusive. This is a tribute to Sigonio's skill as a Ciceronian imitator. Finally, the evidence clearly shows stylometry can be used to determine authorship of works written in the Latin language.

David Holmes can be reached via e-mail at [email protected] or via phone at 609/771-2164.

The College of New Jersey provides academically prepared students with a challenging undergraduate education and a rewarding residential experience, small classes, and a prestigious faculty. TCNJ has been recognized nationally for its excellence including as a top 10 "Best Buy" all nine years Money magazine published its survey, and in U.S. News and World Report, The Fiske Guide to Colleges, Barron's Profiles of American Colleges, Kiplinger's and Peterson's Competitive Colleges. The College of New Jersey is located on 289 tree-lined acres in suburban Ewing, NJ, located off Rt. 31 (Pennington Rd) approximately 1.5 miles North of Olden Ave. and 1.5 miles South of Rt. 95.