Thursday, April 29, 2004

Using Winzip on Pauline authorship

We had a fascinating seminar yesterday in the Graduate Institute for Theology and Religion here in Birmingham. Andy Pryke, from the School of Computer Science gave a paper called:

"Who wrote Paul? Can text analysis based on data compression techniques
(like "winzip") add to our knowledge?"
I will present some preliminary research which applies techniques from computer science and genetic analysis to the text of the letters attributed to Paul. The presentation will show visual representations of the relationships between these documents, and no background in computing is required. Feedback is welcome, particularly on (i) the utility of the method and (ii) the relationship of these results to those of traditional scholarship.
The talk was concise, clearly presented, patiently explaining the computing side of things so that we could all understand, and wanted to enlist the help of those present on the Biblical scholarship side of things.

As I understood it, Andy had applied compression techniques to all the letters in the New Testament in order to ascertain how similar each text was to each text, so that one could see -- for example -- how similar 1 Corinthians is each other letter in the New Testament, then how similar Romans is to each other letter in the New Testament and so on. The use of compression technology like Winzip is useful in this context because it compresses texts by looking for repeated patterns, allowing one to express the compressed text as a number, e.g. "The cat sat on the mat" could be represented as "Θ c@ s@ on Θ m@", thereby reducing the number of necessary symbols from 17 to 10, 0.59. One can then make a direct comparison with another text using the same code, Θ = the, @ = at, and see how similar the chosen text is. "Born of the flesh" could be represented as "Born of Θ flesh" using the same code, reducing the number of necessary symbols from 14 to 12, 0.88, so (obviously) quite different from "the cat sat on the mat". Likewise in the New Testament letters, each text was tested for its relationship to each other text and the degrees of similarity ascertained. The results of the 400+ different relationships can be plotted visually so that one could see where the clustering of similar texts occurred.

The results were interesting. The seven undisputed Paulines appeared to cluster together as very similar texts, but 2 Thessalonians was right there in the mix with them. Ephesians and Colossians were both a bit further away, though similar to each other, and Colossians more similar to the undisputed Paulines than was Ephesians. The Pastorals were way off -- more similar to non-Pauline texts like Hebrews, 1 and 2 Peter than to the undisputed Paulines. And the Pastorals clustered together as similar to each other. The one real anomaly in the results was provided by 1 and 2 John, both of which came out as similar to the undisputed Paulines and less like the other letters in the NT.

No comments: