#07 How AI is Clarifying Scripture Authorship History with Dr. Maciej Eder

#07 How AI is Clarifying Scripture Authorship History with Dr. Maciej Eder

1h 59m
2.8K views
339
4
Watch on YouTube

About This Episode

Can AI reveal who really wrote our ancient texts? 🤯📜 In this episode, Dr. Maciej Eder breaks down how stylometry — the science of authorship detection — is uncovering the hidden identities behind religious, literary, and historical documents. From the Bible to Plato, even the smallest words can betray an author’s voice. We explore: How machine learning detects hidden authors Why some ancient works may not be what they seem The blurry lines between storytelling, memory, and manipulation What if AI is the key to rewriting history? 💬 Share your thoughts in the comments! #Stylometry #AncientTexts #Authorship #Plato #BibleHistory #MaciejEder #LanguageScience #MachineLearning #TextAnalysis #ForensicLinguistics #AuthorshipDetection #HistoryRevealed #Unexplained #DataScience #AIHistory #CrypticTexts #LostKnowledge #NewTestament #Shakespeare #literarymystery 00:00 Exploring Authorship and the Bible 05:55 Gender Differences in Writing Styles 12:04 Historical Context of Authorship and Textual Analysis 18:01 The Evolution of Stylometry Through History 24:00 Practical Applications of Stylometry 29:48 The Future of Stylometry and Its Implications 44:15 The Power of Word Length and Function Words 49:30 The Uniqueness of Style and Its Influences 53:34 The Impact of Computational Power on Stylometry 55:31 Stylometry in Authorship Attribution 57:18 The Availability of Texts and Data Collection 01:00:11 Challenges in Accessing Ancient Texts 01:02:11 The Role of Punctuation in Stylometry 01:03:49 Limitations of LLMs in Authorship Attribution 01:04:18 Understanding Dendrograms in Text Analysis 01:09:15 Evidence in Authorship: The Case of Gallus Anonymous 01:15:09 Stylometry and the Controversy of New Testament Authorship 01:23:58 The Evolution of Oral Tradition to Written Texts 01:24:11 The Significance of Oral Agreements in Historical Contexts 01:25:15 The Complexity of Authorship in Religious Texts 01:27:22 The Art of Letter Writing in Historical Contexts 01:29:01 The Collaborative Nature of Ancient Texts 01:31:35 Cultural Shifts from Oral to Written Traditions 01:32:25 Romanticism and the Concept of Individual Authorship 01:33:32 Understanding Historical Texts in Context 01:36:26 The Role of Authors in Shaping History 01:39:29 Advancements in Stylometry and Textual Analysis 01:43:21 Emerging Fields in Literary Studies 01:45:03 Mixed Authorship and Historical Figures 01:48:24 Quantitative Analysis in Textual Studies 01:52:10 The Intersection of Data and Interpretation 01:56:18 The Future of Textual Analysis and AI Contributions

Topics

stylometry
authorship detection
Maciej Eder
Gallus Anonymous
Prometheus Bound
ancient texts
literary analysis
forensic linguistics
AI authorship
Plato authorship
biblical authorship
New Testament controversy
machine learning history
text analysis
authorship algorithm
Shakespeare authorship
stylometric analysis
computational linguistics
cryptic texts
history podcast
linguistic AI
literary mystery
lost authors
language science
data-driven history

Full Transcript

about the Bible, the God's words, and you've got two different versions and they are not in agreement with the author of this missing passage. They starting collecting by buying, stealing, whatever you know the source was. Matt and I first heard about Dr. Mache Ader when I was downloading his software package to analyze the books of the New Testament to try to figure out who actually wrote them. This sounds like a crazy story, but it's true. We saw some young guys on TikTok, and we'll leave a link below, who were who were claiming that they had figured they had some breakthroughs in the authorship of the New Testament. How did they figure it out? Well, they were using this statistical method called Eater's distance equation, I believe it's called, and there was an open source software package that you could use to run the same test that they were doing. So, I did. And then I found out that there was a doctor eter who actually wrote the package. So we got a hold of him and we asked him all sorts of questions about ancient documents. Turns out he was trying to figure out who wrote the first document in his native language of Polish when he realized that with computer algorithms you can now analyze the smallest words in a text like and the but of the little conjoining words and they actually leave a better fingerprint of an author than the big words do which is what we had always assumed about authorship in like the church history. So he's been applying this new statistical method to old documents and the results are fascinating. So if you're into old stuff or you're into new stuff, this one's for you. Welcome to the Austin and Matt podcast. Welcome Dr. Mache Eder to the Austin and Matt podcast. We are so pumped that you uh are joining us. Thanks for being here. Hello. Yeah, thanks for having me. That's that's a big day for me to be with you. Yeah. So the way that I found out about you is I've started to dive into this field of styometry and it seems to be quite vast and deep and it seems like you are one of the um godfathers of it in some way. Yeah. I mean some of you you wrote an algorithm that that others are using. Um, and so what one of the things that I found interesting at least just to introduce the concept of styometry is it's essentially using math and algorithms and machine learning to determine authorship of ancient texts. Do I have that right? What that does that sound about accurate? That's a very short description of of the core of the concept and of the core of the sub subd discipline. What is a part of computational linguistics or you know quantitative linguistics when you try to model the language you want to guess whether or not the language can be predicted or modeled or or whether or not any peculiar peculiarities or idiosyncrasies can be found because if you think of a of a language or the language or a language whatever um then you know we all use the same language say English but each of us has their own um his or her own peculiarities or idiosyncrasies and stalometry is one of the ways of measuring those those non-sistatic or very systematic but not systemic um the features of languages are very typical to each of of us or to some groups uh of us. So telemetry might might ask um a question about the differences between the female and and and male language. For example, if we believe in this binary model of course uh telemetry can ask a question whether um some uh diseases or disorders such as Alzheimer's or you know Parkinson's these kind of things can be measured or can be reflected in style and can you know be reflected by measuring the uh countable features of language. So stoometry is about measuring pecularities and and idiosyncrasies that might specifically of writing right not not necessarily verbal it's more writing well it's language but the nicest insight to languages by writing of course but uh if it is about spoken language that is written down and then analyzed by by the computers no problem at all of course but what it what it measures is the um the written language um the ultim women. Yeah. And so you would use styometry for example in what you just said. You could actually let's say we have a bunch of writings by men and a bunch of writings by women and we could actually put them in two buckets and and then an algorithm would run through it and we would say, "Hey, can you tell can you tell that there's a difference and is there a difference here?" And it would and now we could feed it a a new piece of text and go, "Was this written by a man or a woman?" And based upon the data that it has, it would give you a some sort of probability or something that it was written by a man or a woman. Is that is that about right? Yeah, that's exactly the case. Except that there's plenty of assumptions that we have to make. Of course, we'll be seeing some differences, but uh the differences in language are not only, you know, genetic or dependent on our individuality or gender or whatever, but also on some social linguistic uh factors such as, you know, our our environment uh we live in is putting some some pressures on how we write, how we speak. And uh it is relatively simple to tell apart female and male writing of the 19th century English literature. very simple. You can just tell them apart just like this. And then you'll discover that in the ma on the male part on the male site there will be vocabulary very typical for the for the male authors such as honor bottle wig bottle,2 something like that. And on the female side there will be feelings you know to feel this kind of vocabulary. And it really it's really stereotypical. It's it's embarrassing embarrassingly stereotypical but is it is it you know the the difference between the brains? I don't think it is. It's a social linguistic you know factor that is very strong here. I mean the expectation of the expectations of the of of the society to for the union to write this way rather than that way for the male authors writing you know to to write this way rather than that way. So you can maybe say all right the 1900s we can differentiate between this is a man or a woman but we would be cautious to say well because it's genetic like you're a man so genetically you write this way you would say actually it might be cultural because like men in the 1900s would write this kind of way because of their values and and all of these ways and women would write a different way. So so you can't necessarily conclude that it's genetic but you could but but it still kind of gives you the right answer pretty well pretty well. Yeah. Yeah. Yeah, that's a that's a very good description indeed. Have you found that would it be easy to forge a document? Like if let's say you were a woman in the 1800s and you wanted to write like a man because you thought it would be perceived a different way would that be difficult for a single human being not without the knowledge of styometry without the knowledge of advanced statistics even would it be easy to forge a different writing style? Well, if you knew the system then then you can then you can you know attack the system if you know uh the the the the back door or something like that. And there were some examples, good examples. And this is the um Bronte sisters that were very ambitious and they wanted to be in the generally in the literature rather than into the pigeon hold as a female literature examples and they started writing as you know as normal authors but they were it was totally difficult for them to publish their stuff under their original names. So they decided to publish under the male names. So the Bell brothers that that that's how they started their career and their um the respective um they given names where um they beginning with the same names with the same letter as their original female names. Right? So that's if you know how to do that, yes, you can you can attack the system. Of course, it's more difficult to forge someone else's style. If we are talking not not about gender but about individuality, it's it's more much more difficult. But there are some authors that are particularly talented uh to do so. And um one of my students were was doing the French author um Marcel P and because he wrote a parody of of Emil Zola and someone someone else I don't remember exactly and it turns out that Marcel P was quite do quite good in doing one of those authors uh he was trying to forge and not so good in forging the other one. So it it depends right. Yeah. How how did you get into this? How like because you're so and I have the bio you're the director of the Institute of Polish Language at the Polish Academy of Sciences and you're an associate professor at the pedagogical university at Kkow. Is that right? Yep. That's correct. I mean you're running How many grad students do you have underneath you? Uh yeah the at the university plenty at the institute of the academia of science of the sciences none. Yeah, grad students rather than undergrads but it's like 20 grad students at at the institute. Yeah. I mean yeah how how did you get interested and and yeah I mean because what's interesting about this is it's sort of taking a math approach to language. So like were you more of a language guy? Were you more of a math guy? How how does this how did you Yeah. None of the above I'm afraid. So my my background as literature is early modern literature a bit of ancient literature but mostly early modern um Latin Polish literature. So the the 15 the 16th the 17th century that's that's where I come from. Uh and like 20 years ago I was doing this stuff. My PhD was on on the 17th century literature but then I discovered more and more anonymous cases to be to be cracked uh in both Polish literature, Latin a little bit of Greek too. uh and it it dragged me to to the field of stalometry and you know assessing the authorship of anonymous text by doing statistical analysis of that style and then I had to just you know catch up with with mats and with linguistics that I became a linguist but you know my original field is literature rather than linguistics but to address your your your question I'm more of a linguist rather than a mathematician I'm not at all and so you're you came across basically in your studies there's just a it seems like maybe there was a lot of texts where we just didn't know the author. How is that common in all of our texts that we that we question the author with at least in your experience of wherever you are? Is it how often do we say this is definitely the author and how many texts or do we have a whole under the iceberg right of just tons more where we just don't know who wrote anything you know. Yeah that's a that's a that's a good description. uh it's of course much more complex because while we once we start asking what the concept of authorship is and how it was shaped in the in the in the previous centuries that's that's a big mess but if we just stay where we are um with this intuitive understanding of of the authorship then there's plenty of of texts from the from the um ancient Latin um poetry and pros from from ancient Greek from biblical studies that are anonymous and uh attributed by some um scholars by using different kind of evidence. It can be either you know the external evidence of you know doing the the chemical analysis of the parchment or these kind of things right or analysis of the ink you name it or the analysis of for example for for for for letters for episodes written by someone to someone else then you can um guess the date and then try to attribute that this needs to be that person rather than this person right because because of the dates right and matching the dates um matching some um some events that are mentioned in in in literature or in in in those letters. This kind also help us to link the the dates to the to the person. And this kind of um like evidence like detective evidence that you can use to attribute to try to attribute some text to some some people. And that's what we've been doing um for centuries in classical philology and philologist uh generally speaking in history right so that's that's that normal thing but then in the 19th century there started to be some ideas that the style itself can betray the authorship if you analyze the style itself right by by statistical methodologies uh back then in the 19th century it was very simple I mean those those approaches were very simple uh both in terms of the linguist istic features, language features to be analyzed and the mathematical procedures to be to be applied. But still, you know, that was that was the beginning of the thinking that the style itself can betray the authorship. And one of the founders of the stalometry, actually the person who coined the termmetry, uh, Vincent Trutoski, Polish philosopher of the turn of the of the 19th century. Ah, cool. Poland coming in hot with the stometry here in Kraco where I'm where I'm right now. Wow. So he did a study on Plato. We can discuss that later. Uh he did the study on Plato in which he kind of introduced the idea of stalometry and he said that hey if there is if we in the 19th century can send someone behind bars because they faked someone check and we just you use the hide handwriting to tell apart the authors then come on it should be even simpler with style. He said he naively assumed. Now we know it's it's a bit of naive na kind of naive to to assume that way but still that was the beginning of this thinking. It kind of makes sense a little bit too because there's just more data. I mean one signature you know that's not much. But if you but if you write 10,000 words and I write 10,000 words that's just a lot. At least the assumption is that's a lot of data that we can compare to each other to see maybe if there's some some similarities. Yeah indeed. And the assumption turns out to be uh verified in the lab experiments. What I mean by lab experiments, there is plenty of of work done in machine learning um societies. Basically machine learning is about doing uh lab experiments because what you basically do you train a model whatever this model is and then you take testing data. So you take some some samples of whatever whatever task it is and you uh anonymize it or make it make it blind for for a computer, right? And then you make the computer guess and then you count how many times the computer was right and then you update the model, tweak the parameters, you know, learn some about language of what whatever you are you are going to to measure and then you update the model and then you observe the outcomes. how many how many good guesses you record, right? And then you you you learn how to how to extract this signature or whatever it is. In the case of the language, there's plenty of those um lab tests in or on the texts of the known authorship and you just anonymize them and then you let the computer guess and then you test which language features work best for this kind of task. Right? So that's that's why we know it it works. it does work with some side notes that that we'll probably discussing later today. Yeah. And so basically you can take known works and kind of tune the model to those known works in a way that to where you go, hey, we know these are all written by the same person. So let's at least tweak all the variables so that way that shows that that's written by all the same person because we know that's true. So at least we're dialing in the variables to be fine-tuned. And now that we've done that, now we can introduce other texts and we have a higher degree of confidence that those are whatever it says it is because we tuned it to kind of an answer key that we already knew at least for other texts. Does that make sense? Does that sound right? Yep. Yep. Yeah. That's exactly the procedure. That's it. What is the procedure using widely across all the machine learning you know applications except that here here we we are talking about language and language features but outside this word it can be some image features whatever features. Yeah that's machine learning I guess 101 but but it it knows it with words now. That's so wild. What what do you think it was that triggered the early styometrists to start evaluating works? because I'm guessing they would have already had like an oral tradition of who different texts were written by. What was it that started them questioning different groups of texts? Yeah, if I may share my personal view, I think it was back in the 3rdish century BC in in the Greek Alexandria when they started building the biggest library ever, just ever. And the idea was to collect all the works that have ever that had ever been been been written in the world. And they starting collecting by buying stealing whatever you know the source was. They started collecting masses of papyre. And then they started discovering that the 12 or 15 versions of the Odyssey or the Iliad they collected are different. And it led to the question, wow, so what are these differences about? are they original or if there is a longer version and the shorter version how to attribute this difference right is it original or not so that that's the moment when it started to be to be the question who is the author of those of those differences or you know some passages that are missing here and of this here that's so interesting because I kind of compare it to today you know we have social media right Tik Tok Instagram and and very often somebody's taking someone else's work putting themself on top of it and talking about it or critiquing it or whatever. And that's now like a new work, right? But it's kind of a derivative from the original person who made the first version. And because that didn't exist 2,000 years ago, the Tik Tok of the day was writing. And so it seems like quite normal just as all humans are doing today back then you could take a writing like the Odyssey or anything and modify it and throw your own flare on top of it or throw something on top of it and you might try and pass it off as the original author or you might say here's my critique just just as humans are doing today. It's just sort of getting my mind into that space that they're doing it with text, not with Tik Tok right now. They're doing it with an actual Don't remember that, you know, the stuff that is being broadcasted by Tik Tok is just, you know, random people here. If it is the about the Bible, the God's words and you've got two different versions and they are not in agreement, then the question of who is the author of this missing passage, well, it's kind of important, isn't it? Yeah. Well, can I ask you to speculate on that? Do you think there was ever a time where there was a group of people that said, "Wow, we have these different copies of these texts and for political or social reasons, it would be nice to actually unify it and create one text and get rid of extraneous copies because we don't want people getting confused." Do you think that there's there would have been power in doing that at any time when the written word was really the only way to get information around the world? Well, the classical philology has a long tradition of doing exactly that out of the extent uh manuscripts of say Cicero or Caesar or whatever. Uh they try to like link them into a more or less concise u or more or less consistent version like composite or whatever because that's the only way of of of doing the the real classical um stuff um available unlike the medieval stuff. Here we've got two different or three different manus containing the same text but for example in different slightly different dialects if we're talking about say French medieval French or med medieval English for that matter and those um manuscripts they preserve the whole story the whole you know microcosm right or the whole universe by by by themselves and making a mixture making a you know a composite out of them it's probably not the best way of thinking about these things, right? So the the other tradition is not to mixing them up but just to keep keeping them uh to to keep them separately as as three or four different separate versions. How well do styometrists need to understand a base language that they're studying in order to run these analyses? Well, if you really want to run these analysis, uh the only knowledge you need to have is how to click the okay the the run button, right? But with the stylo tool that you created that you're talking about a good question of you know is it enough really to just run a tool if you want to be confident about the outcomes of your of your analysis that's that's a different question right but uh doing stoometry today is not really difficult I mean technically right because you you created the stylo R package which everyone can download and use to run these models and did you enable it for have you ever you've already thought through the different languages and created settings for them. Is that why? Well, yes. Uh it was a side project. It was a private uh project. I wanted to do some some analysis in a in a systematic way uh testing out different parameters and I wrote for myself a script um to do so because that was the easiest and the the fastest way to do to do the thing uh to have the thing done. uh unlike the Excel spreadsheet that can also do the same thing but it was like 10 times slower 100 times slower. So I just decided to write the script in a certain language. I decided to to be it to be our but you know that was that was a random choice to be to be honest. It's a machine learning language. Yeah. Right. Yes. And then a colleague of mine from the university said hey we can just extend it generalize and then throw it at our students. and he did and I got some feedback and you know that's how it started to be something bigger than just a script for a personal personal use. Did you have to teach yourself how to program in order to do that? Yeah. Yeah. Yeah. I'm a self-taught programmer. Well, was that a long process like figuring out how to because you had all these texts you knew you wanted to analyze them but you hadn't written. And what what year did you start working on the stylo package? Yeah, to be very honest, I'm old enough to remember the 80s and the ZX Spectrum computer that I got when I was a kid. No way. It's cool. I use it mostly to play of course, but not only because you know you get bored very easily with those very simple games. So I started to do some kind of pre-p pro you know para programming or first versions of of programming programming in this basic language in this logo language and whatever they offered. So the general idea of programming or you know doing an algorithm that is then runnable was was kind of familiar to me and then in the 90s when I got my first PC it was easier to do some a little bit of programming. So I I have always been close to programming. I have never been trained to do so. But it was not a totally new world for me to be to be very honest. So it Yeah. Can you explain uh that how can you explain the algorithm that you used? Uh because there's we we call it Dr. Ed's algorithm around here. Uh so how did you come to that and like you know we can go into the weeds as well uh but you know generally how did you work towards that and how did you come to have confidence in sort of the one that you you're using. Okay. Okay. Yeah. To start with if you want to compare two different texts or two different entities uh we are not talking about a programming right now but the mathematical algorithm to just compare two texts. That's right. Um so if you want to compare two texts in machine learning world you have to divide them into those texts or to to make a representation this representation is use usually a number of of word frequencies. So you measure the frequency. We'll be talking in a couple of minutes I I guess about the features. But basically you um analyze the frequencies of the words of the first of the second or the third word whatever right something like that. And then uh you have to compare those those profiles those those frequency profiles. You take say 100 of those uh frequencies of the words D and of mother father whatever right yeah and then you take like 100 of those of those and you've got a profile of um two or more texts you want to uh compare one against another. So if I understand that correctly basically one method is to count up all of the uses of common words or or highest use words frequency. So it's not usually the subject matter words like you know New Testament it's probably a and the me you he all of the common words and basically by adding up the frequencies of those words actually we may all have our own kind of footprint both in frequency and in the distance between those words as written around in the text. Is does I do I have that right? That's exactly the case. It has nothing in common in common with this um the algorithm you were mentioning. This is a general rule that has been discovered like 50 or 70 years ago um by the one of the first uh styometrists um back in the 50s in the 60s. They discovered that Mosel and Wallace they were uh exploring the federalist papers and the question of the authorship of some some you know debated papers out of the collection. uh some we knew were written by Hamilton, some of medicine, but there were a bunch that we uh knew nothing about the authorship either of this or that peri Wallace discovered that the very good predictor is um the difference between the frequency of the on versus upon. This single predictor alone is a very strong predictor for that very for that very cause. But what is important here, it led them to the assumption that maybe this is not the content words but the function words instead the grammatical words that contain quite a lot of um information about this um authoral um uniqueness and they started counting the the frequencies of the on of this kind of function words. They collected like a few thousand of those and started uh comparing the frequencies of those words alone. So that was a very very very big um very big step forward in this methodology. And now the big question and here we come to this algorithm of mine is how to compare this frequencies this um um this frequency profiles if you've got two of them because if it's not an if we know that that the language is not a random um phenomenon. uh the frequencies of the frequent words are not just slightly more frequent than the other words but they are radically more frequent. It's a very very big difference between the frequency of the V and the word like whatever microphone or keyboard that's sure the difference is radical like orders of magnitude and now if you just count the frequencies and um and compare them the side by side then uh if you don't take this fact into consideration then in fact you're measuring the frequencies of the three most or four or or maybe five most frequent words. And this algorithm that I suggested to use is to um add an additional weight to those uh unfrequent words so that they have more space to say something about their differences they produce, right? So that we can uh compare the frequency of the word the with the frequency of the upon because you know the the the relative differences is like huge between them. You're telling me that we betray our identity in our own writing? That every human has its their own sort of footprint or digital fingerprint whenever they write. And that while humans have been trying to figure out who's written what for thousands of years and we've reached certain levels of sort of understanding computers by able being able to analyze thousands and thousands of variables of my of my writing, it it I betray my identity almost. You could almost say that my style is unique to me. And it's not even in all the un infrequent words. It's in the frequent words a and the those things I'm just programmed to to have those will all appear at a specific sort of clip and that and that can reduce my anonymity in my writing. Is that is that kind of the idea? Yeah. Yeah. Not only this uh because you know if if the sample is big enough then not only the vocabulary the vocab the vocabulary betrays your um identity but also this is the syntax some you know grammatical features um the grammatical categories you're using all of those you know language layers are betraying your uh your identity um provided that the sample is long enough of course that's that's a big big question mark here uh that if you take just one stanza of of of a set then it's just it's not enough. You need enough data. Yeah. It almost Have you ever heard of uh Satoshi Nakamoto? Uh he is the supposed inventor of Bitcoin. He she they we don't know anything about it about him or her or them and they released the Bitcoin white paper and the everybody wants to know who's the author of Bitcoin. So it seems like what you're saying is there's a world where we could take the Bitcoin white paper that we don't know who the author is and let's say we know who we have 10 high the the high 10 highest suspects right or of who we think could be Satoshi. We could if we had enough writings from all of those people we could probably well yeah we could probably figure out if they wrote the Satoshi the the the Bitcoin white paper. Is that right? Yeah I think so. I think so. Um however there is this if right we would have to have some reliable uh you know samples of the writings long enough to to train the model surely so but you know wow is it possible for a person do that well you could you could do that you can do that uh however uh I just we we have to assume that this manifesto is single but it could be a group of people I mean but we'll learn either Hey, it sounds like we can learn maybe even we don't get all the way to the answer, but maybe we could learn something about running stalometry on top of the Bitcoin white paper. That kind of idea. Oh, yeah. That that for sure. But again, um I hardly believe this is a one single authored paper. It might contain lots of um some quotations or hidden quotations or whatever to to mess with the style to blur it a bit but still you know that could be telling too h how do you normally deal with quotations within text? You just remove them. uh I kind of do um remove all the prefaces and you know afterwards this kind of things footnotes if possible I take them out but if this is a part of it you know we are we are touching here a very very very big question of what authorship um is after all and to start with an it's not an absurd um example but just a extreme example in the 19 in the 17th century publisher There there is a um cycle of of very short poems um which is a cycle that's that's very important that you know it it has its you know um climax and it's the construction of of the of the whole set is very important here and the final piece the final like three stanzas this is purely a plagiarism the guy purely took it from someone else so it's not written by him But it's it fits ideally into this climactic, you know, moment of that of that cycle. It was is very well done plagiarism. It is. It is. And my question is, is he not an author? After all, he kind of is because he knew how to put this final piece in the right place. Right. So, yeah. So, back to the quotations. If there's a a person that quotes a lot, but this is, you know, kind of to support the the argument, maybe this is a part of the authorship. Think again of the um not the Bible itself, but the uh church fathers and those theologians in the in the past centuries. they they they were plagiarizing very very often because you know if you see someone else's work from the past centuries and you think this is right then why argue with you know with the fathers right you just it's good it's good words and you don't know there's going to be computers in a thousand years that can analyze what you did or not you're not taking that into consideration when you're doing this if someone did you know good job Yeah. Yes. Incorporated. And so you can you um what was the moment where you realized that this was kind of powerful for you or like it was really truthtelling? Were you weren't you analyzing 16th versus 17th century Polish texts or what? Like when was the moment where you kind of ran your first dendrogram or whatever, right? And you get these results back and you just said, "No way. This is incredible." Like did you have a moment like that? I did. I did. But uh it was not the moment from the very beginning when I when I felt this W thing because the first um the first piece I wanted to attribute was actually a failure um because of the sample size. um with a colleague of mine from the Gilling University here in Kraov. We are trying to uh address 17th century Polish uh writer Miku Sam Shajinski and his very vivid erotical um poems that are sometimes attributed to him, sometimes not. He's one of the metaphysics. So it's you can you know kind of compare him to to to George Herbert and those big John Dan you know those big metaphysics the same time basically the same time basically the same uh um same topics um and equally good I would say if not better and the question whether or not he wrote those erotical obscene poems was quite important to in for for the interpretation you know of of his attitude to life the the outcomes were like not decided and blurry and oh okay not conclusive not conclusive at all. So what I did the following study was I just took I I I wanted to address specific to the question um what is what is how big a sample needs to be to finally see the signal and I collected like 60 or 70 um longer point longer novels English polish some other languages like five languages um and I did it independently uh for those languages and then um in this um like machine learning um a way of doing the uh lab tests. I know the answer, the computer does not know the answer. Right? I I anonymize all the samples and I see how it how it how it does the job uh under certain conditions and these conditions being I was just spoiling gradually the samples making them shorter and shorter and shorter uh in order to see when it collapses. Of course, it collapses. Oh yeah, right. But I wanted to see the point and that was that was that was a big wow moment in my life when I realized that hey for long samples it really works. It it it really shows the authorship across those 60 or you know 150 novels. That was that was a wow moment in my life. So you set up an unsupervised model so the math doesn't have any leading towards what what's what. It just says find uh find patterns and then you took the text and kept increasing the text block size in the equation and it was nothing nothing nothing or you know and then it got stronger and and at a certain size boom it all locked together and kind of showed authorship and it you know you see it sort of materialize. Is that sort of what happened? Yeah. Yeah. Yeah. Except that it it it wasn't like a sudden collapse. It was it was declining and then it collapsed but still you know sure but still you see the to see the to see the results and to see it meaningful that that is meaningful it's a meta it's a meta layer on top of it's showing that the analysis is working it was actually the moment in my life when I realized that it really really works uh if some conditions are being met. So how does this so can you so humans have been trying to figure out who wrote what for thousands of years. It sounds like you know we we've been taking a datadriven approach or at least introduced a data thoughtful approach as early as 150 years ago. So it how has stalometry progressed throughout the ages? Cuz I was reading some critiques on it, right? I mean it's like maybe this is hard, right? Because we have humans that are linguists and they're sort of having opinions on who wrote what and stalometry is much more about math and algorithms just giving you cold coldblooded answers. And so I would think there'd be a lot of push back from humans around stallometry. Is that so how is that how's that come about? what what's been the history of that sort of sentiment around stelometry for the over the last hundred years? Some of the criticism uh is or has been um justified because the first um isometric methods were kind of crude kind of very simple. Okay. Which I understand might raise some questions today. Yeah. Uh because if we if you look back to the very very first ideas how this how stalometry could be used even without naming it um this way that was like in the 15th century there was a guy um um Lorento Vala was his name in Italian and Italian humanist who was absolutely fluent in Latin as one was in that time but he was also fluent in ancient Greek which is important here and he took uh on his desk um um donation by Constantine. So the emperor Constantine allegedly gave the power over Rome to the popes forever. That was the document. Y and the popes of course were very kind of they like to use this document. They love that document. Yeah. They love the document. Sure. To justify the the the right to Yeah. to the land. And Lorento looked at that text and say but well look this Greek language is very very vulgar and it's hardly possible it was written in the 4th century when Constantine was was alive and he started that's that's where the stalometric moment starts he started to count those instances where for example the feudal terminology was used you know in the 4th century no feudal countries were there right there yeah there was no feudalism very little feudalism in the 4th century. Not at all. Yeah. Right. So, you're talking about it and the subject matter is sort of the Yeah. grammatical mistakes of of the Greek language and again the mistakes that could not have been made in the fourth centuries but could have been made in the in the 12th century because of the in you know influence of Latin this kind of things right and he started just counting this evidence. So to me this is the first moment when this telemetric thinking was around. So let's let's take the evidence, let's count it, and let's then see if the evidence is big enough, right? And then if we move forward that that was that was that was but that's really just a feel. That's not really math. That's kind of like uh you you I could see as a critic of that you would say well yeah but you don't you know okay they made those mistakes in the 12th century but it seems a lot more um harder to believe in completely because it's almost interpretive as far as like but I I mean as far if you're making a bunch of grammatical mistakes you know you know what I mean it's I can see how it's still crude before you get computers involved there is a still a level of crude evaluation. Yeah, I get the point. I get the point. But this is, you know, this this idea of gathering evidence that was quite Yeah, that's an interesting idea. Yeah. Right. I think ultimately you never can say for sure like 100% probability. You can say 99 whatever, right? But it's never 100%. But the more evidence you acquire, the more difficult it is to just destroy it or put your counterargument. Right? So gathering evidence per se, that's that's a big thing. That's a that's a big thing. And it sounds like his intuition was the same as mine would have been when I started, which would be to use the big words. Count the biggest words that seem the most like subject oriented. And that's what's so interesting about your tweak on everything is is you I guess I guess later on people were counting the small words but then you figured out a way to sort of match the small words and the big words to find a sweet spot for that really creates that figure. That's correct. So in the 19th century in the 19th century there there were a bunch of people a bunch of scholars um mostly interested in a couple of topics one of them being Shakespeare and the Shakespearean cannon. the other Shakespeare. Shakespeare, of course. Of course. You got to see if he really wrote all those. Of course, Shakespeare being uh topic number one. Topic number two, uh being the Bible, of course, the New Testament and Pauline's epistles, of course. Yeah. And topic number three, which is today not not a big thing, but it used to be, it's it's Plato and Plato's dialogues. the an you know the authorship of of Plato's dialogues and the um dating of of particular dialogues that was important to prove some some things about Plato and these uh guys of 19th century um Augustus de Morgan Conrad Mascow and and this Vincent detoask that I mentioned um a few minutes ago they were trying to uh find some language characteristics or language features as we would say today uh to describe type this uniqueness the uniqueness and they switch from those very typical words to some non- lexical uh patterns for example average uh word length as a as a predictor not very good one but still you know that was a something yeah average word length that's pretty good actually average word length that's that's a thing average sentence length another predictor and they started observing uh for example the case of Plato that the endings of each sentences are likely to be rhythmical. So they started to come those rhythmical patterns at the end of each clause or sentence. That's for for for Plato for Shakespeare or some some other things like that. But it doesn't need to be necessarily the lexical level. It can be some other predictors that they that they were thinking about. uh when the lexical level is concerned they were kind of going in circles and they they're in stuck in this thinking that the typical words this is the thing we have to measure right yeah so it was those guys in the in the 60s um doing was doing the federalist papers that discovered the power of the grammatical words so they flipped the the game they were studying the federalist papers got it man and then you know it the the polit some studies not really in authorship but in cognitive um um meaning of those not really meaning but um the cognitive existence of this of this very very tiny grammatical words. For example, there is the test uh I wish I had this sentence for you but I will I will send it to you later. You got a sentence in English and the goal is to count all the instances of the letter F and the best performers you've got like five seconds and the best performers get like six maybe maybe seven where whereas in fact there is 10 of those but we just skip off which is we we skip four we skip the function words as we read you know in totally of Yeah. Yeah. Right. You you we naturally jump over those. we naturally just skip that. Yeah, of course we we process them subconsciously, but that's that's that's the case, right? That we subconsciously use those words, but we don't pay much attention when we deliber deliberately want to write something down. Wow. And that's the beginning of this thinking that, you know, let's measure the function words rather than the content words because also of that property. Oh, I love it. I think that's so cool. Not only that there's another there there is another theory a very very strong one by the way um if you count the word frequencies I was I was also I I said it already a couple of minutes ago but let me go back to that then the um frequency of the word for example the most frequent word v is much more uh frequent than the second on the list which might be say in so there is a mathematical ical um mathematical model that can be that can be used to describe um that property. It's referred to as Zip's law because George Zip in in the 40s discovered that thing and he linked it that's that's that's the interesting part. It he linked it to the theory of the least effort saying that those words that you're using very often, right? Because our we are lazy. Our brains are very lazy. So if you can optimize for uh for energy by using the words from your cash me cash memory right like like rapid access memory. So these are function words that you don't really think deliberately when you're using them. That's another reason of do of of you know measuring these words rather than the content words that you deliberately have to drag out from your memory by you know spending some energy which you don't with the R4. That makes total sense. It's just sort again, you betray your own identity in the way that our lazy brain just uses all the filler words to make everything keep going. We all do that very very uniquely. Um and math with enough data an algorithm can pick you out. Yeah, that's that's exactly it. Uh this dyometry Oh, go ahead. Uh this uniqueness of course is um um affected by many many different factors social linguistic factors as I said that you know the society expects you to to be quite elegant in style uh when you uh receive your Nobel prize for example that's a different style um than you use in comparison to some other situations right so that's one of the factors but these factors are plenty and um these factors include like your education, the books you've read, you know, your teachers, your parents. So, it's not that the this uniqueness of style is is is purely genetic. It's it's not in your not in your DNA. It's a combination of those different layers and factors, right? So, this um the brain laziness is also, you know, affected by other external factors as well, right? To to Yes. this final of yours. You're saying if if I could recap that, you're saying the key to the small words is that the only the way that you learned the small words through every step of your life, whether it's what was easiest to pronounce as a kid or what you heard the most from adults or how you were taught in school or what literature you read. All of those things get fed into one brain. And that's the probably the hardest part to change when you're writing because that's been baked in since you first learned how how the smallest words in your language works. And so if you made an effort, but you don't make an effort if you just have to write something. Yeah. Right. And so inately think of because there there were some some some writers that deliberately ditch uh affect their own style. Uh there's a an example of a Polish writer Biswaf Pruce uh whose diaries were recently discovered and he writes to himself I'm doing too many adjectives so I should because I've counted them I should decrease the number of adjectives in my you know following the something like that and this is a very very rare example people being aware of their style. Yeah. Most people aren't thinking about how they're delivering a message. They're just think about the message they need to deliver and trying to get it out. I think so is it sounds like there was a history of gathering data on on different texts for all the way back from like the second century and then in the 60s there was this breakthrough where you're starting to measure the small words and they had this idea oh maybe the small words create a better fingerprint and then what intuition I'm guessing that's where you started as well because that was like the latest research from the 60s what intuition did you have or what did you feel that that you thought man I need to create a new way to a new algorithm to measure different docu documents. Well, yeah, they this intuition um is very simple and it it boils down to this um thing that some awards are very frequent, some are less frequent because those algorithm are there. There's there's plenty of those. And there was there was a very nice and heavy thinker John Barus from Newcastle, Australia. uh who was he done like that three years or four years ago um 95 he was I I actually met him a very nice guy and he uh was an English literature person or scholar and um again he was mostly a literature guy with some understanding of maths without real understanding but with a very good intuition and he was the first one to introduce this um methodology back to the literary studies and his study of Jane Austin uh style and he was the first one to discover that um those very frequent words should be kind of scaled down. So his idea of using scaling uh in mathematical terms it's it's using zcores or zed score zcoring uh procedure is a very simple thing but you know there needs to be one person to first um realize that hey let's use this procedure. So after he used this procedure of scaling down you know the D and off and scaling up some other like on upon those those not so very frequent words and then he started observing like things happen that was that was a another you know tacid breakthrough uh in my eyes and I just you know added I I just fine-tuned this idea of of of his by further tuning the frequencies of the most most frequent words down so that each of the words has something to say. My assumption was that if you look through the uh list of all the frequencies from the very frequent to the very very non-frequent then after all the signal lives somewhere close to the top of this of of this list rather than across this list because what what Barus did he just flipped it over and did uh each word the right to vote like in the in the wilderness, right? It was it was revolutionary but it was slightly too bit of a revolution to me. Right. So then would it be a fair statement to say that you think that um that styometry has come a long way in the last 10 years? Like how much is computational power helping stometry? How much is the machine learning? Because we've had some of these algorithms for 50 years, right? And so it s like to what degree has stalometry gotten a bump in the last 10? Yeah. Uh the the step is is huge indeed in the last 10 um on two different levels. One is the our understanding of language and this styom or this this fingerprint because now with the advent of um say large language models and this kind of things we can very easily annotate text for not only vocabulary but also for grammatical categories for syntax for whatever then we can enrich and enhance our um our um language output or text like radically, right? So that's one of the one of the one of the elements of this of this revolution. This is going beyond the uh lexical level with the with the help of of of tools such as LLMs. It's like our vision into texts is going from like 480p to 4K. It's like we could just see all the grains of the letters now like in a way that we never could have seen in all the angles. And it's all of a sudden enhance the input data like radically dramatically. Well, and I had read that this teleometry was used to out um JK Rowling, right? Because she was uh Robert Golra in the Robert Golra series. What is it called? The Corin the Corin strike series. Uh was written by JK Rowling and but you know Robert Galrath is the author. It's her pseudonym. And Stalometry was used to basically show they're both JK Rowling. Like you take both authors and she couldn't hide from it, right? I mean the calling yeah was the title of that book written or published by an anonymous person or unknownly to the society this Robert Galbright copies and that was it. And then after after the stoometric test when when she was faced that that that picture and she said well yeah I have admit I have to admit that was myself then the sales rate skyrocket of course that's incredible. I I don't I mean this feels like arguments that have been going on for thousands of years over so many texts and math is finally hitting a point that's being really helpful and kind of taking opinions off the table because it's more hard data. I mean there's still interpretation and to your point it's hard to go ah it's 100%. sometime I mean not that it's not possible but it it really is incredible that we're introducing sort of machine learning at scale into text analysis and all the different factors and just I mean it makes sense as humans we can only hold like 10 or 20 variables in our brains when evaluating something and machines can do a million 10 million variables so it's going to it's it has to they have if we can point them in the right direction they'll be helpful yeah and this is not our last word I'm I'm afraid bro. Yeah. What are some other breakthroughs that you like what do you do you have any other these text analyses that we've been surprised by that people have used this to kind of emerge figure out different authors? Yeah. The other breakthrough is like I remember the times like 10 years ago when uh if you collected 100 novels across one tradition that was a big wow. That was that was really shocking for some people. What does that mean? What do you what do you mean? Uh I mean the a availability of text which oh oh you could just get the text couple of magnitude right? Oh data collection is is increased totally. Yeah. Think of a situation think of a scenario when you've got your federalist papers and that's your all you know corpus and you are um inferring whatever you want to infer. You are testing your methodologies. You're building a theory around that very small amount of text or small number of texts, right? Compare it to like millions of texts that you can just grab and test your theories and methodologies against. That's it. That's that's a new world. Where where do you get ancient texts from? This is one thing I've been wondering is like if you're looking for Plato's original works, where do you go? What's a trusted source that you would go to to find those? Uh, Perseus project that's that's my suggestion. The Perseus project. Peruse project. Yeah. So that's just that's like a known resource inside the AC the field of academia. Is is it like is it just like disperate resources that you kind of have to know being an academic or is there not two because there's quite a number of those and back in time um every single scholar had their own tiny very small and tidy collection on their own floppy disc not to share it with anyone. Right now the world has changed radically and now in GitHub you can you can find quite a lot of uh resources. However, they very rarely speak one to another those resources I mean and the formats are not identical or they are not compliant with the same format. So there are some issues of course but there are some respected sources like the Perseus project that was started if I'm not mistaken back in the 90s something like that. It's it's it's a very very stable and solid uh resource with all that you might want to have if you if you do the text analysis. There's there's a number of similar um libraries for um for later uh Greek literature like the the patristic literature for the biblical sources. These are uh there is a number of of of the of yeah of the sites when you might want to go. It feels go ahead. Well, it feels like we're missing a central repository like as humanity because I think Tus University runs the Perseus project. Is that right? So like what what happens if TUS goes bankrupt or like goes away? Like does all does it all go away? Do we all just have forks on GitHub? It does feel like we need like a central package manager. I also just want to be able to say like load in all of Plato's text and it's just like here's all the trusted sources in the same format. Here's the format. I guess tools like that don't quite exist yet, do they? They don't they don't and the question of getting bankrupt is not an absurd question indeed because we are not ready to just we don't have really backup of the internet right yeah that's that might be an issue at some point not only that um those uh resources are kind of living creatures because whenever you change something you add a text you make a correction or whatever it's not identical with what you got a month ago It's a version control. If you want to just freeze it and and record it on your on your floppy desks, then which version should you stick to? Right. So that's yeah, it is one of the challenges in front of the human because I I see like all these YouTube videos of people saying look at this ancient text or look at this thing what so and so said back 500 years ago. I always think where did you get the document? Like how do I know that the document you have means anything or it's a copy of a copy or like what are what are we dealing with? Different question. That's a different question because not only we deal with the electronic copies of the electronic versions of an addition, but the addition itself, the addition prepared in the 20th and the 19th century. Uh yeah, that's a big thing. It might be a you know a combination of many different manuscripts or papyrie or whatever. It can be you know there's plenty of addition of decisions that the editors introduce along the line. Yeah, they have. For example, that's the reason that uh a very strong predictor in styometry is punctuation. Punctuation marks. This is a very strong thing. But I first thing I do, I get rid of the punctuation whenever I do any any ancient texts because the punctuation was introduced in the 19th century by moves. Added after see a nice signal which is produced by the editors rather than the authors. Oh, the editors will put a signal. Yeah. If Yeah. If you if you were using the punctuation, you would be figuring out which editors it had, not anything about the author, which could be a different test. Yeah. Muddies the data basically. Yeah. Wow. So interesting. Not only you know the uh particular editors, but also some editorial traditions. I mean the French punctu punctuation is slightly different than the English punctuation than German. And you can see those tradition shining through if you don't get rid of the punctuation of the ancient texts. Uh so yeah, go ahead. Well, yeah, I'm wondering will we get will do you think LLMs will get to the point where they can just look at a text and you can say was this Plato and it will be able to answer better than a proper styometric analysis or do you think it will always come across as somewhat baseless because you don't really know what how the LLM got to its conclusion and you just Yeah. First and foremost, um there's an assumption that LLMs are going to be ultimately better than uh human beings in in solving those problems. Sure. Which might or might not be true because LLMs are as good as their training data is and for Plato we are not going to have like millions of Plato. Yeah. So the model is not going in my view is not going to be radically better than what we have now. If we restrict ourselves to to Plato, of course, it's a data limitation. Basically, it's a data limitation. Yeah. And the other question that is embedded in your in your question is uh that some of the um disputed cases are disputed exactly because they're kind of a borderline cases that Plato had a worse day or a better day. It's does not really fit to right. There is no real answer. It could have it's on the line basically. and there's not enough to know. Um, do you have So, we've been playing around with it and that's how we found out about you. But I think we asked you if you had a dendrogram that you would be willing to share and because I think what would be helpful for me also because I've looked at these dendrograms, but I'm just I'm just my own brain's looking at it. But for me and for anyone else watching, can you show us a dendrogram and kind of explain how you got to that point? And then we'll walk through sort of the branches and what that looks like. Oh yes, totally. Uh, I'll be happy to do so. And while you're pulling that up, I'll just explain. So, basically, you load up a bunch of documents and you run this algorithm and it all those frequencies of words uh that we were talking about and you take the chunk and you chunk it up and then basically this will take those chunks and organize them and cluster them according to like how similar they are to each other. And so, a drogram is a is a clustering of text chunks. Uh, so for example, if you were going to do it on the Bible, you could do it by chapters. And so it'll take 1 Corinthians 1, 1 Corinthians 2, and it'll cluster them all together. If they were all written together, you know, if they seem like they're the same author, it'll cluster them. So a dendrogram is a visual representation of how these clusterings of texts are related to each other or aren't. Yeah, that's exactly what it is. It's a it's a visual representation. It's um it's one of the method of um showing the similarities that are being computed by one of the algorithm you mentioned uh this error algorithm there's plenty of other algorithm as well. So this final dendrogram or this treel like structure is a final representation and as you can see in the picture I hope you can see the picture um it is divided into those leaves which are particular text and the branches and the bigger the branch the the more text um clustered within that branch or similar one to another. Here we've got something which is very very important for the Polish tradition because this is um the question who was Gallus anonymous? Galus Anonymous is um the beginning of the Polish literatur literature even if it is not in Polish it's in Latin but that's how you know the the the all our literature starts so it's quite important has been important for centuries who the author was we know that it's anonymous in galus so maybe from Gaul from France but there are different different ideas that this person might have been actually from Hungary or that's the you you're seeing the picture when I test the hypothesis is the Venetian hypothesis saying that this person might be identical with monk of Leo in Venice um who wrote the translatio Mikais or the translation or the transportation of of the um of the bones of on St. St. Michael. And here we've got different texts. I'm not sure if you can um see my cursor, but whatever. And just and just for anybody who's listening to just the audio, uh it's a document that says Gallas anonymous at the top. And so we're an analyzing an author Gallas and uh we're trying to see what what you know if it's true or not. And then imagine a family tree where you kind of have the guy the the patriarch at the top and it splits and splits and splits, but just turn it 90 degrees to the left. So the top of the family tree is on the left side, but now it branches to the right and and we're seeing clusters of different titles basically on on certain on certain branches. Basically, that's what a dentrogram looks like. Yeah, absolutely. Yes. And uh I've collected here because the stoometry is always about comparing. So you have to compare your thing against some other things um similar things of course. Uh and here I've collected a bunch of similar um texts from the the 12th century. Um and um we see that another translation or this transportation of someone's bones of St. Agatha uh is very uh similar to the chronicle of the um monaster s. Uh these are found to be similar. And then the ne the the next most similar to to these two is Bernardo Sylvestri's um the life of um Malahas. And then we've got an interesting cluster um with Galus anonymous chronic polorum or the chronicle of the pose and uh this translation or transportation of St. Michael's uh bones very close one to another. Even if I tweak the parameters, let me just switch from one picture to another. The picture is slightly different because I tweak the parameters and then the final picture is is different. But still whatever I do, I cannot kill the signal of um the monk of Leo in Venice being very similar really similar to uh Gallus anonymous. So that's, you know, that's the evidence. That's not the final, that's not the smoking gun, of course, but that's a very strong evidence. Here I'm just showing you the tip of an iceberg because I used some other um machine learning methods to show or to to test to which extent those two are similar one to another. And it turns out that the similarity is very very very strong. So this is really a strong hypothesis or strong evidence in favor of that hypothesis that that these two guys are the author of the same the that we are dealing here with the same person in fact and I think I think that's a really good point uh is that this isn't just like hey you see a dendagram and oh these are similar so they're the same no it's a part of it is the process as well and you actually want to tweak the variables to see if you can actually break the connection right like let's steal man the opposite uh you know idea and go well okay manarch manus translica can we tweak it enough to where those things break apart and if they keep clustering together with reasonable variables modification then we get more and more confidence that those two are the same versus if I can keep dialing the variables and now I'm just I can make a jump all over the place depending on how the variables are maybe we can't conclude that those two texts are so similar because it depends on how we tweak the data does that sound right yeah that's a very good description and let me switch to another example um here's the picture but before we go to the picture. This is Greek. This is ancient Greek. The um Greek tragedy uh tragedy and comedy in this very picture. So uh as we remember from our secondary schools um there were three um narrators of the of the tragedy tragedies that are extend. This is Escilis, Sophocles and Uripides. Esculus was the author of seven um works that are extend so focus is somewhat more and uridis 20 something I don't remember the exact name the exact number but uh there has always been the problem with Prometheus band that um the that work by by escic also in his piety towards zoo Zeus the god. So for his um the remaining six um works Zeus is a very good god like a good father something like that. Whereas in Prometheus band he's a mean personality. I mean we don't like Zeus in that one. So that's argument number one um put forward by the classical philologist. Argument number two is metrical patterns or versification patterns. As you probably know, um the Greek tragedy um and Greek poetry in general are using some set certain patterns are allowed and certain are not. But you've got some flexibility. And now within this flexibility, those six works um by esculus are likely to use different slightly different patterns than a Prometheus band. So that's the the second argument. And the third is that uh we know that Esculus was the author who introduced the second actor on stage. And here in this Prometheus band, we've got one tiny stage where well, we basically need three actors to to perform that that scene. And of course, one of those is um is is mute does not speak at all. But still um it's very likely that we need three actors to to to perform that scene. And that's the argument number three. And now if we perform stalometric similarity here, it is um this is a dendrogram except that this is an enhanced dendrogram. I've just used different settings in different particular dendrogram dendrograms and this is like a summarization or dendrograms of dendrograms something like that or a consensus tree as it is referred to as so this consensus tree um summarizes the information from many different particular dendrograms and as you can see we've got a very nice um very nice branch for succulus we've got a very nice branch for escalis uh with some work by Uripides written by Escalus who interesting very nonlikely but you know it's what it is we've got escalus and we've got someone who is you know from a different world so comedy but still from a from the same period roughly the same period Aristophanes with his comedy so as we can see um those works that we expect to be guest or guessed correctly and this is Prometheus band which sticks out yeah and that's that's that's an argument that's not a smoking gun is again but now what if I um follow this procedure of introducing some more noise here by more noise I suggest introducing some other works that have nothing in common I mean some oh cool gross wow so for those for those who are not watching who are just listening uh it's it's basically I love the color coding. Uh so the the different works who we think are the different authors have been colorcoded with Aristophanes, Uripides, and they're basically all clustering together. All the different colors are clustering together to show like these authorships look real. Uh but the one Escalus Prometheus bound has its own branch with nothing attached to it and it sort of sticks out like a sore thumb a little bit when you do this. That's it. Yeah. So to me that's that's a strong evidence. But as said it's not a smoking gun situation but you but so we knew qualitative data qualitative points that you made before was that oh it has it takes three actors instead of two and usually he does it this way. So we can actually look at it qualitatively which has been done that was already done thousands for thousands of years. So we have like a suspicion that Prometheusbound is not written by him. But now that we throw math on top of it, it's just one more data point to go, well, we keep comparing it to everywhere else and Prometheus bound is just sticking out on its own branch. It seems pretty strong. Seems like a strong argument gets made to me. To me, that's that's that's a strong argument. And to me, that's how how this, you know, inference should look like because we we will never had the smoking gun, you know, shock. Yeah. Because we don't have a time machine. Can we I just want to go back to the first dingagramgram like this the narrative there. So you had Galis anonymous who who are you saying that was like the first Polish author to write in Latin? Did I understand that right? Okay. Yeah. Yeah. That's that's correct. Um the Polish literature starts somewhere around you know for 13th 14th century with very very short Polish text but earlier um the the country was of course around for for hundreds of years but the state was established you know at the at the end of the 10th century with some you know documents sent from the from the papacy from from Vatican to from Rome to be to be correct to to the Polish kings. There are some very scarce documents, but the first solid narrative text is this chronicle from the from the beginning of the 12th century. So that's how we, you know, treat it as a as a holy grail or whatever. That's the text. It's written it's written in Latin because that was the trade language of the time. They would communicate back to that was the the well not official because there was no official language but there was communication language of the elites as it was across the whole Europe. Right. But you have this one document that you think is it's about the Polish is it's like about the Polish history or something. How does it Yeah, it's about the Polish the dynasty of of of of the Polish kings. Okay. And so so I I would guess maybe kids that grow up in Poland would maybe learn about Galas anonymous at least at least cursorally to say like okay there was this person that gave us the first description of the Polish kings. Yeah. No. Yeah. And now you're trying to figure out who the author was. And so you are able to take old Latin texts written by people that we know lived in Rome or maybe they were priests or archbishops or whatever and you can compare them. And you found that there's a monk in Italy who who it looks like maybe is actually the author. Yeah. Yeah. Yeah. That's that's a fair description except that I was not the first to suggest that very very text to to to compare Gallus anonymous against because this this hypothesis was around for like a few decades now. Well, I think that's also worth noting is a lot of this at least all the stuff that I've uh come across these are ongoing arguments. I mean, these ideas of authorship have, you know, it's it's not there's not a lot of gotchas, but these has just been argued for so long and so everybody's been looking for extra points to to to weigh in on one side versus the other. And so, a lot of this is kind of let's go back to the arguments and see if the math can help solve them rather than creating our own new arguments kind of thing. Yeah. Yeah. That's a very very fair description. But as said at the beginning, the notion of the authorship itself is kind of a wobbly concept, right? After romanticism when, you know, the idea of um of authorship was indeed romanticized, right? The single author that has a vision and they write down, you know, whatever they think. um it didn't really work that well in the middle ages when mixed authorship and you know incorporating someone else's stuff into your stuff was a norm rather than an exception so we have to be careful here right whenever we speak about single authorship of those things c can we that might bring into a good uh a little more controversial topic uh but if we go into the New Testament and uh we found these guys um they're they wrote a book called Christ before Jesus and they essentially have tried to uh use stalometry and your algorithm uh to look at the authorship of the New Testament and their findings are becoming are pretty controversial uh to say the least and are you aware I think I brought it up to you when we met a couple weeks ago but have you looked into that at all or I don't know if you have any ideas before we dive a little bit more into that around sort of what you would expect out of comparing these because I know that The general mainstream idea is, you know, Paul wrote at least certainly these four specific epistles, but there's 13 that are being argued about and then Matthew, Mark, Luke, and John. And, you know, um yeah, what do you think about the idea that this is proving these dendrograms are looking controversial coming on coming out on the other side? Yeah. Yeah, that's a that's a big topic. I haven't read the book, but I kind of know the argument. Um well good for them that they that they were exploring the things of course um and I hope they took into consideration those um assumptions that that we covered here uh today. Uh we have to remember all the time that New Testament this is this is a composite thing. This is a mixture of everything. uh it is primarily oral literature that has been written down maybe years maybe decades decades rather than centuries later I would say decades it's a okay it's a fair estimate sure right sure um it's written in um ancient Greek I hesitated for a sec because that's a special version of an of an of the ancient Greek like a simpler uh like a pigeon ancient Greek which is referred refer to as queen which is a like common language word means entire commonwealth so a simpler version of of of Greek but um the philologists can see I I I cannot I I don't speak Hebrew or or her armic but the the biblical scholars can see through this uh style the actual sayings of of of Christ of Jesus Christ being originally in archaic or araic and uh Hebrew and they can see you know the syntax the Aramaic syntax and Hebrew syntax shining through uh the New Testament uh text right so that's one of the things uh the other thing is that those sayings and you know the whole story were around in oral tradition for maybe decades but maybe longer we don't know right depending on what we think was the the moment when it was written down if the gap was like 12 months or 12 years or more. We don't Can I stop right there? So, so you're saying that it the linguists know that Jesus words were actually spoken in Aramaic even though the original copy we have is in cog Greek. So, so then we don't know how long it was written and then copied in Aramaic before it was originally translated to Greek. It probably was not written in Aramaic at at all, but it was like preserved in oral tradition until it was written down in a different language. So meaning Greek, right? I mean the apostles were um using Armic as their day-to-day language or communication language and Jesus was also using that language except that when he was uh prosecuted and you know um questioned by Pilados or Pilot or not how you pronounce English okay it was probably in Hebrew but we don't know it might have been in Aramaic but you know the biblists know better uh it was it was it was alive in the in the oral tradition until it was written in Greek. So there was a translation involved at the very beginning. The first written version was already in Greek but the original was like the oral tradition was in Aramaic. Right. So that was the first element of this of this composite composite situation. Yeah. Their conclusion is quite uh controversial. They seem to indicate that the New Testament was was pretty much written in the second century. So around like after the year 144 and they have both this stometry and like uh you know qualitative and quantitative data to go over it and it's not a new argument to your point kind of this has been argued about for thousands of years. They're just kind of picking up the picking up the the ball. Uh but yeah there that's kind of their conclusion is that all of this stuff was written after 144 AD which is if true is quite a long time after the events. Um, and maybe just makes you think about taking them super literally or kind of getting the vibe of them and which one's kind of maybe the wiser choice of the two. Yeah, but I don't know. Honestly, I can live with 144 and the oral tradition in between. You can that sounds fine. Do do you think that's because people were maybe more accustomed to oral tradition so they it would have been like Oh, definitely. Yes. So So there is some validity to that argument, you think? It was even you know this kind of thinking that what is oral is more like important than uh what is in in in in script um unlike today because today when we do a deal we say about the deal it's not really in you know executable until we sign it. It was reversing back back in time. And there was one of the um of the of the um meetings of the of the church in Nice or somewhere, one of those first um gatherings when they uh had to stand up one after another and say the oath or whatever they they agreed on orally and only then it became you know the thing, right? So that's that's a different thing in in that's a different thinking in in the oral traditional world. So, I wouldn't be disappointed if if it's reasonable. That's okay. Yeah, that's interesting. I I think that's I think that's fair. I mean, you know, if they were written, it it uh Yeah. Okay. I Roger that. Counter argument. Uh, however, being that the the so-called synoptic gospels, this is Matthew, every everything except John. Matthew. Yeah. Matthew, Mark, Luke. Yeah. Mark, Luke. uh they share like 50% of the material. Yeah, it's difficult to share that big portion of a material just orally for 144 years. So that would be an counterargument, but let's say you know Yeah, that's true. Well, and like one of the things I think we ran one and I was looking for it and I had didn't find it. We got to run it again. Um but I when we ran a dendrogram on the New Testament, what I saw was um one of the things I remember was what? Acts 1 and Acts 28 cluster together outside of the rest of Acts. And so the argument would be if that's, you know, what this is telling us is that someone had a different text and then they wrote the intro and the outro, you know, the first part and the last part on their own and added it to this letter they had already. Yeah. Yeah. And if I was going to do that, right? I mean, you're if you got a letter and you want to add on to it. I mean, and and you know, paper, parchment paper, this is not you don't go to Walmart and buy 500 of them for five bucks. I mean, paper's pretty rare and you know, it's expensive and it's the tech, it's the iPhone. It's the tech, it is the thing of the day. And so, it makes sense that you would try and be very uh careful with it. But why not just take the same text you have and add on the front the introduction and the and the exit just to kind of put your framing on the topic to make sure your audience knows what you want them to see. So I just even seeing some of that stuff seems okay to me, but I I think a lot of people might might get upset about that. You've got a homework assignment and you look at your empty sheet of paper for hours and hours and you don't know where to start. Why don't you translate, you know, the first paragraph from elsewhere? Totally. normal to it's it's nor it would have been the it would have been the smart thing to do rather than start from zero. It would have been the smarter thing to do. Surely so. You've got a church to run and you have to write a letter to to your, you know, worshippers of your church. Then, you know, you have no time to to waste. Part part of me thinks that's just really beautiful actually if that's happened because it's everybody trying to, you know, say what they, you know, say what they think is helpful and trying to lead everybody. Uh, but I don't know when it got switched over to, you know, this is a work of strict non-fiction. um and whether that was the way to take it when it was being written. I don't know. It's it just brings up questions, you know, out of Well, it does some years, we don't know how many years, but you know, years after that happened and was written down, it became the thing, right? Plus the assumption that God themselves kind of that's his or her hand, it makes a difference, right? If if you think of this holy scripture as as something at least inspired by the holy spirit if not more then you know every single word words taken out or written down extra is makes does make a difference. So I I understand those those heated debates. Well, I mean, and you know, one of the questions that I have is like the authors of these different books, like did they know when they were writing it that in a thousand years we were all going to have it bound on our book on our bookshelves and it was going to be like holy scripture? Like when they're writing a letter? I mean, and and if you didn't know that, like if you don't know I'm writing holy scripture, like when you're maybe they wrote it to help everybody out and then like a hundred years later or whatever, everyone goes, "Yeah, let's canonize that." And now it brings on a new meaning that even the author hadn't necessarily even meant when he wrote it. It was just a letter. And now it's like, no, this is doctrine. And I don't know if that's what he was doing when he was writing that letter. Maybe. But totally with you on that. Yeah. Totally. It gets Yeah. Let's not forget that most of those of those episodes, but also the gospels themselves were probably written kind of collaboratively. I mean you can have one look or one work uh holding the pen but they would ask their colleagues but what did what did Jesus say on that day when we met him? Yeah, right. It's kind of backwards, right? Because paper's expensive. You don't want to do one draft and then, hey, everybody read this. We're going to write another one and then we're going to write another one. Like you you would do that. You would think it's like it reminds me of painters when they're going to paint something. They kind of sketch it out in pencil first and then they change it and they they do so much pre-work before they just put paint to canvas because it's, you know, it's a huge canvas and they don't they want to get it right. And you would think about these documents and these letters exactly the same. Like, hey, should I say it this way or this way? And maybe you have scrap stuff over here. So, the one you write on is the final one. Seems like that's just a re reasonable way to put these letters together. That's exactly the case. Plus, you know, let's not forget about the idea of sketching out the letter and then giving it to your secretary to just elaborate on that or copy the middle. Right? If I write the first paragraph and the last paragraph, let's get a scribe to just get that middle part in. And I got other things to do. Just, hey, copy that in. Throw my thing on the front and back. And that's the letter going out. It's an email. It's our Friday, you know, update on things. I don't know. It's it's I could see that, but it is why it is interesting. Do you have any intuition for when cultures became less oral tradition focused and started writing things down or like when it became when it changed in importance when the writings became more important than the oral tradition or was that word was that different across all cultures maybe? It depends on the culture on the particular culture but in in in Europe or you know I I take America as being part of Europe I mean culturally. Yeah. Yeah. Uh so in Europe it must have been middle agesish kind of uh but for elsewhere I cannot say for sure but in in Eastern Europe in maybe not even Poland but Lithuania those those those um those areas it it it came later like a century later or two centuries later you can still see uh for example in the Polish literature of the 16th century there are some guys that are definitely written down but you can see in the way uh they write they kind think of this being read out or something like that because they are very obsessed about saying what the next paragraph is going to be about. So they kind of provide you some some guidance all the time so that you're not you don't lose the track as if it was meant to be read out aloud rather than just read um uh read silently um by particular readers. So you can you can see those traces of the oral cultures to like 16 17th centuries in some countries in Europe. So that yeah it depends. It really it really does. Oh it's interesting. You know, you said something earlier that was really interesting that did you say that in that it was a very romantic idea and attached to the romantic part in history that one person would write an entire text. Even just the idea that I have some brilliant breakthrough and that I got to get it on paper and I solo do that. That concept to humanity actually really wasn't even considered until the romantic movement. Basically even worse uh in the uh 16th 13th centuries in the early modern period in the Renaissance Renaissance for example in the Barack uh there was this idea of imitation and you are expected to show how good you are in reading the ancient authors by imitating them in the right way. So it was way in a welcome to have some pieces from from the octurus as it is referred in Latin. So the authors capitalized of course right more show the the the knowledge about the authors capitalized the better for it shows you just you just have to be really careful to read any ancient any text that is more than 100 years old. We're reading it anacronistically if we don't take into account all of these things. Like just to read any any article from a thousand years ago and think, "Oh, this is like a university textbook just or whatever you want to think it's like without the proper context, you're going to read it with you're starting it's framed completely wrong." That's crazy. Very fair assessment. Yeah. You just think they're writing it. They might be writing in a culture where it's actually better to to copy people because you're showing that you know all these different people and I can incorporate all their ideas. That might be the thing that makes great writing at the time, not some romantic ideal of this beautiful vision that I've had that I have to get out on paper. That is so fascinating that yeah theoretical takes on that. There are some authors uh for example this uh ice of civil You'll have to check out which which century it is. It's early middle ages. Um the author of the first encyclopedia because he was just you know gathering every every uh single knowledge uh he could gather and put into one very very long work. And one of his thoughts was um the ideas of authorship and he said like there are four um ways of being an author. One is that you just write someone else's that you just ascribe. So he write down someone else's words word by word. The other is you are like elaborating on someone's notes or translator and translator. So you're an author by being a translator for language one language to another. The third way was you're a commentator. So we've got to say holy scripture and then you on the on the margins you add in you know commentaries. And only the fourth time was you write something taking you know the material from from from your predecessors. So there was no at all the idea that you create something from nowhere or nothing that amazing. Why would you I I wonder if some of these scribes sat down and knowing that they were one of the only people thinking about how their time period was going to be recorded and how it interacted with older documents. I wonder if some of them felt like they held like the reigns of history in their hands. They're like, "Man, I'm going to dictate cool what history was to the rest of humanity. So, I get to choose how this goes down." That's so know they were writing the history book. Yeah. Right. And what they did. Yeah. I I'm kind of curious, Dr. Eder, like just out of your own personal curiosities, what are you if you could crack one mystery in all of history, what would it be given all the things you've looked at? H you just have one pass to to actually see the truth. Which one does it is it going to be? Yeah, I would I would love to go to the this concept of of um of authorship as you know something linguistic uniqueness. Can it be boiled down to some you know some general uh laws of of nature? Can we anyhow link you know the the way in which we think I mean the this um frequencies of the words these kind of things we we covered already is it a derivative of any of the bigger loss of nature right that that would be what do you mean by loss of nature like kind of just living in cities and kind of getting out out of the out of nature or what do you mean no what I mean by that is that if you for example look at the sizes of the cities you mentioned the cities Then they also follow some some mathematical distribution. Somehow we the humans know how to build the cities of a certain size. I mean naturally if we look at the um in the nest of the of ants they also follow you know some distribution somehow the nature knows how to self-organize it you know itself. Yeah. And my question would be whether or not this you know uniqueness of language and the language itself is also you know part of this bigger law of nature of self-organization right so that would be one of the myster emergent are there emergent properties as humanity scales up are there emergent properties of language that come out whoa that's a good idea yeah that's really interesting and and does it does it apply across all languages like like have you ever gotten a chance to look at any Chinese texts or any That's too Yeah, Chinese is is is difficult because here not only we have a the language barrier but also a script is different. I mean not only a different grammar but also the script is different and then it's difficult to um define a word. So here what I do because I I I I was asked this question. So um with some students we did some experiments with Chinese. We just switch from um from words to characters to like pairs of characters, character diagrams or triagrams something like that to work around the problem of of the word delimitation. But yeah there are some languages in the case of which the question of the word itself is not not trivial at all. But it seems that some of those laws and you know the authoral signal works across all the languages with some tweaks with some you know for some languages you had to you have to do some additional um effort that's that's true but it it more or less works universally that's also cool what do you think's left in stalometry what like what other juice can we squeeze out of this thing right like now that computers are coming up and is it better algorithms is it obviously more data is always helpful but how do you see stometry continuing to kind of uh get better from here. Yeah. Uh mathematically you are we are quite far already but I think that those better models will be able to see more for sure. I mean the statistical separation starts being visible when you use very very primitive methods but the the more nuanced method nonlinear methods or progistic methods you use the the better the separation. So I think we have not said the last word here. Um yeah that's my intuition. I I cannot uh prove it. So that would be one of the directions but uh my big hopes is the um using technology to enhance the text themselves. I mean to because now we can what we can do we can just take the text as a string of characters. we can divide it into words by you know using space or something similar as delimiters right and then we can analyze this very very shallow representation of language sure yeah yeah and I think that there's going to happen quite a lot on enhancing this shallow representation guessing you know also the division into words but also the division into morph themes or phonms or something like that that will enhance what we are seeing right you know, wide breaking up the words even. Yeah. And going even further into detail could yield more more insight. And I love the idea you had of using LMS to sort of tag different texts with with whatever you can ask an LLM about that it could answer correctly and then throw that into the data set to get more angles on it. I don't really hope that using Lens directly to say, hey, tell me who wrote this or that will solve our problems. But using it as you know as a as an intermediate step to to enhance the data. That's right. It can do a it can analyze text and the beauty is it never gets tired and it can do it it can do a million pages in like an hour and so it maybe even that it can just do all the hard work in that in that it can it doesn't get tired. It can go through all the tedium of tagging every bit of text that we want it to with whatever we want to tag it with. And that's really valuable because to your point a lot of this is just Let's buy it and never gets annoyed when you ask the same questions. And to some degree, this is just work, right? Tag this, tag all the stuff because we got to get the data to analyze it properly. If you don't have good data, you're, you know, clean data in, clean conclusions out, dirty data in, dirty conclusions. And so LLMs can be really helpful in cleaning up the data, but let's still run the machine learning after that. Will we be able to detect plagiarism? Like will we be able to detect an LM wrote something using styometry or will LM instantly know how to bypass that? Listen though it gets better and better every single month. The LMS do. Yeah. Yeah. Yeah. Yeah. They do. uh with a with with with my colleagues, we did some some prelim preliminary text by you know translating by Google translate uh some some words say from French to English and then in English you could still see the um the original um the original signal shining through but you could also be able to see um the Google translate the specific translator signal but oh it's like the editor of the the the editor is showing himself the goo Google or whatever. Oh my god. But not you know it's it's with elements it's it's getting more and more difficult every single month. That's that's amazing. It gets really they're getting better. Yeah. Yeah. Getting better. Way faster. Yeah. That is amazing. Holy cow. What are you working on now? What do your grad students work on? What does does everybody kind of pick their their texts that they kind of want to go after or like how do people study in this field? What what you know what does that look like? Uh that's that's a good question indeed because this field does not really exist as a specific sub field. It's just a bunch of people going from different directions. The linguists interested in how language works. Literary guys that want to solve some particular problems. But also there is a new field that is emerging computational literary studies like CLS. There's got to be a math side coming up right in academia for it. Not only solving the authorship questions but also these other layers that can be traced in in text like gender does gender exist did it exist in 19th century does it does it exist in the contemporary literature? these kind of questions, right? or in the um linguistic subfield of that of that stalometric um they tend the question of of the diseases is al Alzheimer leaving a trace in language we speak or you know the personality can we can we um do the rather than authorship attribution authorship profiling can we profile someone in terms of their extraverticism you know this kind of psychological traits Can we can we trace them out or not that much? So these are the questions that that are being uh addressed. So depending on where you are from uh you might be interested in some some questions of that not really well definfined sub field of study. Man that's so interesting. Weren't you telling me about you told me some a different story. It was about a guy and he um his his the person who was taking care of him was helping with his notes or something. Does that do you know that? Ah okay. Yeah. Yeah. Yeah. You were asking about uh mixed authorship and different approaches to mixed authorship. Yeah. Yeah. Yeah. Absolutely. Um I was talking about Francis Bacon, the famous Francis Bacon, the philosopher of the 17th century, big English name, uh who wrote lots of things except except that he did it himself. He kind of did. Yeah. Okay. Spoiler. He kind of spoiler. Okay. What happened? Yeah, the it turns out that the later uh that the older he was, the more he was uh dictating rather than writing himself. and he would just stroll along the thumbs, the river Tams with his secretary and just sketch out the general ideas for the secretary to elaborate on and you know write down on paper in the final form and then because he was he was involved in many different governmental roles I don't remember exactly what but he was he was a busy person he was a busy guy he was a sir sir Francis Bacon absolutely yes he had a lot of nightly duties he had to That's right. So he had less and less time for this kind of entertainment um what we all mean by writing interesting books. So his most famous book new Atlantis was not written by him at all. It turns out he was dictating or sketching out the general ideas to his secretary. Say about actually another very famous person from the 13th century France Hildigard of France Belgium. uh Hildigert of Bingan, another big name and visionary and in music as well, musicology and you know lots of roles and she was also not very confident about her Latin. uh she was of course very fluent but as I'm kind of fluent in English and this is not my native language so I would rather ask someone to crosscheck my writing if there is you know if the articles are correct and you know the usual she also we didn't invent it she did she she actually hired the person to correct her Latin a native speaker better than her to correct her Latin and the older she was the more a power she transferred to her secretaries. And we know the the last secretary by name was Gibber of GMLO. Uh to whom she gave all the power all the you know responsibility to write because she wrote explicitly you will know better what I think and she would just you know sketch the general ideas and he would take over and we can we can trace that styometric styometrically as well. Right. When she in her in her last works it was like a mixture of the two. I can I mean that is fascinating that you could take Sir Francis Bacon's work and the first one he does is all of his writing and then over time you can actually put them next to each other and you can see the two authors one goes down and one goes up to where his last book was actually not him at all. It was his secretary. And and and you hear that and you go of course of course he is getting old. He's famous. He's rich. He's run he's done a bunch of things and someone else is doing it for him and it's okay. It's just, you know, but and of course it was his ideas and and maybe he sketched it out, but how interesting that we finally have math that can show that like we can have these ideas, but now we have concrete algorithms that can run that through and go, "Yeah, there's two authors here." And in fact, they both one's 100%, one's zero, and by the end they flip-flopped. We can show that. That's incredible. It is except that you have to be quite well resistant to the bitter truth sometimes. Oh, you got to hold Yeah, fair. That you you can't I'm not talking about Sir Francis Bacon, but the biblical stuff that's, you know, Yeah, might get tough at times. Yeah, it was tough because I see this dendrogram and all the a lot of the books are all overlapping. They're all jumbled and like that one dendrogram you just showed us with Escilis and and and and the they all clustered perfectly like all the greens were together all the reds were together except for uh Prometheus bound which was on its own and when you see the dendrogram from the New Testament it's sort of all over the place and so it's sort of to me that's a sign that the signal is not strong here and the final result should be kind of mitigated at times. What do you mean by that? might be there but you know we cannot put too much attention on that. Yeah. Well, I would would you expect to see if they were all if everything's written by Paul, wouldn't you expect to see everything sort of cluster in one branch if it's all a singular author? Yes. Slash um the genre is also a factor here. So by episode the episodes should cluster by definition by definition all the lines episode with with you know Peter, John, whoever else wrote those epistles as opposed to the revelation and the gospels. Uh when the gospels are concerned, we know that they should be clustered together. The the synoptic gospels um Mark, Luke, and Matthew. Yep. And John being separate. And that's what you're likely to be seeing. And we did see that Matthew, Mark, and Luke are kind of clustering more so. And then John is kind of off on its own. Not surprised. Yeah. Not surprised at all. The good question is if John author the u the author of of the revelation is the same John of the of the of the gospel and stalometrically not at all. Not at all. Yeah. Also the bib the bibllicists kind of agree that no this is this is a different person. So here's and so it just shows yeah John whoever wrote John didn't write revelation. They I mean they adjust two different they split it and it's different John's and that's okay. like it, you know, it but it's just uh yeah, I I the what I've said I've said this a few times but it feels like humans have been arguing over this or you know debating it for thousands of years and then like a robot just came in and was like boop there's the answers like here you go and it sort of I don't know if everybody knows what to do with it like I don't know what someone who's studied this their whole lives that maybe it's cognitive dissonance and and I'm not saying it's all the way true on this side either but I mean it's new data data that's much less opinionated than all the other data we've had before because it's just a computer that's making correlations. And I don't know, it's it's interesting to but it doesn't sound like that knocks you off your rocker. In fact, it feels like you've been swimming in this for decades. Yeah. The only thing being that if the outcomes are a bit wobbly, you shouldn't put too much attention on on what you're seeing, right? That's the single tiny side note that you should keep in mind totally that's yeah because it's a process of experimentation so it it takes a lot of tuning and then you build your confidence as you tune and as you as you ask the different questions I guess it's not just like a black and white yes or no can be black and white at at times but you know think of of of actual fingerprints of course in the police academy you see a very very clear cut fingerprints but then in fact when you gather evidence it's like a fragment two fragments out of which you have to extrapolate sometimes the evidence is strong but usually it is not same here right this idea of of the of the um fingerprint exists and it's it's there we can prove it it is but in fact when you when you analyze you know very short fragments that were probably written collaboratively of with the help of someone else then don't expect the signal to be there. Yep. Well, and that was interesting. One of the things that um these guys had mentioned and they'll show it in their they showed in their data is that uh there's a book phileiman and it's the shortest book. It's like you know a couple paragraphs and they basically were like Phileiman jumps around all over the place depending on how we tune the algorithm because it's a little amount of data. So we don't we can't say anything about Phileiman because there's not enough data to draw any any serious conclusions. to your point kind of as they keep tweaking it if if clusters stay similar that's a stronger me signal and if you can keep tweaking it and things keep jumping around then overall it's sort of a weaker signal in some ways. Yeah. To me that's a very good proxy to to which extent the the data is stable indeed. Yeah. Yeah. And they also slashed it into a different way. They said that they they uh they got a bunch of first century Greek, 2nd century Greek, 3rd century Greek and fourth century Greek literature and then they just clustered those and they took the whole New Testament said what's the writing style like right so because we write I if I was to write a paragraph and in 1925 someone was to write a paragraph in English they'll be very different and they found that the whole New Testament clustered around the second century and that they write a very fair assumption to be honest that's that's a very fair assumption. because language is developing and you do see you know the temporal signal in the data very often. Sure. However, let's not forget that the New Testament was super influential and the text from the second from the third century might have been inspired this way or the other. Yep. By this very influential text. So if we can rule out this factor then then you know I like this outcomes. Otherwise, I would ask this question to the authors. And I think that's you kind of it's a good summary. You know, this is giving us hard quantitative data. Uh but we also need to take into account qualitative data as well of like how is it being collected and and taking into these other just using our brains actually to not just say well the math says this so this is it but it interpreting it with everything else we already know and that kind of provides the nuance. So that's why it's a you really gota you really got to get your brain around a specific topic. Even to your point that you know the idea that a single author would write something it was a notion that didn't happen till the enlightenment basically that that that's the that's what makes good writing. And so why would anybody write that way 2,000 years ago? It wasn't even it wasn't even the way they wrote. So you know I think even that sort of education is helpful for analyzing all these texts as well. There's tons there's tons around it that you have to know. Yeah. And a friend of mine is using this analogy or a metaphor of u taking blood samples. If you go to the doctor uh you take blood sample, your urine sample whatever and then you take it to the lab. So the math does some some some things for you and then it is being inspected manually by the doctor. The final word is you know as someone who has a brain, right? That's right. He's interpreting the data. Yeah. So that's right. I like that. Man, this was great. Dr. Eder, thanks for spending some time with us. This was fantastic. Yeah, that was a big big day for me as well. Yeah. Yeah. I mean, well, and it's interesting. I just learned I mean, it's it's it seems to me a huge breakthrough in textual analysis that computers are now contributing and I just think that more people should know about that because it it actually to me it solves more questions than it answers. And in this world of trying to figure things out, that's really helpful. Well, that's what the academic life is about. It's more questions than the answers so that we have some jobs left. Yeah, it's true. And there always are. I don't think we're ever going to run out of questions. No. Um, but it is so interesting to learn about how computers now can help us analyze texts and contribute to authorship debates and and it's also pretty controversial. And so I really wanted you're one of the experts in the field. you've defined one of the algorithms that's being used by other people. You've open sourced it and you're very open to sharing that knowledge. So, I really appreciate it and we're just trying to I wanted to understand it and I and I want other people to understand it. I want other people to get it, but we can't all take the decades of our life like you and dive deep into it. So, thank you for sharing with us because, you know, it's really helpful to understand how these, you know, what's what's new in the world of textual analysis and debate. It's really helpful. Thank you. Thank you. You're totally welcome. Yeah, on behalf of all the curious-minded, we appreciate all the contributions you've made because it's thanks to just sitting questions all the time. It's really nice to hear that people are working on answers somewhere in the corners of the world. So, thank you. Oh, and if anybody wants to make a dendrogram, we actually made a little tiny software. It's at words. Austinmatt.com if anybody wants to go. There's unreed. I want to recommend that one. I've seen it work and that was quite impressive. you know, we just thanks. So, yeah, we threw it together. There's no login, no nothing. And, you know, I hope you know, we'll we'll keep it alive as long as we can, but it's interesting to just pick different things and create a dendrogram and kind of see what it looks like. So, for those who haven't checked it out, uh, that's one place you can go or just or just Google Dendrogram and start looking at them to see what people have done. Um, but yeah, this has been great. Yeah, really appreciate your time. Um, all right. I think we're gonna call it a wrap. Dr. Eder, thanks so much. And where? And I guess you're in Poland, so any anybody's out there in Poland, go visit Dr. Eder. He's the man. That's great. Okay, thank you so much. Um, yeah, we appreciate it. We're going to call it a wrap. Thanks everybody. Thanks for coming, Dr. Eder. Thanks a lot. See you. See you. Bye. Bye.