What Indo-Europeanists do

My friends often ask me what I do for my studies as an Indo-Europeanist, and I think I’ve found a good way to formulate it. Basically, what we’re trying to do is solve a puzzle with, let’s say, a million pieces. Each piece represents a unique word form – usually in a dead Indo-European language, but some living languages, such as Lithuanian, are studied too. Generally we look at the oldest material available: for example, for Greek we’re interested in the Iliad and Odyssey and old inscriptions, and not as much in Classical Greek or later texts. As a specific data point, the number of puzzle pieces coming from Homer is around 30.000.

For every puzzle piece, we’re trying to figure out how it fits with other pieces. If it’s a derived form (as solved is derived from solve), we want to know the exact rule through which it was derived. For the rules, we try to understand how they came to be (the –ed ending, for example, is thought to originate from a form of ‘to do’ that got attached to the verbal root), and what the grammatical category corresponding to the rule means exactly. For a base form of a word – we call it a lemma – we look for an etymology, explaining how this word was created, and maybe what meaning it had in the proto-language. (Solve is ultimately derives from the Proto-Indo-European root usually reconstructed as *lewh₃ ‘to wash’.)

For 95% of the puzzle pieces, the answers to these questions are obvious. These are the forms students are asked to analyze or derive in classes and during exams. The following 4.5% has agreed-upon answers; you’re not expected to be able to figure these answers yourself, but you can look them up in a dictionary or a textbook. Finally, the remaining 0.5% does not fit, and this is what everyone writes their papers about. Most of this 0.5% is very old and very common words, such as “foot” and “nose”.

That they not fit is a fact of life. Language is messy. There is no grand beautiful system into which everything fits that will eventually reveal itself. You have to shuffle the same pieces around until you find a solution that you consider less bad than others – and there’s a whole set of (contradictory) guidelines to decide which one is less bad. The number of word forms on which a theory hinges is often less than 10. And by the way, some of puzzle pieces might have been broken in transit, and you don’t know which ones: most of the texts have reached us in manuscripts written down many centuries after the works were initially composed, so scribal error is always a factor to keep in mind. Nearly every theory that proposes to fit some pieces in a new way has to explain away the counterevidence, usually by suggesting that some forms were created by analogical replacement. We know for a fact that analogy is a big factor in language change, but we often argue about which exactly forms are original and which are analogical creations. (Of the two past forms of to dive, one would think that dove is old and dived was created by analogy, but in reality it’s the other way round.)

Most of the puzzle pieces have been on the table for at least the last 200 years. Homer, Avesta, Rigveda, the Iguvine Tablets and the Gothic Bible have been studied for a very long time, with solid results. The 20th century has brought us Hittite, Tocharian and Mycenaean Greek, and we still haven’t reached agreement on how to fit some parts of them (such as the Hittite verb) into the system. Nowadays, new puzzle pieces are few and far between: all we have is new Hittite tablets (we recently managed to discover a whole new language in them) and occasional disсoveries for other Anatolian languages and Mycenaean. We mostly have to keep shuffling the pieces we already have.

With Etymograph, I’m trying to build a system that can keep track of (ideally all) the puzzle pieces and how different scholars have tried to connect them. It will then allow you to move large groups of puzzle pieces around and quickly see if some part of them no longer fits after your suggested move. Will this help achieve major discoveries? I don’t think so. It’s also a ton of work. But it feels deeply wrong to me as a software developer that such a tool doesn’t exist, so I keep building.

What’s the point of it all? Not that much, actually. Being able to read Hittite and Mycenaean is extremely valuable for ancient historians, and understanding Hittite wouldn’t have been possible without knowing the Indo-European roots that Hittite words derive from. Theoretically, Proto-Indo-European linguistics can also help us shed light on the question of where we all came from (we call this “linguistic paleontology”, and I’ve got a paper to write about it this semester). In practice, however, it can only answer some of the questions. We’ve learned a lot about the migrations of Proto-Indo-Europeans in the last decade or two, and the reason why we were able to learn this is because we can now analyze ancient DNA, not because we got better at doing linguistics.

And other than that, we’re doing this because it’s fun. So now I’ll thank the 19th-century German dudes who have legitimized this as an area of study, and the university that lets me study this with some of the best scholars in the world, and go read some papers about Indo-European words for horses and wagons. See you in the next one!



Leave a comment