The Rise of the Em Dash in Ecology Abstracts
Chatgpt loves to put em dashes (—) everywhere. Anyone who uses it on a daily basis will have noticed. I just read an interesting blog post that made a compelling case that this could be the result of more em dashes in training sets but also a structural outcome of how language models are trained. Shorter token sequences reduce loss, and em dashes are an efficient, low-token alternative to more verbose punctuation. As a result, newer models seem to be leaning hard into the dash.
Ok, but if AI-generated text is full of em dashes, and if scientists are increasingly using tools like ChatGPT in their writing and editing, could this typographic tic already be showing up in the scientific literature? Specifically, in my own field of ecology?
A quick scan of recent literature
To get a rough answer, I did a little experiment. I’ve already had the opportunity to do a few scientometric experiments on this blog (see Changes in number of authors in ecology journals over time). I always find it fascinating to explore my field of research through this lens.
So, I used OpenAlex to pull 10,000 (that should be enough) English-language ecology abstracts from two years: 2021 (before ChatGPT) and 2025 (now). I filtered for abstracts of moderate length (600 to 3500 characters) and computed the frequency of various punctuation and special characters, normalized by total character count. I didn’t just focus on the em dash because I wanted to see how it compared to a wider typographic character set, so I also included commas, colons, asterisks, ampersands, question marks, etc. The code is available here.
The result is shown in the barplot above. The em dash definitely stands out. Its relative frequency more than doubled over the four-year period and no other character came close to that magnitude of change. A few, like question marks, ampersands, and semicolons, showed some decline (I’m not sure why, it could also be related to how people are using LLMs to clean or revise their text). But the em dash was the only character to show such a clear, steep rise.
This is bothering me more than it should
This result is not proof of direct causality, but it is a very strong indication that LLMs may have already made their mark on the literature. There have already been quite a few reports on how LLMs impact vocabulary.
Now, is this the precursor to the infernal loop evoked by the author of the post quoted in the introduction, which will lead us to the collapse of the LLMs and the end of the world? I have no idea. What bothers me more prosaically is that if nothing is done the use of em dash is likely to become suspect, just as the use of the verb to delve has become. I couldn’t care less about the verb to delve, but I do love the em dash—a typographic character of rare elegance. Well, maybe LLMs have good taste after all. They just need to be a bit more subtle so as not to upset my habits, and destroy themselves in the process.