The title of this blog post is a phrase coined by Paul Wouters and Rodrigo Costas in their 2012 publication Users, Narcissism and Control—Tracking the Impact of Scholarly Publications in the 21st Century. By “technologies of narcissism”, Wouters and Costas mean tools that allow individuals to rapidly assess the impact, usage and influence of their publications without much effort.  One of the main points that Wouters and Costas try to convey is that individuals using technologies of narcissism must exercise “great care and caution” due to the individual level focus of altmetrics.

I first recall noticing altmetrics associated to one of my papers after publishing the paper “Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities” in PLoS Computational Biology in 2005. Public Library of Science (PLoS) was one of the first publishers to collect and display views and downloads of papers, and I remember being intrigued the first time I noticed my paper statistics. Eventually I developed the habit of revisiting the paper website frequently to check “how we were doing”. I’m probably responsible for at least a few dozen of the downloads that led to the paper making the “top ten” list the following year. PLoS Computational Biology even published a paper where they displayed paper rankings (by downloads). Looking back, while PLoS was a pioneer in developing technologies of narcissism, it was the appetite for them from individuals such as myself that drove a proliferation of new metrics and companies devoted to disseminating them. For example, a few years later in 2012, right when Wouters and Costas were writing about technologies of narcissism, Altmetric.com was founded and today is a business with millions of dollars in revenue, dozens of employees, and a name that is synonymous with the metrics they measure.

Today Altmetric.com’s “Attention Score”  is prominently displayed alongside articles in many journals (for example all of the publications of the Microbial Society) and even bioRxiv displays them. In fact, the importance of altmetrics to bioRxiv is evident as they are mentioned in the very first entry of the bioRxiv FAQ, which states that “bioRxiv provides usage metrics for article views and PDF downloads, as well as altmetrics relating to social media coverage. These metrics will be inaccurate and underestimate actual usage in article-to-article comparisons if an article is also posted elsewhere.” Altmetric.com has worked hard to make it easy for anyone to embed the “Altmetric Attention Score” on their website, and in fact some professors now do so:

Screenshot 2019-01-29 06.16.21.png

What does Altmetric.com measure? The details may surprise you. For example, Altmetric tracks article mentions in Wikipedia, but only in the English, Finnish and Swedish Wikipedias. Tweets, retweets and quoted tweets of articles are tracked, but not all of them. This is because Altmetric.com was cognizant of the possibility of “gaming the system” from the outset and it therefore looks for “evidence of gaming” to try to defend against manipulation of its scores. The “Gaming altmetrics” blogpost by founder Euan Adie in 2013 is an interesting read. Clearly there have been numerous attempts to manipulate the metrics they measure. He writes “We flag up papers this way and then rely on manual curation (nothing beats eyeballing the data) to work out exactly what, if anything, is going on.”

Is anything going on? It’s hard to say. But here are some recent comments on social media:

can I now use (download) statistics to convince Nature Methods to take my paper…?

screenshot 2019-01-29 06.49.38

Perhaps this exchange is tongue-in-cheek but notice that by linking to a bioRxiv preprint the second tweet in the series above actually affected the Altmetric Attention Score of that preprint:

screenshot 2019-01-29 06.52.20

Here is another exchange:

screenshot 2019-01-29 06.57.48

Apparently they are not the first to have this idea:

screenshot 2019-01-29 06.59.07

The last tweet is by a co-founder of bioRxiv. These recent “jokes” (I suppose?) about altmetrics are in response to a recent preprint by Abdill and Blekhman (note that I’ve just upped the Altmetric Attention Score of the preprint by linking to it from this blog; Altmetric.com tracks links from manually curated lists of blogs). The Abdill-Blekhman preprint included an analysis showing a strong correlation between paper downloads and the impact factor of journals where they are published:

screenshot 2019-01-29 07.04.15

The analogous plot showing the correlation between tweets and citations per preprint (averaged by journal where they ended up being published) was made by Sina Booeshaghi and Lynn Yi last year (GitHub repository here):

unknown-2unknown

There are some caveats to the Booeshaghi-Yi analysis (the number of tweets per preprint is capped at 100) but it shows a similar trend to the Abdill-Blekhman analysis. One question which these data raise (and the convenient API by Abdill and Blekhman makes possible to study) is what is the nature of this correlation? Are truly impactful results in preprints being recognized as such (independently) by both journal reviewers/editors and scientists via twitter, or are the altmetrics of preprints directly affecting where they end up being published? The latter possibility is disturbing if true. Twitter activity is highly biased and associated with many factors that have nothing to do with scientific interest or relevance. For example, women are less influential on twitter than men (men are twice as likely to be retweeted as women). Thus, the question of causation in this case is not just of academic interest, it is also of career importance for individuals, and important for science as a whole. The data of Abdill and Blekhman will be useful in studying the question, and are an important starting point to assimilate and build on. I am apparently not the only person who thinks so; Abdill and Blekhman’s preprint is already “highly downloaded”, as can be seen on a companion website to the preprint called Rxivist.

The Rxivist website is useful for browsing the bioRxiv, but it does something else as well. For the first time, it makes accessible two altmetric statistics (paper downloads and tweets) via “author leaderboards”. Unlike Altmetric.com, which appears to carefully (and sometimes) manually curates the statistics going into its final score, and which acts against manipulation of tweets by filtering for bots, the Rxivist leaderboards are based on raw data. This is what a leaderboard of “papers downloaded” looks like:

screenshot 2019-01-29 07.46.40

Reaction was swift:

Screenshot 2019-01-29 07.43.33.png

The fact is that Stephen Floor is right; it is already accepted that the number of times a preprint has been downloaded is relevant and important:

But this raises a question, are concerns about gaming the system overblown? A real problem? How hard is it, really, to write a bot to boost one’s download statistics? Has someone done it already?

Here is a partial answer to the questions above in the form of a short script that downloads any preprint (also available on the blog GitHub repository where the required companion chromedriver binary is also available):

screenshot 2019-01-29 09.31.48

This script highlights a significant vulnerability in raw altmetric scores such as “number of times downloaded” for a preprint or paper. The validation is evident in the fact that in the course of just three days the script was able to raise the number of downloads for the least downloaded Blekhman et al., preprint (relative to its age) from 477 to 33,540. This now makes it one of the top ten downloaded preprints of all time. This is not a parlor trick or a stunt. Rather, it reveals a significant vulnerability in raw altmetrics, and emphasizes that if individuals are to be ranked (preferably not!) and those rankings broadly distributed, at least the rankings should be done on the basis of  metrics that are robust to trivial attacks and manipulations. To wit, I am no fan of the h-index, and while it is a metric that can be manipulated, it is certainly much harder to manipulate than the number of downloads, or tweets, of a paper.

The scientific community would be remiss to ignore the proliferation of technologies of narcissism. These technologies can have real benefit, primarily by providing novel ways to help researchers identify interesting and important work that is relevant to the questions they are pursuing. Furthermore, one of the main advantages of the open licensing of resources such as bioRxiv or PLoS is that they permit natural language processing to facilitate automatic prioritization of articles, search for relevant literature, and mining for specific scientific terms (see e.g., Huang et al. 2016). But I am loathe to accept a scientific enterprise that rewards winners of superficial, easily gamed, popularity contests.