The title of this blog post is a phrase coined by Paul Wouters and Rodrigo Costas in their 2012 publication Users, Narcissism and Control—Tracking the Impact of Scholarly Publications in the 21st Century. By “technologies of narcissism”, Wouters and Costas mean tools that allow individuals to rapidly assess the impact, usage and influence of their publications without much effort.  One of the main points that Wouters and Costas try to convey is that individuals using technologies of narcissism must exercise “great care and caution” due to the individual level focus of altmetrics.

I first recall noticing altmetrics associated to one of my papers after publishing the paper “Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities” in PLoS Computational Biology in 2005. Public Library of Science (PLoS) was one of the first publishers to collect and display views and downloads of papers, and I remember being intrigued the first time I noticed my paper statistics. Eventually I developed the habit of revisiting the paper website frequently to check “how we were doing”. I’m probably responsible for at least a few dozen of the downloads that led to the paper making the “top ten” list the following year. PLoS Computational Biology even published a paper where they displayed paper rankings (by downloads). Looking back, while PLoS was a pioneer in developing technologies of narcissism, it was the appetite for them from individuals such as myself that drove a proliferation of new metrics and companies devoted to disseminating them. For example, a few years later in 2012, right when Wouters and Costas were writing about technologies of narcissism, was founded and today is a business with millions of dollars in revenue, dozens of employees, and a name that is synonymous with the metrics they measure.

Today’s “Attention Score”  is prominently displayed alongside articles in many journals (for example all of the publications of the Microbial Society) and even bioRxiv displays them. In fact, the importance of altmetrics to bioRxiv is evident as they are mentioned in the very first entry of the bioRxiv FAQ, which states that “bioRxiv provides usage metrics for article views and PDF downloads, as well as altmetrics relating to social media coverage. These metrics will be inaccurate and underestimate actual usage in article-to-article comparisons if an article is also posted elsewhere.” has worked hard to make it easy for anyone to embed the “Altmetric Attention Score” on their website, and in fact some professors now do so:

Screenshot 2019-01-29 06.16.21.png

What does measure? The details may surprise you. For example, Altmetric tracks article mentions in Wikipedia, but only in the English, Finnish and Swedish Wikipedias. Tweets, retweets and quoted tweets of articles are tracked, but not all of them. This is because was cognizant of the possibility of “gaming the system” from the outset and it therefore looks for “evidence of gaming” to try to defend against manipulation of its scores. The “Gaming altmetrics” blogpost by founder Euan Adie in 2013 is an interesting read. Clearly there have been numerous attempts to manipulate the metrics they measure. He writes “We flag up papers this way and then rely on manual curation (nothing beats eyeballing the data) to work out exactly what, if anything, is going on.”

Is anything going on? It’s hard to say. But here are some recent comments on social media:

can I now use (download) statistics to convince Nature Methods to take my paper…?

screenshot 2019-01-29 06.49.38

Perhaps this exchange is tongue-in-cheek but notice that by linking to a bioRxiv preprint the second tweet in the series above actually affected the Altmetric Attention Score of that preprint:

screenshot 2019-01-29 06.52.20

Here is another exchange:

screenshot 2019-01-29 06.57.48

Apparently they are not the first to have this idea:

screenshot 2019-01-29 06.59.07

The last tweet is by a co-founder of bioRxiv. These recent “jokes” (I suppose?) about altmetrics are in response to a recent preprint by Abdill and Blekhman (note that I’ve just upped the Altmetric Attention Score of the preprint by linking to it from this blog; tracks links from manually curated lists of blogs). The Abdill-Blekhman preprint included an analysis showing a strong correlation between paper downloads and the impact factor of journals where they are published:

screenshot 2019-01-29 07.04.15

The analogous plot showing the correlation between tweets and citations per preprint (averaged by journal where they ended up being published) was made by Sina Booeshaghi and Lynn Yi last year (GitHub repository here):


There are some caveats to the Booeshaghi-Yi analysis (the number of tweets per preprint is capped at 100) but it shows a similar trend to the Abdill-Blekhman analysis. One question which these data raise (and the convenient API by Abdill and Blekhman makes possible to study) is what is the nature of this correlation? Are truly impactful results in preprints being recognized as such (independently) by both journal reviewers/editors and scientists via twitter, or are the altmetrics of preprints directly affecting where they end up being published? The latter possibility is disturbing if true. Twitter activity is highly biased and associated with many factors that have nothing to do with scientific interest or relevance. For example, women are less influential on twitter than men (men are twice as likely to be retweeted as women). Thus, the question of causation in this case is not just of academic interest, it is also of career importance for individuals, and important for science as a whole. The data of Abdill and Blekhman will be useful in studying the question, and are an important starting point to assimilate and build on. I am apparently not the only person who thinks so; Abdill and Blekhman’s preprint is already “highly downloaded”, as can be seen on a companion website to the preprint called Rxivist.

The Rxivist website is useful for browsing the bioRxiv, but it does something else as well. For the first time, it makes accessible two altmetric statistics (paper downloads and tweets) via “author leaderboards”. Unlike, which appears to carefully (and sometimes) manually curates the statistics going into its final score, and which acts against manipulation of tweets by filtering for bots, the Rxivist leaderboards are based on raw data. This is what a leaderboard of “papers downloaded” looks like:

screenshot 2019-01-29 07.46.40

Reaction was swift:

Screenshot 2019-01-29 07.43.33.png

The fact is that Stephen Floor is right; it is already accepted that the number of times a preprint has been downloaded is relevant and important:

But this raises a question, are concerns about gaming the system overblown? A real problem? How hard is it, really, to write a bot to boost one’s download statistics? Has someone done it already?

Here is a partial answer to the questions above in the form of a short script that downloads any preprint (also available on the blog GitHub repository where the required companion chromedriver binary is also available):

screenshot 2019-01-29 09.31.48

This script highlights a significant vulnerability in raw altmetric scores such as “number of times downloaded” for a preprint or paper. The validation is evident in the fact that in the course of just three days the script was able to raise the number of downloads for the least downloaded Blekhman et al., preprint (relative to its age) from 477 to 33,540. This now makes it one of the top ten downloaded preprints of all time. This is not a parlor trick or a stunt. Rather, it reveals a significant vulnerability in raw altmetrics, and emphasizes that if individuals are to be ranked (preferably not!) and those rankings broadly distributed, at least the rankings should be done on the basis of  metrics that are robust to trivial attacks and manipulations. To wit, I am no fan of the h-index, and while it is a metric that can be manipulated, it is certainly much harder to manipulate than the number of downloads, or tweets, of a paper.

The scientific community would be remiss to ignore the proliferation of technologies of narcissism. These technologies can have real benefit, primarily by providing novel ways to help researchers identify interesting and important work that is relevant to the questions they are pursuing. Furthermore, one of the main advantages of the open licensing of resources such as bioRxiv or PLoS is that they permit natural language processing to facilitate automatic prioritization of articles, search for relevant literature, and mining for specific scientific terms (see e.g., Huang et al. 2016). But I am loathe to accept a scientific enterprise that rewards winners of superficial, easily gamed, popularity contests.

The Journal lmpact Factor (JIF) was first proposed by Eugene Garfield of Institute for Scientific Information (ISI) fame in 1955. It is a journal specific yearly citation measure, defined to be the average number of citations per paper of the papers published in the preceding two years. Obsession with the impact factor in the face of widespread recognition of its shortcomings as a tool for judging the value of science is an unfortunate example of “the tragedy of the commons”.

Leaving aside for a moment the flaws of the JIF, one may wonder whether journals do in fact have any impact? By “impact”one might imagine something along the lines of the simple definition in the Merriam-Webster Dictionary: “to have a strong and often bad effect on (something or someone)” and as an object for the impact one could study the researchers who publish, the scientific community as a whole, or the papers themselves. On the question of impact on papers, common sense suggests that publishing in a high profile journal helps a paper succeed and there is pseudoscience to support that case. However there is little in the way of direct measurement. Twitter to the rescue.

At the end of last year my twitter account was approaching 5,000 followers. Inspired by others, I found myself reflecting on this “milestone” and in anticipation of the event, I started to ponder the scientific utility of amassing such a large numbers of followers. There is, of course, a lot of work being done on natural language processing of twitter feeds, but it struck me that with 5,000 followers I was in a position to use twitter for proactive experimentation rather than just passive mining. Impact factors, followers, and twitter… it was just the right mix for a little experiment…

In my early tweeting days I encountered a minor technical issue with links to papers: it was unclear to me whether I should use link shorteners (and if so which service?) or include direct links to articles in my tweets. I initially thought that using link shorteners would save me characters but I quickly discovered that this was not the case. Eventually, following advice from fellow twitterati, I began tweeting articles only with direct links to the journal websites. Last year, when twitter launched free analytics for all registered users, I started occasionally examining the stats for article tweets, and I began to notice quantitatively  what I had always suspected intuitively: tweets of Cell, Nature and Science (CNS) articles were being circulated much more widely than those of other journals. Having use, the natural question to ask was how do tweets of journal articles with the journal names compare to tweets with anonymized links?

Starting in August of 2015, I began occasionally tweeting articles about 5 minutes apart, using the exact same text (the article title or brief description) but doing it once with the article linked via the journal website so that the journal name was displayed in the link and once with an a link that revealed nothing about the journal source. Twitter analytics allowed me to see, for each tweet, a number of (highly correlated) tweet statistics, and I settled on measuring the number of clicks on the link embedded in the tweet. By switching the order of named/anonymized tweets I figured I could control for a temporal effect in tweet appearance, e.g. it seemed likely that users would click on the most recent links on their feed resulting in more views/clicks etc. for later tweets identical except for link type . Ideally this control would have been performed by A/B testing but that was not a possibility (see Supplementary Materials and Notes). I did my tweeting manually, generally waiting a few weeks between batches of tweets so that nobody would catch on to what I was doing (and thereby ruin the experiment). I was eventually caught forcing me to end the experiment but not before I squeezed in enough tests to achieve a significant p-value for something.

I hypothesized that twitter users will click on articles when, and only when, the titles or topics reflect research of interest to them. Thus, I expected not to find a difference in analytics between tweets made with journal names as opposed to links. Strikingly, tweets of articles from Cell, Nature and Science journals (CNS) all resulted in higher clicks on the journal title rather than the anonymized link (p-value 0.0078). The average effect was a ratio of 2.166 between clicks on links with the journal name in comparison to clicks on links. I would say that this number is the real journal impact factor of what are now called the “glamour journals” (I’ve reported it to three decimal digits to be consistent with the practice of most journals in advertising their JIFs). To avoid confusion with the standard JIF, I call my measured impact factor the RIF (relative impact factor).

Untitled 3

One possible objection to the results reported above is that perhaps the RIF reflects an aversion to clicking on links, rather than a preference for clicking on (glamour) journal links. I decided to test that by performing the same test (journal link vs. link) with PLoS One articles:

Untitled 4

Strikingly, in three out of the four cases tested users displayed an aversion to clicking on PLoS One links. Does this mean that publishing in PLoS One is career suicide? Certainly not (I note that I have published PLoS One papers that I am very proud of, e.g. Disordered Microbial Communities in Asthmatic Airways), but the PLoS One RIF of 0.877 that I measured (average ratio of clicks, as explained above) is certainly not very encouraging for those who hope for science to be journal name blind. It also suggests that the RIF of glamour journals does not reflect an aversion to clicking on links, but rather an affinity for.. what else to call it but.. glamour.

Academics frequently complain that administrators are at fault for driving researchers to  emphasize JIFs, but at the recent Gaming Metrics meeting I attend UC Davis University Librarian MacKenzie Smith pointed out something which my little experiment confirms: “It’s you!

Supplementary Material and Notes

The journal Nature Communications is not obviously a “glamour journal”, however I included it in that category because the journal link name began… Removing the Nature Communications tweet from the glamour analysis increases the glamour journal RIF to 2.264.

The ideal platform for my experiment is an A/B testing setup, and as my former coauthor Dmitry Ryaboy , head of the experimentation team at twitter explains in a blog post, twitter does perform such testing on users for internal purposes. However I could not perform A/B testing directly from my account, hence the implementation of the design described above.

I tried to tweet the journal/ tweets exactly 5 minutes apart, but once or twice I got distracted reading nonsense on twitter and was delayed by a bit. Perhaps if I’d been more diligent (and been better at dragging out the experiment) I’d have gotten more and better data. I am comforted by the fact that my sample size was >1.

Twitter analytics provided multiple measures, e.g. number of retweets, impressions, total engagements etc., but I settled on link clicks because that data type gave the best results for the argument I wanted to make. The table with the full dataset is available for download from here (or in pdf). The full list of tweets is here.

