The Journal lmpact Factor (JIF) was first proposed by Eugene Garfield of Institute for Scientific Information (ISI) fame in 1955. It is a journal specific yearly citation measure, defined to be the average number of citations per paper of the papers published in the preceding two years. Obsession with the impact factor in the face of widespread recognition of its shortcomings as a tool for judging the value of science is an unfortunate example of “the tragedy of the commons”.
Leaving aside for a moment the flaws of the JIF, one may wonder whether journals do in fact have any impact? By “impact”one might imagine something along the lines of the simple definition in the Merriam-Webster Dictionary: “to have a strong and often bad effect on (something or someone)” and as an object for the impact one could study the researchers who publish, the scientific community as a whole, or the papers themselves. On the question of impact on papers, common sense suggests that publishing in a high profile journal helps a paper succeed and there is pseudoscience to support that case. However there is little in the way of direct measurement. Twitter to the rescue.
At the end of last year my twitter account was approaching 5,000 followers. Inspired by others, I found myself reflecting on this “milestone” and in anticipation of the event, I started to ponder the scientific utility of amassing such a large numbers of followers. There is, of course, a lot of work being done on natural language processing of twitter feeds, but it struck me that with 5,000 followers I was in a position to use twitter for proactive experimentation rather than just passive mining. Impact factors, followers, and twitter… it was just the right mix for a little experiment…
In my early tweeting days I encountered a minor technical issue with links to papers: it was unclear to me whether I should use link shorteners (and if so which service?) or include direct links to articles in my tweets. I initially thought that using link shorteners would save me characters but I quickly discovered that this was not the case. Eventually, following advice from fellow twitterati, I began tweeting articles only with direct links to the journal websites. Last year, when twitter launched free analytics for all registered users, I started occasionally examining the stats for article tweets, and I began to notice quantitatively what I had always suspected intuitively: tweets of Cell, Nature and Science (CNS) articles were being circulated much more widely than those of other journals. Having use bit.ly, the natural question to ask was how do tweets of journal articles with the journal names compare to tweets with anonymized links?
Starting in August of 2015, I began occasionally tweeting articles about 5 minutes apart, using the exact same text (the article title or brief description) but doing it once with the article linked via the journal website so that the journal name was displayed in the link and once with an a bit.ly link that revealed nothing about the journal source. Twitter analytics allowed me to see, for each tweet, a number of (highly correlated) tweet statistics, and I settled on measuring the number of clicks on the link embedded in the tweet. By switching the order of named/anonymized tweets I figured I could control for a temporal effect in tweet appearance, e.g. it seemed likely that users would click on the most recent links on their feed resulting in more views/clicks etc. for later tweets identical except for link type . Ideally this control would have been performed by A/B testing but that was not a possibility (see Supplementary Materials and Notes). I did my tweeting manually, generally waiting a few weeks between batches of tweets so that nobody would catch on to what I was doing (and thereby ruin the experiment). I was eventually caught forcing me to end the experiment but not before I squeezed in enough tests to achieve a significant p-value for something.
I hypothesized that twitter users will click on articles when, and only when, the titles or topics reflect research of interest to them. Thus, I expected not to find a difference in analytics between tweets made with journal names as opposed to bit.ly links. Strikingly, tweets of articles from Cell, Nature and Science journals (CNS) all resulted in higher clicks on the journal title rather than the anonymized link (p-value 0.0078). The average effect was a ratio of 2.166 between clicks on links with the journal name in comparison to clicks on bit.ly links. I would say that this number is the real journal impact factor of what are now called the “glamour journals” (I’ve reported it to three decimal digits to be consistent with the practice of most journals in advertising their JIFs). To avoid confusion with the standard JIF, I call my measured impact factor the RIF (relative impact factor).
One possible objection to the results reported above is that perhaps the RIF reflects an aversion to clicking on bit.ly links, rather than a preference for clicking on (glamour) journal links. I decided to test that by performing the same test (journal link vs. bit.ly link) with PLoS One articles:
Strikingly, in three out of the four cases tested users displayed an aversion to clicking on PLoS One links. Does this mean that publishing in PLoS One is career suicide? Certainly not (I note that I have published PLoS One papers that I am very proud of, e.g. Disordered Microbial Communities in Asthmatic Airways), but the PLoS One RIF of 0.877 that I measured (average ratio of journal:bit.ly clicks, as explained above) is certainly not very encouraging for those who hope for science to be journal name blind. It also suggests that the RIF of glamour journals does not reflect an aversion to clicking on bit.ly links, but rather an affinity for.. what else to call it but.. glamour.
Academics frequently complain that administrators are at fault for driving researchers to emphasize JIFs, but at the recent Gaming Metrics meeting I attend UC Davis University Librarian MacKenzie Smith pointed out something which my little experiment confirms: “It’s you!”
Supplementary Material and Notes
The journal Nature Communications is not obviously a “glamour journal”, however I included it in that category because the journal link name began nature.com/… Removing the Nature Communications tweet from the glamour analysis increases the glamour journal RIF to 2.264.
The ideal platform for my experiment is an A/B testing setup, and as my former coauthor Dmitry Ryaboy , head of the experimentation team at twitter explains in a blog post, twitter does perform such testing on users for internal purposes. However I could not perform A/B testing directly from my account, hence the implementation of the design described above.
I tried to tweet the journal/bit.ly tweets exactly 5 minutes apart, but once or twice I got distracted reading nonsense on twitter and was delayed by a bit. Perhaps if I’d been more diligent (and been better at dragging out the experiment) I’d have gotten more and better data. I am comforted by the fact that my sample size was >1.
Twitter analytics provided multiple measures, e.g. number of retweets, impressions, total engagements etc., but I settled on link clicks because that data type gave the best results for the argument I wanted to make. The table with the full dataset is available for download from here (or in pdf). The full list of tweets is here.
22 comments
Comments feed for this article
February 23, 2016 at 4:18 am
Odd one
Fantastic study! However here is a data set of n=1 for you:
– I rarely click on cell.com nature.com etc because I don’t like encouraging the glamor;
– I also rarely click on abbreviator links because they track my web usage.
– but I am very happy to click on links to plos.org peerj.org, etc.
I know that makes me an odd one out, but then so do many other things.
February 23, 2016 at 12:50 pm
NLP
This to me us absurdity in its own right. You shouldn’t avoid a paper based on the journal, as much as you shouldn’t like a paper based purely on where it’s published.
February 23, 2016 at 4:56 am
Andre
Can you rule out the possible influence of tweet promotion by Twitter based on tweet content? If Twitter ranks tweets with glamour journal urls higher in ones newsfeed you can expect them to be clicked more often.
February 23, 2016 at 6:44 am
brembs
Very cool idea!
I’m usually a sucker for confirmation bias and this result is straight down my alley. However, this specific result doesn’t seem particularly rock-solid, not even for a confirmation bias sucker like me, no matter to how many digits you present it. It looks like an N=4-7 to me (alright, more than N=1) with two groups, Glam vs. non-Glam. Given the variability in article content, I’d hazard a tentative guess that a different set of 11 articles may have yielded the opposite result.
Strinkingly, I sense competence and a slight hint of self-deprecating humor/irony in your post that makes me believe you are already quite aware of all that 🙂
In which case it was very entertaining, thanks so much!
February 23, 2016 at 7:11 am
Hilda Bastian (@hildabast)
What an interesting idea! But I wish you’d mentioned the sample size in the post (4 PLOS One tweets and 7 for the glams). Given that the PLOS One’s were also at a time of year when traffic might have slowed down, that could matter a lot. Were your engagements still at the same level overall?
Another confounder is if the particular articles in the glam journals were more interesting – a big risk with such a small number, and that the glam journals do get more of those. Did you think they were all equally interesting?
February 23, 2016 at 7:37 am
jishnu
Good to know that my prediction was correct.
As always, this is very interesting!
February 23, 2016 at 3:34 pm
Craig Kaplan
Hee- I wonder who else noticed
February 23, 2016 at 8:21 am
Daniel Weissman
Nice idea! I’m not sure I would have used this design with tweeting each paper twice. You might have had a better shot at evading detection if you had just tweeted each paper once, randomly assigning it to have either a direct or shortened url, and either kept time/day constant or assigned it randomly too.
Looking at the data, I’m a bit surprised not to see any obvious priority effect.
February 23, 2016 at 12:14 pm
Lenny Teytelman
I didn’t want to spoil this for people on Twitter, but if someone is reading the comments, I assume they went through the whole post, including the “supplementary.”
I have been arguing for years that relegating data and methods to supplementary information is a travesty. But I haven’t had anything concrete to point to until this absolutely brilliant post from Lior.
February 23, 2016 at 12:23 pm
Daniel Weissman
Ha, I can’t believe I missed that! Wow…
But looking at the data table, it doesn’t seem so bad.
February 23, 2016 at 3:50 pm
Lenny Teytelman
It’s not bad. It’s genius. It’s N=1 comparisons. 🙂 So “Warning against confirmation bias is missed because of confirmation bias” http://www.thespectroscope.com/read/warning-against-confirmation-bias-is-missed-because-of-confirmation-bias-by-lenny-teytelman-353
February 23, 2016 at 3:34 pm
jishnu
I have enjoyed Lior’s blogs for a variety of reasons, one of which is the fact that the post always has several layers (e.g. bonus question in the P value prize post). When I read this post earlier today, 2 things came to mind (what I found “interesting”):
1. Did Lior really stop the experiment because of one tweet from me or was it part of the elaborate smokescreen/double experiment? I hadn’t checked the raw data till your post, but I had seen the phrase “that data type gave the best results for the argument I wanted to make” line in the supplementary. For me, that gave it away – while the relationship between JIF and attention received may be true, this experiment doesn’t really prove it.
2. I am not sure if there is wordplay on “RIF”. I had googled it and the second result that came up is “Reading is Fundamental” (name of an organization). I have been wondering if that is Lior’s hidden way of telling us that reading the whole post is indeed fundamental to truly understanding what he is trying to say!
February 23, 2016 at 2:49 pm
Z (@nevercube)
I will risk to make a fool of myself. I saw this http://www.thespectroscope.com/read/warning-against-confirmation-bias-is-missed-because-of-confirmation-bias-by-lenny-teytelman-353 and I now feel I should ask: wouldn’t the ratio “link clicks over impressions” be a better measure?
February 23, 2016 at 4:05 pm
Saket
Nice post! This made me wonder if the actual *impact* of a paper should be scaled by JIF, called SIF : $$(no. of citations)/JIF$$. 57/32.242 < 10/5.578
February 23, 2016 at 7:30 pm
eratosignis
Hey guys,
I think the statistical critics are a bit harsh. Maybe they’re too used to “big data” analyses. Yes, the numbers of papers he tested before being rumbled are low, but my intuition is that he’s got a really strong result.
Let’s exclude the Nat Commun (as they insist on calling themselves) from glamour journals, though I do agree with Lior’s comment that the “nature.com” in the web address probably does the trick here. Nature cleverly trades on its glamour name to boost a rival PLoS One-type journal where they accept (almost) anything. So I think there’s a lot of confusion about Nature Communications among its readership. (Though to be fair, I’ve reviewed papers that end up being rejected by both Nat Commun and PLoS One).
Even without Nat Commun, all 6 out of 6 glamour journals get higher numbers of clicks when cited by the journal’s own web link. The log likelihood ratio compared with the equal null (i.e. 3/6 expectation) is a pretty hefty -4.16, which for 1 d.f. corresponds with “significance” of P = 0.004. Even using the rather more conservative X^2 test (i.e. o-e squared/e), one still ends up with P=0.0143.
But this is just a sign-test, essentially. The contrast with the PLoS One results in a 2×2 contingency-table (i.e. 6, 0, 1, 3) would give even smaller likelihood ratios, and more extreme significance. (Just checked).
And many of the individual results for the papers are strong deviants from binomial. Lior’s post is tongue-in-cheek, but his results are very convincing.
February 23, 2016 at 9:52 pm
gasstationwithoutpumps
The statistics here must be wrong, since the chance of 6 coin flips coming up all heads is 1/64, P=0.0156, bigger than either of the approximate tests eratosignis suggests.
February 24, 2016 at 7:56 pm
eratosignis
Correct. Ln (1/64) = -4.16, as I said, and with 1 d.f. this seems to me unlikely. My calculation was merely of the log likelihood, i.e. the log of the probability of the data given the hypothesis of p = 0.5. This is not an approximation. 1/64 seems unlikely to me, doesn’t it to you?
Derived P-values assuming a chi-square continuous approximation converge with large N, but here we have rather small N, so the convergence is poor with different tests, as expected. So to me these “P-values” from different tests are all in roughly the same ball-park. For instance, the X^2 value gives P=0.0143, not very different from your “exact test” binomial P = 0.0156.
But I reiterate that these are just sign tests, with considerable loss of information. What we really have is much richer data in terms of actual numbers of people clicking. So there is more power in the data than suggested by Lior’s t-test (below).
February 24, 2016 at 5:50 am
Joshua Plotkin
I don’t really know what “impressions” mean in Twitter. But if impressions are really opportunities for users to click (i.e. number of times the tweet was seen at all by anyone), then RIF should indeed be based on clicks/impression — in which case the average Glam RIF seems to be about 1.5, and average PLOS rif about 0.9. And clearly these averages are not significantly different based on 11 data points (or t-test).
February 24, 2016 at 6:29 am
Lior Pachter
For clicks/impressions the group averages are, as you say, 1.5478 and 0.9107 and the one-sided t-test gives a p-value of 0.06.
February 24, 2016 at 7:25 am
Daniel Weissman
I think this is too conservative. If there are systematic differences in the number of impressions for direct vs indirect links for a journal, then that’s an important signal to capture. (And there are plausible mechanisms: for instance, direct links could be more likely to be retweeted than indirect links, or more likely to be retweeted by people with many followers.)
On the other hand, if there are no systematic differences in the numbers of impressions, then only looking at clicks just adds some noise to your clicks/impressions measurement, which I don’t think is as much of a problem.
February 24, 2016 at 3:08 pm
Newbie
Thanks for the forest (which I missed until the giveaway in supp) and the trees which made a great read getting there. Did you go into your experiment and data analysis to test the glamhumping hypothesis?
February 25, 2016 at 4:43 pm
David
Cool analysis, but for what it’s worth, I think an important point is that you’re actually estimating the glam factor _conditional_ on a referral by a leading figure in the field. One could speculate that this attenuates the effect of the actual journal name (If Lior Pachter himself thinks this paper is worthy, I should probably read it regardless of where it was published).