Derived P-values assuming a chi-square continuous approximation converge with large N, but here we have rather small N, so the convergence is poor with different tests, as expected. So to me these “P-values” from different tests are all in roughly the same ball-park. For instance, the X^2 value gives P=0.0143, not very different from your “exact test” binomial P = 0.0156.

But I reiterate that these are just sign tests, with considerable loss of information. What we really have is much richer data in terms of actual numbers of people clicking. So there is more power in the data than suggested by Lior’s t-test (below).

]]>On the other hand, if there are no systematic differences in the numbers of impressions, then only looking at clicks just adds some noise to your clicks/impressions measurement, which I don’t think is as much of a problem.

]]>I think the statistical critics are a bit harsh. Maybe they’re too used to “big data” analyses. Yes, the numbers of papers he tested before being rumbled are low, but my intuition is that he’s got a really strong result.

Let’s exclude the Nat Commun (as they insist on calling themselves) from glamour journals, though I do agree with Lior’s comment that the “nature.com” in the web address probably does the trick here. Nature cleverly trades on its glamour name to boost a rival PLoS One-type journal where they accept (almost) anything. So I think there’s a lot of confusion about Nature Communications among its readership. (Though to be fair, I’ve reviewed papers that end up being rejected by both Nat Commun and PLoS One).

Even without Nat Commun, all 6 out of 6 glamour journals get higher numbers of clicks when cited by the journal’s own web link. The log likelihood ratio compared with the equal null (i.e. 3/6 expectation) is a pretty hefty -4.16, which for 1 d.f. corresponds with “significance” of P = 0.004. Even using the rather more conservative X^2 test (i.e. o-e squared/e), one still ends up with P=0.0143.

But this is just a sign-test, essentially. The contrast with the PLoS One results in a 2×2 contingency-table (i.e. 6, 0, 1, 3) would give even smaller likelihood ratios, and more extreme significance. (Just checked).

And many of the individual results for the papers are strong deviants from binomial. Lior’s post is tongue-in-cheek, but his results are very convincing.

]]>