Algorithmic bias is a term used to describe situations where an algorithm systematically produces outcomes that are less favorable to individuals within a particular group, despite there being no relevant properties of individuals in that group that should lead to distinct outcomes from other groups . As “big data” approaches become increasingly popular for optimizing complex tasks, numerous examples of algorithmic bias have been identified, and sometimes the implications can be serious. As a result, algorithmic bias has become a matter of concern, and there are ongoing efforts to develop methods for detecting it and mitigating its effects. However, despite increasing recognition of the problems due to algorithmic bias, sometimes bias is embraced by the individuals it benefits. For example, in her book Weapons of Math Destruction, Cathy O’Neil discusses the gaming of algorithmic ranking of universities via exploitation of algorithmic bias in ranking algorithms. While there is almost universal agreement that algorithmic rankings of universities are problematic, many faculty at universities that do achieve a top ranking choose to ignore the problems so that they can boast of their achievement.
Of the algorithms that are embraced in academia, Google Scholar is certainly among the most popular. It’s used several times a day by every researcher I know to find articles via keyword searches, and, Google Scholar pages has made it straightforward for researchers to create easily updatable publication lists. These now serve as proxies for formal CV publication lists, with the added benefit (?) that citation metrics such as the h-index are displayed as well (Jacsó, 2012). Provided as an option along with publication lists, the Google Scholar coauthor list of a user can be displayed on the page. Google offers users who have created a Google Scholar page the ability to view suggested coauthors, and authors can then select to add or delete those suggestions. Authors can also add as coauthors individuals not suggested by Google. The Google Scholar co-author rankings and the suggestion lists, are generated automatically by an algorithm that has not, to my knowledge, been disclosed.
Google Scholar coauthor lists are useful. I occasionally click on the coauthor lists to find related work, or to explore the collaboration network of a field that may be tangentially related to mine but that I’m not very familiar with. At some point I started noticing that the lists were almost entirely male. Frequently, they were entirely male. I decided to perform a simple exercise to understand the severity of what appeared to me to be a glaring problem:
Let the Google Scholar coauthor graph be a directed graph GS = (V,E) whose vertices correspond to authors in Google Scholar, and with an edge from
to
if author
is listed as a coauthor on the main page of author
. We define an author v to be manlocked (terminology thanks to Páll Melsted) if its out-degree is at least 1, and if every vertex w that it is adjacent to (i.e., for which (v,w) is an edge) and that is ranked among the top twenty coauthors by Google Scholar (i.e., w appears on the front page of v), is a male.
For example, the Google Scholar page of Steven Salzberg is not manlocked: of the 20 coauthors listed on the Scholar page, only 18 are men. However several of the vertices it is adjacent to, for example the one corresponding Google Scholar page of Ben Langmead, are manlocked. There are so many manlocked vertices that it is not difficult, starting at a manlocked vertex, to embark on a long manlocked walk in the GS graph, hopping from one manlocked vertex to another. For example, starting with the manlocked Dean of the College of Computer, Mathematical and Natural Sciences at the University of Maryland, we find a manlocked walk of length 14 (I leave it as an exercise for the reader to find the longest walk that this walk is contained in):
Amitabh Varshney → Jihad El Sana → Peter Lindstrom → Mark Duchaineau → Alexander Hartmaier → Anxin Ma → Roger Reed → David Dye → Peter D Lee → Oluwadamilola O. Taiwo → Paul Shearing → Donal P. Finegan → Thomas J. Mason → Tobias Neville
A country is doubly landlocked when it is surrounded only by landlocked countries. There are only two such countries in the world: Uzbekistan and Lichtenstein. Motivated by this observation, we define a vertex in the Google Scholar coauthor graph to be doubly manlocked if it is adjacent only to manlocked vertices.
Open problem: determine the number of doubly manlocked individuals in the Google Scholar coauthor graph.
Why are there so many manlocked vertices in the Google Scholar coauthorship graph? Some hypotheses:
- Publications by women are cited less than those of men (Aksnes et al. 2011).
- Men tend to publish more with other men and there are many more men publishing than women (see, e.g. Salerno et al. 2019, Wang et al. 2019).
- Men who are “equally contributing” co-first authors are more “equal” than their women co-first authors (Broderick and Casadevall 2019). Google Scholar’s coauthor recommendations may give preference to first co-first authors.
- I am not privy to Google’s algorithms, but Google Scholar’s coauthor recommendations may also be biased towards coauthors on highly cited papers. Such papers will be older papers. While today the gender ratio today is heavily skewed towards men, it was even more so in the past. For example, Steven Salzberg, who is a senior scientists mentioned above and lists 18 men coauthors out of twenty on his Google Scholar page, has graduated 12 successful Ph.D. students in the past, 11 of whom are men. In other words, the extent of manlocked vertices may be the result of algorithmic bias that is inadvertently highlighting the gender homogeneity of the past.
- Many successful and prolific women may not be using Google Scholar (I can think of many in my own field, but was not able to find a study confirming this empirical observation). If this is true, the absence of women on Google Scholar would directly inflate the number of manlocked vertices. Moreover, in surveying many Google Scholar pages, I found that women with Google Scholar pages tend to have more women as coauthors than the men do.
- Even though Google Scholar allows for manually adding coauthors, it seems most users are blindly accepting the recommendations without thinking carefully about what coauthorship representation best reflects their actual professional relationships and impactful work. Thus, individuals may be supporting the algorithmic bias of Google Scholar by depending on its automation. Google may be observing that users tend to click on coauthors that are men at a high rate (since those are the ones being displayed) thus reinforcing for itself with data the choices of the coauthorship algorithm.
The last point above (#4) raises an interesting general issue with Google Scholar. While Google Scholar appears to be fully automated, and indeed, in addition to suggesting coauthors automatically the service will also automatically add publications, the Google Scholar page of an individual is completely customizable. In addition to the coauthors being customizable, the papers that appear on a page can be manually added or deleted, and in fact even the authors or titles of individual papers can be changed. In other words, Google Scholar can be easily manipulated with authors using “algorithmic bias” as a cover (“oops, I’m so sorry, the site just added my paper accidentally”). Are scientists actually doing this? You bet they are (I leave it as an exercise for the reader to find examples).
11 comments
Comments feed for this article
November 18, 2019 at 10:30 am
salzberg1
You made an error: my Google Scholar page (including the one you link to on the web archive) has 2 women in the co-authors list: Mihaela Pertea and Daniela Puiu. (And note that one of the men listed there is you!) Also since you mentioned my Ph.D. students, I would add that I have 5 students at present, of whom 3 are women. I strongly support women in engineering and science, and I wish the balance were more equal than it is today (especially in Computer Science, from which most of my past students came). I suggest that you add to this blog post the numbers of male and female students you’ve graduated as well.
November 18, 2019 at 10:42 am
Lior Pachter
Thanks for finding the typo; I’ve fixed the post. Since you asked about my own students, the full list is here:
https://pachterlab.github.io/group.html
You’ll see that I’ve graduated 27 Ph.D. students. Of them 7 have been women. Of the 18 postdocs I’ve mentored, 8 have been women.
November 18, 2019 at 6:55 pm
salzberg1
the error appeared in 2 places – also in the enumerated list, item 4.
November 18, 2019 at 11:50 pm
Lior Pachter
Thanks- corrected.
November 19, 2019 at 2:32 am
Julien Roux (@_julien_roux)
Thanks for pointing our attention to this. I’m noticing only two female in my co-author list (as visible in the webpage), but six when I click “EDIT” to see the full list of approved co-authors. The metric used to do the ranking is not clear to me, but seems to depend at least on the number of citations of shared papers and the number of joint papers. I haven’t found a way to modify the ranking or display all approved co-authors…
November 19, 2019 at 8:14 am
Lior Pachter
I think that if you add coauthors manually using the EDIT button they appear in the order in which you add them.
November 19, 2019 at 5:25 pm
Matthew MacLeod
Thanks for this. I was able to un-manlock my profile. It turns out one of my female co-authors does have a profile, but we only co-authored under a last name she no longer uses, so Google didn’t flag her as a recommendation. This presumably disproportionately affects women as well.
November 25, 2019 at 6:06 pm
Wim Crusio
I had a look at my own profile and saw that one of the women listed was a co-author on just 1 paper (out of >200). Tow other women with whom I have published many more articles are not listed, the difference being that the first has a GScholar profile but the latter two don’t. Given that we men are known to be more vain than our female colleagues, another factor here may be that male authors are more likely to have a profile than female authors…
December 27, 2019 at 12:53 am
STEM Caveman
> “numerous examples of algorithmic bias have been identified”
Is there a list of examples somewhere? There’s a tendency to take any algorithm-related thing some people don’t like, call it bias, and demand funding to research and solve the supposed problem. But instances satisfying the definition at the top of this blogpost are rare and debatable.
e.g., the example cited from Cathy O’Neil’s book is just something she doesn’t like (“wrong” university rankings), not a case of unfairness as defined in this post.
The COMPAS recidivism predictor is another non-example. It has been the subject of several pearl-clutching articles but the statistical analysis put out by the company that produces the software was a lot more convincing than the FUD put out by critics, and the critics did not have an answer other than to change the subject.
It’s hard to find serious examples of algorithmic unfairness because the algorithms do tend to be fair, more so than most non-algorithmic processes, under ordinary definitions. When people dislike the results but lack a principled reason for the dislike, they then stretch or ignore definitions so as to call the result “biased”. People like Cathy O’Neil are trying to make a career of that.
January 6, 2020 at 12:10 am
Kerem Camsari
Another common trick in Google Scholar (GS) is to merge different papers to boost one’s profile.
One legitimate reason for this function is to combine arXiv postings with published versions of an article (sometimes GS’s algorithm is smart enough to remove duplicate posts of the same paper but if there is a title change in the published version merging has to be done manually).
I have seen authors merging “related” but totally different papers to combine a highly cited paper (where they are second or third authors) with a lowly cited paper (where they are first or first co-authors) to make it look like their first author work received more citations than it really did.
Fortunately GS marks merged papers with an asterisk so it’s all there for people to see.
All marked papers should be examined to see exactly what they are merging.
September 10, 2020 at 2:14 pm
Paul Farrar
Google Scholar isn’t completely customizable. If you don’t have an email with an approved ending (say, because you’re retired, or a consultant), they won’t let you have a page.