Algorithmic bias is a term used to describe situations where an algorithm systematically produces outcomes that are less favorable to individuals within a particular group, despite there being no relevant properties of individuals in that group that should lead to distinct outcomes from other groups . As “big data” approaches become increasingly popular for optimizing complex tasks, numerous examples of algorithmic bias have been identified, and sometimes the implications can be serious. As a result, algorithmic bias has become a matter of concern, and there are ongoing efforts to develop methods for detecting it and mitigating its effects. However, despite increasing recognition of the problems due to algorithmic bias, sometimes bias is embraced by the individuals it benefits. For example, in her book Weapons of Math Destruction, Cathy O’Neil discusses the gaming of algorithmic ranking of universities via exploitation of algorithmic bias in ranking algorithms. While there is almost universal agreement that algorithmic rankings of universities are problematic, many faculty at universities that do achieve a top ranking choose to ignore the problems so that they can boast of  their achievement.

Of the algorithms that are embraced in academia, Google Scholar is certainly among the most popular. It’s used several times a day by every researcher I know to find articles via keyword searches, and, Google Scholar pages has made it straightforward for researchers to create easily updatable publication lists. These now serve as proxies for formal CV publication lists, with the added benefit (?) that citation metrics such as the h-index are displayed as well (Jacsó, 2012). Provided as an option along with publication lists, the Google Scholar coauthor list of a user can be displayed on the page. Google offers users who have created a Google Scholar page the ability to view suggested coauthors, and authors can then select to add or delete those suggestions. Authors can also add as coauthors individuals not suggested by Google. The Google Scholar co-author rankings and the suggestion lists, are generated automatically by an algorithm that has not, to my knowledge, been disclosed.

Google Scholar coauthor lists are useful. I occasionally click on the coauthor lists to find related work, or to explore the collaboration network of a field that may be tangentially related to mine but that I’m not very familiar with. At some point I started noticing that the lists were almost entirely male. Frequently, they were entirely male. I decided to perform a simple exercise to understand the severity of what appeared to me to be a glaring problem:

Let the Google Scholar coauthor graph be a directed graph GS = (V,E) whose vertices correspond to authors in Google Scholar, and with an edge (v_1,v_2) \in E  from v_1 \in V to v_2 \in V if author v_2 is listed as a coauthor on the main page of author v_1. We define an author to be manlocked (terminology thanks to Páll Melsted) if its out-degree is at least 1, and if every vertex that it is adjacent to (i.e., for which (v,w) is an edge) and that is ranked among the top twenty coauthors by Google Scholar (i.e., w appears on the front page of v), is a male.

For example, the Google Scholar page of Steven Salzberg is not manlocked: of the 20 coauthors listed on the Scholar page, only 18 are men. However several of the vertices it is adjacent to, for example the one corresponding Google Scholar page of Ben Langmead, are manlocked. There are so many manlocked vertices that it is not difficult, starting at a manlocked vertex, to embark on a long manlocked walk in the GS graph, hopping from one manlocked vertex to another. For example, starting with the manlocked Dean of the College of Computer, Mathematical and Natural Sciences at the University of Maryland, we find a manlocked walk of length 14 (I leave it as an exercise for the reader to find the longest walk that this walk is contained in):

Amitabh VarshneyJihad El SanaPeter LindstromMark DuchaineauAlexander HartmaierAnxin MaRoger ReedDavid DyePeter D LeeOluwadamilola O. TaiwoPaul ShearingDonal P. FineganThomas J. Mason → Tobias Neville

A country is doubly landlocked when it is surrounded only by landlocked countries. There are only two such countries in the world: Uzbekistan and Lichtenstein. Motivated by this observation, we define a vertex in the Google Scholar coauthor graph to be doubly manlocked if it is adjacent only to manlocked vertices.

Open problem: determine the number of  doubly manlocked individuals in the Google Scholar coauthor graph.

880px-Landlocked_countries.svg

Why are there so many manlocked vertices in the Google Scholar coauthorship graph? Some hypotheses:

  1. Publications by women are cited less than those of men (Aksnes et al. 2011).
  2. Men tend to publish more with other men and there are many more men publishing than women (see, e.g. Salerno et al. 2019, Wang et al. 2019).
  3. Men who are “equally contributing” co-first authors are more “equal” than their women co-first authors (Broderick and Casadevall 2019). Google Scholar’s coauthor recommendations may give preference to first co-first authors.
  4. I am not privy to Google’s algorithms, but Google Scholar’s coauthor recommendations may also be biased towards coauthors on highly cited papers. Such papers will be older papers. While today the gender ratio today is heavily skewed towards men, it was even more so in the past. For example, Steven Salzberg, who is a senior scientists mentioned above and lists 18 men coauthors out of twenty on his Google Scholar page, has graduated 12 successful Ph.D. students in the past, 11 of whom are men. In other words, the extent of manlocked vertices may be the result of algorithmic bias that is inadvertently highlighting the gender homogeneity of the past.
  5. Many successful and prolific women may not be using Google Scholar (I can think of many in my own field, but was not able to find a study confirming this empirical observation). If this is true, the absence of women on Google Scholar would directly inflate the number of manlocked vertices. Moreover, in surveying many Google Scholar pages, I found that women with Google Scholar pages tend to have more women as coauthors than the men do.
  6. Even though Google Scholar allows for manually adding coauthors, it seems most users are blindly accepting the recommendations without thinking carefully about what coauthorship representation best reflects their actual professional relationships and impactful work. Thus, individuals may be supporting the algorithmic bias of Google Scholar by depending on its automation. Google may be observing that users tend to click on coauthors that are men at a high rate (since those are the ones being displayed) thus reinforcing for itself with data the choices of the coauthorship algorithm.

The last point above (#4) raises an interesting general issue with Google Scholar. While Google Scholar appears to be fully automated, and indeed, in addition to suggesting coauthors automatically the service will also automatically add publications, the Google Scholar page of an individual is completely customizable. In addition to the coauthors being customizable, the papers that appear on a page can be manually added or deleted, and in fact even the authors or titles of individual papers can be changed. In other words, Google Scholar can be easily manipulated with authors using “algorithmic bias” as a cover (“oops, I’m so sorry, the site just added my paper accidentally”). Are scientists actually doing this? You bet they are (I leave it as an exercise for the reader to find examples).