Open Journal Systems

Algorithms have been widely used in a wide variety of sectors, from hiring [2] to elections[3] to credit scoring.[4] While algorithms have the potential to increase efficiency and even reduce discrimination, critics of algorithms have raised the specter of “algorithmic bias.” While algorithms have the veneer of objectivity—operating on hard numbers and spitting out quantifiable results—the way algorithms are created and the way they operate can lead to discriminatory and biased results.[5] In particular, scholars have identified three main mechanisms through which algorithmic bias arises: selection of the input variables, selection of the output variables, and the construction of the actual algorithm (the formula that relates the input variables to the output variables).[6]

Central to the problem of algorithmic bias is the challenging problem of how to quantify the bias produced by an algorithm. How should we measure this bias, and when should the bias be enough to allow a claimant to mount a legal challenge?

In Bias In, Bias Out, Sandra Mayson provides an example of this fundamental problem using the COMPAS program, which is an algorithm predicting the likelihood a criminal defendant becomes a recidivist.[7] Courts use COMPAS risk assessment predictions to determine when to set a defendant free—from setting bond amounts to sentence lengths.[8] In 2016, ProPublica published a scathing report, criticizing COMPAS for producing significant racial disparities.[9] COMPAS, in response, rejected ProPublica’s criticisms.[10] As Sandra Mayson explains, however, this disagreement is actually attributable to how COMPAS and ProPublica used different definitions of equality.[11]

 Figure 1.[12] The results of an algorithm predicting rearrest.

Table 1. Summary of Figure 1.

Figure 1 shows a hypothetical COMPAS scenario; Table 1 provides a numerical summary of Figure 1. For each individual, the COMPAS algorithm predicts whether that individual will be rearrested. In Figure 1, the COMPAS algorithm predicts that 4 of the 10 Gray individuals and 2 of the 10 Black individuals will be rearrested. This prediction may or may not be true. In Figure 1, only 2 Gray individuals were actually rearrested and only 1 Black individual was actually rearrested. Table 2 summarizes the different types of errors—there are no false negative errors in Figure 1. 

Table 2. Types of Errors

Table 3.[13] Different equality measures for the algorithm results of Figure 1. 


As summarized by Table 3, there are several different definitions of equality. COMPAS used predictive parity while ProPublica used the false positive rate.[14] Using COMPAS’s definition of equality (predictive parity), the COMPAS algorithm treats the Gray and Black groups equally; however, using ProPublica’s definition of equality (false-positive ratio), the COMPAS algorithm treats the Gray group worse than the Black group. That different equality measures produce different results necessarily follows from the fact that the two groups have different base rates.[15]

Beyond the three measures summarized in Table 3, there are many other statistical measures of equality.[16] Interestingly, the Supreme Court has tackled this issue: in Wards Cove Packing Co. v. Atonio, the Supreme Court rejected a disparate impact claim because the wrong statistical measure of disparity had been used to establish a prima facie case of disparate impact.[17] Further, the Equal Employment Opportunity Commission’s 4/5ths rule, which establishes the statistical disparity sufficient for a prima facie disparate impact case, seems to use statistical parity as its measure of equality.[18] Still, as of yet, there is no consensus on what statistical measure to use to quantify equality. But given the increased proliferation of algorithms, a working definition of equality is critically important. This is an active area of research for statisticians, computer scientists, policy experts, and lawyers.



[1] Geoffrey Xiao,

[2] Claire Cain Miller, Can an Algorithm Hire Better than a Human?, N.Y. Times (June 25,


[3] Stefania Milan & Claudio Agosti, Personalisation Algorithms and Elections: Breaking Free of the Filter Bubble, Internet Policy Rev. (Feb. 7, 2019),

[4] Mikella Hurely & Julius Adebayo, Credit Scoring in the Era of Big Data, 18 Yale L.J. & Tech. 148 (2016).

[5] Solon Barocas & Andrew D. Selbst, Big Data’s Disparate Impact, 104 Cal. L. Rev. 671, 677-693 (2016); Joe Kleinberg, Jens Ludwig, Betty L. Bergman, & Cass R. Sunstein, Discrimination in the Age of Algorithms, 10 J. Legal Analysis 113, 138-141 (2019).

[6] Joe Kleinberg, Jens Ludwig, Betty L. Bergman, & Cass R. Sunstein, Discrimination in the Age of Algorithms, 10 J. Legal Analysis 113, 138-141 (2019).

[7] Sandra G. Mayson, Bias In, Bias Out, 128 Yale L. J. 2218, 2233-2252 (2019).

[8] Id.

[9] Id.

[10] Response to ProPublica: Demonstrating accuracy equity and predictive parity, equivant (Dec. 1, 2018),

[11] Sandra G. Mayson, Bias In, Bias Out, 128 Yale L. J. 2218, 2233-2252 (2019).

[12] Id. at 2235 fig. 2 (adapted from this source).

[13] Id. at 2236 tbl. 1 (adapted from this source).

[14] Id. at 2233-2235.

[15] Sandra G. Mayson, Bias In, Bias Out, 128 Yale L. J. 2218, 2233-2252 (2019).

[16] Sahil Verma & Julia Rubin, Fairness Definitions Explained, 2018 ACM/IEEE International Workshop on Software Fairness (May, 29, 2018),; Aaron Roth, (un)Fairness in Machine Learning,; Sandra G. Mayson, Bias In, Bias Out, 128 Yale L. J. 2218, 2233-2252 (2019).

[17] Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 655 (1989) (The Court held that comparing the percentage of nonwhite workers in cannery and noncannery positions was not the appropriate measure of disparity. Rather, the relevant measure was a comparison “between the racial composition of [the noncannery positions] and the racial composition of the qualified . . . population in the relevant labor market.”).

[18] Sandra G. Mayson, Bias In, Bias Out, 128 Yale L. J. 2218, 2242 (2019). “A selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact.” EEOC Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. § 1607.4(D) (2018).