In July 2018, the ACLU built a database of 25,000 publicly available arrest photos and used Amazon’s commercial facial recognition service, Rekognition, to compare every member of Congress against it. The test used the default confidence threshold (80%) — the same threshold Amazon’s documentation recommended for law enforcement use at the time.

Rekognition flagged 28 sitting members of Congress as matching criminals in the arrest database. All 28 were false positives.

The Bias Problem

The errors were not evenly distributed. Twenty-eight of 535 members is a roughly 5% false positive rate — but among the 28 misidentified, 11 were people of color, representing about 39% of the false matches despite making up only around 20% of Congress at the time.

The pattern mirrored findings from independent academic research: facial recognition systems trained primarily on lighter-skinned faces consistently perform worse on darker-skinned faces, particularly women of color. A 2018 MIT study by Joy Buolamwini and Timnit Gebru found commercial face analysis systems misclassified darker-skinned women at error rates up to 34%, versus less than 1% for lighter-skinned men.

Amazon’s Defense

Amazon pushed back on the ACLU’s methodology, arguing that law enforcement deployments should use a 99% confidence threshold, not the default 80%. The ACLU countered that Amazon’s own documentation did not make this distinction clear for policing applications, and that real-world deployments would not necessarily use higher thresholds.

Amazon did not dispute that the false matches occurred.

Why It Happened

Facial recognition models learn from the data they are trained on. When training sets are dominated by certain demographics, the model develops finer-grained representations for those groups and coarser, less reliable representations for underrepresented groups.

The 80% confidence threshold compounded the issue. At scale — a city’s worth of surveillance cameras, a large protest — a 5% false positive rate does not produce a handful of manageable errors. It produces thousands of leads pointing at the wrong people.

📋 DISASTER DOSSIER

Date: July 26, 2018 System: Amazon Rekognition (commercial facial recognition API) Test Operator: ACLU Subjects: All 535 sitting members of the U.S. Congress Database: 25,000 publicly available arrest photos Confidence Threshold Used: 80% (Amazon’s default recommendation) False Matches: 28 (all false positives) Racial Disparity: ~39% of false matches were people of color; people of color made up ~20% of Congress at the time Amazon’s Response: Disputed methodology; suggested 99% threshold for law enforcement (not documented at the time) Consequence: Congressional scrutiny; 2020 indefinite moratorium on police use of Rekognition Audacity Level: 🤖🤖🤖🤖 (Sold to police departments at the wrong threshold)

What Changed

The test drew congressional scrutiny: members of Congress whose faces had been misidentified wrote letters demanding answers. In 2020, Amazon imposed an indefinite moratorium on police use of Rekognition, citing the need for stronger federal regulation. Microsoft and IBM made similar announcements.

Several cities, including San Francisco and Boston, banned government use of facial recognition outright. Federal legislation remained stalled, but the moratorium shifted the industry’s public posture.

Key Takeaway

A facial recognition system that performs reasonably well on average can still be dangerously unreliable for specific demographic groups. When deployed in high-stakes contexts — policing, security screening, border control — that uneven reliability does not distribute its harms evenly either.


Sources: ACLU (Joy Buolamwini, Timnit Gebru), MIT Media Lab Gender Shades study (2018), Amazon Rekognition documentation.