In litigation analytics, we have daily challenges related to processing, sanitizing, and ultimately transforming unstructured litigation data into actionable insights. I’ve experienced how difficult it is to take dirty, ahistorical data and produce accurate and reliable data points. And, as a member of this rare community, I’d like to be the first to welcome Senator Tom Cotton to the club! This requires a bit of explanation.

During Judge Ketanji Brown Jackson’s senate confirmation hearing, we observed the following exchange. Senator Cotton was asking Judge Brown Jackson if she had ever represented any Guantanamo Bay detainees after leaving the public defender’s office:

Cotton: “Did you ever represent any of the detainees at Guantanamo Bay when you were not a public defender?”

Brown Jackson: “I left the federal defender’s office, I joined a law firm and one of the people that I represented was now at that law firm. They had him as a client.”

Cotton: “That’s Mr. al Sawam.”

Brown Jackson: “al Qahtani”

Cotton: “al Qahtani was the one.”

Cotton: “What about Mr. al Sawam?”

Brown Jackson: “I don’t know what happened to Mr. al Sawam.”

Cotton: “You were listed as counsel for two years during your time at Morrison & Foerster.”

Brown Jackson: “What happens is when you leave from any place, firms or government service, um, you have to let the Court know. Or, their records, their records reflect where you are in the system, and not so much the case in terms of your address.”

Senator Cotton was trying to catch J. Brown Jackson in a contradiction here because the PACER data for al Sawam’s Habeas Petition shows that (then attorney) Brown Jackson was listed as counsel for the case, with a email address long after she left the public defender’s office for private practice, and that she was attached to the case until April 20, 2009.

Cotton (or presumably his staffer) was attempting to assemble a timeline of Judge Brown Jackson’s work history based on PACER records. Unfortunately, this approach does not work with PACER attorney data for multiple reasons. The underlying attorney data in previous cases can change when an attorney updates their information. It can also quickly become out of date. PACER is replete with data challenges. This is why you can’t determine attorney/firm relations reliably this way.

Judge Brown Jackson’s explanation is a little hard to follow, but her characterization of PACER attorney data being “system-wide” as opposed to “case-specific” is more accurate. As a seasoned litigator and experienced judge, she has no doubt encountered these discrepancies before.

At Lex Machina, we are very aware of the challenges presented by the enormous amount of litigation data that is part of the dockets and documents in PACER. We use a combination of approaches including: automated filtering that is part of our Attorney Data Engine; our Signature Block Analyzer that identifies attorneys from their filings; and manual curations by our data team to address any discrepancies. That means our attorney-law firm associations are more accurate than what you get from PACER alone. 

Leonard Park is a Product Manager at Lex Machina who focuses on new practice area development. His domains of expertise include federal patent, copyright and trademark law, and PTAB. Leonard holds a J.D. with High Tech Law Certificate from Santa Clara School of Law, and a B.A. in Physics from Oberlin College.