AI and Legal: A Data Scientist and a Lawyer Walk Into a Bar

Stop us if you’ve heard this one: A data scientist and a lawyer walk into a bar… we’d continue the joke, but it’s such a statistical improbability that we’ll just stop right there.

But should it be such an improbable occurrence?

Attorneys have great expertise in understanding and synthesizing broad amounts of information in building fact-based cases. Data scientists, in turn, thrive at taking vast quantities of data and translating it into meaningful, actionable outcomes.

Different, yet the same. Two sides of a similar coin. Each highly trained in their areas of expertise, but in their normal day-to-day routines, usually swimming in very different lanes. When those lanes do happen to converge, it’s usually regarding very sensitive subjects, such as what constitutes fairness in AI as seen in the eyes of state and federal courts.

Having worked with both sides of that coin, we’ve observed how data scientists and lawyers can partner together effectively. But there’s often a wall of experience separating the two. So, if you are a data scientist working with your company’s lawyer to talk about a Machine Learning or Generative AI model, how can you break the wall and swim in the same lane?

What the Attorney is Trying to Understand

What is the model used for: This can be deceptively complex and have significant legal repercussions. Does the model generate a continuous score? Is it a classifier? Are cutoffs superimposed on top of a continuous outcome? Are these cutoffs known? Does the output of this model feed into another model as an input? Does the model affect underwriting of loans or is it for marketing only?

All of these are important factors a lawyer will take into consideration. Context is key in fair lending; additional information about model usage could render otherwise benign factors problematic or vice versa. For example, there may be disparate impact in a model that generates a continuous score which evaporates when the intended cutoffs are applied.

Are the outcomes rankable: If a model operates in a sensitive area — such as consumer credit — an attorney will need to know whether the model creates rankable outcomes. Outcomes are considered rankable if some outcomes are objectively better than others from the consumers’ standpoint. For example, if an underwriting model rejects some applicants and approves others for loans, those applicants receiving loans receive an objectively favorable outcome. Credit card offers provide another common example; if segmentation exists such that some applicants receive a sign-up bonus of $100 statement credit and others receive $200, the outcomes are rankable. If, however, the model segmentation results in some customers who receive double cashback on groceries and some who receive double cashback on gas, the outcomes would be unrankable because a consumer could feasibly prefer one outcome over the other. As a data scientist, this means you need to think hard about your model’s output and how that output will be used. Ask yourself whether your model will give better outcomes to some people over others. And remember that while you as a consumer may regard receiving a credit card offer as “junk mail,” from your lawyer’s perspective, choosing to solicit someone for an offer of credit is a favorable outcome, so make sure to disclose if this is how the model is being used.

Is there any human-in-the-loop in the outcome: An attorney will want to know if human decisions exist at any point in the process. Even if that discretion occurs outside the model, but somewhere else in the decision pipeline, the lawyer will need to know this information. For example, it is common for credit applications rejected by a model to undergo another review by a human.

What features does your model use: Your attorney needs to know what your features mean and how you create heavily engineered features. The lawyer will look for certain features that regulations outright prohibit (such as sex, in the context of credit) and features that might create statistical proxies for prohibited bases. ZIP+4 commonly draws the ire of lawyers because of its strong correlation with race.

Where does your data come from: If the lawyer identifies a concern about a feature, they need to know how to get more information about it. They will typically ask questions such as:

Were the features made by an in-house feature engineering team?
Were they made by a third-party vendor? For example, is it a third-party score?
If so, did the vendor provide an attestation that they excluded protected class characteristics in creating the feature(s)?

This information may originate from outside of your area of influence. After all, at a large organization, the creation of features generally falls under the purview of specialized data engineers. The lawyer may not be aware of this distinction. If possible, reach out to the relevant data engineer to see if you can get documentation on the feature engineering process before speaking with the lawyer.

What an Attorney Cares Less About

Model architecture and technical details: Whether your model is XGBoost or LightGBM will be less of a concern to a lawyer. Focus instead on understanding and translating the implications of existing architecture and technical details. For example, logistic models cannot process null values. If a logistic model necessitated the dropping or imputation of null values, how many values were dropped or imputed? Were the null values concentrated among certain features? Missing values may be distributed across classes differently, which can cause disparities.

Model performance: While anathema to a data scientist who painstakingly crafted a model for maximum performance, upon initial review, a lawyer will focus on areas of risk over model performance since their role focuses on identifying and mitigating risk. When discussing model performance with a lawyer, try to contextualize performance metrics. Showing a drop in KS is less useful than saying, “removing that feature results in a 12% reduction to KS, which means that the planned campaign is no longer expected to be profitable.”

Disparate Impact Analysis

After reviewing the basics of the model, the lawyer will decide whether a model needs to undergo disparate impact analysis. Models typically require disparate impact analysis if they:

Operate in a highly regulated space such as credit, housing, or employment.
Make decisions with a more profound effect on the individual (e.g., underwriting vs. marketing).
Impact a larger number of consumers.
Include rankable outcomes.

If disparate impact analysis finds no disparities, protected class features/potential proxies for protected class features, or potential for bias in the application of the model, then the review is typically over. However, if the review detects disparities or other concerns, then an alternatives analysis may occur. This involves the creation of alternate models. If a less discriminatory model exists which still achieves business needs, the lawyer may have decided to adopt the alternative model instead of the baseline.

How to Make a Lawyer Love You

Creating robust documentation of the model building process is key from an attorney’s perspective. If the model is ever under scrutiny during litigation or regulatory investigation, having contemporaneously written explanations is key. Plus, regulation aside, good model documentation is a good practice that will save you and your coworkers a lot of headache in the long-run.

Outlining a Model

Here’s a rough outline of items to document as you are building a model, although necessary documentation can vary by industry, company and model purpose. You should consult your company’s compliance department or legal team to determine what documentation you should be creating contemporaneously as you build your model.

Model Purpose

What is the intended use of this model?
Is it providing a final decision, or used in conjunction with other tools as a part of a pipeline?
Does the model feature any segmentation?
Are there any testing or holdout groups?
If there is segmentation, are some groups’ outcomes more favorable than others?
Even if no firm plans exist currently, could this model be repurposed in the future? For example, if you are developing a model for marketing purposes, is there a chance it could be used for underwriting at some point?

Timeline

Is the model already deployed?
If the model is not deployed yet, what is the expected launch date?
How long will the model be used?
If the model is in deployment, how long has it been deployed?
If so, has data collected during model deployment ever been analyzed for model performance or fairness?
Will the model be monitored and reviewed for performance and fairness degradation over time, and if so at what frequency?

Features

Compile a list of all features used in the model. For each feature, include a description of what that feature measures. Note the data source for each feature, whether a certain internal datastream, from a credit bureau, or from a third-party vendor.
Are any of the following features used in your data? Note that depending on the usage (industry, model purpose, etc.) these features may be allowed. However, it is critical that you check with your company’s compliance team. In addition to these explicit features, think about any features that could serve as statistical proxies (for example, length of account history could be a proxy for age, as someone would have to be at least 68 years old to have 50 years of account history.) Keep in mind that this is not an exhaustive list; only your legal and compliance team will know what to watch for in your particular use case: Age (apart from using it as a filter to remove individuals too young to enter into legal contracts or in other specific circumstances), Sex/gender, Race, Ethnicity, national origin, marital status, familial status (presence of children in a household or the expectation of children), disability, receipt of public assistance.

Model Deployment

Will materials associated with the model deployment (e.g., marketing emails for a marketing model or loan servicing documentation for underwriting models) be available in any language other than English?
Will model deployment be constrained to a certain geography? Common geographies include: States or collections of states, Regions (e.g., Northeast, Midwest), Metropolitan Statistical Areas (MSAs), Will the model cover (or treat differently) individuals living in Hawaii, Alaska, DC, or any U.S. territory? (Puerto Rico, the U.S. Virgin Islands, American Samoa, Guam, and the Northern Mariana Islands).
Will human-in-the-loop play any part in the model outcome? For example, are applications initially rejected by the model reviewed by humans for a second look?

So let’s recap: A data scientist and a lawyer walk into a bar. They’re discussing the legal repercussions of a credit application that was rejected by a model based on zip code. Hilarious, right?

For more information or to speak with a member of our team, email us at info@solas.ai.

« PREVIOUS NEXT »