We introduce the problem of risk analysis for Intellectual Property (IP) lawsuits. More specifically, we focus on estimating the risk for participating parties using solely prior factors, i.e., historical and concurrent behavior of the entities involved in the case. This work represents a first step towards building a comprehensive legal risk assessment system for parties involved in litigation. This technology will allow parties to optimize their case parameters to minimize their own risk, or to settle disputes out of court and thereby ease the burden on the judicial system. In addition, it will also help U.S. courts detect and fix any inherent biases in the system.
We model risk estimation as a relational classification problem using conditional random fields [6] to jointly estimate the risks of concurrent cases. We evaluate our model on data collected by the Stanford Intellectual Property Litigation Clearinghouse, which consists of over 4,200 IP lawsuits filed across 88 U.S. federal districts and ranging over 8 years, probably the largest legal data set reported in data mining research. Despite being agnostic to the merits of the case, our best model achieves a classification accuracy of 64%, 22% (relative) higher than the majority-class baseline.