Training a text classifier to predict the outcome of legal cases using the Active Learning approach.

Machine Learning is used in various parts of our lives, and text analytics is not an exception. Text analytics enables us to explore, process, and analyze all kinds of text documents to provide valuable insights for a specific business task or research question. Such text documents can be reviews, speeches, medical reports, etc.

Text classification is one of the most common but also one of the most challenging tasks in the natural language processing domain. Unfortunately, classifying text is often complicated by the domain specific nature of language and text; a classifier trained on movie reviews may not do a good job of classifying medical records, for example. Secondly, it’s time-consuming to create large labeled datasets to be used in AI training and mostly impossible to find pre-labeled datasets for specific problems.

To solve this issue, Dcipher Analytics offers a dedicated Active Learning workbench to train classification models for any domain. The trained models can then be saved and used as a part of text analytics pipelines.

Today’s blog post focuses on one specific domain: classifying legal cases. We train a text classifier model to label legal cases with predefined tags, including possible case outcomes in the legal context, such as "referred," "applied," "distinguished," "considered," and "discussed" If you want to know more about the analysis and the results, keep reading!

1. Create a new project

The first step is to click "Create a new project" once we've entered the platform. Then several options are available to choose a Project template. We select the "New blank project" template since we want to start from scratch.

Now, name the project to complete the process of creating a new project. We define the "Project name" as "Legal Case Classification.” And if needed, an additional Project description can also be added to this tab.

2. Import the dataset

After creating the project, determine the source of your data to import. We choose the "Import from file" option and select the data file we want to classify. Then, click on “Import” to start uploading the file to the platform. We used an open source Kaggle dataset called “Legal Citation Text Classification” for this blog post and sampled 5000 cases. You can also follow the steps in the video below.

3. Create an Active Learning workbench

Training a classifier model on not labeled or partially labeled data requires a specific Dcipher workbench called Active Learning. To create the Active Learning workbench, click the "Add workbench" button above the Schema. The Active Learning workbench supports training classifiers on long-text type fields. Then, select the text field you wish to classify and drag that text field to the middle of the created Active Learning workbench which in this case is below the Schema. Drop it on the Active Learning workbench once the “Text source” section is displayed.

To get detailed information regarding Active Learning, visit the help center article here.

After selecting the text field you want to classify, the next step is to select the type of classification mode in the "Choose your model mode" dropdown. There are two options for the model mode, namely Categorization and Tagging. In the Categorization mode, each text is tagged with a single label and is helpful for training text classifiers. Moreover, in the Tagging mode, each text may get multiple labels, and it helps train taggers. Since each case in our problem has a single outcome, we choose the Categorization mode. After giving a name to our classifier model, we simply proceed by clicking on “Continue.”

Optionally, an "Existing labels field" can also be added. If an existing labels field is provided in the model, the Active Learning model will use labeled texts as ground information and start training itself based on those labels. If this is the case, existing labels are not required for all texts. Because when you have a partially labeled dataset, Active Learning will only use the labeled texts to run the initial training, after which it will suggest the labels itself for the unlabeled texts. You can leave it as "None" if you don't already have any labels, which is the case for our use case as well.

4. Add labels

Adding the correct labels for classification is essential to train the new model properly. There are five labels used for this classification: "referred", "applied", "considered", "distinguished", and "discussed", which all pertain to legal contexts. Referred cases are referred by the court but are not assigned by the director to a CASA volunteer. In applied cases, the court applies dicta or principles from a previous decision to facts that are materially different from those of the earlier case. A considered case is when the court finds an earlier decision but does not follow, apply, or distinguish it. In distinguished cases, the court decides that it need not follow a previous case by which it would otherwise be bound because there is some salient difference, e.g. of a fact or the terms of a document, between the previous case and the one before it. Lastly, discussed cases refer to those cases where a decision has not been rendered. To get more information, you can check this source that we used.

We start evaluating the text and add the corresponding label tag to classify the legal outcome. In order to add a label, click on the "+" sign and simply write the label in the given field. Similarly, to remove a label, click the "x" sign. You can also follow the labeling process from the video below.

After labeling some of the texts from the cases in the "Text Column" of the AL workbench, we then click the “Update” button at the top right corner of the AL workbench. This transmits all the classification information to the model so it can train itself. You can see this further in the next step. Here, the “Update" button is the key to training the Active Learning model based on the assigned labels as it transfers information to the AI model for its classification.

5. Train the model

As the AI model trains itself and offers suggestions when we click on the “Update" button, it updates the likelihood score of the selected labels with which it predicts its legal verdicts (assigned labels). After the initial training, we can train the model by rejecting or accepting these classification recommendations based on their accuracy. If the correct label is not suggested, you can add it by clicking on the “+” sign and selecting the correct tag. After tagging, click again on the “Update” button. This way, you can continue the training process until the suggestions are accurate enough for your case, as can also be seen in the video below:

Lastly, you can finalize the trained model if you are satisfied with the accuracy of its classification that can be used in your use case. Once you click “Finalize” in the top right corner of the Active Learning workbench, the "Output Field name" section appears. Here, you can name the output field and click “Apply” to save the model.

6. The final classification

After finalizing the model, the “Tags” field is added to the Schema, and introduced or approved labels from the suggestions are transmitted to the source dataset under this field. You can view the tags by dragging the tag field to the Table View workbench, which is visible on the right side of the Schema, as you can see in the picture below.

The trained model allowed us to classify the legal texts and predict the legal verdicts of the cases based on their legal context. This way, the model can now be applied to various legal cases, yielding accurate classification results in a short amount of time using Classify text (AL models) operation which allows using trained AI models on other datasets. You can use the same process to classify other types of text data. Let us know the cases where you would need a specific model. And to not miss out on our future suggestions, make sure to follow us on our LinkedIn page as well.

Additionally, If you’d like to learn more about other types of charts used in different examples, you can also check out our previous blog posts:

Get started!

To access our text analytics toolbox and try out analyzing free-form text responses in Dcipher Analytics, sign up for a free trial.

Book a demo