DESIGNING A CONTINUOUS LEARNING FRAMEWORK

Suchismita Sahu
Analytics Vidhya
Published in
5 min readSep 7, 2021

--

We know a lot about various Machine Learning models, how they learn and how to deploy those models. But, again the question arises, how do you ensure your predictions continue to be accurate? How do you keep your models up-to-date with new training data?

If we have found a data set that provides us with accurate predictions that’s a great start, but how long will that data continue to provide accurate predictions?

So, in this article, we are going to design a Continuous Learning Framework. What does it mean? It refers to a framework, where the models will be fed with the new updated data that is verified and validated by the Industry Experts or the end users.

USE CASE:

Considering the case of an Web Application, where user has to provide data in order to create some receipts and then it does through various Workflow Activities, where it gets reviewed by respective users driven by various Roles and finally the end objective of this application is the submit the Industry standard regulatory reports to the various Regulatory Authorities. I am talking about a simple use case adopted in any Healthcare Industry. As part of creating the receipt, we receive information from various sources and the receipt creation process shall be automated, besides having a way of manually creating the receipt. When we receive information from various sources using Fax, email, or data present in PDF or Image forms, then these data should be automatically extracted and fed to the parent web application for receipt creation. So, a separate AI/ML team is dedicated for that having the required infrastructure and model engines. Once the data extracted by PDF data extractor using OCR Tesseract, OpenCV or applying NLP on the unstructured data, the receipt gets created in the parent application, which again goes through all the previously defined workflow activities where respective users review, approve and perform required actions. As part of this process, the Quality Control reviewers or the Medical Reviewers who are the Industry expert Healthcare Professionals, perform the required updates on the created receipts. Sometimes, the data extracted using the about mentioned process are not correctly displayed in the required fields or as part of any other assessment. So, here the system gets the corrected and verified data for the created receipt, by the Experts. These data are the system created new/Feedback data which will be fed to the NLP model, described above, so that model will learn from these new data, which in turn will increase the accuracy of the models.

DESIGNING THE CONTINUOUS LEARNING FRAMEWORK

This is the way to keep our models up-to-date is to have an automated system to continuously evaluate and retrain our models, which is called Continuous Learning Framework.

The following diagram shows how the continuous learning process works when you build your models in Jupyter notebooks:

1. We start by storing our training data in a table in AWS Redshift DataWarehouse or AWS S3 on Cloud. When we are ready to train our model, we will pull our training data into a Jupyter Notebook.

2. In our notebook, we will build our NLP model using AWS Sagemaker.

3. Deploy our NLP model in Docker and we have to tell the model where to find the training data (a table in the data warehouse) as well as where it will find feedback data (another table in the warehouse) later for evaluation.

4. For continuous learning to be effective we need to have some type of automated process for consuming new data. This could be a REST API, a script that downloads data nightly, or any other process that gathers new data. This is referred to as feedback data. When new feedback data is received, send it to the NLP model.

5. When we determine we have gathered enough feedback data to test, instruct NLP model, via a REST API, to start feedback evaluation.

6. Model pulls any new feedback data and runs predictions against the current model.

7. After feedback evaluation completes, the accuracy of the data is measured against the accuracy threshold. If the accuracy is below that threshold then retraining is triggered.

8. NLP model then pulls in all training data and all feedback data to build a new model and measures its accuracy.

9. If the new model’s accuracy exceeds the original model’s accuracy, then the new model is automatically deployed.

What’s next…?

VISUALIZING THE MODEL PERFORMANCE USING A DASHBOARD

Now, we have to visualize the performance of each model through a Dashboard.

Objective of the Dashboard: The Continuous Learning Framework along with the Dashboard will be deployed in the AWS environment for each tenant. Admin of each tenant will be able to choose the required model NLP models into their production.

Designing a Dashboard: Here, I am considering a simple Dashboard for our understanding for different NLP models, which are built using the historical data. Architecture of each model differs slightly in order to train the model with the new feedback data.

There are 2 widgets:

  • Attribute Level: It shall display the performance of NLP model for each attribute extraction from the unstructured texts using NLP, for the current time period.
  • X-axis: Model Performance Evaluation Metrics, for each attribute selected:
  • Accuracy
  • Precision
  • Recall
  • Support
  • True Positive
  • True Negative
  • False Positive
  • False Negative
  • Attributes extracted by NLP model:
  • Product
  • Indication
  • Adverse Event
  • Patient Age
  • Medical History etc…
  • Y-axis: Values of the Performance Evaluation Metrics.
  • Periodic Level: It shall display the performance of a single attribute over a period of time period. X-axis will be divided into the no. of time periods and each time period shall contain the Performance Evaluation Metrics for the selected single attribute.

CONCLUSION:

In this article, we learnt the what is Continuous Learning Framework, the need of it, we designed the framework and also we visualized the performance of each model in the Dashboard.

So, what’s for you…? I have the following tasks for you…

  • Build a notebook where we take a data set, clean it, and then upload it to the AWS warehouse.
  • Pull the data from the warehouse, train the models, and deploy it.
  • Finally, upload feedback data to the warehouse, kick off feedback evaluation, and watch the continuous learning process..

See you in our next article…Till then, Stay Tuned…

--

--

Suchismita Sahu
Analytics Vidhya

Working as a Technical Product Manager at Jumio corporation, India. Passionate about Technology, Business and System Design.