Effortless NLP Model Deployment With HuggingFace and Streamlit

Updated: Jan 10


Natural language processing (NLP), one of the fastest-growing areas in artificial intelligence (AI), saw massive growth in 2021. Through state-of-the-art language models, NLP demonstrated unprecedented capabilities in natural language generation and in downstream tasks. NLP methods have resulted in increasing adoption and advancements in NLP research, as well as a number of recent low-code tools, frameworks, and MLOps systems that now allow innovative use cases across industries. This trend will certainly continue in 2022.

At Quansight, we frequently deliver projects that apply state-of-the-art NLP research to innovative use cases for solving our clients’ business problems. Inspired by a recent project, this post walks you through a simple example of how to use a pretrained model from Hugging Face Hub—one of the most popular open-source NLP frameworks—and deploy it with Streamlit, an open source app framework for machine learning (ML) and data science. In the process, we’ll explain the relevant NLP concepts regarding training approaches.

Data-Centric Approach to Model Training

Data-centric approach has become prevalent in all subfields of AI, particularly in NLP, shifting the focus from models to data that goes into them. Due to the complexity of natural languages, data quality is critical to achieving good model performance. The performance of NLP models is constrained by the quality and quantity of the data used to train them. Large language models require a substantial amount of data. In general, the quantity of data must scale as the model complexity increases for a model to properly learn. In the low-data, high-noise limit, it can be very difficult to attain a well-performing model.

More recently, NLP practitioners are focusing more on the quality of data and discovering promising improvements in training and modeling approaches. Recent advancements in transfer learning and specific techniques, such as zero-shot learning, make it possible, in many cases, to generate a well-performing model even when working with small datasets or even no data at all. For most NLP tasks, there are several options for generating a suitable model:

  • Train a new model from scratch with your own data

  • Use an existing model (possibly for zero-shot learning)

  • Alter an existing model to suit specific needs (transfer learning)

When combined with quality data and an appropriate training approach, NLP applications can deliver impressive results allowing our models to get closer to achieving human performance in natural language understanding.

The Benefits of Using An Open Source Library for Your NLP Project

In a previous post we outlined some of the popular open source NLP frameworks. A core feature of many of today’s NLP libraries is that they provide users with pretrained models for all NLP tasks and the ability to choose any model that serves their purpose. In this post, we demonstrate how easy it is to use and deploy a model from Hugging Face Hub.

With open source libraries like Hugging Face, sharing, using, and fine-tuning existing pretrained models are often straightforward operations for someone who has a cursory knowledge of NLP. By harnessing the structure of pretrained models, we can train the output layers for our specific task with less data and compute time than needed to train a completely new model. Below we'll briefly touch on two of these approaches.

Training Approaches And Using an Existing NLP Model

With new NLP models trained daily, there is now a wealth of available NLP models for a broad range of NLP tasks that are open source and easily accessible through NLP frameworks. As we write this post, there are currently 22,887 open source models on the Hugging Face Hub that anyone can download and use. This number is mostly attributed to Hugging Face making it easy for contributors to add their models on the hub. Furthermore, you can experiment with these pretrained models using various training approaches, including supervised, semi-supervised, few-shot learning, and zero-shot learning.

While supervised and semi-supervised learning methods are relatively more established approaches, few-shot and zero-shot learning methods are newer and promising techniques, now widely being experimented with in NLP. Particularly with the proliferation of open source large language models, such as GPT-3 and more recently Megatron Turing NLG, few-shot learning is now one of the most popular approaches in NLP. The few-shot learning approach classifies text (or image) based on a small number of labeled samples.

Many of the existing models on the Hugging Face Hub are amenable to zero-shot learning, a technique still in its initial stages of natural language inference (NLI). Its primary goal is to tackle the prevalent problem of lacking enough labeled data for training AI models. Having seen promising results in computer vision, zero-shot learning has recently become an active research area for NLP, as well. This method allows the inference of both source and target classes without data from target classes and to task the models with classifying categories that are not present during model training. With zero-shot learning, you can use an existing model for a classification task without retraining the model. Check out Hugging Face’s zero-shot learning demo here.

Fine-Tuning an Existing Model to Suit Specific Needs

Pretrained models can be repurposed for downstream tasks, using transfer learning, which is, in fact, the underlying technique behind training approaches, such as zero-shot and few-shot learning discussed above. The main idea in transfer learning is that the initial layers in deep learning models tend to generate feature spaces for a given domain that are useful beyond the scope of the specific task. If we focus on just retraining the output layers, we can generate a powerful model that’s well suited for our task at hand, without the challenge of constructing a model from scratch and without the large expense of training it.

Interfacing NLP Models

NLP practitioners are placing more emphasis on deploying their models and taking their expertise beyond data gathering and training models. With this, the need for easy deployment options surfaced, allowing the practitioners to deploy their models without having to be a DevOps expert. Deploying small models and prototypes easily through web application interfaces is now becoming a critical part of the toolkit of practitioners.

To illustrate the process of deploying a model on the web for anyone to experiment with it, we deployed a toy sentiment analysis app (see the repo here). For this demo, we used Streamlit. Streamlit is a project that affords fast development and free deployment of small apps. It is an excellent tool for prototyping apps. This simple interface requires very little code:

import streamlit as st
import numpy as np
import en_textcat_goemotions
from emoji_dict import emojis

def load_model():
    return en_textcat_goemotions.load()

def classify(text):
    out = nlp(text)
    keys = list(out.cats.keys())
    values = list(out.cats.values())
    max_val = np.argmax(values)
    return keys[max_val]

nlp = load_model()

title = st.title(':watermelon: Emotion Detection :watermelon:')
text = st.text_area('write a sentence to classify')

if text is not None:
    category = classify(text)
    st.markdown(f'{emojis[category]} {category} / 

For this task, we used a pretrained model. We searched Hugging Face for emotion classification models and picked this model, which is created by spaCy developers and trained on the GoEmotions dataset that Google has recently open sourced. For a popular use case like emotion recognition, Hugging Face and spaCy collaboration expedites the process of generating and interacting with the model and abstracts away the details of word embeddings and sparse vectors among other technicalities of NLP models. In the above app, we need to load the model and have it classify inputted text. To load the model, we need only use the en_textcat_goemotions.load() method and it returns a SpaCy model of type spacy.lang.en.English:

>>> type(nlp)

To use the model for prediction, we use the __call__ method with our inputted text. To get the classification, we use the out.cats property, which returns a dictionary with the probabilities for each category for the given input. We use the category with the highest probability for the classification.

>>> out = nlp('I love learning about NLP')
>>> out.cats
{'admiration': 0.011484717018902302,
 'amusement': 0.011577117256820202,
 'anger': 0.003952520899474621,
 'annoyance': 0.007863284088671207,
 'approval': 0.01222476176917553,
 'caring': 0.005067039746791124,
 'confusion': 0.0035391286946833134,
 'curiosity': 0.004292341414839029,
 'desire': 0.01035382691770792,
 'disappointment': 0.004042243119329214,
 'disapproval': 0.007767093367874622,
 'disgust': 0.0030901883728802204,
 'embarrassment': 0.0016432638512924314,
 'excitement': 0.002745988080278039,
 'fear': 0.004036883357912302,
 'gratitude': 0.0029807265382260084,
 'grief': 0.0028782000299543142,
 'joy': 0.01333794929087162,
 'love': 0.9983959794044495,
 'nervousness': 0.008615669794380665,
 'optimism': 0.0036486624740064144,
 'pride': 0.0028471113182604313,
 'realization': 0.010369624942541122,
 'relief': 0.0012223649537190795,
 'remorse': 0.0022894563153386116,
 'sadness': 0.004058834630995989,
 'surprise': 0.005428586155176163,
 'neutral': 0.005726550240069628}

Streamlit works by iteratively running the code from top to bottom and keeping track of when values change. On the first iteration, we load the model and set up each of the widgets. We can use ‘st.cache’ to prevent Streamlit from reloading the model each iteration. When the value of ‘st.text_area’ changes, Streamlit iterates through the code and renders the classification result as markdown.

To run on the Streamlit servers, we need to provide an environment file. Here's our requirements.txt:


Combining the pretrained models of Hugging Face with the fast development/deployment cycle afforded by Streamlit, we were able to go from a simple idea of an NLP-based application to a fully deployed prototype in the time that it takes to 1) search Hugging Face for a suitable model, 2) install an environment, and 3) write ~15 lines of code. For a different task, we could have employed zero-shot learning with minimal time/effort. If we had some data that we wanted to model, we also could have used transfer learning for our specific task. Using these two frameworks together, all roads are conceptually clear and relatively quick.


With frameworks like Hugging Face and Streamlit, it’s easier than ever to quickly experiment with and deploy NLP models. Streamlit works well for small applications and for prototyping larger ones. The extensive repository of open source models Hugging Face Hub makes it simple to find existing models that can be used for the original task they were trained for or a new task via transfer learning. When combined, these two frameworks lead to practical NLP-based applications developed and deployed very quickly.

349 views0 comments
..... ..... .....
..... ..... .....
...... ......