Here, by leveraging the powerful Universal Sentence Encoder library built on the transformer architecture, we will improve our model’s ability to comprehend sentences and distinguish between them.
TensorFlow + JavaScript. The most popular, cutting-edge AI framework now supports the most widely used programming language on the planet. So let’s make text and NLP (Natural Language Processing) chatbot magic happen through Deep Learning right in our web browser, GPU-accelerated via WebGL using TensorFlow.js!
You are welcome to download the project code.
In our first version of text emotion detection, we simply marked which words were contained in the sentence from a vocabulary "bag of words" to train our neural network. This method has some drawbacks that might be already evident to you.
One word can have drastically different meanings depending on its context. For example, phrases "this strange seashell is pretty"
and "this seashell is pretty strange"
have different meanings that would be hidden in just a collection of the words: [ "is"
, "pretty"
, "seashell"
, "strange"
, "this"
].
This is where transformers can come in handy. Transformers are behind the state-of-the-art neural network architecture running famous text generation models like GPT-2 and GPT-3, as well as the latest breakthroughs in high-quality language translations. It is also the technology powering the sentence-level embedding functionality that we’ll use to improve our emotion detection.
Some great resources explain transformers in-depth. For this project, it will be enough to know that they are a neural network model that analyzes both the word and its position in a sentence to infer its meaning and importance. If you are interested in reading up on transformers, here are some places you can start:
Setting Up TensorFlow.js Code with Universal Sentence Encoder
This project is very similar to our first emotion detection code, so let’s use the initial codebase as our starting point, with the word embeddings and prediction parts removed. We’ll add one important and incredibly powerful library here, Universal Sentence Encoder, which is a pre-trained transformer-based language processing model. This library is what we will use to generate much more meaningful input vectors for the AI from the text.
<html>
<head>
<title>Detecting Emotion in Text: Chatbots in the Browser with TensorFlow.js</title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder"></script>
</head>
<body>
<p id="text"></p>
<h1 id="status">Loading...</h1>
<script>
const emotions = [
"admiration",
"amusement",
"anger",
"annoyance",
"approval",
"caring",
"confusion",
"curiosity",
"desire",
"disappointment",
"disapproval",
"disgust",
"embarrassment",
"excitement",
"fear",
"gratitude",
"grief",
"joy",
"love",
"nervousness",
"optimism",
"pride",
"realization",
"relief",
"remorse",
"sadness",
"surprise",
"neutral"
];
function setText( text ) {
document.getElementById( "status" ).innerText = text;
}
function shuffleArray( array ) {
for( let i = array.length - 1; i > 0; i-- ) {
const j = Math.floor( Math.random() * ( i + 1 ) );
[ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
}
}
(async () => {
let data = await fetch( "web/emotions.tsv" ).then( r => r.text() );
let lines = data.split( "\n" ).filter( x => !!x );
shuffleArray( lines );
const numSamples = 200;
let sentences = lines.slice( 0, numSamples ).map( line => {
let sentence = line.split( "\t" )[ 0 ];
return sentence;
});
let outputs = lines.slice( 0, numSamples ).map( line => {
let categories = line.split( "\t" )[ 1 ].split( "," ).map( x => parseInt( x ) );
let output = [];
for( let i = 0; i < emotions.length; i++ ) {
output.push( categories.includes( i ) ? 1 : 0 );
}
return output;
});
const model = tf.sequential();
model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ 512 ] } ) );
model.add(tf.layers.dense( { units: 50, activation: "relu" } ) );
model.add(tf.layers.dense( { units: 25, activation: "relu" } ) );
model.add(tf.layers.dense( {
units: emotions.length,
activation: "softmax"
} ) );
model.compile({
optimizer: tf.train.adam(),
loss: "categoricalCrossentropy",
metrics: [ "accuracy" ]
});
const xs = tf.stack( vectors.map( x => tf.tensor1d( x ) ) );
const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
await model.fit( xs, ys, {
epochs: 100,
shuffle: true,
callbacks: {
onEpochEnd: ( epoch, logs ) => {
setText( `Training... Epoch #${epoch} (${logs.acc})` );
console.log( "Epoch #", epoch, logs );
}
}
} );
setInterval( async () => {
let line = lines[ Math.floor( Math.random() * lines.length ) ];
let sentence = line.split( "\t" )[ 0 ];
let categories = line.split( "\t" )[ 1 ].split( "," ).map( x => parseInt( x ) );
document.getElementById( "text" ).innerText = sentence;
}, 5000 );
})();
</script>
</body>
</html>
GoEmotion Dataset
We’ll use the same data we’d used in the first version, from the GoEmotions dataset available on the Google Research GitHub repository, to train our neural network. It consists of 58 thousand English Reddit comments labeled with 27 emotion categories. You can use the full set if you’d like, but we only need a small subset for this project, so downloading this smaller test set will work. Place the downloaded file in the project folder where your web page can retrieve it from the local web server such as "web"
.
Universal Sentence Encoder
Universal Sentence Encoder (USE) is "a [pre-trained] model that encodes text into 512-dimensional embeddings." It’s based on the transformer architecture, and has been trained on a vocabulary of 8 thousand words that generate embeddings (sentence encoding vectors) that can map any sentence by how similar it is to any other potential sentence. As I mentioned earlier, using transformers, this model takes into account not just the words in the sentence but also their position and significance.
We can generate a unique ID for any sentence while calculating how similar it is to other sentences. This ability is quite powerful for NLP tasks because we can combine textual datasets with the embeddings as input into our neural networks to learn and extract different characteristics, such as inferring emotion from the text .
The USE is quite easy and straightforward to work with. Let’s load it up in our code before we define our network model and generate embeddings for each of the sentences.
setText( "Loading USE..." );
let encoder = await use.load();
setText( "Loaded!" );
let embeddings = await encoder.embed( sentences );
Training the AI Model
We don’t have to change our model at all - but here it is again, to refresh your memory.
We define a network with three hidden layers outputting a categorical vector with a length of 27 (the number of emotion categories), where the index of the maximum value is our predicted emotion identifier.
const model = tf.sequential();
model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ allWords.length ] } ) );
model.add(tf.layers.dense( { units: 50, activation: "relu" } ) );
model.add(tf.layers.dense( { units: 25, activation: "relu" } ) );
model.add(tf.layers.dense( {
units: emotions.length,
activation: "softmax"
} ) );
model.compile({
optimizer: tf.train.adam(),
loss: "categoricalCrossentropy",
metrics: [ "accuracy" ]
});
And we can update the training input to directly take the sentence embeddings because they are already returned as tensors.
const xs = embeddings;
const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
await model.fit( xs, ys, {
epochs: 50,
shuffle: true,
callbacks: {
onEpochEnd: ( epoch, logs ) => {
setText( `Training... Epoch #${epoch} (${logs.acc})` );
console.log( "Epoch #", epoch, logs );
}
}
} );
Let’s Detect Emotions
It’s time to see how well our upgraded emotion detector performs.
Inside the 5-second timer where we predict the emotion category, we can now generate a sentence embedding with USE for input.
Here’s what that looks like.
setInterval( async () => {
let line = lines[ Math.floor( Math.random() * lines.length ) ];
let sentence = line.split( "\t" )[ 0 ];
let categories = line.split( "\t" )[ 1 ].split( "," ).map( x => parseInt( x ) );
document.getElementById( "text" ).innerText = sentence;
let vector = await encoder.embed( [ sentence ] );
let prediction = await model.predict( vector ).data();
let id = prediction.indexOf( Math.max( ...prediction ) );
setText( `Result: ${emotions[ id ]}, Expected: ${emotions[ categories[ 0 ] ]}` );
}, 5000 );
Finish Line
That’s all. Are you impressed yet? Here is our final code:
<html>
<head>
<title>Detecting Emotion in Text: Chatbots in the Browser with TensorFlow.js</title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder"></script>
</head>
<body>
<p id="text"></p>
<h1 id="status">Loading...</h1>
<script>
const emotions = [
"admiration",
"amusement",
"anger",
"annoyance",
"approval",
"caring",
"confusion",
"curiosity",
"desire",
"disappointment",
"disapproval",
"disgust",
"embarrassment",
"excitement",
"fear",
"gratitude",
"grief",
"joy",
"love",
"nervousness",
"optimism",
"pride",
"realization",
"relief",
"remorse",
"sadness",
"surprise",
"neutral"
];
function setText( text ) {
document.getElementById( "status" ).innerText = text;
}
function shuffleArray( array ) {
for( let i = array.length - 1; i > 0; i-- ) {
const j = Math.floor( Math.random() * ( i + 1 ) );
[ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
}
}
(async () => {
let data = await fetch( "web/emotions.tsv" ).then( r => r.text() );
let lines = data.split( "\n" ).filter( x => !!x );
shuffleArray( lines );
const numSamples = 200;
let sentences = lines.slice( 0, numSamples ).map( line => {
let sentence = line.split( "\t" )[ 0 ];
return sentence;
});
let outputs = lines.slice( 0, numSamples ).map( line => {
let categories = line.split( "\t" )[ 1 ].split( "," ).map( x => parseInt( x ) );
let output = [];
for( let i = 0; i < emotions.length; i++ ) {
output.push( categories.includes( i ) ? 1 : 0 );
}
return output;
});
setText( "Loading USE..." );
let encoder = await use.load();
setText( "Loaded!" );
let embeddings = await encoder.embed( sentences );
const model = tf.sequential();
model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ 512 ] } ) );
model.add(tf.layers.dense( { units: 50, activation: "relu" } ) );
model.add(tf.layers.dense( { units: 25, activation: "relu" } ) );
model.add(tf.layers.dense( {
units: emotions.length,
activation: "softmax"
} ) );
model.compile({
optimizer: tf.train.adam(),
loss: "categoricalCrossentropy",
metrics: [ "accuracy" ]
});
const xs = embeddings;
const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
await model.fit( xs, ys, {
epochs: 50,
shuffle: true,
callbacks: {
onEpochEnd: ( epoch, logs ) => {
setText( `Training... Epoch #${epoch} (${logs.acc})` );
console.log( "Epoch #", epoch, logs );
}
}
} );
setInterval( async () => {
let line = lines[ Math.floor( Math.random() * lines.length ) ];
let sentence = line.split( "\t" )[ 0 ];
let categories = line.split( "\t" )[ 1 ].split( "," ).map( x => parseInt( x ) );
document.getElementById( "text" ).innerText = sentence;
let vector = await encoder.embed( [ sentence ] );
let prediction = await model.predict( vector ).data();
let id = prediction.indexOf( Math.max( ...prediction ) );
setText( `Result: ${emotions[ id ]}, Expected: ${emotions[ categories[ 0 ] ]}` );
}, 5000 );
})();
</script>
</body>
</html>
What’s Next?
By leveraging the powerful Universal Sentence Encoder library built on the transformer architecture, we improved our model’s ability to comprehend sentences and distinguish between them. Could this also apply to answering trivia questions?
Find out in the next article of this series, Improved Trivia Expert: Chatbots in the Browser with TensorFlow.js.
Raphael Mun is a tech entrepreneur and educator who has been developing software professionally for over 20 years. He currently runs Lemmino, Inc and teaches and entertains through his Instafluff livestreams on Twitch building open source projects with his community.