{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "code",
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-9J37ylJuw12",
"outputId": "2a1bff3b-237d-4e52-c07e-f690fe70ac00"
},
"id": "-9J37ylJuw12",
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "opJDPXu9sje3"
},
"source": [
"## Table of Contents\n",
"- [1 - Import Libraries and try out Trax](#1)\n",
"- [2 - Importing the Data](#2)\n",
" - [2.1 - Loading in the Data](#2-1)\n",
" - [2.2 - Building the Vocabulary](#2-2)\n",
" - [2.3 - Converting a Tweet to a Tensor](#2-3)\n",
" - [2.4 - Creating a Batch Generator](#2-4)\n",
"- [3 - Defining Classes](#3)\n",
" - [3.1 - ReLU Class](#3-1)\n",
" - [3.2 - Dense Class](#3.2)\n",
" - [3.3 - Model](#3-3)\n",
" \n",
"- [4 - Training](#4)\n",
" - [4.1 Training the Model](#4-1)\n",
" \n",
" - [4.2 - Practice Making a Prediction](#4-2)\n",
"- [5 - Evaluation](#5)\n",
" - [5.1 - Computing the Accuracy on a Batch](#5-1)\n",
" \n",
" - [5.2 - Testing your Model on Validation Data](#5-2)\n",
"- [6 - Testing with your Own Input](#6)\n",
"- [7 - Word Embeddings](#7)"
],
"id": "opJDPXu9sje3"
},
{
"cell_type": "code",
"source": [
"cd /content/drive/MyDrive/Code_files_and_data/"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IQptnNSWuunW",
"outputId": "dde778b3-75bd-49be-a69e-7d6122933048"
},
"id": "IQptnNSWuunW",
"execution_count": 77,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"/content/drive/MyDrive/Code_files\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IOK4n9JEjCVs"
},
"source": [
"\n",
"## 1 - Import Libraries and try out Trax\n",
"\n",
"- Let's import libraries and look at an example of using the Trax library."
],
"id": "IOK4n9JEjCVs"
},
{
"cell_type": "code",
"source": [
"!pip install -q -U trax\n",
"import trax"
],
"metadata": {
"id": "VbdGmrkjumHk"
},
"id": "VbdGmrkjumHk",
"execution_count": 16,
"outputs": []
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"id": "WOTfm2P0jCVt"
},
"outputs": [],
"source": [
"import os \n",
"import shutil\n",
"import random as rnd\n",
"\n",
"# import relevant libraries\n",
"import trax\n",
"import trax.fastmath.numpy as np\n",
"from trax import layers as tl\n",
"from trax import fastmath\n",
"\n",
"# import Layer from the utils.py file\n",
"from utils3 import Layer, load_tweets, process_tweet"
],
"id": "WOTfm2P0jCVt"
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "EyMnUt38jCVw",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
},
"outputId": "42add9cf-99e5-4a0f-edc0-5aeaa4f91ef8"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray(5., dtype=float32, weak_type=True)"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n"
]
}
],
"source": [
"# Create an array using trax.fastmath.numpy\n",
"a = np.array(5.0)\n",
"\n",
"# View the returned array\n",
"display(a)\n",
"\n",
"print(type(a))"
],
"id": "EyMnUt38jCVw"
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"id": "J2RUtDtrjCV0"
},
"outputs": [],
"source": [
"# Define a function that will use the trax.fastmath.numpy array\n",
"def f(x):\n",
" \n",
" # f = x^2\n",
" return (x**2)"
],
"id": "J2RUtDtrjCV0"
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"id": "qvUd-xzqjCV4",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "58c03de7-f314-4b6d-dc52-6a9ea624afe0"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"f(a) for a=5.0 is 25.0\n"
]
}
],
"source": [
"# Call the function\n",
"print(f\"f(a) for a={a} is {f(a)}\")"
],
"id": "qvUd-xzqjCV4"
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"id": "2Im5Hkc9jCV8",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "5ea03a9c-8da6-4de1-c7fa-a3f512cf4a03"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"function"
]
},
"metadata": {},
"execution_count": 21
}
],
"source": [
"# Directly use trax.fastmath.grad to calculate the gradient (derivative) of the function\n",
"grad_f = trax.fastmath.grad(fun=f) # df / dx - Gradient of function f(x) with respect to x\n",
"\n",
"# View the type of the retuned object (it's a function)\n",
"type(grad_f)"
],
"id": "2Im5Hkc9jCV8"
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"id": "0lDIVvx3jCV_",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "34a88070-56ee-4203-bb89-905af882d3b9"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray(10., dtype=float32, weak_type=True)"
]
},
"metadata": {}
}
],
"source": [
"# Call the newly created function and pass in a value for x (the DeviceArray stored in 'a')\n",
"grad_calculation = grad_f(a)\n",
"\n",
"# View the result of calling the grad_f function\n",
"display(grad_calculation)"
],
"id": "0lDIVvx3jCV_"
},
{
"cell_type": "markdown",
"metadata": {
"id": "CZ8RUynQsktn"
},
"source": [
"\n",
"## 2 - Importing the Data\n",
"\n",
"\n",
"### 2.1 - Loading in the Data\n",
"\n",
"Import the data set. \n",
"- You may recognize this from earlier assignments in the specialization.\n",
"- Details of process_tweet function are available in utils.py file"
],
"id": "CZ8RUynQsktn"
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"deletable": false,
"editable": false,
"id": "h5ClwIOSuLJh"
},
"outputs": [],
"source": [
"## DO NOT EDIT THIS CELL\n",
"\n",
"# Import functions from the utils.py file\n",
"\n",
"def train_val_split():\n",
" # Load positive and negative tweets\n",
" all_positive_tweets, all_negative_tweets = load_tweets()\n",
"\n",
" # View the total number of positive and negative tweets.\n",
" print(f\"The number of positive tweets: {len(all_positive_tweets)}\")\n",
" print(f\"The number of negative tweets: {len(all_negative_tweets)}\")\n",
"\n",
" # Split positive set into validation and training\n",
" val_pos = all_positive_tweets[4000:] # generating validation set for positive tweets\n",
" train_pos = all_positive_tweets[:4000]# generating training set for positive tweets\n",
"\n",
" # Split negative set into validation and training\n",
" val_neg = all_negative_tweets[4000:] # generating validation set for negative tweets\n",
" train_neg = all_negative_tweets[:4000] # generating training set for nagative tweets\n",
" \n",
" # Combine training data into one set\n",
" train_x = train_pos + train_neg \n",
"\n",
" # Combine validation data into one set\n",
" val_x = val_pos + val_neg\n",
"\n",
" # Set the labels for the training set (1 for positive, 0 for negative)\n",
" train_y = np.append(np.ones(len(train_pos)), np.zeros(len(train_neg)))\n",
"\n",
" # Set the labels for the validation set (1 for positive, 0 for negative)\n",
" val_y = np.append(np.ones(len(val_pos)), np.zeros(len(val_neg)))\n",
"\n",
"\n",
" return train_pos, train_neg, train_x, train_y, val_pos, val_neg, val_x, val_y"
],
"id": "h5ClwIOSuLJh"
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"id": "lQOynZRtsjfD",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "918abf1b-7605-43eb-8b67-6e1b3f789662"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The number of positive tweets: 5000\n",
"The number of negative tweets: 5000\n",
"length of train_x 8000\n",
"length of val_x 2000\n"
]
}
],
"source": [
"train_pos, train_neg, train_x, train_y, val_pos, val_neg, val_x, val_y = train_val_split()\n",
"\n",
"print(f\"length of train_x {len(train_x)}\")\n",
"print(f\"length of val_x {len(val_x)}\")"
],
"id": "lQOynZRtsjfD"
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"id": "2bRX6aPDjCWH",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c038056e-5300-4de0-e308-d35bc0a925de"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"original tweet at training position 0\n",
"#FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top engaged members in my community this week :)\n",
"Tweet at training position 0 after processing:\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['followfriday', 'top', 'engag', 'member', 'commun', 'week', ':)']"
]
},
"metadata": {},
"execution_count": 25
}
],
"source": [
"# Try out function that processes tweets\n",
"print(\"original tweet at training position 0\")\n",
"print(train_pos[0])\n",
"\n",
"print(\"Tweet at training position 0 after processing:\")\n",
"process_tweet(train_pos[0])"
],
"id": "2bRX6aPDjCWH"
},
{
"cell_type": "markdown",
"metadata": {
"id": "ac4D5WSUAVub"
},
"source": [
"\n",
"### 2.2 - Building the Vocabulary\n",
"\n",
"Now build the vocabulary.\n",
"- Map each word in each tweet to an integer (an \"index\"). \n",
"- The following code does this for you, but please read it and understand what it's doing.\n",
"- Note that you will build the vocabulary based on the training data. \n",
"- To do so, you will assign an index to everyword by iterating over your training set.\n",
"\n",
"The vocabulary will also include some special tokens\n",
"- `__PAD__`: padding\n",
"- ``: end of line\n",
"- `__UNK__`: a token representing any word that is not in the vocabulary."
],
"id": "ac4D5WSUAVub"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rQaHKs7kAVuc"
},
"outputs": [],
"source": [
"# Build the vocabulary\n",
"# Unit Test Note - There is no test set here only train/val\n",
"def get_vocab(train_x):\n",
"\n",
" # Include special tokens \n",
" # started with pad, end of line and unk tokens\n",
" Vocab = {'__PAD__': 0, '____': 1, '__UNK__': 2} \n",
"\n",
" # Note that we build vocab using training data\n",
" for tweet in train_x: \n",
" processed_tweet = process_tweet(tweet)\n",
" for word in processed_tweet:\n",
" if word not in Vocab: \n",
" Vocab[word] = len(Vocab)\n",
" \n",
" return Vocab\n",
"\n",
"Vocab = get_vocab(train_x)\n",
"\n",
"print(\"Total words in vocab are\",len(Vocab))\n",
"display(Vocab)"
],
"id": "rQaHKs7kAVuc"
},
{
"cell_type": "markdown",
"metadata": {
"id": "0x8pND8tAVuf"
},
"source": [
"\n",
"## 2.3 - Converting a Tweet to a Tensor\n",
"\n",
"Write a function that will convert each tweet to a tensor (a list of unique integer IDs representing the processed tweet).\n",
"- Note, the returned data type will be a **regular Python `list()`**\n",
" - You won't use TensorFlow in this function\n",
" - You also won't use a numpy array\n",
" - You also won't use trax.fastmath.numpy array\n",
"- For words in the tweet that are not in the vocabulary, set them to the unique ID for the token `__UNK__`.\n",
"\n",
"##### Example\n",
"Input a tweet:\n",
"```CPP\n",
"'@happypuppy, is Maria happy?'\n",
"```\n",
"\n",
"The tweet_to_tensor will first conver the tweet into a list of tokens (including only relevant words)\n",
"```CPP\n",
"['maria', 'happi']\n",
"```\n",
"\n",
"Then it will convert each word into its unique integer\n",
"\n",
"```CPP\n",
"[2, 56]\n",
"```\n",
"- Notice that the word \"maria\" is not in the vocabulary, so it is assigned the unique integer associated with the `__UNK__` token, because it is considered \"unknown.\"\n",
"\n"
],
"id": "0x8pND8tAVuf"
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"id": "Ft1zNGMaAVuf"
},
"outputs": [],
"source": [
"# CANDIDATE FOR TABLE TEST - If a student forgets to check for unk, there might be errors or just wrong values in the list.\n",
"# We can add those errors to check in autograder through tabled test or here student facing user test.\n",
"\n",
"# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) \n",
"# GRADED FUNCTION: tweet_to_tensor\n",
"def tweet_to_tensor(tweet, vocab_dict, unk_token='__UNK__', verbose=False):\n",
" '''\n",
" Input: \n",
" tweet - A string containing a tweet\n",
" vocab_dict - The words dictionary\n",
" unk_token - The special string for unknown tokens\n",
" verbose - Print info durign runtime\n",
" Output:\n",
" tensor_l - A python list with\n",
" \n",
" ''' \n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
" # Process the tweet into a list of words\n",
" # where only important words are kept (stop words removed)\n",
" word_l = process_tweet(tweet)\n",
" \n",
" if verbose:\n",
" print(\"List of words from the processed tweet:\")\n",
" print(word_l)\n",
" \n",
" # Initialize the list that will contain the unique integer IDs of each word\n",
" tensor_l = [] \n",
" \n",
" # Get the unique integer ID of the __UNK__ token\n",
" unk_ID = unk_token\n",
" \n",
" if verbose:\n",
" print(f\"The unique integer ID for the unk_token is {unk_ID}\")\n",
" \n",
" # for each word in the list:\n",
" for word in word_l:\n",
" \n",
" # Get the unique integer ID.\n",
" # If the word doesn't exist in the vocab dictionary,\n",
" # use the unique ID for __UNK__ instead. \n",
" word_ID = vocab_dict.get(word, vocab_dict[unk_ID])\n",
" \n",
" # Append the unique integer ID to the tensor list.\n",
" tensor_l.append(word_ID)\n",
" ### END CODE HERE ###\n",
" \n",
" return tensor_l"
],
"id": "Ft1zNGMaAVuf"
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"id": "ze0Zx_5UjCWU",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "478c5e29-2973-4c8a-ba61-5f80b5c9efc2"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Actual tweet is\n",
" Bro:U wan cut hair anot,ur hair long Liao bo\n",
"Me:since ord liao,take it easy lor treat as save $ leave it longer :)\n",
"Bro:LOL Sibei xialan\n",
"\n",
"Tensor of tweet:\n",
" [1064, 136, 478, 2351, 744, 8149, 1122, 744, 53, 2, 2671, 790, 2, 2, 348, 600, 2, 3488, 1016, 596, 4558, 9, 1064, 157, 2, 2]\n"
]
}
],
"source": [
"print(\"Actual tweet is\\n\", val_pos[0])\n",
"print(\"\\nTensor of tweet:\\n\", tweet_to_tensor(val_pos[0], vocab_dict=Vocab))"
],
"id": "ze0Zx_5UjCWU"
},
{
"cell_type": "markdown",
"metadata": {
"id": "rwAZZIYYAVuj"
},
"source": [
"\n",
"### 2.4 - Creating a Batch Generator\n",
"\n",
"Most of the time in Natural Language Processing, and AI in general we use batches when training our data sets. \n",
"- If instead of training with batches of examples, you were to train a model with one example at a time, it would take a very long time to train the model. \n",
"- You will now build a data generator that takes in the positive/negative tweets and returns a batch of training examples. It returns the model inputs, the targets (positive or negative labels) and the weight for each target (ex: this allows us to can treat some examples as more important to get right than others, but commonly this will all be 1.0). \n",
"\n",
"Once you create the generator, you could include it in a for loop\n",
"\n",
"```CPP\n",
"for batch_inputs, batch_targets, batch_example_weights in data_generator:\n",
" ...\n",
"```\n",
"\n",
"You can also get a single batch like this:\n",
"\n",
"```CPP\n",
"batch_inputs, batch_targets, batch_example_weights = next(data_generator)\n",
"```\n",
"The generator returns the next batch each time it's called. \n",
"- This generator returns the data in a format (tensors) that you could directly use in your model.\n",
"- It returns a triplet: the inputs, targets, and loss weights:\n",
" - Inputs is a tensor that contains the batch of tweets we put into the model.\n",
" - Targets is the corresponding batch of labels that we train to generate.\n",
" - Loss weights here are just 1s with same shape as targets. Next week, you will use it to mask input padding."
],
"id": "rwAZZIYYAVuj"
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"id": "fPd9HNT7AVuk"
},
"outputs": [],
"source": [
"# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)\n",
"# GRADED: Data generator\n",
"def data_generator(data_pos, data_neg, batch_size, loop, vocab_dict, shuffle=False):\n",
" '''\n",
" Input: \n",
" data_pos - Set of positive examples\n",
" data_neg - Set of negative examples\n",
" batch_size - number of samples per batch. Must be even\n",
" loop - True or False\n",
" vocab_dict - The words dictionary\n",
" shuffle - Shuffle the data order\n",
" Yield:\n",
" inputs - Subset of positive and negative examples\n",
" targets - The corresponding labels for the subset\n",
" example_weights - An array specifying the importance of each example\n",
" \n",
" ''' \n",
"\n",
" # make sure the batch size is an even number\n",
" # to allow an equal number of positive and negative samples \n",
" assert batch_size % 2 == 0\n",
" \n",
" # Number of positive examples in each batch is half of the batch size\n",
" # same with number of negative examples in each batch\n",
" n_to_take = batch_size // 2\n",
" \n",
" # Use pos_index to walk through the data_pos array\n",
" # same with neg_index and data_neg\n",
" pos_index = 0\n",
" neg_index = 0\n",
" \n",
" len_data_pos = len(data_pos)\n",
" len_data_neg = len(data_neg)\n",
" \n",
" # Get and array with the data indexes\n",
" pos_index_lines = list(range(len_data_pos))\n",
" neg_index_lines = list(range(len_data_neg))\n",
" \n",
" # shuffle lines if shuffle is set to True\n",
" if shuffle:\n",
" rnd.shuffle(pos_index_lines)\n",
" rnd.shuffle(neg_index_lines)\n",
" \n",
" stop = False\n",
" \n",
" # Loop indefinitely\n",
" while not stop: \n",
" \n",
" # create a batch with positive and negative examples\n",
" batch = []\n",
" \n",
" # First part: Pack n_to_take positive examples\n",
" \n",
" # Start from 0 and increment i up to n_to_take\n",
" for i in range(n_to_take):\n",
" \n",
" # If the positive index goes past the positive dataset,\n",
" if pos_index >= len_data_pos: \n",
" \n",
" # If loop is set to False, break once we reach the end of the dataset\n",
" if not loop:\n",
" stop = True;\n",
" break;\n",
" # If user wants to keep re-using the data, reset the index\n",
" pos_index = 0\n",
" if shuffle:\n",
" # Shuffle the index of the positive sample\n",
" rnd.shuffle(pos_index_lines)\n",
" \n",
" # get the tweet as pos_index\n",
" tweet = data_pos[pos_index_lines[pos_index]]\n",
" \n",
" # convert the tweet into tensors of integers representing the processed words\n",
" tensor = tweet_to_tensor(tweet, vocab_dict)\n",
" \n",
" # append the tensor to the batch list\n",
" batch.append(tensor)\n",
" \n",
" # Increment pos_index by one\n",
" pos_index = pos_index + 1\n",
"\n",
"\n",
" \n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
"\n",
" # Second part: Pack n_to_take negative examples\n",
"\n",
" # Using the same batch list, start from 0 and increment i up to n_to_take\n",
" for i in range(n_to_take):\n",
" \n",
" # If the negative index goes past the negative dataset,\n",
" if neg_index >= len_data_neg:\n",
" \n",
" # If loop is set to False, break once we reach the end of the dataset\n",
" if not loop:\n",
" stop = True \n",
" break \n",
" \n",
" # If user wants to keep re-using the data, reset the index\n",
" neg_index = 0\n",
" \n",
" if shuffle:\n",
" # Shuffle the index of the negative sample\n",
" rnd.shuffle(neg_index_lines)\n",
" \n",
" # get the tweet as neg_index\n",
" tweet = data_neg[neg_index_lines[neg_index]]\n",
" \n",
" # convert the tweet into tensors of integers representing the processed words\n",
" tensor = tweet_to_tensor(tweet, vocab_dict)\n",
" \n",
" # append the tensor to the batch list\n",
" batch.append(tensor)\n",
" \n",
" # Increment neg_index by one\n",
" neg_index = neg_index + 1\n",
"\n",
" ### END CODE HERE ### \n",
"\n",
" if stop:\n",
" break;\n",
"\n",
" # Get the max tweet length (the length of the longest tweet) \n",
" # (you will pad all shorter tweets to have this length)\n",
" max_len = max([len(t) for t in batch]) \n",
" \n",
" \n",
" # Initialize the input_l, which will \n",
" # store the padded versions of the tensors\n",
" tensor_pad_l = []\n",
" # Pad shorter tweets with zeros\n",
" for tensor in batch:\n",
"\n",
"\n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
" # Get the number of positions to pad for this tensor so that it will be max_len long\n",
" n_pad = max_len - len(tensor)\n",
" \n",
" # Generate a list of zeros, with length n_pad\n",
" pad_l = [0] * n_pad\n",
" \n",
" # concatenate the tensor and the list of padded zeros\n",
" tensor_pad = tensor + pad_l\n",
" \n",
" # append the padded tensor to the list of padded tensors\n",
" tensor_pad_l.append(tensor_pad)\n",
"\n",
" # convert the list of padded tensors to a numpy array\n",
" # and store this as the model inputs\n",
" inputs = np.array(tensor_pad_l)\n",
" \n",
" # Generate the list of targets for the positive examples (a list of ones)\n",
" # The length is the number of positive examples in the batch\n",
" target_pos = [1] * n_to_take\n",
" \n",
" # Generate the list of targets for the negative examples (a list of zeros)\n",
" # The length is the number of negative examples in the batch\n",
" target_neg = [0] * n_to_take\n",
" \n",
" # Concatenate the positve and negative targets\n",
" target_l = target_pos + target_neg\n",
" \n",
" # Convert the target list into a numpy array\n",
" targets = np.array(target_l)\n",
"\n",
" # Example weights: Treat all examples equally importantly.\n",
" example_weights = np.ones(targets.shape, dtype=int)\n",
" \n",
"\n",
" ### END CODE HERE ###\n",
"\n",
" # note we use yield and not return\n",
" yield inputs, targets, example_weights"
],
"id": "fPd9HNT7AVuk"
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"id": "iIwM4YHtAVum",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b9bf707a-fa6d-460b-db55-3955e040c8cf"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Inputs: [[2005 4450 3200 9 0 0 0 0 0 0 0]\n",
" [4953 566 2000 1453 5173 3498 141 3498 130 458 9]\n",
" [3760 109 136 582 2929 3968 0 0 0 0 0]\n",
" [ 249 3760 0 0 0 0 0 0 0 0 0]]\n",
"Targets: [1 1 0 0]\n",
"Example Weights: [1 1 1 1]\n"
]
}
],
"source": [
"# Set the random number generator for the shuffle procedure\n",
"rnd.seed(30) \n",
"\n",
"# Create the training data generator\n",
"\n",
"def train_generator(batch_size, train_pos\n",
" , train_neg, vocab_dict, loop=True\n",
" , shuffle = False):\n",
" return data_generator(train_pos, train_neg, batch_size, loop, vocab_dict, shuffle)\n",
"\n",
"# Create the validation data generator\n",
"def val_generator(batch_size, val_pos\n",
" , val_neg, vocab_dict, loop=True\n",
" , shuffle = False):\n",
" return data_generator(val_pos, val_neg, batch_size, loop, vocab_dict, shuffle)\n",
"\n",
"# Create the validation data generator\n",
"def test_generator(batch_size, val_pos\n",
" , val_neg, vocab_dict, loop=False\n",
" , shuffle = False):\n",
" return data_generator(val_pos, val_neg, batch_size, loop, vocab_dict, shuffle)\n",
"\n",
"# Get a batch from the train_generator and inspect.\n",
"inputs, targets, example_weights = next(train_generator(4, train_pos, train_neg, Vocab, shuffle=True))\n",
"\n",
"# this will print a list of 4 tensors padded with zeros\n",
"print(f'Inputs: {inputs}')\n",
"print(f'Targets: {targets}')\n",
"print(f'Example Weights: {example_weights}')"
],
"id": "iIwM4YHtAVum"
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"id": "mcDOyrx9jCWh",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "ca374f99-93ed-44b7-90c7-1c3a6160a6a0"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The inputs shape is (4, 14)\n",
"input tensor: [3 4 5 6 7 8 9 0 0 0 0 0 0 0]; target 1; example weights 1\n",
"input tensor: [10 11 12 13 14 15 16 17 18 19 20 9 21 22]; target 1; example weights 1\n",
"input tensor: [5737 2900 3760 0 0 0 0 0 0 0 0 0 0 0]; target 0; example weights 1\n",
"input tensor: [ 857 255 3651 5738 306 4457 566 1229 2766 327 1201 3760 0 0]; target 0; example weights 1\n"
]
}
],
"source": [
"# Test the train_generator\n",
"\n",
"# Create a data generator for training data,\n",
"# which produces batches of size 4 (for tensors and their respective targets)\n",
"tmp_data_gen = train_generator(batch_size = 4, train_pos=train_pos, train_neg=train_neg, vocab_dict=Vocab)\n",
"\n",
"# Call the data generator to get one batch and its targets\n",
"tmp_inputs, tmp_targets, tmp_example_weights = next(tmp_data_gen)\n",
"\n",
"print(f\"The inputs shape is {tmp_inputs.shape}\")\n",
"for i,t in enumerate(tmp_inputs):\n",
" print(f\"input tensor: {t}; target {tmp_targets[i]}; example weights {tmp_example_weights[i]}\")"
],
"id": "mcDOyrx9jCWh"
},
{
"cell_type": "markdown",
"metadata": {
"id": "TcWUXFaPzS-m"
},
"source": [
"\n",
"### 3.1 - ReLU Class\n",
"You will now implement the ReLU activation function in a class below. The ReLU function looks as follows: \n",
"\n",
"\n",
"$$ \\mathrm{ReLU}(x) = \\mathrm{max}(0,x) $$\n"
],
"id": "TcWUXFaPzS-m"
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"id": "VGE5zZ5mzF9x"
},
"outputs": [],
"source": [
"# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)\n",
"# GRADED FUNCTION: Relu\n",
"class Relu(Layer):\n",
" \"\"\"Relu activation function implementation\"\"\"\n",
" def forward(self, x):\n",
" '''\n",
" Input: \n",
" - x (a numpy array): the input\n",
" Output:\n",
" - activation (numpy array): all positive or 0 version of x\n",
" '''\n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
" \n",
" activation = np.maximum(x, 0)\n",
"\n",
" ### END CODE HERE ###\n",
" \n",
" return activation"
],
"id": "VGE5zZ5mzF9x"
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"id": "hVQ3YtoZ1uYP",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "fdff6b5f-8116-45ca-f7f9-8014efd23093"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Test data is:\n",
"[[-2. -1. 0.]\n",
" [ 0. 1. 2.]]\n",
"Output of Relu is:\n",
"[[0. 0. 0.]\n",
" [0. 1. 2.]]\n"
]
}
],
"source": [
"# Test your relu function\n",
"x = np.array([[-2.0, -1.0, 0.0], [0.0, 1.0, 2.0]], dtype=float)\n",
"relu_layer = Relu()\n",
"print(\"Test data is:\")\n",
"print(x)\n",
"print(\"Output of Relu is:\")\n",
"print(relu_layer(x))"
],
"id": "hVQ3YtoZ1uYP"
},
{
"cell_type": "markdown",
"metadata": {
"id": "XepjDxCQ1G8p"
},
"source": [
"\n",
"### 3.2 - Dense Class \n",
"\n",
"Implement the forward function of the Dense class. \n",
"- The forward function multiplies the input to the layer (`x`) by the weight matrix (`W`)\n",
"\n",
"$$\\mathrm{forward}(\\mathbf{x},\\mathbf{W}) = \\mathbf{xW} $$\n",
"\n",
"- You can use `numpy.dot` to perform the matrix multiplication.\n",
"\n",
"Note that for more efficient code execution, you will use the trax version of `math`, which includes a trax version of `numpy` and also `random`.\n",
"\n",
"Implement the weight initializer `new_weights` function\n",
"- Weights are initialized with a random key.\n",
"- The second parameter is a tuple for the desired shape of the weights (num_rows, num_cols)\n",
"- The num of rows for weights should equal the number of columns in x, because for forward propagation, you will multiply x times weights.\n",
"\n",
"Please use `trax.fastmath.random.normal(key, shape, dtype=tf.float32)` to generate random values for the weight matrix. The key difference between this function\n",
"and the standard `numpy` randomness is the explicit use of random keys, which\n",
"need to be passed. While it can look tedious at the first sight to pass the random key everywhere, you will learn in Course 4 why this is very helpful when\n",
"implementing some advanced models.\n",
"- `key` can be generated by calling `random.get_prng(seed=)` and passing in a number for the `seed`.\n",
"- `shape` is a tuple with the desired shape of the weight matrix.\n",
" - The number of rows in the weight matrix should equal the number of columns in the variable `x`. Since `x` may have 2 dimensions if it represents a single training example (row, col), or three dimensions (batch_size, row, col), get the last dimension from the tuple that holds the dimensions of x.\n",
" - The number of columns in the weight matrix is the number of units chosen for that dense layer. Look at the `__init__` function to see which variable stores the number of units.\n",
"- `dtype` is the data type of the values in the generated matrix; keep the default of `tf.float32`. In this case, don't explicitly set the dtype (just let it use the default value).\n",
"\n",
"Set the standard deviation of the random values to 0.1\n",
"- The values generated have a mean of 0 and standard deviation of 1.\n",
"- Set the default standard deviation `stdev` to be 0.1 by multiplying the standard deviation to each of the values in the weight matrix."
],
"id": "XepjDxCQ1G8p"
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"id": "6reTe6asjCWt",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 139
},
"outputId": "ea8b1fee-77fd-4786-918d-e7fd172009e5"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The random seed generated by random.get_prng\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([0, 1], dtype=uint32)"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"choose a matrix with 2 rows and 3 columns\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"(2, 3)"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Weight matrix generated with a normal distribution with mean 0 and stdev of 1\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([[ 0.95730704, -0.9699289 , 1.0070665 ],\n",
" [ 0.3661903 , 0.1729483 , 0.29092234]], dtype=float32)"
]
},
"metadata": {}
}
],
"source": [
"# See how the trax.fastmath.random.normal function works\n",
"tmp_key = trax.fastmath.random.get_prng(seed=1)\n",
"print(\"The random seed generated by random.get_prng\")\n",
"display(tmp_key)\n",
"\n",
"print(\"choose a matrix with 2 rows and 3 columns\")\n",
"tmp_shape=(2,3)\n",
"display(tmp_shape)\n",
"\n",
"# Generate a weight matrix\n",
"# Note that you'll get an error if you try to set dtype to tf.float32, where tf is tensorflow\n",
"# Just avoid setting the dtype and allow it to use the default data type\n",
"tmp_weight = trax.fastmath.random.normal(key=tmp_key, shape=tmp_shape)\n",
"\n",
"print(\"Weight matrix generated with a normal distribution with mean 0 and stdev of 1\")\n",
"display(tmp_weight)"
],
"id": "6reTe6asjCWt"
},
{
"cell_type": "markdown",
"metadata": {
"id": "IpiJ87L9jCWw"
},
"source": [
"\n",
"### - Dense\n",
"\n",
"Implement the `Dense` class."
],
"id": "IpiJ87L9jCWw"
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"id": "783FfWt70660"
},
"outputs": [],
"source": [
"# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)\n",
"# GRADED FUNCTION: Dense\n",
"class Dense(Layer):\n",
" \"\"\"\n",
" A dense (fully-connected) layer.\n",
" \"\"\"\n",
"\n",
" # __init__ is implemented for you\n",
" def __init__(self, n_units, init_stdev=0.1):\n",
" \n",
" # Set the number of units in this layer\n",
" self._n_units = n_units\n",
" self._init_stdev = init_stdev\n",
"\n",
" # Please implement 'forward()'\n",
" def forward(self, x):\n",
"\n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
"\n",
" # Matrix multiply x and the weight matrix\n",
" dense = np.dot(x, self.weights)\n",
" \n",
" ### END CODE HERE ###\n",
" return dense\n",
"\n",
" # init_weights\n",
" def init_weights_and_state(self, input_signature, random_key):\n",
" \n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
" # The input_signature has a .shape attribute that gives the shape as a tuple\n",
" input_shape = input_signature.shape[-1]\n",
"\n",
" # Generate the weight matrix from a normal distribution, \n",
" # and standard deviation of 'stdev' \n",
" w = self._init_stdev * trax.fastmath.random.normal(key = random_key, shape = (input_shape, self._n_units))\n",
" \n",
" ### END CODE HERE ### \n",
" self.weights = w\n",
" return self.weights"
],
"id": "783FfWt70660"
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"id": "vw-z6n8SAVuy",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "4f2b0ba3-a8f1-4c38-b4f9-1a2ed00d255e"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Weights are\n",
" [[-0.02837107 0.09368163 -0.10050073 0.14165013 0.10543301 0.09108127\n",
" -0.04265671 0.0986188 -0.05575324 0.0015325 ]\n",
" [-0.2078568 0.05548371 0.09142365 0.05744596 0.07227863 0.01210618\n",
" -0.03237354 0.16234998 0.02450039 -0.13809781]\n",
" [-0.06111237 0.01403725 0.08410043 -0.10943579 -0.1077502 -0.11396457\n",
" -0.0593338 -0.01557651 -0.03832145 -0.11144515]]\n",
"Foward function output is [[-3.0395489 0.9266805 2.5414748 -2.0504727 -1.9769386 -2.5822086\n",
" -1.7952733 0.94427466 -0.89803994 -3.7497485 ]]\n"
]
}
],
"source": [
"# Testing your Dense layer \n",
"dense_layer = Dense(n_units=10) #sets number of units in dense layer\n",
"random_key = trax.fastmath.random.get_prng(seed=0) # sets random seed\n",
"z = np.array([[2.0, 7.0, 25.0]]) # input array \n",
"\n",
"dense_layer.init(z, random_key)\n",
"print(\"Weights are\\n \",dense_layer.weights) #Returns randomly generated weights\n",
"print(\"Foward function output is \", dense_layer(z)) # Returns multiplied values of units and weights"
],
"id": "vw-z6n8SAVuy"
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"id": "OCVwFVacsjfT",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "40d451ac-5f3f-4e00-e35e-fe097a6cf4cc"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Weights are\n",
" [[ 0.10545162 -0.09692886 -0.05946021 -0.00318857 0.24109332]\n",
" [-0.18784492 -0.07847696 -0.03137084 0.03337089 0.17677036]\n",
" [-0.10277646 0.14111719 -0.05084971 -0.05263775 0.05031504]\n",
" [ 0.10549793 -0.00874073 0.07958167 0.26565617 -0.05822906]]\n",
"Foward function output is [[-1.4564111 -0.7315444 0.14366013 1.6651783 1.2354649 ]]\n"
]
}
],
"source": [
"# Testing your Dense layer \n",
"dense_layer = Dense(n_units=5) #sets number of units in dense layer\n",
"random_key = trax.fastmath.random.get_prng(seed=0) # sets random seed\n",
"z = np.array([[-1.0, 10.0, 0.0, 5.0]]) # input array \n",
"\n",
"dense_layer.init(z, random_key)\n",
"print(\"Weights are\\n \",dense_layer.weights) #Returns randomly generated weights\n",
"print(\"Foward function output is \", dense_layer(z)) # Returns multiplied values of units and weights"
],
"id": "OCVwFVacsjfT"
},
{
"cell_type": "markdown",
"metadata": {
"id": "eZEY8vBCgrgy"
},
"source": [
"\n",
"### 3.3 - Model\n",
"\n",
"Now you will implement a classifier using neural networks. Here is the model architecture you will be implementing. \n",
"\n",
"\n",
"\n",
"For the model implementation, you will use the Trax `layers` module, imported as `tl`.\n",
"Note that the second character of `tl` is the lowercase of letter `L`, not the number 1. Trax layers are very similar to the ones you implemented above,\n",
"but in addition to trainable weights also have a non-trainable state.\n",
"State is used in layers like batch normalization and for inference, you will learn more about it in course 4.\n",
"\n",
"First, look at the code of the Trax Dense layer and compare to your implementation above.\n",
"- [tl.Dense](https://github.com/google/trax/blob/master/trax/layers/core.py#L29): Trax Dense layer implementation\n",
"\n",
"One other important layer that you will use a lot is one that allows to execute one layer after another in sequence.\n",
"- [tl.Serial](https://github.com/google/trax/blob/master/trax/layers/combinators.py#L26): Combinator that applies layers serially. \n",
" - You can pass in the layers as arguments to `Serial`, separated by commas. \n",
" - For example: `tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...))`\n",
"\n",
"Please use the `help` function to view documentation for each layer."
],
"id": "eZEY8vBCgrgy"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RpbiDzN9jCW2"
},
"outputs": [],
"source": [
"# View documentation on tl.Dense\n",
"#help(tl.Dense)"
],
"id": "RpbiDzN9jCW2"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Hrblw_uJ4zmF"
},
"outputs": [],
"source": [
"# View documentation on tl.Serial\n",
"#help(tl.Serial)"
],
"id": "Hrblw_uJ4zmF"
},
{
"cell_type": "markdown",
"metadata": {
"id": "n6PptsvwjCW3"
},
"source": [
"- [tl.Embedding](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L113): Layer constructor function for an embedding layer. \n",
" - `tl.Embedding(vocab_size, d_feature)`.\n",
" - `vocab_size` is the number of unique words in the given vocabulary.\n",
" - `d_feature` is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example). "
],
"id": "n6PptsvwjCW3"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Y5FAphBWjCW4"
},
"outputs": [],
"source": [
"# View documentation for tl.Embedding\n",
"#help(tl.Embedding)"
],
"id": "Y5FAphBWjCW4"
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"id": "Bi4OhkZbjCW6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "4ef0e36b-efc6-4c17-d44e-773fdaaec967"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"Embedding_3_2"
]
},
"metadata": {}
}
],
"source": [
"# An example of and embedding layer\n",
"rnd.seed(31)\n",
"tmp_embed = tl.Embedding(d_feature=2, vocab_size=3)\n",
"display(tmp_embed)"
],
"id": "Bi4OhkZbjCW6"
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"id": "l8-wXwWvsjfV",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 156
},
"outputId": "1dc1507c-3df4-45ce-b214-198d4e6982b3"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Shape of returned array is (2, 3, 2)\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([[[-0.09254155, 1.1765094 ],\n",
" [ 1.0511576 , 0.7154667 ],\n",
" [ 0.7439485 , -0.81590366]],\n",
"\n",
" [[ 0.7439485 , -0.81590366],\n",
" [ 0.7439485 , -0.81590366],\n",
" [-0.09254155, 1.1765094 ]]], dtype=float32)"
]
},
"metadata": {}
}
],
"source": [
"# Let's assume as an example, a batch of two lists\n",
"# each list represents a set of tokenized words.\n",
"tmp_in_arr = np.array([[0,1,2],\n",
" [3,2,0]\n",
" ])\n",
"\n",
"# In order to use the layer, we need to initialize its signature\n",
"tmp_embed.init(trax.shapes.signature(tmp_in_arr))\n",
"\n",
"# Embedding layer will return an array of shape (batch size, vocab size, d_feature)\n",
"tmp_embedded_arr = tmp_embed(tmp_in_arr)\n",
"\n",
"print(f\"Shape of returned array is {tmp_embedded_arr.shape}\")\n",
"display(tmp_embedded_arr)"
],
"id": "l8-wXwWvsjfV"
},
{
"cell_type": "markdown",
"metadata": {
"id": "OD0XVH5jjCW8"
},
"source": [
"- [tl.Mean](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L276): Calculates means across an axis. In this case, please choose axis = 1 to get an average embedding vector (an embedding vector that is an average of all words in the vocabulary). \n",
"- For example, if the embedding matrix is 300 elements and vocab size is 10,000 words, taking the mean of the embedding matrix along axis=1 will yield a vector of 300 elements."
],
"id": "OD0XVH5jjCW8"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "CO0uMOOmjCW8"
},
"outputs": [],
"source": [
"# view the documentation for tl.mean\n",
"#help(tl.Mean)"
],
"id": "CO0uMOOmjCW8"
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"id": "eSS-_d38jCW-",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 86
},
"outputId": "234c1e18-bc07-4ede-8613-9810764f7b37"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The mean along axis 0 creates a vector whose length equals the vocabulary size\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([2.5, 3.5, 4.5], dtype=float32)"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"The mean along axis 1 creates a vector whose length equals the number of elements in a word embedding\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([2., 5.], dtype=float32)"
]
},
"metadata": {}
}
],
"source": [
"# Pretend the embedding matrix uses \n",
"# 2 elements for embedding the meaning of a word\n",
"# and has a vocabulary size of 3\n",
"# So it has shape (2,3)\n",
"tmp_embed = np.array([[1,2,3,],\n",
" [4,5,6]\n",
" ])\n",
"\n",
"# take the mean along axis 0\n",
"print(\"The mean along axis 0 creates a vector whose length equals the vocabulary size\")\n",
"display(np.mean(tmp_embed,axis=0))\n",
"\n",
"print(\"The mean along axis 1 creates a vector whose length equals the number of elements in a word embedding\")\n",
"display(np.mean(tmp_embed,axis=1))"
],
"id": "eSS-_d38jCW-"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0UsQjFrAjCXF"
},
"outputs": [],
"source": [
"#help(tl.LogSoftmax)"
],
"id": "0UsQjFrAjCXF"
},
{
"cell_type": "markdown",
"metadata": {
"id": "IGgTfJ-csjfX"
},
"source": [
"**Online documentation**\n",
"\n",
"- [tl.Dense](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Dense)\n",
"\n",
"- [tl.Serial](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#module-trax.layers.combinators)\n",
"\n",
"- [tl.Embedding](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Embedding)\n",
"\n",
"- [tl.Mean](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Mean)\n",
"\n",
"- [tl.LogSoftmax](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.LogSoftmax)"
],
"id": "IGgTfJ-csjfX"
},
{
"cell_type": "markdown",
"metadata": {
"id": "W8ONXnJsjCXH"
},
"source": [
"\n",
"### Exercise 5 - classifier\n",
"Implement the classifier function. "
],
"id": "W8ONXnJsjCXH"
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"id": "Wh33Hk8lgrgz"
},
"outputs": [],
"source": [
"# UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)\n",
"# GRADED FUNCTION: classifier\n",
"def classifier(vocab_size=9088, embedding_dim=256, output_dim=2, mode='train'):\n",
" \n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
" \n",
" # create embedding layer\n",
" embed_layer = tl.Embedding( \n",
" vocab_size=vocab_size, # Size of the vocabulary\n",
" d_feature=embedding_dim # Embedding dimension\n",
" ) \n",
" \n",
" # Create a mean layer, to create an \"average\" word embedding\n",
" mean_layer = tl.Mean(axis=1)\n",
" \n",
" # Create a dense layer, one unit for each output\n",
" dense_output_layer = tl.Dense(n_units = output_dim)\n",
" \n",
" # Create the log softmax layer (no parameters needed)\n",
" log_softmax_layer = tl.LogSoftmax()\n",
" \n",
" # Use tl.Serial to combine all layers\n",
" # and create the classifier\n",
" # of type trax.layers.combinators.Serial\n",
" model = tl.Serial( \n",
" embed_layer, # embedding layer\n",
" mean_layer, # mean layer\n",
" dense_output_layer, # dense output layer\n",
" log_softmax_layer # log softmax layer\n",
" ) \n",
" ### END CODE HERE ###\n",
" \n",
" # return the model of type\n",
" return model"
],
"id": "Wh33Hk8lgrgz"
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"id": "OwJCu3e9jCXK"
},
"outputs": [],
"source": [
"tmp_model = classifier(vocab_size=len(Vocab))"
],
"id": "OwJCu3e9jCXK"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZsMzvK8YjCXM"
},
"outputs": [],
"source": [
"print(type(tmp_model))\n",
"display(tmp_model)"
],
"id": "ZsMzvK8YjCXM"
},
{
"cell_type": "markdown",
"metadata": {
"id": "1FaugA_7grg6"
},
"source": [
"\n",
"## 4 - Training\n",
"\n",
"To train a model on a task, Trax defines an abstraction [`trax.supervised.training.TrainTask`](https://trax-ml.readthedocs.io/en/latest/trax.supervised.html#trax.supervised.training.TrainTask) which packages the train data, loss and optimizer (among other things) together into an object.\n",
"\n",
"Similarly to evaluate a model, Trax defines an abstraction [`trax.supervised.training.EvalTask`](https://trax-ml.readthedocs.io/en/latest/trax.supervised.html#trax.supervised.training.EvalTask) which packages the eval data and metrics (among other things) into another object.\n",
"\n",
"The final piece tying things together is the [`trax.supervised.training.Loop`](https://trax-ml.readthedocs.io/en/latest/trax.supervised.html#trax.supervised.training.Loop) abstraction that is a very simple and flexible way to put everything together and train the model, all the while evaluating it and saving checkpoints.\n",
"Using `Loop` will save you a lot of code compared to always writing the training loop by hand, like you did in courses 1 and 2. More importantly, you are less likely to have a bug in that code that would ruin your training."
],
"id": "1FaugA_7grg6"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UGgKw03jjCXP"
},
"outputs": [],
"source": [
"# View documentation for trax.supervised.training.TrainTask\n",
"#help(trax.supervised.training.TrainTask)"
],
"id": "UGgKw03jjCXP"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Tr2MmdWDn6hV"
},
"outputs": [],
"source": [
"# View documentation for trax.supervised.training.EvalTask\n",
"#help(trax.supervised.training.EvalTask)"
],
"id": "Tr2MmdWDn6hV"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XkUVMzVXn_8f"
},
"outputs": [],
"source": [
"# View documentation for trax.supervised.training.Loop\n",
"#help(trax.supervised.training.Loop)"
],
"id": "XkUVMzVXn_8f"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ooekq1F305bt"
},
"outputs": [],
"source": [
"# View optimizers that you could choose from\n",
"#help(trax.optimizers)"
],
"id": "Ooekq1F305bt"
},
{
"cell_type": "markdown",
"metadata": {
"id": "OmR3BhV41Cxs"
},
"source": [
"Notice some available optimizers include:\n",
"```CPP\n",
" adafactor\n",
" adam\n",
" momentum\n",
" rms_prop\n",
" sm3\n",
"```"
],
"id": "OmR3BhV41Cxs"
},
{
"cell_type": "markdown",
"metadata": {
"id": "HA01H6K7grg_"
},
"source": [
"\n",
"### 4.1 Training the Model\n",
"\n",
"Now you are going to train your model. \n",
"\n",
"Let's define the `TrainTask`, `EvalTask` and `Loop` in preparation to train the model."
],
"id": "HA01H6K7grg_"
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"deletable": false,
"editable": false,
"id": "ogMtJgHSoiZj"
},
"outputs": [],
"source": [
"# PLEASE, DO NOT MODIFY OR DELETE THIS CELL\n",
"from trax.supervised import training\n",
"\n",
"def get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, vocab_dict, loop, batch_size = 16):\n",
" \n",
" rnd.seed(271)\n",
"\n",
" train_task = training.TrainTask(\n",
" labeled_data=train_generator(batch_size, train_pos\n",
" , train_neg, vocab_dict, loop\n",
" , shuffle = True),\n",
" loss_layer=tl.WeightedCategoryCrossEntropy(),\n",
" optimizer=trax.optimizers.Adam(0.01),\n",
" n_steps_per_checkpoint=10,\n",
" )\n",
"\n",
" eval_task = training.EvalTask(\n",
" labeled_data=val_generator(batch_size, val_pos\n",
" , val_neg, vocab_dict, loop\n",
" , shuffle = True), \n",
" metrics=[tl.WeightedCategoryCrossEntropy(), tl.WeightedCategoryAccuracy()],\n",
" )\n",
" \n",
" return train_task, eval_task\n",
" \n",
"\n",
"train_task, eval_task = get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, Vocab, True, batch_size = 16)\n",
"model = classifier()"
],
"id": "ogMtJgHSoiZj"
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"id": "y-SS2AwZsjfb",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "be9a5a8c-6c06-4f76-fc1f-c4a09214f2ab"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Serial[\n",
" Embedding_9088_256\n",
" Mean\n",
" Dense_2\n",
" LogSoftmax\n",
"]"
]
},
"metadata": {},
"execution_count": 54
}
],
"source": [
"model"
],
"id": "y-SS2AwZsjfb"
},
{
"cell_type": "markdown",
"metadata": {
"id": "R_sw8EGd0Sjk"
},
"source": [
"This defines a model trained using [`tl.WeightedCategoryCrossEntropy`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.metrics.WeightedCategoryCrossEntropy) optimized with the [`trax.optimizers.Adam`](https://trax-ml.readthedocs.io/en/latest/trax.optimizers.html#trax.optimizers.adam.Adam) optimizer, all the while tracking the accuracy using [`tl.WeightedCategoryAccuracy`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.metrics.WeightedCategoryAccuracy) metric. We also track `tl.WeightedCategoryCrossEntropy` on the validation set."
],
"id": "R_sw8EGd0Sjk"
},
{
"cell_type": "markdown",
"metadata": {
"id": "yB78IIUerIVG"
},
"source": [
"Now let's make an output directory and train the model."
],
"id": "yB78IIUerIVG"
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"id": "CNx4LnP9rMsO",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "662b003e-e6e2-46aa-eedd-2446b262e46b"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"./model/\n"
]
}
],
"source": [
"dir_path = './model/'\n",
"\n",
"try:\n",
" shutil.rmtree(dir_path)\n",
"except OSError as e:\n",
" pass\n",
"\n",
"\n",
"output_dir = './model/'\n",
"output_dir_expand = os.path.expanduser(output_dir)\n",
"print(output_dir_expand)"
],
"id": "CNx4LnP9rMsO"
},
{
"cell_type": "markdown",
"metadata": {
"id": "e4R4EHUcrwqe"
},
"source": [
"\n",
"### Exercise 6 - train_model\n",
"**Instructions:** Implement `train_model` to train the model (`classifier` that you wrote earlier) for the given number of training steps (`n_steps`) using `TrainTask`, `EvalTask` and `Loop`. For the `EvalTask`, take a look to the cell next to the function definition: the `eval_task` is passed as a list explicitly, so take that into account in the implementation of your `train_model` function."
],
"id": "e4R4EHUcrwqe"
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"id": "tolygrj7rpFX"
},
"outputs": [],
"source": [
"# UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)\n",
"# GRADED FUNCTION: train_model\n",
"def train_model(classifier, train_task, eval_task, n_steps, output_dir):\n",
" '''\n",
" Input: \n",
" classifier - the model you are building\n",
" train_task - Training task\n",
" eval_task - Evaluation task. Received as a list.\n",
" n_steps - the evaluation steps\n",
" output_dir - folder to save your files\n",
" Output:\n",
" trainer - trax trainer\n",
" '''\n",
" rnd.seed(31) # Do NOT modify this random seed. This makes the notebook easier to replicate\n",
" \n",
" ### START CODE HERE (Replace instances of 'None' with your code) ### \n",
" training_loop = training.Loop( \n",
" classifier, # The learning model\n",
" train_task, # The training task\n",
" eval_tasks=eval_task, # The evaluation task\n",
" output_dir=output_dir, # The output directory\n",
" random_seed=31 # Do not modify this random seed in order to ensure reproducibility and for grading purposes.\n",
" ) \n",
"\n",
" training_loop.run(n_steps = n_steps)\n",
" ### END CODE HERE ###\n",
" \n",
" # Return the training_loop, since it has the model.\n",
" return training_loop"
],
"id": "tolygrj7rpFX"
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"deletable": false,
"editable": false,
"id": "d-AtiqAYs_rH",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "4f34f727-abd5-4280-a51a-d383be3122dd"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.8/dist-packages/jax/_src/lib/xla_bridge.py:553: UserWarning: jax.host_count has been renamed to jax.process_count. This alias will eventually be removed; please update your code.\n",
" warnings.warn(\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"Step 1: Total number of trainable weights: 2327042\n",
"Step 1: Ran 1 train steps in 2.31 secs\n",
"Step 1: train WeightedCategoryCrossEntropy | 0.68989831\n",
"Step 1: eval WeightedCategoryCrossEntropy | 0.69806957\n",
"Step 1: eval WeightedCategoryAccuracy | 0.43750000\n",
"\n",
"Step 10: Ran 9 train steps in 13.59 secs\n",
"Step 10: train WeightedCategoryCrossEntropy | 0.64247584\n",
"Step 10: eval WeightedCategoryCrossEntropy | 0.53418100\n",
"Step 10: eval WeightedCategoryAccuracy | 0.93750000\n",
"\n",
"Step 20: Ran 10 train steps in 7.59 secs\n",
"Step 20: train WeightedCategoryCrossEntropy | 0.45600957\n",
"Step 20: eval WeightedCategoryCrossEntropy | 0.33223987\n",
"Step 20: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 30: Ran 10 train steps in 2.68 secs\n",
"Step 30: train WeightedCategoryCrossEntropy | 0.24014239\n",
"Step 30: eval WeightedCategoryCrossEntropy | 0.15884452\n",
"Step 30: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 40: Ran 10 train steps in 1.97 secs\n",
"Step 40: train WeightedCategoryCrossEntropy | 0.13276727\n",
"Step 40: eval WeightedCategoryCrossEntropy | 0.06164266\n",
"Step 40: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 50: Ran 10 train steps in 3.56 secs\n",
"Step 50: train WeightedCategoryCrossEntropy | 0.08444289\n",
"Step 50: eval WeightedCategoryCrossEntropy | 0.06003656\n",
"Step 50: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 60: Ran 10 train steps in 3.19 secs\n",
"Step 60: train WeightedCategoryCrossEntropy | 0.04531727\n",
"Step 60: eval WeightedCategoryCrossEntropy | 0.02509754\n",
"Step 60: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 70: Ran 10 train steps in 1.13 secs\n",
"Step 70: train WeightedCategoryCrossEntropy | 0.03989114\n",
"Step 70: eval WeightedCategoryCrossEntropy | 0.00249659\n",
"Step 70: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 80: Ran 10 train steps in 1.20 secs\n",
"Step 80: train WeightedCategoryCrossEntropy | 0.01885000\n",
"Step 80: eval WeightedCategoryCrossEntropy | 0.00504305\n",
"Step 80: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 90: Ran 10 train steps in 1.19 secs\n",
"Step 90: train WeightedCategoryCrossEntropy | 0.04065781\n",
"Step 90: eval WeightedCategoryCrossEntropy | 0.00822989\n",
"Step 90: eval WeightedCategoryAccuracy | 1.00000000\n",
"\n",
"Step 100: Ran 10 train steps in 1.71 secs\n",
"Step 100: train WeightedCategoryCrossEntropy | 0.01506269\n",
"Step 100: eval WeightedCategoryCrossEntropy | 0.09649467\n",
"Step 100: eval WeightedCategoryAccuracy | 0.93750000\n"
]
}
],
"source": [
"# Do not modify this cell.\n",
"# Take a look on how the eval_task is inside square brackets and \n",
"# take that into account for you train_model implementation\n",
"training_loop = train_model(model, train_task, [eval_task], 100, output_dir_expand)"
],
"id": "d-AtiqAYs_rH"
},
{
"cell_type": "markdown",
"metadata": {
"id": "KVMcsw2kjCX9"
},
"source": [
"\n",
"### 4.2 - Practice Making a Prediction\n",
"\n",
"Now that you have trained a model, you can access it as `training_loop.model` object. We will actually use `training_loop.eval_model` and in the next weeks you will learn why we sometimes use a different model for evaluation, e.g., one without dropout. For now, make predictions with your model.\n",
"\n",
"Use the training data just to see how the prediction process works. \n",
"- Later, you will use validation data to evaluate your model's performance.\n"
],
"id": "KVMcsw2kjCX9"
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"id": "WAMgXWY4jCX-",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a148696d-bc27-4403-d6f3-3d497bea8b49"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The batch is a tuple of length 3 because position 0 contains the tweets, and position 1 contains the targets.\n",
"The shape of the tweet tensors is (16, 15) (num of examples, length of tweet tensors)\n",
"The shape of the labels is (16,), which is the batch size.\n",
"The shape of the example_weights is (16,), which is the same as inputs/targets size.\n"
]
}
],
"source": [
"# Create a generator object\n",
"tmp_train_generator = train_generator(16, train_pos\n",
" , train_neg, Vocab, loop=True\n",
" , shuffle = False)\n",
"\n",
"\n",
"\n",
"# get one batch\n",
"tmp_batch = next(tmp_train_generator)\n",
"\n",
"# Position 0 has the model inputs (tweets as tensors)\n",
"# position 1 has the targets (the actual labels)\n",
"tmp_inputs, tmp_targets, tmp_example_weights = tmp_batch\n",
"\n",
"print(f\"The batch is a tuple of length {len(tmp_batch)} because position 0 contains the tweets, and position 1 contains the targets.\") \n",
"print(f\"The shape of the tweet tensors is {tmp_inputs.shape} (num of examples, length of tweet tensors)\")\n",
"print(f\"The shape of the labels is {tmp_targets.shape}, which is the batch size.\")\n",
"print(f\"The shape of the example_weights is {tmp_example_weights.shape}, which is the same as inputs/targets size.\")"
],
"id": "WAMgXWY4jCX-"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5XoxD6u5jCX_"
},
"outputs": [],
"source": [
"# feed the tweet tensors into the model to get a prediction\n",
"tmp_pred = training_loop.eval_model(tmp_inputs)\n",
"print(f\"The prediction shape is {tmp_pred.shape}, num of tensor_tweets as rows\")\n",
"print(\"Column 0 is the probability of a negative sentiment (class 0)\")\n",
"print(\"Column 1 is the probability of a positive sentiment (class 1)\")\n",
"print()\n",
"print(\"View the prediction array\")\n",
"tmp_pred"
],
"id": "5XoxD6u5jCX_"
},
{
"cell_type": "markdown",
"metadata": {
"id": "0aJpFcyljCYB"
},
"source": [
"To turn these probabilities into categories (negative or positive sentiment prediction), for each row:\n",
"- Compare the probabilities in each column.\n",
"- If column 1 has a value greater than column 0, classify that as a positive tweet.\n",
"- Otherwise if column 1 is less than or equal to column 0, classify that example as a negative tweet."
],
"id": "0aJpFcyljCYB"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6wJHv0TNjCYC"
},
"outputs": [],
"source": [
"# turn probabilites into category predictions\n",
"tmp_is_positive = tmp_pred[:,1] > tmp_pred[:,0]\n",
"for i, p in enumerate(tmp_is_positive):\n",
" print(f\"Neg log prob {tmp_pred[i,0]:.4f}\\tPos log prob {tmp_pred[i,1]:.4f}\\t is positive? {p}\\t actual {tmp_targets[i]}\")"
],
"id": "6wJHv0TNjCYC"
},
{
"cell_type": "markdown",
"metadata": {
"id": "TywSi02cjCYF"
},
"source": [
"Notice that since you are making a prediction using a training batch, it's more likely that the model's predictions match the actual targets (labels). \n",
"- Every prediction that the tweet is positive is also matching the actual target of 1 (positive sentiment).\n",
"- Similarly, all predictions that the sentiment is not positive matches the actual target of 0 (negative sentiment)"
],
"id": "TywSi02cjCYF"
},
{
"cell_type": "markdown",
"metadata": {
"id": "N6X_0K_EjCYF"
},
"source": [
"One more useful thing to know is how to compare if the prediction is matching the actual target (label). \n",
"- The result of calculation `is_positive` is a boolean.\n",
"- The target is a type trax.fastmath.numpy.int32\n",
"- If you expect to be doing division, you may prefer to work with decimal numbers with the data type type trax.fastmath.numpy.int32"
],
"id": "N6X_0K_EjCYF"
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"id": "CQgx_ar9jCYG",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 156
},
"outputId": "aa4c85bb-1e7e-4646-c8c2-68313231529b"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Array of booleans\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([ True, True, True, True, True, True, True, True,\n",
" False, False, False, False, False, False, False, False], dtype=bool)"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Array of integers\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Array of floats\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"DeviceArray([1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0.], dtype=float32)"
]
},
"metadata": {}
}
],
"source": [
"# View the array of booleans\n",
"print(\"Array of booleans\")\n",
"display(tmp_is_positive)\n",
"\n",
"# convert boolean to type int32\n",
"# True is converted to 1\n",
"# False is converted to 0\n",
"tmp_is_positive_int = tmp_is_positive.astype(np.int32)\n",
"\n",
"\n",
"# View the array of integers\n",
"print(\"Array of integers\")\n",
"display(tmp_is_positive_int)\n",
"\n",
"# convert boolean to type float32\n",
"tmp_is_positive_float = tmp_is_positive.astype(np.float32)\n",
"\n",
"# View the array of floats\n",
"print(\"Array of floats\")\n",
"display(tmp_is_positive_float)"
],
"id": "CQgx_ar9jCYG"
},
{
"cell_type": "markdown",
"metadata": {
"id": "8gJ3n4UljCYJ"
},
"source": [
"Note that Python usually does type conversion for you when you compare a boolean to an integer\n",
"- True compared to 1 is True, otherwise any other integer is False.\n",
"- False compared to 0 is True, otherwise any ohter integer is False."
],
"id": "8gJ3n4UljCYJ"
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "GbKFCf0njCYJ",
"outputId": "d48e9e75-5116-4a06-ec0b-c430cd16bf51"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"True == 1: True\n",
"True == 2: False\n",
"False == 0: True\n",
"False == 2: False\n"
]
}
],
"source": [
"print(f\"True == 1: {True == 1}\")\n",
"print(f\"True == 2: {True == 2}\")\n",
"print(f\"False == 0: {False == 0}\")\n",
"print(f\"False == 2: {False == 2}\")"
],
"id": "GbKFCf0njCYJ"
},
{
"cell_type": "markdown",
"source": [
"#Ignore"
],
"metadata": {
"id": "je8824VTty4f"
},
"id": "je8824VTty4f"
},
{
"cell_type": "markdown",
"metadata": {
"id": "fRRrgOHJgrhI"
},
"source": [
"\n",
"## 5 - Evaluation \n",
"\n",
"\n",
"### 5.1 - Computing the Accuracy on a Batch\n",
"\n",
"You will now write a function that evaluates your model on the validation set and returns the accuracy. \n",
"- `preds` contains the predictions.\n",
" - Its dimensions are `(batch_size, output_dim)`. `output_dim` is two in this case. Column 0 contains the probability that the tweet belongs to class 0 (negative sentiment). Column 1 contains probability that it belongs to class 1 (positive sentiment).\n",
" - If the probability in column 1 is greater than the probability in column 0, then interpret this as the model's prediction that the example has label 1 (positive sentiment). \n",
" - Otherwise, if the probabilities are equal or the probability in column 0 is higher, the model's prediction is 0 (negative sentiment).\n",
"- `y` contains the actual labels.\n",
"- `y_weights` contains the weights to give to predictions."
],
"id": "fRRrgOHJgrhI"
},
{
"cell_type": "markdown",
"metadata": {
"id": "2hdfk3LEjCYL"
},
"source": [
"\n",
"### Exercise 7 - compute_accuracy\n",
"Implement `compute_accuracy`."
],
"id": "2hdfk3LEjCYL"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WBqaN5f9grhJ"
},
"outputs": [],
"source": [
"# UNQ_C7 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)\n",
"# GRADED FUNCTION: compute_accuracy\n",
"def compute_accuracy(preds, y, y_weights):\n",
" \"\"\"\n",
" Input: \n",
" preds: a tensor of shape (dim_batch, output_dim) \n",
" y: a tensor of shape (dim_batch,) with the true labels\n",
" y_weights: a n.ndarray with the a weight for each example\n",
" Output: \n",
" accuracy: a float between 0-1 \n",
" weighted_num_correct (np.float32): Sum of the weighted correct predictions\n",
" sum_weights (np.float32): Sum of the weights\n",
" \"\"\"\n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
" # Create an array of booleans, \n",
" # True if the probability of positive sentiment is greater than\n",
" # the probability of negative sentiment\n",
" # else False\n",
" is_pos = None\n",
"\n",
" # convert the array of booleans into an array of np.int32\n",
" is_pos_int = None\n",
" \n",
" # compare the array of predictions (as int32) with the target (labels) of type int32\n",
" correct = None\n",
"\n",
" # Count the sum of the weights.\n",
" sum_weights = None\n",
" \n",
" # convert the array of correct predictions (boolean) into an arrayof np.float32\n",
" correct_float = None\n",
" \n",
" # Multiply each prediction with its corresponding weight.\n",
" weighted_correct_float = None\n",
"\n",
" # Sum up the weighted correct predictions (of type np.float32), to go in the\n",
" # denominator.\n",
" weighted_num_correct = None\n",
"\n",
" # Divide the number of weighted correct predictions by the sum of the\n",
" # weights.\n",
" accuracy = None\n",
"\n",
" ### END CODE HERE ###\n",
" return accuracy, weighted_num_correct, sum_weights"
],
"id": "WBqaN5f9grhJ"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1c7ZOeO0jCYN"
},
"outputs": [],
"source": [
"# test your function\n",
"tmp_val_generator = val_generator(64, val_pos\n",
" , val_neg, Vocab, loop=True\n",
" , shuffle = False)\n",
"\n",
"# get one batch\n",
"tmp_batch = next(tmp_val_generator)\n",
"\n",
"# Position 0 has the model inputs (tweets as tensors)\n",
"# position 1 has the targets (the actual labels)\n",
"tmp_inputs, tmp_targets, tmp_example_weights = tmp_batch\n",
"\n",
"# feed the tweet tensors into the model to get a prediction\n",
"tmp_pred = training_loop.eval_model(tmp_inputs)\n",
"tmp_acc, tmp_num_correct, tmp_num_predictions = compute_accuracy(preds=tmp_pred, y=tmp_targets, y_weights=tmp_example_weights)\n",
"\n",
"print(f\"Model's prediction accuracy on a single training batch is: {100 * tmp_acc}%\")\n",
"print(f\"Weighted number of correct predictions {tmp_num_correct}; weighted number of total observations predicted {tmp_num_predictions}\")"
],
"id": "1c7ZOeO0jCYN"
},
{
"cell_type": "markdown",
"metadata": {
"id": "dqle69F1grhM"
},
"source": [
"\n",
"### 5.2 - Testing your Model on Validation Data\n",
"\n",
"Now you will write test your model's prediction accuracy on validation data. \n",
"\n",
"This program will take in a data generator and your model. \n",
"- The generator allows you to get batches of data. You can use it with a `for` loop:\n",
"\n",
"```\n",
"for batch in iterator: \n",
" # do something with that batch\n",
"```\n",
"\n",
"`batch` has `3` elements:\n",
"- the first element contains the inputs\n",
"- the second element contains the targets\n",
"- the third element contains the weights"
],
"id": "dqle69F1grhM"
},
{
"cell_type": "markdown",
"metadata": {
"id": "1zwYl_f9jCYP"
},
"source": [
"\n",
"### Exercise 8 - test_model\n",
"\n",
"**Instructions:** \n",
"- Compute the accuracy over all the batches in the validation iterator. \n",
"- Make use of `compute_accuracy`, which you recently implemented, and return the overall accuracy."
],
"id": "1zwYl_f9jCYP"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HKoTad4ggrhN"
},
"outputs": [],
"source": [
"# UNQ_C8 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)\n",
"# GRADED FUNCTION: test_model\n",
"def test_model(generator, model, compute_accuracy=compute_accuracy):\n",
" '''\n",
" Input: \n",
" generator: an iterator instance that provides batches of inputs and targets\n",
" model: a model instance \n",
" Output: \n",
" accuracy: float corresponding to the accuracy\n",
" '''\n",
" \n",
" accuracy = 0.\n",
" total_num_correct = 0\n",
" total_num_pred = 0\n",
" \n",
" ### START CODE HERE (Replace instances of 'None' with your code) ###\n",
" for batch in generator: \n",
" \n",
" # Retrieve the inputs from the batch\n",
" inputs = None\n",
" \n",
" # Retrieve the targets (actual labels) from the batch\n",
" targets = None\n",
" \n",
" # Retrieve the example weight.\n",
" example_weight = None\n",
"\n",
" # Make predictions using the inputs \n",
" pred = None\n",
" \n",
" # Calculate accuracy for the batch by comparing its predictions and targets\n",
" batch_accuracy, batch_num_correct, batch_num_pred = None\n",
" \n",
" # Update the total number of correct predictions\n",
" # by adding the number of correct predictions from this batch\n",
" total_num_correct += None\n",
" \n",
" # Update the total number of predictions \n",
" # by adding the number of predictions made for the batch\n",
" total_num_pred += None\n",
"\n",
" # Calculate accuracy over all examples\n",
" accuracy = None\n",
" \n",
" ### END CODE HERE ###\n",
" return accuracy"
],
"id": "HKoTad4ggrhN"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false,
"id": "1Rm_k21XgrhQ"
},
"outputs": [],
"source": [
"# DO NOT EDIT THIS CELL\n",
"# testing the accuracy of your model: this takes around 20 seconds\n",
"model = training_loop.eval_model\n",
"accuracy = test_model(test_generator(16, val_pos\n",
" , val_neg, Vocab, loop=False\n",
" , shuffle = False), model)\n",
"\n",
"print(f'The accuracy of your model on the validation set is {accuracy:.4f}', )"
],
"id": "1Rm_k21XgrhQ"
},
{
"cell_type": "markdown",
"source": [
"#Good Ahead"
],
"metadata": {
"id": "iaTwG4vYt89x"
},
"id": "iaTwG4vYt89x"
},
{
"cell_type": "markdown",
"metadata": {
"id": "Mct4P9QZgrhT"
},
"source": [
"\n",
"## 6 - Testing with your Own Input\n",
"\n",
"Finally you will test with your own input. You will see that deepnets are more powerful than the older methods you have used before. Although you go close to 100% accuracy on the first two assignments, the task was way easier. "
],
"id": "Mct4P9QZgrhT"
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"id": "SUq5cw-xgrhU"
},
"outputs": [],
"source": [
"# this is used to predict on your own sentnece\n",
"def predict(sentence):\n",
" inputs = np.array(tweet_to_tensor(sentence, vocab_dict=Vocab))\n",
" \n",
" # Batch size 1, add dimension for batch, to work with the model\n",
" inputs = inputs[None, :] \n",
" \n",
" # predict with the model\n",
" preds_probs = model(inputs)\n",
" \n",
" # Turn probabilities into categories\n",
" preds = int(preds_probs[0, 1] > preds_probs[0, 0])\n",
" \n",
" sentiment = \"negative\"\n",
" if preds == 1:\n",
" sentiment = 'positive'\n",
"\n",
" return preds, sentiment\n"
],
"id": "SUq5cw-xgrhU"
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3RJntC57grhX",
"outputId": "29d71b93-ee76-4779-8afa-4d88f631f619"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The sentiment of the sentence \n",
"***\n",
"\"It's such a nice day, I think I'll be taking Sid to Ramsgate for lunch and then to the beach maybe.\"\n",
"***\n",
"is positive.\n",
"\n",
"The sentiment of the sentence \n",
"***\n",
"\"This movie was almost good.\"\n",
"***\n",
"is negative.\n"
]
}
],
"source": [
"# try a positive sentence\n",
"sentence = \"It's such a nice day, I think I'll be taking Sid to Ramsgate for lunch and then to the beach maybe.\"\n",
"tmp_pred, tmp_sentiment = predict(sentence)\n",
"print(f\"The sentiment of the sentence \\n***\\n\\\"{sentence}\\\"\\n***\\nis {tmp_sentiment}.\")\n",
"\n",
"print()\n",
"# try a negative sentence\n",
"sentence = \"This movie was almost good.\"\n",
"tmp_pred, tmp_sentiment = predict(sentence)\n",
"print(f\"The sentiment of the sentence \\n***\\n\\\"{sentence}\\\"\\n***\\nis {tmp_sentiment}.\")"
],
"id": "3RJntC57grhX"
},
{
"cell_type": "markdown",
"metadata": {
"id": "nZmGCheXjCYX"
},
"source": [
"Notice that the model works well even for complex sentences."
],
"id": "nZmGCheXjCYX"
},
{
"cell_type": "markdown",
"metadata": {
"id": "a1q9H_6nsjfq"
},
"source": [
"\n",
"## 7 - Word Embeddings"
],
"id": "a1q9H_6nsjfq"
},
{
"cell_type": "markdown",
"metadata": {
"id": "oNSCA-Hesjfr"
},
"source": [
"In this section, you will visualize the word embeddings that were constructed for this sentiment analysis task. You can retrieve them by looking at the `model.weights` tuple (recall that the first layer of the model is the embedding layer)."
],
"id": "oNSCA-Hesjfr"
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"id": "MVRKzUdlsjfr"
},
"outputs": [],
"source": [
"embeddings = model.weights[0]"
],
"id": "MVRKzUdlsjfr"
},
{
"cell_type": "markdown",
"metadata": {
"id": "DgTV9gPJsjfr"
},
"source": [
"Let's take a look at the size of the embeddings. "
],
"id": "DgTV9gPJsjfr"
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"id": "TSiauBfXsjfr",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "10c18bab-07c3-4355-c030-189e06d2a1c1"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(9088, 256)"
]
},
"metadata": {},
"execution_count": 70
}
],
"source": [
"embeddings.shape"
],
"id": "TSiauBfXsjfr"
},
{
"cell_type": "markdown",
"metadata": {
"id": "lxzJVdaasjfs"
},
"source": [
"To visualize the word embeddings, it is necessary to choose 2 directions to use as axes for the plot. You could use random directions or the first two eigenvectors from PCA. Here, you'll use scikit-learn to perform dimensionality reduction of the word embeddings using PCA. "
],
"id": "lxzJVdaasjfs"
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"id": "X0jusVc0sjfs"
},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA #Import PCA from scikit-learn\n",
"pca = PCA(n_components=2) #PCA with two dimensions\n",
"\n",
"emb_2dim = pca.fit_transform(embeddings) #Dimensionality reduction of the word embeddings"
],
"id": "X0jusVc0sjfs"
},
{
"cell_type": "markdown",
"metadata": {
"id": "3K8MVqz_sjfs"
},
"source": [
"Now, everything is ready to plot a selection of words in 2d. "
],
"id": "3K8MVqz_sjfs"
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"id": "8Vz7DxRKsjfs",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 281
},
"outputId": "c9259a2e-4dd9-419b-cddc-40a8bb2dfbb8"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"