Skip to content

Commit

Permalink
Merge pull request activeloopai#894 from activeloopai/Fix_notebooks
Browse files Browse the repository at this point in the history
Fix notebook & readme errors
  • Loading branch information
imshashank authored Jun 1, 2021
2 parents 9471e3b + fec14d9 commit bd6d192
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 247 deletions.
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,17 @@ To load a public dataset, one needs to write dozens of lines of code and spend h
pip3 install hub
```

To be able to download datasets stores on hub platform, you would need to login:
You can register a free account at [Activeloop](https://app.activeloop.ai/register/?utm_source=github&utm_medium=repo&utm_campaign=readme) and authenticate locally:

```sh
hub register
hub login

# alternatively, add username and password as arguments (use on platforms like Kaggle)
hub login -u username -p password
```

Access public datasets in Hub by following a straight-forward convention which merely requires a few lines of simple code. Run this excerpt to get the first thousand images in the [MNIST database](https://app.activeloop.ai/dataset/activeloop/mnist/?utm_source=github&utm_medium=repo&utm_campaign=readme) in the numpy array format:

```python
Expand Down
277 changes: 30 additions & 247 deletions examples/notebooks/Getting_Started_with_Text_on_Hub.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,40 +31,18 @@
"id": "xKDwaemyJF-R",
"outputId": "3b8db4cf-40a3-4190-d97b-c181a6d858d5"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[K |████████████████████████████████| 122kB 18.4MB/s \n",
"\u001b[K |████████████████████████████████| 1.8MB 37.4MB/s \n",
"\u001b[K |████████████████████████████████| 296kB 48.8MB/s \n",
"\u001b[K |████████████████████████████████| 337kB 55.1MB/s \n",
"\u001b[K |████████████████████████████████| 2.2MB 46.8MB/s \n",
"\u001b[K |████████████████████████████████| 133kB 59.6MB/s \n",
"\u001b[K |████████████████████████████████| 71kB 10.3MB/s \n",
"\u001b[K |████████████████████████████████| 102kB 15.3MB/s \n",
"\u001b[K |████████████████████████████████| 81kB 11.0MB/s \n",
"\u001b[K |████████████████████████████████| 133kB 56.7MB/s \n",
"\u001b[K |████████████████████████████████| 7.3MB 51.1MB/s \n",
"\u001b[K |████████████████████████████████| 92kB 13.0MB/s \n",
"\u001b[K |████████████████████████████████| 133kB 60.5MB/s \n",
"\u001b[K |████████████████████████████████| 3.2MB 45.3MB/s \n",
"\u001b[K |████████████████████████████████| 5.8MB 52.2MB/s \n",
"\u001b[K |████████████████████████████████| 71kB 10.6MB/s \n",
"\u001b[K |████████████████████████████████| 71kB 10.3MB/s \n",
"\u001b[K |████████████████████████████████| 51kB 8.8MB/s \n",
"\u001b[?25h Building wheel for outdated (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Building wheel for asciitree (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Building wheel for littleutils (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[31mERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.\u001b[0m\n",
"\u001b[31mERROR: botocore 1.20.17 has requirement urllib3<1.27,>=1.25.4, but you'll have urllib3 1.24.3 which is incompatible.\u001b[0m\n",
"\u001b[31mERROR: boto3 1.16.39 has requirement botocore<1.20.0,>=1.19.39, but you'll have botocore 1.20.17 which is incompatible.\u001b[0m\n"
]
}
],
"outputs": [],
"source": [
"!pip3 install hub pandas numpy tqdm sklearn"
]
},
{
"cell_type": "markdown",
"id": "nuclear-diving",
"metadata": {},
"source": [
"!pip install hub -q"
"#### To be able to create datasets or download from hub, please create an account by visiting this link: \n",
"#### https://app.activeloop.ai/"
]
},
{
Expand All @@ -76,7 +54,8 @@
},
"outputs": [],
"source": [
"!hub login"
"# Use the username & password used to register on hub here to login\n",
"!hub login -u <username> -p <password>"
]
},
{
Expand Down Expand Up @@ -148,23 +127,7 @@
"id": "offensive-diameter",
"outputId": "bdf04d56-c0b7-4a99-f6c0-dc502a4a046e"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as \"Teachers\". My 35 years in the teaching profession lead me to believe that Bromwell High\\'s satire is much closer to reality than is \"Teachers\". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\\'t!'"
]
},
"execution_count": 12,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"line"
]
Expand Down Expand Up @@ -236,117 +199,7 @@
"id": "statistical-negative",
"outputId": "22bdc236-3142-4428-c88a-1a16d541ee68"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Review</th>\n",
" <th>Label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>I have watched this episode more often than an...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>I really enjoyed \"Doctor Mordrid\". This is a l...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Hickory Dickory Dock was a good Poirot mystery...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Fragile Carne, just before his great period. A...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>So I don't ruin it for you, I'll be very brief...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12495</th>\n",
" <td>Film can be a looking glass to see the world i...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12496</th>\n",
" <td>A message movie, but a rather good one. Outsta...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12497</th>\n",
" <td>Kurosawa weaves a tale that has a cast of char...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12498</th>\n",
" <td>When you compare what Brian De Palma was doing...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12499</th>\n",
" <td>This series is set a year after the mission to...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>12500 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" Review Label\n",
"0 I have watched this episode more often than an... 1\n",
"1 I really enjoyed \"Doctor Mordrid\". This is a l... 1\n",
"2 Hickory Dickory Dock was a good Poirot mystery... 1\n",
"3 Fragile Carne, just before his great period. A... 1\n",
"4 So I don't ruin it for you, I'll be very brief... 1\n",
"... ... ...\n",
"12495 Film can be a looking glass to see the world i... 1\n",
"12496 A message movie, but a rather good one. Outsta... 1\n",
"12497 Kurosawa weaves a tale that has a cast of char... 1\n",
"12498 When you compare what Brian De Palma was doing... 1\n",
"12499 This series is set a year after the mission to... 1\n",
"\n",
"[12500 rows x 2 columns]"
]
},
"execution_count": 16,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"reviews_df"
]
Expand Down Expand Up @@ -437,16 +290,16 @@
"# url = Akash/FlipkartReviews\n",
"# Before you can upload datasets, please login into Hub. Run the first cell.\n",
"\n",
"url = \"dhiganthrao/IMDB-MovieReviews\"\n",
"url = \"<your username>/IMDB-MovieReviews\"\n",
"\n",
"# Uncomment the following lines if you\"re uploading *this* dataset for the first time.\n",
"# my_schema = {\"Review\": Text(shape=(None, ), max_shape=(max_length, )),\n",
"# \"Label\": ClassLabel(num_classes=2)}\n",
"my_schema = {\"Review\": Text(shape=(None, ), max_shape=(max_length, )),\n",
" \"Label\": ClassLabel(num_classes=2)}\n",
"\n",
"# ds = hub.Dataset(url, shape=(25000,), schema=my_schema)\n",
"# for i in tqdm(range(len(ds))):\n",
"# ds[\"Review\", i] = reviews_df[\"Review\"][i]\n",
"# ds[\"Label\", i] = reviews_df[\"Labels\"][i]"
"ds = hub.Dataset(url, shape=(25000,), schema=my_schema)\n",
"for i in tqdm(range(len(ds))):\n",
" ds[\"Review\", i] = reviews_df[\"Review\"][i]\n",
" ds[\"Label\", i] = reviews_df[\"Label\"][i]"
]
},
{
Expand Down Expand Up @@ -485,7 +338,7 @@
"# This command saves all changes to the cloud. You can also view this dataset at\n",
"# https://app.activeloop.ai\n",
"\n",
"# ds.flush()"
"ds.flush()"
]
},
{
Expand All @@ -509,18 +362,7 @@
"id": "floral-sociology",
"outputId": "91845f74-60d4-4791-f980-f88701a05be5"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'hub.api.dataset.Dataset'>\n",
"SchemaDict({'Review': Text(shape=(None,), dtype='int64', max_shape=(13704,)), 'Label': ClassLabel(shape=(), dtype='int64', num_classes=2)})\n",
"A simple and effective film about what life is all about, responding to challenges. It took a lot of gall for Homer and his friends to be able to grow into manhood without falling in the trap of a prefabricated future that runs from father to son, to be a miner in the local mine and never get out of that fate. It took also three different challenges for Homer and his friends to conquer a personal and free future. The challenge of the first ever man-made artificial satellite, Sputnik 1, a Soviet satellite, a milestone in human history, a turning point that Homer and his friends could not miss, did not want to miss. Then the challenge of science and applied mechanics to calculate and to devise a rocket from scratch or rather from what they could gather in books and order in their minds. Finally the challenge of a world that resists and refuses and tries to force you back into the pack, even with an untimely accident that forces you to get back into the pack for plain survival necessity, and even then Homer proved he had the guts to accept the challenge that was blocking for a while his own plans and dreams. But there is another side of the story that the film does not emphasize enough. Homer is the carrier of the project but he is also the carrier of the inspiration he and his friends need. If he is the one who is going to get the university scholarship, because his friends gave him precedence, his friends will also be able to get on their own roads and tracks and step out of the mining fate, thanks to the energy his inspiring example sets in front of their eyes. It is hard at times not to follow the example of the one who is like a beacon on a difficult road. But the film is also effective to show how the father resisted this dream because for him science was not the fabric of a true man, like mining or football. The working class fate that was so present in those 1950s and 1960s and still is present in some areas is too often enforced by the traditional thinking of the father. If the mother does not have the courage to speak up one day, the working class fate I am speaking of becomes a tremendous trap. Here too the film is effective and it should make some parents think. This might have been the fourth challenge Homer had to face: the challenge of taking a road that was not the one pointed at and programmed by his own father.<br /><br />Dr Jacques COULARDEAU, University Paris Dauphine & University Paris 1 Pantheon Sorbonne\n",
"1\n"
]
}
],
"outputs": [],
"source": [
"print(type(ds))\n",
"print(ds.schema)\n",
Expand Down Expand Up @@ -551,23 +393,7 @@
"id": "CeopDtlKf4x0",
"outputId": "942475d1-7385-4128-93a9-bd8edcdc4385"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'this is a test :) :('"
]
},
"execution_count": 10,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"import re\n",
"\n",
Expand All @@ -593,20 +419,7 @@
"id": "u0I0Y0Aegjk3",
"outputId": "ff00b66c-22e1-43d0-fa3a-5d0c5ceeeb29"
},
"outputs": [
{
"data": {
"text/plain": [
"['I', 'find', 'it', 'fun', 'to', 'use', 'Hub']"
]
},
"execution_count": 11,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"from nltk.stem.porter import PorterStemmer\n",
"\n",
Expand All @@ -631,20 +444,7 @@
"id": "ZcVnkyqLgqfz",
"outputId": "ef664771-9239-4730-beb8-959cc91d3eee"
},
"outputs": [
{
"data": {
"text/plain": [
"['hub', 'is', 'extrem', 'easi', 'and', 'effici', 'to', 'use']"
]
},
"execution_count": 12,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"def tokenizer_stemmer(text):\n",
" return [porter.stem(word) for word in text.split()]\n",
Expand Down Expand Up @@ -690,16 +490,7 @@
"id": "Qdme7UlWiKng",
"outputId": "83d64fa8-2533-4ead-a46c-2e308ead672f"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.\n",
"[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 51.1s finished\n"
]
}
],
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"from sklearn.linear_model import LogisticRegressionCV\n",
Expand All @@ -723,15 +514,7 @@
"id": "z_7C3lCNnQmc",
"outputId": "05b6d62c-b61f-43b7-b30e-bba1adf4df54"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 0.88648\n"
]
}
],
"outputs": [],
"source": [
"print(f\"Accuracy: {clf.score(X_test, y_test)}\")"
]
Expand Down Expand Up @@ -761,7 +544,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.8.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit bd6d192

Please sign in to comment.