Skip to content

Commit

Permalink
Adding two more languages to the growing list: kotlin and scala
Browse files Browse the repository at this point in the history
  • Loading branch information
neomatrix369 committed Jul 25, 2019
1 parent 57c03c6 commit 8c91e79
Show file tree
Hide file tree
Showing 18 changed files with 238 additions and 11 deletions.
2 changes: 1 addition & 1 deletion README-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
- [Deploying Bespoke AI using fnproj - KADlytics by Miminal](https://blogs.oracle.com/startup/deploying-bespoke-ai-using-fn-project-kadlytics-by-miminal) ([Tweet]( https://twitter.com/java/status/1034474482751221761))

#### Natural Language Processing (NLP)
- See [Natural Language Processing (NLP)](natural-language-processing/README.md#Java)
- See [Natural Language Processing (NLP)](natural-language-processing/README.md#Java-jvm)

#### Neural Networks

Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Awesome AI-ML-DL [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

Better NLP: [![Better NLP](https://img.shields.io/docker/pulls/neomatrix369/better-nlp.svg)](https://hub.docker.com/r/neomatrix369/better-nlp) | NLP Java: [![NLP Java](https://img.shields.io/docker/pulls/neomatrix369/nlp-java.svg)](https://hub.docker.com/r/neomatrix369/nlp-java) | NLP Clojure: [![NLP Clojure](https://img.shields.io/docker/pulls/neomatrix369/nlp-clojure.svg)](https://hub.docker.com/r/neomatrix369/nlp-clojure)
Better NLP: [![Better NLP](https://img.shields.io/docker/pulls/neomatrix369/better-nlp.svg)](https://hub.docker.com/r/neomatrix369/better-nlp)

NLP Java: [![NLP Java](https://img.shields.io/docker/pulls/neomatrix369/nlp-java.svg)](https://hub.docker.com/r/neomatrix369/nlp-java) | NLP Clojure: [![NLP Clojure](https://img.shields.io/docker/pulls/neomatrix369/nlp-clojure.svg)](https://hub.docker.com/r/neomatrix369/nlp-clojure) | NLP Kotlin: [![NLP Kotlin](https://img.shields.io/docker/pulls/neomatrix369/nlp-kotlin.svg)](https://hub.docker.com/r/neomatrix369/nlp-kotlin) | NLP Scala: [![NLP Scala](https://img.shields.io/docker/pulls/neomatrix369/nlp-scala.svg)](https://hub.docker.com/r/neomatrix369/nlp-scala)

Dataiku DSS: [![Dataiku DSS](https://img.shields.io/docker/pulls/neomatrix369/dataiku-dss.svg)](https://hub.docker.com/r/neomatrix369/dataiku-dss) | Grakn: [![Grakn](https://img.shields.io/docker/pulls/neomatrix369/grakn.svg)](https://hub.docker.com/r/neomatrix369/grakn) | Jupyter-Java: [![Jupyter-Java](https://img.shields.io/docker/pulls/neomatrix369/jupyter-java.svg)](https://hub.docker.com/r/neomatrix369/jupyter-java) | Zeppelin: [![Zeppelin](https://img.shields.io/docker/pulls/neomatrix369/zeppelin.svg)](https://hub.docker.com/r/neomatrix369/zeppelin)

Expand Down
22 changes: 17 additions & 5 deletions examples/nlp-java-jvm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Run a docker container with NLP libraries/frameworks writting in Java/JVM languages, running under the traditional Java 8 (from OpenJDK or another source) or GraalVM.

Find out more about [Natural Language Processing](https://en.wikipedia.org/wiki/Natural_language_processing) from the [NLP section](../../natural-language-processing/README.md) section.
Find out more about [Natural Language Processing](https://en.wikipedia.org/wiki/Natural_language_processing) from the [NLP section](../../natural-language-processing/README.md#natural-language-processing-nlp) section.

Startup in traditional JDK or GraalVM mode.

Expand Down Expand Up @@ -33,14 +33,25 @@ Startup in traditional JDK or GraalVM mode.
- [Infections-clj](https://github.com/r0man/inflections-clj) - Rails-like inflection library for Clojure and ClojureScript
- [postagga](https://github.com/fekr/postagga) - A library to parse natural language in Clojure and ClojureScript

### Kotlin
- [Lingua](https://github.com/pemistahl/lingua/) - A language detection library for Kotlin and Java, suitable for long and short text alike
- [Kotidgy](https://github.com/meiblorn/kotidgy) — an index-based text data generator written in Kotlin

### Scala
- [Saul](https://github.com/CogComp/saul) - Library for developing NLP systems, including built in modules like SRL, POS, etc.
- [ATR4S](https://github.com/ispras/atr4s) - Toolkit with state-of-the-art automatic term recognition methods.
- [tm](https://github.com/ispras/tm) - Implementation of topic modeling based on regularized multilingual PLSA.
- [word2vec-scala](https://github.com/Refefer/word2vec-scala) - Scala interface to word2vec model; includes operations on vectors like word-distance and word-analogy.
- [Epic](https://github.com/dlwh/epic) - Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.

## Scripts provided

**Go to [the previous folder](../nlp-java-jvm) to find the below scripts.**

- [runInDocker.sh](./runInDocker.sh) - runs the container and brings you to the command prompt inside the container
- [Base Dockerfile](./images/base/Dockerfile) | [Java Dockerfile](./images/java/Dockerfile): Dockerfile scripts to help build the base and language (i.e. java, clojure) specific docker image of NLP Java/JVM in an isolated environment with the necessary dependencies.
- [images folder](./images) - provided with scripts to build and the scripts included into the container for the base image and language (i.e. java, clojure) specific docker image
- [buildDockerImage.sh](./buildDockerImage.sh): build the docker base and language (i.e. java, clojure) specific image takes under 5 minutes to finish on a decent connection
- [Base Dockerfile](./images/base/Dockerfile) | [Java Dockerfile](./images/java/Dockerfile): Dockerfile scripts to help build the base and language (i.e. java, clojure, kotlin, scala) specific docker image of NLP Java/JVM in an isolated environment with the necessary dependencies.
- [images folder](./images) - provided with scripts to build and the scripts included into the container for the base image and language (i.e. java, clojure, kotlin, scala) specific docker image
- [buildDockerImage.sh](./buildDockerImage.sh): build the docker base and language (i.e. java, clojure, kotlin, scala) specific image takes under 5 minutes to finish on a decent connection
- [push-nlp-java-docker-image-to-hub.sh](./push-nlp-java-docker-image-to-hub.sh) - push pre-built docker images to docker hub (please pass in your own Docker username and later on enter Docker login details, see usage below)
- [removeUnusedContainersAndImages.sh](./removeUnusedContainersAndImages.sh) - a housekeeping script to remove dangling images and terminated containers (helps save some diskspace)

Expand Down Expand Up @@ -80,7 +91,7 @@ $ DOCKER_USER_NAME="your_docker_username" ./buildDockerImage.sh
or
$ IMAGE_VERSION="x.y.z" ./buildDockerImage.sh [language_id]
```
`[language_id]` - defaults to `java` when not provided. Accepts: java, clojure
`[language_id]` - defaults to `java` when not provided. Accepts: `java`, `clojure`, `kotlin`, `scala`

**Push built NLP Java/JVM docker image to Docker hub:**

Expand All @@ -106,4 +117,5 @@ Please have a look at the [CONTRIBUTING](../../CONTRIBUTING.md) guidelines, also

---

Back to [NLP page](../../natural-language-processing/README.md#natural-language-processing-nlp) </br>
Back to [main page (table of contents)](../../README.md)
17 changes: 17 additions & 0 deletions examples/nlp-java-jvm/images/kotlin/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
ARG BASE_IMAGE
FROM ${BASE_IMAGE}

### Common functions
COPY common.sh common.sh

### Lingua installation
### https://github.com/pemistahl/lingua/
COPY lingua.sh lingua.sh

### Kotidgy installation
### https://github.com/meiblorn/kotidgy
COPY kotidgy.sh kotidgy.sh

RUN chown -R nlp-java:nlp-java .

USER nlp-java
44 changes: 44 additions & 0 deletions examples/nlp-java-jvm/images/kotlin/common.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/bin/bash

set -e
set -u
set -o pipefail

gitClone() {
REPO_URL=$1
REPO_FOLDER=$(echo "${REPO_URL}" | awk '{split($0,a,"/"); print a[5]}')
if [ -e "${REPO_FOLDER}" ]; then
echo "${REPO_FOLDER} already exists, aborting process, remove folder manually to perform a fresh download/update"
else
git clone --depth=1 ${REPO_URL}
fi
}

downloadArtifact() {
URL=$1
ARTIFACT=${2}
ARTIFACT_FOLDER=${3}

if [ -e "${ARTIFACT_FOLDER}" ]; then
echo "${ARTIFACT_FOLDER} already exists, aborting process, remove folder manually to perform a fresh download/update"
else
if [[ -e "${ARTIFACT}" ]]; then
echo "${ARTIFACT} already exists, skipping to next step..."
else
curl -O -L -J "${URL}"
fi

if [[ -z "$(echo ${ARTIFACT} | grep zip)" ]]; then
if [[ -z "$(echo ${ARTIFACT} | grep 'tar.gz|tgz')" ]]; then
tar -xvzf ${ARTIFACT}
else
echo 'File format unrecognised, aborting...'
exit -1
fi
else
unzip -u ${ARTIFACT}
fi

rm -f ${ARTIFACT}
fi
}
11 changes: 11 additions & 0 deletions examples/nlp-java-jvm/images/kotlin/kotidgy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

set -e
set -u
set -o pipefail

source common.sh

cd shared
gitClone https://github.com/meiblorn/kotidgy
cd ..
11 changes: 11 additions & 0 deletions examples/nlp-java-jvm/images/kotlin/lingua.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

set -e
set -u
set -o pipefail

source common.sh

cd shared
gitClone https://github.com/pemistahl/lingua
cd ..
1 change: 1 addition & 0 deletions examples/nlp-java-jvm/images/kotlin/version.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.1
29 changes: 29 additions & 0 deletions examples/nlp-java-jvm/images/scala/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
ARG BASE_IMAGE
FROM ${BASE_IMAGE}

### Common functions
COPY common.sh common.sh

### Saul installation
### https://github.com/CogComp/saul
COPY saul.sh saul.sh

### ATR4S installation
### https://github.com/ispras/atr4s
COPY atr4s.sh atr4s.sh

### tm installation
### https://github.com/ispras/tm
COPY tm.sh tm.sh

### word2vec-scala installation
### https://github.com/Refefer/word2vec-scala
COPY word2vec-scala.sh word2vec-scala.sh

### epic installation
### https://github.com/dlwh/epic
COPY epic.sh epic.sh

RUN chown -R nlp-java:nlp-java .

USER nlp-java
11 changes: 11 additions & 0 deletions examples/nlp-java-jvm/images/scala/atr4s.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

set -e
set -u
set -o pipefail

source common.sh

cd shared
gitClone https://github.com/r0man/inflections-clj
cd ..
44 changes: 44 additions & 0 deletions examples/nlp-java-jvm/images/scala/common.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/bin/bash

set -e
set -u
set -o pipefail

gitClone() {
REPO_URL=$1
REPO_FOLDER=$(echo "${REPO_URL}" | awk '{split($0,a,"/"); print a[5]}')
if [ -e "${REPO_FOLDER}" ]; then
echo "${REPO_FOLDER} already exists, aborting process, remove folder manually to perform a fresh download/update"
else
git clone --depth=1 ${REPO_URL}
fi
}

downloadArtifact() {
URL=$1
ARTIFACT=${2}
ARTIFACT_FOLDER=${3}

if [ -e "${ARTIFACT_FOLDER}" ]; then
echo "${ARTIFACT_FOLDER} already exists, aborting process, remove folder manually to perform a fresh download/update"
else
if [[ -e "${ARTIFACT}" ]]; then
echo "${ARTIFACT} already exists, skipping to next step..."
else
curl -O -L -J "${URL}"
fi

if [[ -z "$(echo ${ARTIFACT} | grep zip)" ]]; then
if [[ -z "$(echo ${ARTIFACT} | grep 'tar.gz|tgz')" ]]; then
tar -xvzf ${ARTIFACT}
else
echo 'File format unrecognised, aborting...'
exit -1
fi
else
unzip -u ${ARTIFACT}
fi

rm -f ${ARTIFACT}
fi
}
11 changes: 11 additions & 0 deletions examples/nlp-java-jvm/images/scala/epic.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

set -e
set -u
set -o pipefail

source common.sh

cd shared
gitClone https://github.com/dakrone/clojure-opennlp
cd ..
11 changes: 11 additions & 0 deletions examples/nlp-java-jvm/images/scala/saul.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

set -e
set -u
set -o pipefail

source common.sh

cd shared
gitClone https://github.com/CogComp/saul
cd ..
11 changes: 11 additions & 0 deletions examples/nlp-java-jvm/images/scala/tm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

set -e
set -u
set -o pipefail

source common.sh

cd shared
gitClone https://github.com/fekr/postagga
cd ..
1 change: 1 addition & 0 deletions examples/nlp-java-jvm/images/scala/version.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.1
11 changes: 11 additions & 0 deletions examples/nlp-java-jvm/images/scala/word2vec-scala.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

set -e
set -u
set -o pipefail

source common.sh

cd shared
gitClone https://github.com/dakrone/clojure-opennlp
cd ..
2 changes: 1 addition & 1 deletion examples/nlp-java-jvm/runInDocker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ GRAALVM_VERSION=${GRAALVM_VERSION:-19.1.1}

DOCKER_USER_NAME=${DOCKER_USER_NAME:-"neomatrix369"}

IMAGE_NAME=${IMAGE_NAME:-nlp-java}
IMAGE_NAME=${IMAGE_NAME:-nlp-${language_id}}
IMAGE_VERSION=${IMAGE_VERSION:-$(cat images/${language_id}/version.txt)}
DOCKER_FULL_TAG_NAME="${DOCKER_USER_NAME}/${IMAGE_NAME}"

Expand Down
6 changes: 3 additions & 3 deletions natural-language-processing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- [LaMachine](https://proycon.github.io/LaMachine/) - LaMachine is a unified software distribution for Natural Language Processing. Integration of numerous open-source NLP tools, programming libraries, web-services and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines (bare-metal, Virtual Machines and Docker containers).
- [Awesome NLP](https://github.com/keon/awesome-nlp)

## Java
## Java/JVM
- [An introduction to natural language processing and a demo using opensource libraries](https://www.ibm.com/developerworks/library/cc-cognitive-natural-language-processing/index.html?social_post=963789367&fst=Discover) ([Tweet](https://twitter.com/java/status/883174486459248646))
- [Implementing NLP Attention Mechanisms with DeepLearning4J](https://www.meetup.com/AI-for-Enterprise-Virtual-User-Group/events/255622367/) ([Tweet](https://twitter.com/java/status/1058405126988161024))
- [How Stanford CoreNLP, a popular Java natural language tool can help you perform Natural Language Processing tasks](https://stanfordnlp.github.io/CoreNLP/) ([Tweet](https://twitter.com/java/status/945689918289924096))
Expand All @@ -22,7 +22,7 @@
- [Clojure-based libraries](https://github.com/keon/awesome-nlp#user-content-clojure)
- [Kotlin-based libraries](https://github.com/keon/awesome-nlp#user-content-kotlin)
- [Scala-based libraries](https://github.com/keon/awesome-nlp#user-content-scala)
- [NLP Java/JVM](../examples/nlp-java-jvm/README.md) - docker container with Java/JVM based NLP libraries/frameworks (inspired by LaMachine, Awesome NLP and others out there)
- [NLP Java/JVM](../examples/nlp-java-jvm/README.md#nlp-javajvm) - docker container with Java/JVM based NLP libraries/frameworks (inspired by LaMachine, Awesome NLP and others out there)

## Courses, Tutorial, Learning resource
- [Introductory: NLP for hackers](https://nlpforhackers.io/deep-learning-introduction/)
Expand All @@ -39,7 +39,7 @@
## Library, Framework, Models, Tools, Services

- [BloomsburyAI's Open Source NLP tool: Cape Webservices - backend server](https://github.com/bloomsburyai/cape-webservices) | [Rest of BloomsburyAI's Open Source NLP tool - Cape](https://www.github.com/bloomsburyai) [Bought out by FB around March/April 2019]
- [NLP Java/JVM](../examples/nlp-java-jvm/README.md) - docker container with Java/JVM based NLP libraries/frameworks (inspired by LaMachine, Awesome NLP and others out there)
- [NLP Java/JVM](../examples/nlp-java-jvm/README.md#nlp-javajvm) - docker container with Java/JVM based NLP libraries/frameworks (inspired by LaMachine, Awesome NLP and others out there)
- [Better NLP library (experimental)](../examples/better-nlp) | Slides: [1](./better-nlp/presentations/09-Mar-2019/Better-NLP-Presentation-Slides.pdf) [2](./better-nlp/presentations/29-Jun-2019/Better-NLP-2.0-one-library-rules-them-all-Presentation-Slides.pdf)
- [Facebook's PyText](https://github.com/facebookresearch/PyText)
- [Facebook's FastText](https://github.com/facebookresearch/FastText) | [homepage | docs](https://fasttext.cc/)
Expand Down

0 comments on commit 8c91e79

Please sign in to comment.