Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rodrigopivi committed May 10, 2019
1 parent cfc906b commit 93d93e8
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 18 deletions.
20 changes: 10 additions & 10 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ If you are building chatbots using commercial models, open source frameworks or

This project contains the:
- [Online chatito IDE](https://rodrigopivi.github.io/Chatito/)
- [Chatito DSL specification](https://github.com/rodrigopivi/Chatito/blob/master/spec.md)
- [Chatito DSL specification](https://github.com/rodrigopivi/Chatito/blob/master/spec.md)
- [DSL AST parser in pegjs format](https://github.com/rodrigopivi/Chatito/blob/master/parser/chatito.pegjs)
- [Generator implemented in typescript + npm package](https://github.com/rodrigopivi/Chatito/tree/master/src)
- [Generator implemented in typescript + npm package](https://github.com/rodrigopivi/Chatito/tree/master/src)

### Chatito language
For the full language specification and documentation, please refer to the [DSL spec document](https://github.com/rodrigopivi/Chatito/blob/master/spec.md).
Expand All @@ -31,7 +31,7 @@ For the full language specification and documentation, please refer to the [DSL
The language is independent from the generated output format and because each model can receive different parameters and settings, there are 3 data format adapters provided. This section describes the adapters, their specific behaviors and use cases:

#### Default format
Use the default format if you plan to train a custom model or if you are writting a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intents` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the dsl. E.g.:
Use the default format if you plan to train a custom model or if you are writing a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intents` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the DSL. E.g.:

```
%[some intent]('context': 'some annotation')
Expand All @@ -46,7 +46,7 @@ Custom entities like 'context', 'required' and 'type' will be available at the o

#### [Rasa NLU](https://rasa.com/docs/nlu/)
[Rasa NLU](https://rasa.com/docs/nlu/) is a great open source framework for training NLU models.
One particular behavior of the Rasa adapter is that when a slot definition sentence only contains one alias, the generated rasa dataset will map the alias as a synonym. e.g.:
One particular behavior of the Rasa adapter is that when a slot definition sentence only contains one alias, the generated Rasa dataset will map the alias as a synonym. e.g.:

```
%[some intent]('training': '1')
Expand All @@ -60,14 +60,14 @@ One particular behavior of the Rasa adapter is that when a slot definition sente
synonym 2
```

In this example, the generated rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 2` mapping to `some slot synonyms`.
In this example, the generated Rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 2` mapping to `some slot synonyms`.

#### [LUIS](https://www.luis.ai/)
[LUIS](https://www.luis.ai/) is part of Microsoft's Cognitive services. Chatito supports training a LUIS NLU model through its [batch add labeled utterances endpoint](https://westus.dev.cognitive.microsoft.com/docs/services/5890b47c39e2bb17b84a55ff/operations/5890b47c39e2bb052c5b9c09), and its [batch testing api](https://docs.microsoft.com/en-us/azure/cognitive-services/LUIS/luis-how-to-batch-test).

To train a LUIS model, you will need to post the utterance in batches to the relevant api for training or testing.
To train a LUIS model, you will need to post the utterance in batches to the relevant API for training or testing.

Reference issue: [#61](https://github.com/rodrigopivi/Chatito/issues/)
Reference issue: [#61](https://github.com/rodrigopivi/Chatito/issues/61)

#### [Snips NLU](https://snips-nlu.readthedocs.io/en/latest/)
[Snips NLU](https://snips-nlu.readthedocs.io/en/latest/) is another great open source framework for NLU. One particular behavior of the Snips adapter is that you can define entity types for the slots. e.g.:
Expand All @@ -81,11 +81,11 @@ Reference issue: [#61](https://github.com/rodrigopivi/Chatito/issues/)
~[tomorrow]
```

In the previous example, all `@[date]` values will be taged with the `snips/datetime` entity tag.
In the previous example, all `@[date]` values will be tagged with the `snips/datetime` entity tag.

### NPM package

Chatito is supports nodejs `v8.11.2 LTS` or higher.
Chatito supports Node.js `v8.11.2 LTS` or higher.

Install it globally:
```
Expand Down Expand Up @@ -120,7 +120,7 @@ npx chatito <pathToFileOrDirectory> --format=<format> --formatOptions=<formatOpt
### Notes to prevent overfitting
Overfitting (https://en.wikipedia.org/wiki/Overfitting) is a problem that can be prevented if we use Chatito correctly. The idea behind this tool, is to have an intersection between data augmentation and having probabilistic description of possible sentences. It is not intended to generate deterministic datasets, you should avoid generating all possible combinations.
[Overfitting](https://en.wikipedia.org/wiki/Overfitting) is a problem that can be prevented if we use Chatito correctly. The idea behind this tool, is to have an intersection between data augmentation and a probabilistic description of possible sentences combinations. It is not intended to generate deterministic datasets, you should avoid generating all possible combinations.
### Author and maintainer
Rodrigo Pimentel
17 changes: 9 additions & 8 deletions spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ non printable characters, this are the requirements of document source text and
- Comments: Lines of text starting with '//' or '#' (no spaces before)
- Imports: Lines of text starting with 'import' keyword followed by a relative filepath
- Entity arguments: Optional key-values that can be declared at intents and slot definitions
- Probability operator: an optional keyword declared at the start of sentences to control the probabilities.

### 2.1 - Entities
Entities are the way to define keywords that wrap sentence variations and attach some properties to them.
Expand All @@ -83,7 +84,7 @@ added to the sentences defined inside. e.g.:
hi
```

The previous example will generate all possible unique examples for greet (in this case 2 utterances). But there are cases where there is no need to generate all utterances, or when we want to attach some extra properties to the genreated utterance, that is where entity arguments can help.
The previous example will generate all possible unique examples for greet (in this case 2 utterances). But there are cases where there is no need to generate all utterances, or when we want to attach some extra properties to the generated utterance, that is where entity arguments can help.

Entity arguments are comma separated key-values declared with the entity definition inside parenthesis. Each entity argument is composed of a key, followed by the `:` symbol and the value. The argument key or value are just strings wrapped with single or double quotes, optional spaces between the parenthesis and comma are allowed, the format is similar to ndjson but only for string values.

Expand Down Expand Up @@ -154,7 +155,7 @@ Nesting entities: Sentences defined inside a slot can only reference alias entit

#### 2.1.3 - Alias
The alias entity is defined by the `~[` symbols at the start of a line, following by the name of the alias and `]`.
Alias are just variations of a word and does not generate any tag. By default if an alias is referenced but not defined (like in the next example for `how are you`, it just uses the alias key name, this is usefull for making a word optional but not having to add the extra lines of code defining a new alias. e.g.:
Alias are just variations of a word and does not generate any tag. By default if an alias is referenced but not defined (like in the next example for `how are you`, it just uses the alias key name, this is useful for making a word optional but not having to add the extra lines of code defining a new alias. e.g.:

```
%[greet]
Expand All @@ -172,14 +173,14 @@ When an alias is referenced inside a slot definition, and it is the only token o

Alias definitions are not allowed to declare entity arguments.

Nesting entities: Sentences defined inside aliases can reference slots and other aliases but preventing recursive loops
Nesting entities: Sentences defined inside aliases can reference slots and other aliases but preventing recursive loops.


### 2.2 - Sentence probability operator

The way Chatito works, is like pulling samples from a cloud of possible combinations, but once the sentences definitions start getting more complex, the max possible combination possibilities increments exponentially, causing a problem where the generator will most likely pick sentences that have more possible combinations, and omit some sentences that may be more important at the dataset. To have some control of the generator principle, you can use the this operator.
The way Chatito works, is like pulling samples from a cloud of possible combinations, but once the sentences definitions start getting more complex, the max possible combination possibilities increments exponentially, causing a problem where the generator will most likely pick sentences that have more possible combinations, and omit some sentences that may be more important at the dataset. To have some control of the generator principle, you can use the probability operator.

The sentence probability operator is defined by the `*[` symbols at the start of a sentence, following by the probability of generating the sentence (max 100) and `]`. The value inside the probability operator must by an integer betwen 1 and 100.
The sentence probability operator is defined by the `*[` symbols at the start of a sentence, following by a number, the probability of generating the sentence and `]`. The value inside the probability operator must be an integer between 1 and 100, and the sum of all probability operators inside an entity definition should never exceed 100.

```
%[greet]('training': '2', 'testing': '2')
Expand All @@ -190,11 +191,11 @@ The sentence probability operator is defined by the `*[` symbols at the start of

This way, it is possible to declare that from the first sentence we want 5 testing and 5 training examples (50%). The second sentence will generate 30% of the utterances. And the 20% remaining will come from the remaining possibilities of all sentences.

NOTE: Be carefull when using probability operator, because if the sentence reaches its max number of unique generated values, it will start producing duplicates and possibly slowing down the generator that may filter duplicates.
NOTE: Be careful when using probability operator, because if the sentence reaches its max number of unique generated values, it will start producing duplicates and possibly slowing down the generator that may filter duplicates.

### 2.3 - Importing chatito files

To allow reusing entity declarations. It is possible to import another chatito file using the import keyword. Importing another chatito file, only allows using the slots and aliases defined there, if the imported file defines intents, they will be ignored since intents are generation entry points.
To allow reusing entity declarations. It is possible to import another chatito file using the import keyword. Importing another chatito file only allows using the slots and aliases defined there, if the imported file defines intents, they will be ignored since intents are generation entry points.

As an example, given two chatito files:

Expand All @@ -216,7 +217,7 @@ import ./slot1.chatito
```

The file `main.chatito` will import all alias and slot definitions from `./slot1.chatito`.
The text next to the import statement should be a relative path from the main file to the imported file.
The text next to the import statement should be a relative path from the main file to the imported file. Imports can be nested, and the path is always relative to the file that declares the reference.

Note: Chatito will throw an exception if two imports define the same entity.

Expand Down

0 comments on commit 93d93e8

Please sign in to comment.