Skip to content

Commit

Permalink
fix recursive examples creation with deep merge
Browse files Browse the repository at this point in the history
  • Loading branch information
rodrigopivi committed Aug 28, 2018
1 parent 6df9a1e commit be7e0f2
Show file tree
Hide file tree
Showing 9 changed files with 139 additions and 16 deletions.
11 changes: 4 additions & 7 deletions examples/dateBooking_large.chatito
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
%[bookRestaurantsAtDatetime]('training': '1000')
%[bookRestaurantsAtDatetime]('training': '1000', 'testing': '100')
~[find?] ~[some?] ~[restaurants] ~[available?] from @[bookTime] to @[bookTime]
~[find?] ~[some?] ~[restaurants] ~[available?] from @[bookTime] to @[bookTime] on @[bookDate]

Expand Down Expand Up @@ -32,10 +32,12 @@
~[24hour]~[:]~[minute]

@[bookDate]
~[contextDate]
~[monthNames] ~[monthDays]
~[monthDays] of ~[monthNames]
~[monthDayNumbers]/~[monthNumbers]
today
tomorrow
next ~[weekDays]

~[:]
:
Expand All @@ -44,11 +46,6 @@
am
pm

~[contextDate]
today
tomorrow
next ~[weekDays]


~[monthNames]
January
Expand Down
2 changes: 1 addition & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "chatito",
"version": "2.1.3",
"version": "2.1.4",
"description": "Generate training datasets for NLU chatbots using a simple DSL",
"bin": {
"chatito": "./dist/bin.js"
Expand Down
4 changes: 2 additions & 2 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ For the full language specification and documentation, please refer to the [DSL
The language is independent from the generated output format and because each model can receive different parameters and settings, there are 3 data format adapters provided. This section describes the adapters, their specific behaviors and use cases:

#### Default format
Use the default format if you plan to train a custom model or if you are writting a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intentts` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the dsl. E.g.:
Use the default format if you plan to train a custom model or if you are writting a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intents` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the dsl. E.g.:

```
%[some intent]('context': 'some annotation')
Expand Down Expand Up @@ -50,7 +50,7 @@ One particular behavior of the Rasa adapter is that when a slot definition sente
synonym 2
```

In this example, the generated rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 1` mapping to `some slot synonyms`.
In this example, the generated rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 2` mapping to `some slot synonyms`.

#### [Snips NLU](https://snips-nlu.readthedocs.io/en/latest/)
[Snips NLU](https://snips-nlu.readthedocs.io/en/latest/) is another great open source framework for NLU. One particular behavior of the Snips adapter is that you can define entity types for the slots. e.g.:
Expand Down
2 changes: 1 addition & 1 deletion spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ The previous example will generate all possible unique examples for greet (in th

Entity arguments are comma separated key-values declared with the entity definition inside parenthesis. Each entity argument is composed of a key, followed by the `:` symbol and the value. The argument key or value are just strings wrapped with single or double quotes, optional spaces between the parenthesis and comma are allowed, the format is similar to ndjson but only for string values.

By default, intent definitions can expect the `training` and `testing` argument keys, when defined, are used to declare the maximum number of unique examples to generate for the given intent, and splitting them in two datasets. The generator will first populate the training dataset, then testing dataset until reaching the sum of both values, each value must be `>= 1`. e.g.:
By default, intent definitions can expect the `training` and `testing` argument keys, when defined, are used to declare the maximum number of unique examples to generate for the given intent, and splitting them in two datasets, the training dataset is to be used to train the NLU model, and the testing dataset should be used to evaluate the accuracy of the model with examples it never trained with. Creating a testing dataset is not required, but it is important to be aware of the accuracy of your model to detect overfitting and compare against previous accuracies. The generator will first populate the training dataset, then testing dataset until reaching the sum of both values, each value must be `>= 1`. e.g.:

```
%[greet]('training: '2', 'testing': '1')
Expand Down
2 changes: 1 addition & 1 deletion src/adapters/rasa.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ export interface IRasaDataset {
};
}

interface IRasaTestingDataset {
export interface IRasaTestingDataset {
[intent: string]: ISentenceTokens[][];
}

Expand Down
2 changes: 1 addition & 1 deletion src/adapters/snips.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ export interface ISnipsDataset {
language: string;
}

interface ISnipsTestingDataset {
export interface ISnipsTestingDataset {
[intent: string]: ISentenceTokens[][];
}

Expand Down
116 changes: 115 additions & 1 deletion src/tests/bin.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,20 @@ test('test npm command line generator for large example', () => {
const child = cp.execSync(`node -r ts-node/register ${npmBin} ${grammarFile} --outputPath=${generatedDir}`);
expect(fs.existsSync(generatedDir)).toBeTruthy();
expect(fs.existsSync(generatedTrainingFile)).toBeTruthy();
expect(fs.existsSync(generatedTestingFile)).toBeFalsy();
expect(fs.existsSync(generatedTestingFile)).toBeTruthy();
const trainingDataset = JSON.parse(fs.readFileSync(generatedTrainingFile, 'utf8'));
expect(trainingDataset).not.toBeNull();
expect(trainingDataset.bookRestaurantsAtDatetime).not.toBeNull();
expect(trainingDataset.bookRestaurantsAtDatetime.length).toEqual(1000);
fs.unlinkSync(generatedTrainingFile);
const testingDataset = JSON.parse(fs.readFileSync(generatedTestingFile, 'utf8'));
expect(testingDataset).not.toBeNull();
expect(testingDataset.bookRestaurantsAtDatetime).not.toBeNull();
expect(testingDataset.bookRestaurantsAtDatetime.length).toEqual(100);
fs.unlinkSync(generatedTestingFile);
fs.rmdirSync(generatedDir);
expect(fs.existsSync(generatedTrainingFile)).toBeFalsy();
expect(fs.existsSync(generatedTestingFile)).toBeFalsy();
expect(fs.existsSync(generatedDir)).toBeFalsy();
});

Expand Down Expand Up @@ -68,6 +74,46 @@ test('test npm command line generator for medium example', () => {
expect(fs.existsSync(generatedDir)).toBeFalsy();
});

test('test npm command line generator for process all directory examples', () => {
const d = __dirname;
const generatedDir = path.resolve(`${d}/../../examples/citySearch_medium`);
const generatedTrainingFile = path.resolve(generatedDir, 'default_dataset_training.json');
const generatedTestingFile = path.resolve(generatedDir, 'default_dataset_testing.json');
const npmBin = path.resolve(`${d}/../bin.ts`);
const grammarFiles = path.resolve(`${d}/../../examples/`);
if (fs.existsSync(generatedTrainingFile)) {
fs.unlinkSync(generatedTrainingFile);
}
if (fs.existsSync(generatedTestingFile)) {
fs.unlinkSync(generatedTestingFile);
}
if (fs.existsSync(generatedDir)) {
fs.rmdirSync(generatedDir);
}
const child = cp.execSync(`node -r ts-node/register ${npmBin} ${grammarFiles} --outputPath=${generatedDir}`);
expect(fs.existsSync(generatedDir)).toBeTruthy();
expect(fs.existsSync(generatedTrainingFile)).toBeTruthy();
expect(fs.existsSync(generatedTestingFile)).toBeTruthy();
const trainingDataset = JSON.parse(fs.readFileSync(generatedTrainingFile, 'utf8'));
expect(trainingDataset).not.toBeNull();
expect(trainingDataset.findByCityAndCategory).not.toBeNull();
expect(trainingDataset.findByCityAndCategory.length).toEqual(1000);
expect(trainingDataset.bookRestaurantsAtDatetime).not.toBeNull();
expect(trainingDataset.bookRestaurantsAtDatetime.length).toEqual(1000);
const testingDataset = JSON.parse(fs.readFileSync(generatedTestingFile, 'utf8'));
expect(testingDataset).not.toBeNull();
expect(testingDataset.findByCityAndCategory).not.toBeNull();
expect(testingDataset.findByCityAndCategory.length).toEqual(100);
expect(testingDataset.bookRestaurantsAtDatetime).not.toBeNull();
expect(testingDataset.bookRestaurantsAtDatetime.length).toEqual(100);
fs.unlinkSync(generatedTrainingFile);
fs.unlinkSync(generatedTestingFile);
fs.rmdirSync(generatedDir);
expect(fs.existsSync(generatedTrainingFile)).toBeFalsy();
expect(fs.existsSync(generatedTestingFile)).toBeFalsy();
expect(fs.existsSync(generatedDir)).toBeFalsy();
});

test('test npm command line generator for rasa medium example', () => {
const d = __dirname;
const generatedTrainingFile = path.resolve(`${d}/../../examples/rasa_dataset_training.json`);
Expand Down Expand Up @@ -98,6 +144,38 @@ test('test npm command line generator for rasa medium example', () => {
expect(fs.existsSync(generatedTestingFile)).toBeFalsy();
});

test('test npm command line generator for rasa directory examples', () => {
const d = __dirname;
const generatedTrainingFile = path.resolve(`${d}/../../examples/rasa_dataset_training.json`);
const generatedTestingFile = path.resolve(`${d}/../../examples/rasa_dataset_testing.json`);
const npmBin = path.resolve(`${d}/../bin.ts`);
const grammarFile = path.resolve(`${d}/../../examples`);
if (fs.existsSync(generatedTrainingFile)) {
fs.unlinkSync(generatedTrainingFile);
}
if (fs.existsSync(generatedTestingFile)) {
fs.unlinkSync(generatedTestingFile);
}
const child = cp.execSync(`node -r ts-node/register ${npmBin} ${grammarFile} --format=rasa --outputPath=${d}/../../examples`);
expect(fs.existsSync(generatedTrainingFile)).toBeTruthy();
const dataset = JSON.parse(fs.readFileSync(generatedTrainingFile, 'utf8'));
expect(dataset).not.toBeNull();
expect(dataset.rasa_nlu_data).not.toBeNull();
expect(dataset.rasa_nlu_data.common_examples).not.toBeNull();
expect(dataset.rasa_nlu_data.common_examples.length).toEqual(2000);
fs.unlinkSync(generatedTrainingFile);
expect(fs.existsSync(generatedTrainingFile)).toBeFalsy();
expect(fs.existsSync(generatedTestingFile)).toBeTruthy();
const testingDataset = JSON.parse(fs.readFileSync(generatedTestingFile, 'utf8'));
expect(testingDataset).not.toBeNull();
expect(testingDataset.findByCityAndCategory).not.toBeNull();
expect(testingDataset.findByCityAndCategory.length).toEqual(100);
expect(testingDataset.bookRestaurantsAtDatetime).not.toBeNull();
expect(testingDataset.bookRestaurantsAtDatetime.length).toEqual(100);
fs.unlinkSync(generatedTestingFile);
expect(fs.existsSync(generatedTestingFile)).toBeFalsy();
});

test('test npm command line generator for snips medium example', () => {
const d = __dirname;
const generatedTrainingFile = path.resolve(`${d}/../../examples/snips_dataset_training.json`);
Expand Down Expand Up @@ -128,3 +206,39 @@ test('test npm command line generator for snips medium example', () => {
fs.unlinkSync(generatedTestingFile);
expect(fs.existsSync(generatedTestingFile)).toBeFalsy();
});

test('test npm command line generator for snips all examples', () => {
const d = __dirname;
const generatedTrainingFile = path.resolve(`${d}/../../examples/snips_dataset_training.json`);
const generatedTestingFile = path.resolve(`${d}/../../examples/snips_dataset_testing.json`);
const npmBin = path.resolve(`${d}/../bin.ts`);
const grammarFile = path.resolve(`${d}/../../examples`);
if (fs.existsSync(generatedTrainingFile)) {
fs.unlinkSync(generatedTrainingFile);
}
if (fs.existsSync(generatedTestingFile)) {
fs.unlinkSync(generatedTestingFile);
}
const child = cp.execSync(`node -r ts-node/register ${npmBin} ${grammarFile} --format=snips --outputPath=${d}/../../examples`);
expect(fs.existsSync(generatedTrainingFile)).toBeTruthy();
const dataset = JSON.parse(fs.readFileSync(generatedTrainingFile, 'utf8'));
expect(dataset).not.toBeNull();
expect(dataset.intents).not.toBeNull();
expect(dataset.intents.findByCityAndCategory).not.toBeNull();
expect(dataset.intents.findByCityAndCategory.utterances).not.toBeNull();
expect(dataset.intents.findByCityAndCategory.utterances.length).toEqual(1000);
expect(dataset.intents.bookRestaurantsAtDatetime).not.toBeNull();
expect(dataset.intents.bookRestaurantsAtDatetime.utterances).not.toBeNull();
expect(dataset.intents.bookRestaurantsAtDatetime.utterances.length).toEqual(1000);
fs.unlinkSync(generatedTrainingFile);
expect(fs.existsSync(generatedTrainingFile)).toBeFalsy();
expect(fs.existsSync(generatedTestingFile)).toBeTruthy();
const testingDataset = JSON.parse(fs.readFileSync(generatedTestingFile, 'utf8'));
expect(testingDataset).not.toBeNull();
expect(testingDataset.findByCityAndCategory).not.toBeNull();
expect(testingDataset.findByCityAndCategory.length).toEqual(100);
expect(testingDataset.bookRestaurantsAtDatetime).not.toBeNull();
expect(testingDataset.bookRestaurantsAtDatetime.length).toEqual(100);
fs.unlinkSync(generatedTestingFile);
expect(fs.existsSync(generatedTestingFile)).toBeFalsy();
});
14 changes: 13 additions & 1 deletion src/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,22 @@ export const maxSentencesForEntity = (ed: IChatitoEntityAST, entities: IEntities
// Deep merge objects
// https://gist.github.com/Salakar/1d7137de9cb8b704e48a
const isObject = (item: any) => item && typeof item === 'object' && !Array.isArray(item) && item !== null;
const isArray = (item: any) => {
if (typeof Array.isArray === 'undefined') {
return Object.prototype.toString.call(item) === '[object Array]';
} else {
return Array.isArray(item);
}
};
export const mergeDeep = <T>(target: any, source: any): T => {
if (isObject(target) && isObject(source)) {
Object.keys(source).forEach(key => {
if (isObject(source[key])) {
if (isArray(source[key])) {
if (target[key] === undefined) {
target[key] = [];
}
target[key] = target[key].concat(source[key]);
} else if (isObject(source[key])) {
if (!target[key]) {
Object.assign(target, { [key]: {} });
}
Expand Down

0 comments on commit be7e0f2

Please sign in to comment.