Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-printable characters inside quoted string literals. #219

Merged
merged 1 commit into from Nov 21, 2015
Merged

Allow non-printable characters inside quoted string literals. #219

merged 1 commit into from Nov 21, 2015

Conversation

ghost
Copy link

@ghost ghost commented Nov 21, 2015

Fixed issue #192

@puzrin
Copy link
Member

puzrin commented Nov 21, 2015

NB. Speed decrease is ~10% (25% on edge cases). Looks acceptable.

@dervus
Copy link
Collaborator

dervus commented Nov 21, 2015

Looks good, but

  1. Why "JSON-style" input checking is disabled for single quoted scalars? I thought that according to the spec all quoted scalars should follow this exception.
  2. Performance probably could be improved by rewriting this regexp to something similar to JSON char check in the same function:
    if (checkJson) {
    for (_position = 0, _length = _result.length;
    _position < _length;
    _position += 1) {
    _character = _result.charCodeAt(_position);
    if (!(0x09 === _character ||
    0x20 <= _character && _character <= 0x10FFFF)) {
    throwError(state, 'expected valid JSON character');
    }
    }
    }

    It was written this way exactly becasue of performance. Manual codepoint-by-codepoint iteration is faster on small inputs than regexp test.

@ghost
Copy link
Author

ghost commented Nov 21, 2015

Disabling of the JSON-style checking for single quoted scalars is a consequence of the following two statements from YAML 1.2 specification:
a) 5.7. Note that escape sequences are only interpreted in double-quoted scalars.
b) 5.1. To ensure readability, non-printable characters should be escaped on output, even inside such scalars.
If there is an non-printable character inside a single-quoted scalar, then after parsing and exporting it should be replaced by escape sequence because of b). But it is impossible because of a). So the only way to resolve this problem is to assume that non-printable characters are allowed only inside double-quoted scalars, but not single-quoted ones.

@dervus
Copy link
Collaborator

dervus commented Nov 21, 2015

You're mixing two independent areas. 5.1 is dumper's concern. Dumper takes JS object and then decides by itself which scalar style to use. 5.7 is out of place at all. It's about \ escape sequences, no?

I'm just looking for usage of nb-json rule, and see that it's base for both nb-double-char and nb-single-char. See grammer rules in "7.3.1. Double-Quoted Style" and "7.3.2. Single-Quoted Style".

@ghost
Copy link
Author

ghost commented Nov 21, 2015

OK, I've missed the section 7.3.2, so I'll revert the check for single-quoted scalars back.

@dervus
Copy link
Collaborator

dervus commented Nov 21, 2015

@puzrin, if you think we don't need to try improve the performance, then It's good for merge now.

puzrin pushed a commit that referenced this pull request Nov 21, 2015
Allow non-printable characters inside quoted string literals.
@puzrin puzrin merged commit ea9e65a into nodeca:master Nov 21, 2015
@puzrin
Copy link
Member

puzrin commented Nov 21, 2015

I doubt that regexp unroll for this case will be faster - too many conditions. Let's leave it for someone else, who has time for experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants