Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate away from yajl #6257

Open
1 of 2 tasks
dcermak opened this issue Oct 4, 2024 · 14 comments
Open
1 of 2 tasks

Migrate away from yajl #6257

dcermak opened this issue Oct 4, 2024 · 14 comments
Labels
accepted Has been approved and is ok to start working on enhancement help wanted

Comments

@dcermak
Copy link

dcermak commented Oct 4, 2024

Welcome

  • Yes, I've searched similar issues and discussions on GitHub and didn't find any.

Impact

  • This feature requires new configuration and/or commands

Current Behavior

i3 currently uses yajl for json serialization. Unfortunately, yajl is pretty much dead upstream now and according to the current maintainer in Fedora, it has collected quite a number of CVEs and they'd like to eventually stop maintaining it or remove it from the distribution entirely (see: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/YPFHPOKAND3RZR7ZKWTDHUQEESG6IUJ3/).

Desired Behavior

Migrate to a maintained json library, e.g. json-c (this library is also used by sway).

I'd be willing to help out in such an effort, but only if there's any chance that this will land upstream in i3.

@i3bot
Copy link

i3bot commented Oct 4, 2024

Please note that new features which require additional configuration will usually not be considered. We are happy with the feature set of i3 and want to focus in fixing bugs instead. We do accept feature requests, however, and will evaluate whether the added benefit (clearly) outweighs the complexity it adds to i3.

Keep in mind that i3 provides a powerful way to interact with it through its IPC interface: https://i3wm.org/docs/ipc.html.

@orestisfl
Copy link
Member

I think that's a good idea but there is a lot of code that is tightly coupled with json parsing unfortunately. The effort here would be quite significant. Thoughts @stapelberg?

@stapelberg
Copy link
Member

Thanks for the hint, I didn’t realize yajl had become stale.

While I’m not too worried about the CVEs (i3 doesn’t parse untrusted JSON), it is concerning that distribution maintainers want to drop yajl, so yes, we should probably migrate i3 to some other library.

I made a note to cut a new release soon (maybe end of next week) and then we can accept PRs which convert i3 piece by piece to a new JSON library. It’s probably easiest to start with the standalone tools like i3-msg.

@orestisfl
Copy link
Member

Good idea about the piece-by-piece migration. WDYT about having a separate experimental branch for the json migration at least in the beginning?

@dcermak
Copy link
Author

dcermak commented Oct 7, 2024

With which json library should we go? json-c?

@orestisfl
Copy link
Member

Ideally, before starting on an implementation, the first work item would be to have a quick comparison between the alternatives.
Some options from a 2min search:

Options that seem less suiting:

Ultimately, from the top of my head, we want a library that:

  1. Supports comments. This is non-standard but has always been supported in i3
  2. Is well-maintained, relatively active
  3. Is relatively popular
  4. Is somewhat fast. We don't need to use the fastest option but i3 does a lot of json parsing since all IPC calls use json.
  5. Ideally, the migration effort from yajl is as small as possible

@stapelberg
Copy link
Member

json-c seems to be the much more popular option compared to runner-up cJSON. I can’t find parson in Debian at all.

If json-c seems usable, it seems like a good choice from what I can tell.

@orestisfl
Copy link
Member

json-c seems to be the much more popular option compared to runner-up cJSON. I can’t find parson in Debian at all.

Haven't looked into either yet but what is this based on? cJSON has 3x the stars.

@stapelberg
Copy link
Member

json-c seems to be the much more popular option compared to runner-up cJSON. I can’t find parson in Debian at all.

Haven't looked into either yet but what is this based on? cJSON has 3x the stars.

This was based on looking at how many packages declare a build-dependency by searching Debian Code Search:

(You can also do these reverse-dependency queries by querying apt, but I found it easiest to use Code Search here.)

@orestisfl orestisfl added help wanted accepted Has been approved and is ok to start working on labels Oct 27, 2024
@sigprof
Copy link

sigprof commented Nov 24, 2024

  1. Supports comments. This is non-standard but has always been supported in i3

Looks like the json-c parser supports comments directly, unless you use the JSON_TOKENER_STRICT flag. However, without the JSON_TOKENER_STRICT flag json-c accepts much more non-standard things than just comments:

  • single quoted strings ('foo') are accepted (however, for object member names such single quoted strings seem to be accepted even with JSON_TOKENER_STRICT, which looks like a bug);
  • values like true, false, null, Infinity, NaN are accepted even with mismatched case (although Infinity and NaN seem to be outside of the basic JSON spec, then JSON_TOKENER_STRICT flag does not disable them completely);
  • when parsing floating point numbers, trailing E/e or +/- characters without the actual exponent after them are ignored;
  • when parsing integer numbers (with no . or e/E) in the value), if the number does not actually fit into uint64_t (or int64_t, if it was prefixed with the - sign), the value is silently truncated to 64 bits (with JSON_TOKENER_STRICT a parse error is detected in that case instead — the number does not get converted into a double value with the corresponding loss of precision);
  • control characters (<= 0x1f) are silently ignored;
  • trailing commas in arrays and objects are allowed.

Probably the most problematic thing here is the overflow handling for integer numbers (but i3 should not be receiving untrusted JSON anyway, so this might be acceptable). Also, once trailing commas would start to be supported, getting rid of them without breaking compatibility would be impossible.

cJSON seems to be much more stricter — its parser does not supports comments, trailing commas, or the non-standard single quotes for strings. Comments in JSON are “supported” only by the void cJSON_Minify(char *json) function, which strips any superfluous whitespace from the passed JSON, and in the process also strips the /* ... */ and // ... comments; however, this function modifies the passed buffer in place, which might not be convenient in some cases.

Also cJSON parses all numbers as double, and additionally supplies the value converted to int for convenience (with saturation to the INT_MIN ... INT_MAX range); and it does not save any information about whether the original JSON representation was purely integer (e.g., for 2 and 2.0 the parse result would be exactly the same).

In addition, both of those implementation do not support null characters (\u0000) in JSON strings, unlike yajl; i3 might not really use that feature though.

I also looked at jansson, and that library explicitly does not support any JSON syntax extensions, including comments.

@stapelberg
Copy link
Member

Thanks for looking into this and sharing your findings.

I think we can probably live with not setting JSON_TOKENER_STRICT, i.e. with accepting all the non-standard behavior. The only item that I can imagine people accidentally running into is the single quotes for strings, and that’s easy enough to fix in, say, a saved layout file.

Given that we released v4.24 recently, it seems like the best course of action is now clear: convert piece-by-piece to json-c.

@xzfc
Copy link
Contributor

xzfc commented Jan 12, 2025

There is a conceptual difference between yajl and json-c parsers:

  • yajl parser is a event-driven/SAX-like parser. The user sets up a yajl_callbacks structure, and the parser calls them in a sequence during parsing.
  • json-c parser is DOM-like. The parser returns a newly allocated json_object * which the user should use to get fields, iterate, etc.

Similarly, yajl serializer is string-builder-like, while json-c serializer is DOM-like (the user prepares json_object *, then calls json_object_to_json_string on it).

The DOM-like API would simplify the code, but it would make performance worse as it performs a lot of allocations.

I migrated IPC serializer code for get_tree from yajl to json-c and benchmarked it using this script. The benchmark spawns a bunch of nested containers, then measures time of get_tree command. Results are bellow (times in ms). Do we consider acceptable to have 2x-3x performance slowdown?


Current next (yajl):

i3 version 4.24-7-g853b0d91 © 2009 Michael Stapelberg and contributors
nodes= 10 min= 0.17 median= 0.18 max= 0.43 avg= 0.21
nodes= 12 min= 0.19 median= 0.20 max= 0.46 avg= 0.21
nodes= 16 min= 0.23 median= 0.24 max= 0.43 avg= 0.25
nodes= 24 min= 0.33 median= 0.34 max= 0.53 avg= 0.34
nodes= 40 min= 0.51 median= 0.53 max= 1.14 avg= 0.56
nodes= 72 min= 0.88 median= 0.91 max= 1.51 avg= 0.94
nodes=136 min= 1.66 median= 1.75 max= 2.00 avg= 1.76
nodes=264 min= 3.05 median= 3.18 max= 4.35 avg= 3.22

My 6257-json-c branch:

i3 version 4.24-21-gc7936a15 © 2009 Michael Stapelberg and contributors
nodes= 10 min= 0.30 median= 0.35 max= 0.67 avg= 0.35
nodes= 12 min= 0.35 median= 0.39 max= 0.84 avg= 0.40
nodes= 16 min= 0.55 median= 0.60 max= 1.06 avg= 0.60
nodes= 24 min= 0.68 median= 0.76 max= 1.00 avg= 0.76
nodes= 40 min= 1.46 median= 1.59 max= 2.15 avg= 1.60
nodes= 72 min= 2.74 median= 2.96 max= 3.67 avg= 2.97
nodes=136 min= 5.23 median= 5.86 max= 6.14 avg= 5.86
nodes=264 min= 9.75 median=11.73 max=14.67 avg=11.75

@orestisfl
Copy link
Member

Thank you @xzfc, that's really illuminating. I am afraid that a performance degradation like this is a significant issue, i3 depends on a lot of JSON parsing/serialising so we should consider this aspect before moving with json-c prematurely.

Are there any other event-driven json libraries out there?

@xzfc
Copy link
Contributor

xzfc commented Jan 12, 2025

Non-DOM libraries:

Also, it worth considering vendoring small JSON dependencies (or adding them as submodules or rolling our own), so we don't rely on having them in distro repos. E.g. jems is only 278 lines of code (comments and blanks excluded), and jsmn is 353 lines.

Also, there is https://github.com/ibireme/yyjson. It's DOM-like, but it claims to be extremely fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Has been approved and is ok to start working on enhancement help wanted
Projects
None yet
Development

No branches or pull requests

6 participants