Add `to_columns_string()` C++ JSON API #2315

texodus · 2023-07-25T02:42:05Z

This PR adds View.to_columns_string() method, which works similarly to to_columns(), except it returns a JSON string instead of a JavaScript object (in JS) or Python Dict (in Python). to_columns_string() is implemented in C++ useing RapidJSON library, and re-used verbatim in both JavaScript and Python, reducing duplicate code between these two projects.

In JavaScript:

In JavaScript, to_columns(), to_json() have been rewritten in terms of to_columns_string(), using JSON.parse()
As JavaScript is typically instantiated in a WebWorker, using to_columns_string() saves one Object->String->Object conversion (as the data can stay in string form until it is copied into the main renderer process).
Linear performance is improved ~2x for these methods.
@finos/perspective-viewer-datagrid and @finos/perspective-viewer-d3fc have been rewritten to use this method internally, leading to improved performance for both.

In Python

to_columns(), to_records() have been rewritten in terms of to_columns_string(), using json.loads()
All 3 of these methods are now GIL-less, so concurrency for virtual workloads is much improved.
(Breaking) Inf, -Inf and nan support has been removed from these methods. These data structures errored when serializing anyway, so were entirely useless for visualization purposes. to_numpy() and to_arrow() preserve these values still.

In both:

date and datetime columns now use YYYY-MM-DD and YYYY-MM-DD HH:MM:SS.SSSS formatting respectively, for CSV export and split_by headers. The former can only be fixed at the expense of performance and/or significant asset size (bundling a locale collection); the latter can be fixed with a refactoring of the JSON representation.

Serial (linear) performance by version:

10 clients concurrent performance time series:

timkpaine · 2023-07-25T13:20:25Z

just noting that you've auto formatted the python tests here. I'm a fan of this, but it does make for a big changeset

texodus · 2023-07-26T15:25:30Z

@timkpaine The auto formatting is exclusively in a single commit authored by no author.

This code shouldn't have been excluded from formatting. The issue (currently, though the decision begets issues) is that VSCode does not respect the pyproject.toml configuration for black suddenly, resulting in these files being obliterated whenever they are edited. This PR modifies python tests, btu the modifications are not in the auto-format commit. We could move this commit to a separate PR but as it impacts nothing IMO don't think its worth the CI time.

brochington · 2023-07-26T15:12:46Z

tools/perspective-bench/src/js/worker.js

    await table_suite();
+    await view_suite();


Just curious, why were these switched?

brochington · 2023-07-26T16:30:07Z

packages/perspective/test/js/leaks.spec.js

@@ -111,6 +122,19 @@ test.describe("leaks", function () {
            view.delete();
            table.delete();
        });
+
+        test.skip("csv loading does not leak", async () => {


Just double checking, should this be skipped?

It takes forever to run is the only reason to skip it

packages/perspective/test/js/pivots.spec.js

brochington · 2023-07-26T16:40:12Z

packages/perspective/test/js/to_format.spec.js

-                { datetime: "6/13/16" },
-                { datetime: "6/14/16" },
+                { datetime: "2016-06-13" },
+                { datetime: "2016-06-14" },


Curious, how important is it to maintain datatime strring parsing across different systems? Since it's possible to use a parsed date string as part of a column name, should we at one point add a C++ based function that both JS and Python can call to get a standardized formatted date string?

There are only two conditions where date/datetime columns need to be formatted:

(1) When they are used in a split_by and we need to stringify them to create compound column names. This is really a deficiency of the JSON encoding we use which doesn't permit true column paths. Fixing this encoding would remove the need for formatting for this case.

(2) When the formatted flag is passed. This is only used by the CSV exporter (as users typically expect CSVs to be "human readable"). Formerly, this used sprintf to try to replicate the en/us locale, as the browser native locale is not cheaply accessible from C++; however, this was never right, especially in non-en/us locales. The format in this patch is a compromise - it is not locale-formatted, but it is consistently emitted and parsed (though assumption are made about the encoding timezone in this case which are symmetric wrt Perspective but maybe not always desired).

brochington · 2023-07-26T16:42:02Z

python/perspective/bench/runtime/perspective_benchmark.py

-            func = Benchmark(lambda: getattr(self._view, "to_{0}".format(name))(), meta=test_meta)
+            method = "to_{0}".format(name)
+            test_meta = make_meta("to_format", method)
+            func = Benchmark(getattr(self._view, method), meta=test_meta)


If benchmarking results in static assets, maybe we can create an blocks example that pulls in the data.

texodus and others added 2 commits July 21, 2023 19:48

Move to_columns() JSON creation logic to C++.

f21b69b

Apply lint to perspective tests

abfaa27

texodus added enhancement Feature requests or improvements breaking labels Jul 25, 2023

texodus mentioned this pull request Jul 25, 2023

Move to_columns() JSON creation logic to C++. #2314

Closed

texodus force-pushed the rapidjson-api branch from fdf1d99 to 6583c9e Compare July 26, 2023 15:21

texodus force-pushed the rapidjson-api branch from 6583c9e to 20abe09 Compare July 26, 2023 16:38

brochington reviewed Jul 26, 2023

View reviewed changes

Add Python support

3e2bc4d

texodus force-pushed the rapidjson-api branch from 20abe09 to 3e2bc4d Compare July 27, 2023 14:29

texodus merged commit 37e3df3 into master Jul 29, 2023

texodus deleted the rapidjson-api branch July 29, 2023 23:34

timkpaine mentioned this pull request Feb 7, 2024

Add benchmarking / track in CI Point72/csp#38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `to_columns_string()` C++ JSON API #2315

Add `to_columns_string()` C++ JSON API #2315

texodus commented Jul 25, 2023 •

edited

Loading

timkpaine commented Jul 25, 2023

texodus commented Jul 26, 2023

brochington Jul 26, 2023

brochington Jul 26, 2023

texodus Jul 26, 2023

brochington Jul 26, 2023

texodus Jul 29, 2023

brochington Jul 26, 2023

Add to_columns_string() C++ JSON API #2315

Add to_columns_string() C++ JSON API #2315

Conversation

texodus commented Jul 25, 2023 • edited Loading

timkpaine commented Jul 25, 2023

texodus commented Jul 26, 2023

brochington Jul 26, 2023

Choose a reason for hiding this comment

brochington Jul 26, 2023

Choose a reason for hiding this comment

texodus Jul 26, 2023

Choose a reason for hiding this comment

brochington Jul 26, 2023

Choose a reason for hiding this comment

texodus Jul 29, 2023

Choose a reason for hiding this comment

brochington Jul 26, 2023

Choose a reason for hiding this comment

Add `to_columns_string()` C++ JSON API #2315

Add `to_columns_string()` C++ JSON API #2315

texodus commented Jul 25, 2023 •

edited

Loading