-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to_columns_string()
C++ JSON API
#2315
Conversation
just noting that you've auto formatted the python tests here. I'm a fan of this, but it does make for a big changeset |
@timkpaine The auto formatting is exclusively in a single commit authored by This code shouldn't have been excluded from formatting. The issue (currently, though the decision begets issues) is that VSCode does not respect the |
await table_suite(); | ||
await view_suite(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why were these switched?
@@ -111,6 +122,19 @@ test.describe("leaks", function () { | |||
view.delete(); | |||
table.delete(); | |||
}); | |||
|
|||
test.skip("csv loading does not leak", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking, should this be skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes forever to run is the only reason to skip it
{ datetime: "6/13/16" }, | ||
{ datetime: "6/14/16" }, | ||
{ datetime: "2016-06-13" }, | ||
{ datetime: "2016-06-14" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, how important is it to maintain datatime strring parsing across different systems? Since it's possible to use a parsed date string as part of a column name, should we at one point add a C++ based function that both JS and Python can call to get a standardized formatted date string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only two conditions where date
/datetime
columns need to be formatted:
(1) When they are used in a split_by
and we need to stringify them to create compound column names. This is really a deficiency of the JSON encoding we use which doesn't permit true column paths. Fixing this encoding would remove the need for formatting for this case.
(2) When the formatted
flag is passed. This is only used by the CSV exporter (as users typically expect CSVs to be "human readable"). Formerly, this used sprintf
to try to replicate the en/us
locale, as the browser native locale is not cheaply accessible from C++; however, this was never right, especially in non-en/us
locales. The format in this patch is a compromise - it is not locale-formatted, but it is consistently emitted and parsed (though assumption are made about the encoding timezone in this case which are symmetric wrt Perspective but maybe not always desired).
func = Benchmark(lambda: getattr(self._view, "to_{0}".format(name))(), meta=test_meta) | ||
method = "to_{0}".format(name) | ||
test_meta = make_meta("to_format", method) | ||
func = Benchmark(getattr(self._view, method), meta=test_meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If benchmarking results in static assets, maybe we can create an blocks example that pulls in the data.
This PR adds
View.to_columns_string()
method, which works similarly toto_columns()
, except it returns a JSON string instead of a JavaScript object (in JS) or Python Dict (in Python).to_columns_string()
is implemented in C++ useing RapidJSON library, and re-used verbatim in both JavaScript and Python, reducing duplicate code between these two projects.In JavaScript:
to_columns()
,to_json()
have been rewritten in terms ofto_columns_string()
, usingJSON.parse()
to_columns_string()
saves one Object->String->Object conversion (as the data can stay in string form until it is copied into the main renderer process).@finos/perspective-viewer-datagrid
and@finos/perspective-viewer-d3fc
have been rewritten to use this method internally, leading to improved performance for both.In Python
to_columns()
,to_records()
have been rewritten in terms ofto_columns_string()
, usingjson.loads()
Inf
,-Inf
andnan
support has been removed from these methods. These data structures errored when serializing anyway, so were entirely useless for visualization purposes.to_numpy()
andto_arrow()
preserve these values still.In both:
date
anddatetime
columns now useYYYY-MM-DD
andYYYY-MM-DD HH:MM:SS.SSSS
formatting respectively, for CSV export andsplit_by
headers. The former can only be fixed at the expense of performance and/or significant asset size (bundling a locale collection); the latter can be fixed with a refactoring of the JSON representation.Serial (linear) performance by version:
10 clients concurrent performance time series: