Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New generate_query_graph tool. #9212

Merged
merged 25 commits into from
Oct 19, 2023
Merged

Conversation

Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Oct 4, 2023

The generate query graph code was taken out in #6972. Here I'm attempting to add it back in the generate query graph code. Using some pastel colors but it can honestly be anything.

Some goals for this version of the query graph

  • No javascript. Javascript is hard to maintain. This tree library is basically all CSS and doesn't need maintenance
  • Readability. The previous query graphs were not very readable (some colors made the text hard to read), The new colors prevent this from happening.
  • Same timing information as the previous version.

Things to improve

  • The ability to zoom out and see the whole plan. If the plan is bushy, the graph can get very wide, and it becomes difficult to see all of the operators in one screen. Currently however, it is much easier for a user to just navigate across the whole plan. This can also be fixed by decreasing the font for the html.
  • When detailed profiling is provided, add another UI element to show the timing information for the individual optimizers
  • better colors?

open to some initial feedback right now about colors/design.
You can check out the branch and generate the query graph with the following commands while in the duckdb directory.

duckdb -c "load tpch; call dbgen(sf=0.1); pragma enable_profiling=json;pragma profile_output="q09.json";pragma tpch(9);"
python3 scripts/generate_querygraph.py q09.json

Old query graph generator output
image

New query graph generator output
image

text = f.read()

if open_output:
os.system('open "' + output.replace('"', '\\"') + '"')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the webbrowser module here instead

_exported_symbols.extend([
"typing",
"functional"
"functional",
"duckdb_query_graph"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the module name needs to include duckdb

total_timing_sum += self.get_summary_phase_timings(phase).time
return total_timing_sum

QUERY_TIMINGS = AllTimings()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being reset anywhere, can the graph generator only be used once per interpreter session?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea yea, but it's probably good to just reset this every time a new profile output is passed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to not use a global to begin with probably?

@@ -0,0 +1,45 @@
# generate_querygraph.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include this code and ship it as an executable module instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means it runs as python3 -m generate_querygraph explain_output.json ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like

python -m duckdb.query_graph ...

But basically yeah. Could even publish it an standalone executable on the user machines that way too


sys.path.insert(0, 'benchmark')

arguments = sys.argv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the argparse library makes this much less cumbersome

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@github-actions github-actions bot marked this pull request as draft October 6, 2023 09:18
@Tmonster Tmonster requested review from Mause and Tishj October 6, 2023 09:20
@Tmonster
Copy link
Contributor Author

Tmonster commented Oct 6, 2023

Pressed the re-review button too early. Wait a couple of minutes

text = f.read()

if open_output:
webbrowser.open('file:///Users/tomebergen/scripts/run_imdb/tpch_new_html/q08.html', new=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to replace this?

translate_json_to_html(input, output)

with open(output, 'r') as f:
text = f.read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

'chart_script': ''
}

# ????
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

????

# if detailed profiling is enabled, then information on the timing of each stage of the query is available
# that is "top level timing information"
def get_top_level_timings(json):
return []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to contain something?


def open_utf8(fpath, flags):
import sys
if sys.version_info[0] < 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support python 2 anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just inline this function, no warning required

@@ -0,0 +1,45 @@
# generate_querygraph.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file can be deleted now right?

@Tmonster Tmonster marked this pull request as ready for review October 6, 2023 10:58

def open_utf8(fpath, flags):
import sys
if sys.version_info[0] < 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just inline this function, no warning required

# add up all of the times
# measure each time as a percentage of the total time.
# then you can return a list of [phase, time, percentage]
child_timings = get_child_timings(json['children'][0], query_timings)
Copy link
Member

@Mause Mause Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this variable is never used, just call the function

f.write(html)

def main():
sys.path.insert(0, 'benchmark')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you added this line?

@github-actions github-actions bot marked this pull request as draft October 9, 2023 08:24
@Tmonster Tmonster marked this pull request as ready for review October 9, 2023 12:28
@Tmonster
Copy link
Contributor Author

Tmonster commented Oct 9, 2023

@Mause Some of those lines are leftover from the old script which I dropped in. Sorry for not thoroughly checking it. Thank you for the reviews 🙏 !

@Tmonster Tmonster requested a review from Mause October 9, 2023 12:50
Copy link
Member

@Mause Mause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just need to run the make format-fix script and update for the stubs test

@github-actions github-actions bot marked this pull request as draft October 10, 2023 09:47
@Tmonster Tmonster marked this pull request as ready for review October 11, 2023 08:05
@Tmonster Tmonster requested a review from Mause October 11, 2023 12:49
@Mytherin
Copy link
Collaborator

@Tmonster @Mause is this ready to merge?

@Tmonster
Copy link
Contributor Author

Ready from my side 👍

@Mytherin Mytherin merged commit 6ff89e6 into duckdb:main Oct 19, 2023
14 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants