This is WIP exploring the idea of storing, for multiple source files, the result of running drracket/check-syntax, plus some more analysis.
The main motivation is to support multi-file flavors of things like "find references" and "rename".
The intent is this could enhance Racket Mode, as well as Dr Racket and other tools.
For each analyzed source file:
- Fully expand, accumulating some information even if expansion fails (as used by e.g. Typed Racket):
- direct calls to
error-display-handler
online-check-syntax
logger messages
- Run check-syntax, recording the values from various
syncheck-annotations<%>
methods.
After accumulating information in various fields of a struct, finally the struct is serialized, compressed, and stored in a sqlite table.
We extend the check-syntax analysis in various ways:
-
In addition to
syncheck:add-definition-target
, which identifies definitions, we identify and record exports from fully-expanded#%provide
forms. -
In addition to
syncheck:add-arrow/name-dup/pxpy
, which identifies lexical and import arrows, we identify and record some other flavors of arrows:- import-rename-arrows, as from
rename-in
etc. - export-rename-arrows, as from
rename-out
, etc.
Also we enhance the check-syntax import-arrows to store the "from" and "nominal-from" information from identifier-binding. Following the nominal-from values to the exports in other files, and vice versa, is how we can identify rename-sites across multiple files.
- import-rename-arrows, as from
-
We assemble a list of imported symbols, suitable for use as completion candidates, akin to
namespace-mapped-symbols
. [Currently this is one dumb flat list, as is done by Racket Mode's current back end. A to-do is to create a tree structure reflecting which candidates are valid where.]
In Racket a definition can be exported and imported an arbitrary number of times before it is used -- and can be renamed at each such step.
In general, the definition graph elides that and expresses "big, direct jumps" among files. Which is wonderful when you want to e.g. "visit/find/jump to definition" in another file.
By contrast the "name introduction and use" graph cares about the
chain of exports and imports, and considers steps where a rename
occurs. A motivation is to support multi-file rename commands. For
that to work, every occurrence of the "same" name must be known,
including uses in provide
and require
forms, and considering
clauses like rename-out
, prefix-ix
, rename-in
, prefix-out
, and
so on.
For example, if user wants foo
to be renamed bar
, then sites like
(provide foo)
must be changed. Furthermore, sites like (provide (rename-out [foo xxx]))
are inflection points where the graph ends.
If some other file does (require (rename-in mod [xxx foo]))
, that
"foo" is not the same and should not be in the same set of sites to be
renamed as the "foo" in the exporting file.
For either type of graph, it is simple to proceed from a use to its
source. When the source is in some other file, we know which other
file: The identifier-binding
"from" or "nominal-from" information
always says in which other file to look. If that file isn't yet in the
database (or is outdated), we analyze it, and so on transitively.
Furthermore it is a 1:1 relation; even when there are multiple steps
(such as hopping through a contract wrapper to the wrapped
definition), each step is 1:1.
On the other hand, proceeding from a definition to its uses is a 1:many relation, transitively (each of the many uses may in turn have many uses). Furthermore we can't discover absolutely all uses -- unless absolutely all using files have already been analyzed. There exists only a set of known uses, which is limited by the set of already-analyzed files.
This is another motivation to save analysis results for multiple files in a database. One or more directory trees, each for some project the user cares about, can be analyzed proactively. (Thereafter a digest mismatch can trigger an automatic re-analysis of a changed file.) This enables discovering all uses, at least within the scope of those projects.
Status quo, Racket Mode's back end runs check-syntax and returns to
the front end racket-xp-mode
the full results for each file. The
entire Emacs buffer is re-propertized. For example mouse-overs become
help-echo
text properties.
How exactly would Racket Mode's back end use this pdb
project.
Initially, Racket Mode's back end could use this pdb project the same way: Get the full analysis results, and re-propertize the entire buffer.
That alone is no improvement. But we could add new Racket Mode commands that query the db, such as multi-file xref-find-references or renaming.
Furthermore, I think we could eliminate the back end's cache of fully expanded syntax. For example find-definition no longer needs to walk fully-expanded syntax looking for a site. We already did that, for all definitions, and saved the results; now it's just a db query.
(I'm not sure about find-signature: Maybe we could add a pass to walk pre-expanded surface syntax, finding all signatures, as the status quo back end does one by one.)
Status: Done as an initial sanity check, then discarded. I
modified racket-xp-mode
and the Racket Mode back end to use pdb when
available, and use the same propertize-all-buffer approach. It
performed about the same as before; having multi-file rneame was nice.
Although that's still in the commit history, I wanted to move on past
that to the next step.
A bigger change: The front end would query just for various spans of the buffer, as-needed.
This would improve how we handle larger files like class-internal.rkt, not to mention eenormous files like the example provided by samth.
Status quo, Emacs doesn't block while the analysis is underway, but after it completes, for a sufficiently large buffer and analysis results, it takes a very long time to marshal the results and to re-propertize the entire buffer; Emacs can noticeably freeze.
Admittedly doing limited, JIT queries doesn't magically transform drracket/check-syntax itself to a "streaming" or incremental approach. The entire analysis would still need to complete (still taking about 10 seconds for class-internal.rkt, and 60 for the example provided by samth!) before any new results were available. However the results could be retrieved in vastly smaller batches. IOW there would still be a large delay until any new results were available, but no update freezes.
Status: Done. Still dog-fooding. I quickly realized that modifying
racket-xp-mode
to work in both the "classic" and new ways was going
to be messy. Instead I made a fresh racket-pdb-mode
. This works by
doing a query to the db whenever point (Emacs jargon, a.k.a. the
caret) moves. The back end and pdb return values only pertaining to
point and the currently visible span (the window-start through
window-end positions, in Emacs jargon). I'm still dog-fooding this,
looking for problems or mis-features.
Of course this could become a package to be used in various other ways.
We could offer any of:
-
A CLI (e.g. a new
raco
tool). -
A stable API for Racket programs.
-
An equivalent API via HTTP.
One issue here is that some tools might prefer or need line:column
coordinates instead of positions. [Effectively drracket/check-syntax
and our own analysis use syntax-position
and syntax-span
, ignoring
syntax-line
and syntax-column
.] Either we could try to store
line:col-denominated spans, also, in the db when we analyze (at some
cost in space). Or we could just synthesize these as/when needed by
such an API, by running through the file using port-count-lines!
(at
some cost in time).
-
In
example.rkt
,prefix-tests
has a couple tests I can't yet make pass, for the reasons explained in the comment: 1.prefix-out
doesn't support sub-range-binders, and, 2.all-defined-out
gives every definition the same srcloc, which is the(all-defined-out)
form itself. -
The
#%provide
clausesall-defined
,all-defined-except
,prefix-all-defined
, andprefix-all-defined-except
are not yet supported by our analysis that finds exports. (Note thatprovide
clauses likeall-defined-out
do not actually expand into these, and are supported. So this limitation isn't as big as it seems. But if some handwritten code or other macro expansion uses these specific#%provide
clauses, the exports won't be identified.) -
The
rename-sites
command currently returns a hash-table value with all results. For renames involving a huge number of files and sites, a for-each flavor might be preferable.
If you want to kick the tires on this in its current state, I
recommend looking at the tests in example.rkt
, as called from the
tests
submodule.
As the functions work in terms of 1-based positions, just like Racket
syntax-position
and Emacs buffer positions, it's annoying to keep
typing C-x = to see the position at point while in the
example files. You might find it handy to add something like the
following to your Emacs mode-line-position
definition:
(:propertize (:eval (format "%s" (point)))
face (:slant italic))
Also remember that M-g c will let you jump to a position.
You probably want to avoid, however, the very-many-files-example
submodule -- unless you want to wait hours for 8,000 files to be
analyzed for the first time:
;; On my system -- with the non-minimal Racket distribution
;; installed, and about a dozen other packages -- this results in
;; about 8,000 files, which takes nearly 3 hours to analyze,
;; and yields a 92 MiB pdb-main.sqlite file.
(for ([d (in-list (list* (get-pkgs-dir 'installation)
(get-pkgs-dir 'user)
(current-library-collection-paths)))])
(when (directory-exists? d)
(queue-directory-to-analyze d)))
;; Do this to analyze all files discovered. With #:always? #f each
;; file will be fully re-analyzed only if its digest is invalid (if
;; the file has changed, or, the digest was deleted to force a
;; fresh analysis).
(time (analyze-all-known-paths #:always? #f))
Also, if you use Emacs, you could try the new pdb
branch from the
racket-mode
repo. In this case you probably to change your
racket-mode-hook
to use racket-pdb-mode
instead of
racket-xp-mode
.