Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RawSyntax address as the 'rootId' of SyntaxIdentifier #635

Merged
merged 1 commit into from
Aug 30, 2022

Conversation

rintaro
Copy link
Member

@rintaro rintaro commented Aug 24, 2022

We don't need to guarantee the uniqueness throughout the lifetime of the process, but only need to guarantee the uniqueness while the object is alive.

This removes _CSwiftSyntax dependency from SwiftSyntax.

We don't need to guarantee the uniqueness throughout the process
lifetime. but only need to guarantee the uniqueness while the object is
alive.

This removes  `_CSwiftSyntax` dependency  from `SwiftSyntax` module.
@rintaro rintaro requested a review from CodaFi August 24, 2022 22:57
@rintaro rintaro requested a review from ahoppen as a code owner August 24, 2022 22:57
@rintaro
Copy link
Member Author

rintaro commented Aug 24, 2022

@swift-ci Please test

@akyrtzi
Copy link
Contributor

akyrtzi commented Aug 24, 2022

Would this prevent sharing of RawSyntax objects? As I mentioned I'd suggest that we treat RawSyntax objects as "slabs of data" and not try to infer any kind of identity out of them.

@rintaro
Copy link
Member Author

rintaro commented Aug 24, 2022

@akyrtzi

Would this prevent sharing of RawSyntax objects?

No, RawSyntax can still be reused by different tree. The difference from the previous implementation is that, for example

let raw: RawSyntax = ...
let tree1 = Syntax(raw: raw)
let tree2 = Syntax(raw: raw)

Previously tree1 and tree2 had different ids. But now they have the same id. And IMO, that actually makes sense because they are exactly the same. The operation above is equivalent to

let tree1 = Syntax(raw: raw)
let tree2 = tree1

@rintaro
Copy link
Member Author

rintaro commented Aug 25, 2022

When sharing a RawSyntax from "different" trees:

let raw: RawSyntax = ...
let tree1 = Syntax(raw: raw)
let tree2 = Syntax(raw: raw.children[0])

let id1 = tree1.child(at: 0).id
let id2 = tree2.id

here, id1 and id2 are different because both rootId and indexInTree are different.
Another pattern:

let child: RawSyntax...
let tree1 = Syntax(raw: .makeLayout(elements: [child], arena: arena))
let tree2 = Syntax(raw: .makeLayout(elements: [child], arena: arena))

let id1 = tree1.child(at: 0).id
let id2 = tree2.child(at: 0).id

id1 and id2 are still different because the rootId is different.

@akyrtzi
Copy link
Contributor

akyrtzi commented Aug 25, 2022

To make sure I understand the proposed semantic change, the notion of "syntax identity" changes from "identity is guaranteed to be different between created syntax trees" to "identity may be the same if you created 2 trees and they ended up with the same RawSyntax tree structure".

So for example, if I parse the same file twice, previously I would be guaranteed to get different identities but after this change their identity could be the same. And from your example:

```swift
let child: RawSyntax...
let tree1: Syntax(raw: .makeLayout(elements: [child], arena: arena))
let tree2: Syntax(raw: .makeLayout(elements: [child], arena: arena))

let id1 = tree1.child(at: 0).id
let id2 = tree2.child(at: 0).id

id1 and id2 are still different because the rootId is different.

id1 and id2 are different now but they could be the same, right? A future implementation could re-use the same RawSyntax tree that wraps [child] elements and provide the same rootId.

To clarify, I think such a change would be fine, I mainly want to make sure I'm following along with what is being proposed.

@akyrtzi
Copy link
Contributor

akyrtzi commented Aug 25, 2022

if I parse the same file twice, previously I would be guaranteed to get different identities but after this change their identity could be the same.

To clarify, this is not just theoretical, a future implementation could be providing caching by associating source file contents with its associated RawSyntax tree, in which case parsing the same file twice would give you the same rootId.

@rintaro
Copy link
Member Author

rintaro commented Aug 25, 2022

if I parse the same file twice, previously I would be guaranteed to get different identities but after this change their identity could be the same.

Right. if 1) we implement incremental parsing, 2) we detect the file is exactly the same as the previous parsing, and 3) we decide to reuse the existing root RawSyntax instance as the result green tree.

let child: RawSyntax...
let tree1: Syntax(raw: .makeLayout(elements: [child], arena: arena))
let tree2: Syntax(raw: .makeLayout(elements: [child], arena: arena))

let id1 = tree1.child(at: 0).id
let id2 = tree2.child(at: 0).id

id1 and id2 are still different because the rootId is different.

id1 and id2 are different now but they could be the same, right? A future implementation could re-use the same RawSyntax tree that wraps [child] elements and provide the same rootId.

Since .makeLayout() always creates a new instance, in this specific case, they can't be the same. But I think I understand what you are saying, and yes, a future incremental parsing could re-use the same "root" raw syntax node for multiple parsing, and they will have the same id.

@rintaro
Copy link
Member Author

rintaro commented Aug 25, 2022

let child: RawSyntax...
let tree1: Syntax(raw: .makeLayout(elements: [child], arena: arena))
let tree2: Syntax(raw: .makeLayout(elements: [child], arena: arena))

let id1 = tree1.child(at: 0).id
let id2 = tree2.child(at: 0).id

id1 and id2 are still different because the rootId is different.

id1 and id2 are different now but they could be the same, right?

Oh, maybe you are thinking .makeLayout() could return the same instance if the structure is exactly the same? I didn't think that scenario, but yes, it's possible.

@akyrtzi
Copy link
Contributor

akyrtzi commented Aug 25, 2022

Going back to my earlier comment:

Could you clarify why you need RawSyntax.id and what it means for RawSyntax to be Identifiable? Am I getting the same RawSyntax.id for a func token anywhere it shows up, in the same tree or any tree?

Within this context I'm fine if the answer is that a RawSyntax.id may or may not be the same for all func tokens in the tree, it depends on the implementation. I think if it was exposed to clients it would be confusing but it's fine for internal use.

@rintaro
Copy link
Member Author

rintaro commented Aug 25, 2022

@akyrtzi

Going back to my earlier comment:

Could you clarify why you need RawSyntax.id and what it means for RawSyntax to be Identifiable? Am I getting the same RawSyntax.id for a func token anywhere it shows up, in the same tree or any tree?

I didn't think about it deeply. I was just adding RawSyntax.id because I thought Syntax.id can be root-RawSyntax.id+indexInTree. Now, I don't think RawSyntax having id makes much sense. RawSyntax's uniqueness is only valuable when it's used as a "root" node of the tree. So I'm not adding RawSyntax.id.

BTW, I don't think I fully understand your first argument that "Would this prevent sharing of RawSyntax objects?". I haven't come up with a scenario where identifying RawSyntax instances would prevent sharing them. Could you elaborate?

@akyrtzi
Copy link
Contributor

akyrtzi commented Aug 25, 2022

BTW, I don't think I fully understand your first argument that "Would this prevent sharing of RawSyntax objects?". I haven't come up with a scenario where identifying RawSyntax instances would prevent sharing them. Could you elaborate?

This depends on the implementation. Using the raw pointer should be fine (given they are immutable), but, for example, imagine we got rid of "rooId+indexInTree" and replaced it with a "RawSyntax.uniqueId" construct where every RawSyntax gets its own uniqueId, in this scenario it would prevent sharing them.

@ahoppen
Copy link
Member

ahoppen commented Aug 28, 2022

When you merge this, you can also remove _CSwiftSyntax from the CMake-based build system that Doug just added.

@rintaro
Copy link
Member Author

rintaro commented Aug 30, 2022

@akyrtzi Thank you for the explanation, and thank you for all your inputs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants