Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique IDs for headers (avoid duplicates) #1932

Open
10 tasks
mpacer opened this issue Mar 15, 2017 · 7 comments · May be fixed by #16845
Open
10 tasks

Unique IDs for headers (avoid duplicates) #1932

mpacer opened this issue Mar 15, 2017 · 7 comments · May be fixed by #16845

Comments

@mpacer
Copy link
Member

mpacer commented Mar 15, 2017

This is a followup to the discussion began in #1736 around using UUIDs as part of the automatic id generation scheme for headers.

Basic problem layout: right now, if you have a notebook and you have

### my header 

### my header

in the same notebook, both headers will have the same id (specifically #my-header). Aside from violating html standards for id uniqueness, this makes it impossible to link to the second instance of the header even though it assigned an anchor link.

This problem is dramatically worse for jupyterlab due to the possibility of multiple notebooks being in the same window at the same time (not only because it is more likely that header text is shared between notebooks(in general), but because of the combinatorial growth of opportunities to conflict that arise with each additional notebook injecting ids into the namespace).

One proposed solution to this problem is to instead build ids that contain UUIDs for identifying (at least) the notebook from the id originates.

The main problem I see with this approach is that this eliminates the ease of use for writing links to different parts of the document. Not only does no one want to have to type [link to header](#link-to-header-<header_uuid>) in order to link, what's worse is that there's no way to write the link without rendering the cell and looking at the actual anchor that was generated.

But there are conceptual issues as well: for example, if you ever copy and paste a cell with a UUID already being present — do you create a new one and therefore break any links that also are copied and pasted?

There are a lot of issues embedded in these topics but I've tried to distill them to specific use cases for ids and where challenges might arise:

  • Linking to headers headers in the same notebook
    • Multiple headers have the same text, ids conflict
    • Manually specified ids that conflict with autogenerated ids
    • Addition of a new cell with a header with the same text "stealing" the targets of the postceding header (that it originally pointed to)
    • Cross notebook id conflicts, however the DOM resolves the ordering of the objects, the first id will be the referred item.
  • Cross notebook linking
  • Copying and pasting cells
    • Copying and pasting cells between notebooks
  • Multiple views on the same document (e.g., a live nbconvert html preview)
  • Deterministic, robust link resolution not relying on content persistence
    E.g., Header has id 1 , changing the text of an header, render (generates a new id 2), breaks links pointing to id 1 , changing the header text back to original, render (generates a new id 3), does 1==3?
@mpacer
Copy link
Member Author

mpacer commented Mar 15, 2017

Forgot to tag people I've talked about this with @blink1073 @Carreau @ian-r-rose @takluyver.

@Carreau
Copy link
Contributor

Carreau commented Mar 15, 2017

You forget external links to a particular section IIUC, like a number of repos have (for example in their README) [blah](example/notebook.ipynb#foo), if you stick a auto generated UUID, even if you can resolve it in document that may be an issue to update later.

My take on this is:

  • if conflict in same notebook : warn user.
  • iterate later on how to select an id manually.

The second one might be tackled lated, and will also help when section get renamed/merge.
We can go for manually tagging cells to have a list of fallback id and inject dummy <span>s, as long as we link close enough to the target.

@blink1073
Copy link
Contributor

blink1073 commented Mar 16, 2017

This illustrates what I mean about UUIDs: The user could write [blah](#foo) for local links, and [blah](example/notebook.ipynb#foo) for external links. The header itself gets a DOM id based on example/notebook.ipynb. The local link and the external link would both be handled to point to that known id on the page.

@blink1073
Copy link
Contributor

(updated the comment above).

@mpacer
Copy link
Member Author

mpacer commented Mar 18, 2017

@Carreau That use case was what I meant by cross notebook linking. If you weren't linking to a particular section (or more generally a particular id) then the anchors themselves wouldn't come into play.

@blink1073 Is the idea to have people write their links using #foo and then intercept the target of said links (when rendered) to have them point at not the id foo but a custom UUID based id? Does that mean that when rendered to the page does the href for the link in question actually change? Or will it appear to link to the "wrong" thing? My guess is (based on the bug that initially got @ian-r-rose and I to propose (#1945) is that the href will appear to be different.

Looking at the href of the that is visible on hovering over a header, has historically been the best way to unambiguously identify the link to a header without diving into the actual DOM to find the id. That is, it's what I've been telling people for ages (i.e., long before my postdoc) who see internal links and ask how to start using them in their notebooks.

If we're going to go this route for UUIDs, we should probably surface a better way to handle within document links (let alone between document links). How hard would it be to extend codemirror to allow us to inject tab completion on seeing a (…)[ followed by a # to complete to the preferred textual representation of all the header ids in the document.

If we could get that to work we could probably do more and also do tab completion for […]( followed by ~, or ./ (or other OS specific path prefixes) to documents that are currently viewed somewhere the jupyterlab interface. (My guess is it would be much harder to try to tab complete from the actual file system vs. tab completing from what is currently available in the jupyterlab DOM since there is a widget view in which the document is loaded as its own DOM-like object).


I'm not sure if this should be a new issue, but @ian-r-rose and I were working on this (e.g., #1945), which led us to realise that this is a problem not just for the notebook but for markdown in general. This is because we're using the same rendermime to handle all instances of markdown regardless of the file format they're in.

I kinda like that idea, but it almost certainly will mean that we're going to run into issues with people's expectations around ids, linking &c. & how other services render markdown. For example, # anything {my-id} doesn't manually assign an id in github rendered markdown while it would assign the id (#my-id) in pandoc rendered markdown. However, auto id generation is available when you look at a README.md on the github page, and if you only have the notebook header ids, your links will fail on the github page (and vice versa).

Is there a downside to including phantom anchors to hold ids on behalf of different formats manner for auto-generating header ids? Because if there really isn't, then I think I have a thought about how to handle this particular problem.


UUIDs don't solve the multiple ids in the same notebook issue, I think the best chance of that might be to prioritise making manual ids possible (exclusively via the # header text {id-text}) syntax and produce warnings as @Carreau suggests.

@blink1073
Copy link
Contributor

@mpacer, the way we are handling this for links to documents in JLab is to use the command linker, which adds a data attribute to the anchor that, when clicked, runs a command in JLab to open the path rather than the default action of the click. This could be extended to include the hash, so that the href of the anchor would not have a uuid but the action of clicking it would have the path encoded.

cf

linkHandler.handleLink(anchor, path);

* The global click handler that deploys commands/argument pairs that are

@jasongrout jasongrout modified the milestones: 1.0, Future Sep 5, 2018
@JasonWeill JasonWeill changed the title UUIDs for header ids to avoid the duplicate id problem Unique IDs for headers (avoid duplicates) May 26, 2022
@JasonWeill
Copy link
Contributor

From #12402: Headers should contain ASCII characters only for maximum compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants