Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to ignore HTML elements and content to MD044/proper-names #435

Closed
okalachev opened this issue Oct 5, 2021 · 15 comments
Closed

Comments

@okalachev
Copy link

okalachev commented Oct 5, 2021

Capital letters are rarely used in file names due to multiple reasons. But:

"MD044": {"names": ["JavaScript"]}
<img src="javascript.png">

Gives:

MD044/proper-names Proper names should have the correct capitalization [Expected: JavaScript; Actual: javascript]

This problem has appeared in v0.24.0.

okalachev added a commit to CopterExpress/clover that referenced this issue Oct 5, 2021
@DavidAnson
Copy link
Owner

There are a lot of places a file name might appear. The new implementation of this rule probably finds more instances which is why you've filed this. But I'm not sure I want to add a bunch of exceptions.

What about adding "javascript.png" to your allowed word list kind of how "GitHub"/"github.com" is handled here: https://github.com/DavidAnson/markdownlint/blob/main/test/proper-names-projects.json

@okalachev
Copy link
Author

But it used to work in previous version the better way. It didn't find violations in file names, but it found everywhere else.

We don't use capitalization in file names, and I believe it's a general rule. And with the latest version I now have 159 errors in my documentation, and it doesn't make sense to add all these exceptions.

Isn't there any simple algorithm that can detect file name, or at least an HTML attribute src or href values + markdown images addresses or something like that?

@DavidAnson
Copy link
Owner

I think the previous implementation of this rule missed things like text in HTML alt and title attributes, so this is an improvement.

Are all 159 new issues in your case unique? If there are duplicates like I'm expecting, a few new entries in the word list will solve the problem. Can you give some other examples, please?

@okalachev
Copy link
Author

@DavidAnson, I examined the errors, and I may say, the most of them are pretty unique. Because they are (suppose JavaScript is set as the proper spelling):

  • Unique image file names (in HTML). It doesn’t make any sense to check in HTML attributes like src and href (although I don't know if HTML gets parsed here).

  • Directories names in paths. There is no nice way to fix detections in:

    <img src="img/javascript/1.png">

    Except adding something weird like"javascript/" to the list.

  • HTML identifiers. Like:

    <script type="text/javascript">
    <a id="javascript-1">

    And even:

    <javascript/>
  • Identifiers in <code> tags:

    This is not an error:

    `javascript`

    But this is:

    <code>javascript</code>

    Although it's obvious that code blocks should not be checked.

@DavidAnson
Copy link
Owner

It looks like the request to ignore file names would only avoid some of these scenarios, so would be a partial solution for you at best.

This project is a linting tool for Markdown and it seems all of your examples use HTML. It's true that HTML can be used in Markdown, but many consider that to be an anti-pattern. Rule MD033/no-inline-html warns against doing so.

I'm not expecting to add a bunch of HTML handling to support the variety of examples you show, but I will leave the issue open for comment.

@okalachev
Copy link
Author

@DavidAnson, what if to add a parameter that would disable this rule for embedded HTML at all? So this rule would work as it had been working before. Anyway HTML is not get checked correctly. And anyway you say HTML is not recommended in Markdown.

That would work for me, I believe.

@DavidAnson DavidAnson changed the title MD044/proper-names - don't check file names Add option to ignore HTML elements and content to MD044/proper-names Oct 7, 2021
@groenroos
Copy link

We've just run into this issue as well with mixed Markdown and HTML, with instances in img tag srcs being flagged under this rule.

If there was an option to ignore HTML with this rule, I'd turn it on as preferable to the current situation; but as it's been pointed out, it would miss out on alt and title attributes that could contain user-facing text content, which wouldn't be ideal.

I think being able to ignore situations where the triggering word is "filename-y" (kebab-case, not surrounded by whitespace, etc.) would be very useful, probably realistic to implement, and getting closer (even if not perfect) to eliminating false positives.

As it stands now, we unfortunately have to turn off this rule completely.

@chriswong
Copy link

"MD044": {"names": ["JavaScript", "javascript"]}

I think both of JavaScript and javascript should be ok under this configuration, but JavaScript not.

@DavidAnson
Copy link
Owner

@chriswong I would have expected that as well. The code handles matching substrings by sorting in order of length, but that does not help here where two names differ only in case. I included an example of the problem below and have added a note to myself to look into this. Thank you!

https://dlaa.me/markdownlint/#%25m%23%20Issue%20%3F%3F%3F%0A%0AOkay%3A%20javascript.png%0A%0AOkay%3A%20JavaScript.png%0A%0ABad%3A%20JAVASCRIPT%0A%0A%3C!--%20markdownlint-configure-file%20%7B%0A%20%20%22MD044%22%3A%20%7B%22names%22%3A%20%5B%22JavaScript%22%2C%20%22javascript%22%5D%7D%0A%7D%20--%3E%0A

DavidAnson added a commit that referenced this issue Apr 23, 2022
@DavidAnson
Copy link
Owner

The first commit addresses the javascript/JavaScript issue. The second commit addresses the first 3/4 examples in the detailed comment. It does detect <code>...</code> and ignore the inner content because a) that starts to require parsing complex HTML and b) that is better written with Markdown backticks (unlike the other examples which do not have exactly equivalent Markdown representations).

@okalachev
Copy link
Author

@DavidAnson, thanks, I will try your fixes!

@okalachev
Copy link
Author

okalachev commented Jun 16, 2022

@DavidAnson, it’s actually not so easy to try this out, as markdownlint-cli and markdownlint are different things, so I can’t easily make markdownlint-cli using the cloned version of markdownlint, not the downloaded from npm (which lacks this feature).

Is there an easy way to check?

@DavidAnson
Copy link
Owner

The easiest thing is probably to clone this repository into the markdownlint folder under node_modules of the CLI. Or wait a little longer because I am getting close to doing another round of releases.

@okalachev
Copy link
Author

@DavidAnson, I was able to test it with a symlink, and html_elements: false fixed most of the false positives, that's great!

However, things like <code>javascript</code> are still incorrectly detected. Luckily, there are very few of them in my case.

@okalachev
Copy link
Author

I read, that it's a known issue. In my case <code> is used because this text is inside an HTML table. And it's impossible to use markdown inside HTML table. And sometimes using HTML table is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants