Add option to ignore HTML elements and content to MD044/proper-names #435

okalachev · 2021-10-05T13:25:08Z

Capital letters are rarely used in file names due to multiple reasons. But:

"MD044": {"names": ["JavaScript"]}

<img src="javascript.png">

Gives:

MD044/proper-names Proper names should have the correct capitalization [Expected: JavaScript; Actual: javascript]

This problem has appeared in v0.24.0.

The text was updated successfully, but these errors were encountered:

DavidAnson · 2021-10-05T15:08:44Z

There are a lot of places a file name might appear. The new implementation of this rule probably finds more instances which is why you've filed this. But I'm not sure I want to add a bunch of exceptions.

What about adding "javascript.png" to your allowed word list kind of how "GitHub"/"github.com" is handled here: https://github.com/DavidAnson/markdownlint/blob/main/test/proper-names-projects.json

okalachev · 2021-10-05T15:15:33Z

But it used to work in previous version the better way. It didn't find violations in file names, but it found everywhere else.

We don't use capitalization in file names, and I believe it's a general rule. And with the latest version I now have 159 errors in my documentation, and it doesn't make sense to add all these exceptions.

Isn't there any simple algorithm that can detect file name, or at least an HTML attribute src or href values + markdown images addresses or something like that?

DavidAnson · 2021-10-05T15:29:15Z

I think the previous implementation of this rule missed things like text in HTML alt and title attributes, so this is an improvement.

Are all 159 new issues in your case unique? If there are duplicates like I'm expecting, a few new entries in the word list will solve the problem. Can you give some other examples, please?

okalachev · 2021-10-06T22:58:32Z

@DavidAnson, I examined the errors, and I may say, the most of them are pretty unique. Because they are (suppose JavaScript is set as the proper spelling):

Unique image file names (in HTML). It doesn’t make any sense to check in HTML attributes like src and href (although I don't know if HTML gets parsed here).
Directories names in paths. There is no nice way to fix detections in:
```
<img src="img/javascript/1.png">
```
Except adding something weird like"javascript/" to the list.

HTML identifiers. Like:

<script type="text/javascript">

<a id="javascript-1">

And even:

<javascript/>

Identifiers in <code> tags:

This is not an error:
```
`javascript`
```
But this is:
```
<code>javascript</code>
```
Although it's obvious that code blocks should not be checked.

DavidAnson · 2021-10-07T01:00:09Z

It looks like the request to ignore file names would only avoid some of these scenarios, so would be a partial solution for you at best.

This project is a linting tool for Markdown and it seems all of your examples use HTML. It's true that HTML can be used in Markdown, but many consider that to be an anti-pattern. Rule MD033/no-inline-html warns against doing so.

I'm not expecting to add a bunch of HTML handling to support the variety of examples you show, but I will leave the issue open for comment.

okalachev · 2021-10-07T11:59:14Z

@DavidAnson, what if to add a parameter that would disable this rule for embedded HTML at all? So this rule would work as it had been working before. Anyway HTML is not get checked correctly. And anyway you say HTML is not recommended in Markdown.

That would work for me, I believe.

groenroos · 2022-02-19T02:26:57Z

We've just run into this issue as well with mixed Markdown and HTML, with instances in img tag srcs being flagged under this rule.

If there was an option to ignore HTML with this rule, I'd turn it on as preferable to the current situation; but as it's been pointed out, it would miss out on alt and title attributes that could contain user-facing text content, which wouldn't be ideal.

I think being able to ignore situations where the triggering word is "filename-y" (kebab-case, not surrounded by whitespace, etc.) would be very useful, probably realistic to implement, and getting closer (even if not perfect) to eliminating false positives.

As it stands now, we unfortunately have to turn off this rule completely.

chriswong · 2022-03-30T09:47:14Z

"MD044": {"names": ["JavaScript", "javascript"]}

I think both of JavaScript and javascript should be ok under this configuration, but JavaScript not.

DavidAnson · 2022-03-30T16:05:05Z

@chriswong I would have expected that as well. The code handles matching substrings by sorting in order of length, but that does not help here where two names differ only in case. I included an example of the problem below and have added a note to myself to look into this. Thank you!

https://dlaa.me/markdownlint/#%25m%23%20Issue%20%3F%3F%3F%0A%0AOkay%3A%20javascript.png%0A%0AOkay%3A%20JavaScript.png%0A%0ABad%3A%20JAVASCRIPT%0A%0A%3C!--%20markdownlint-configure-file%20%7B%0A%20%20%22MD044%22%3A%20%7B%22names%22%3A%20%5B%22JavaScript%22%2C%20%22javascript%22%5D%7D%0A%7D%20--%3E%0A

…he same name (ex: "Abc" and "ABC") (refs #435).

DavidAnson · 2022-04-26T04:55:17Z

The first commit addresses the javascript/JavaScript issue. The second commit addresses the first 3/4 examples in the detailed comment. It does detect <code>...</code> and ignore the inner content because a) that starts to require parsing complex HTML and b) that is better written with Markdown backticks (unlike the other examples which do not have exactly equivalent Markdown representations).

okalachev · 2022-05-05T10:02:41Z

@DavidAnson, thanks, I will try your fixes!

okalachev · 2022-06-16T11:37:53Z

@DavidAnson, it’s actually not so easy to try this out, as markdownlint-cli and markdownlint are different things, so I can’t easily make markdownlint-cli using the cloned version of markdownlint, not the downloaded from npm (which lacks this feature).

Is there an easy way to check?

DavidAnson · 2022-06-16T16:15:24Z

The easiest thing is probably to clone this repository into the markdownlint folder under node_modules of the CLI. Or wait a little longer because I am getting close to doing another round of releases.

okalachev · 2022-06-16T16:51:52Z

@DavidAnson, I was able to test it with a symlink, and html_elements: false fixed most of the false positives, that's great!

However, things like <code>javascript</code> are still incorrectly detected. Luckily, there are very few of them in my case.

okalachev · 2022-06-16T16:56:07Z

I read, that it's a known issue. In my case <code> is used because this text is inside an HTML table. And it's impossible to use markdown inside HTML table. And sometimes using HTML table is necessary.

okalachev added a commit to CopterExpress/clover that referenced this issue Oct 5, 2021

Use markdownlint-cli@0.28.1 as of DavidAnson/markdownlint#435

b249524

DavidAnson added the question label Oct 5, 2021

DavidAnson added enhancement and removed question labels Oct 7, 2021

DavidAnson changed the title ~~MD044/proper-names - don't check file names~~ Add option to ignore HTML elements and content to MD044/proper-names Oct 7, 2021

okalachev mentioned this issue Nov 16, 2021

MD044 flags words in markdown links #443

Closed

DavidAnson added a commit that referenced this issue Apr 23, 2022

Update MD044/proper-names to support specifying multiple casings of t…

8afec14

…he same name (ex: "Abc" and "ABC") (refs #435).

DavidAnson added the fixed in next label Apr 29, 2022

DavidAnson closed this as completed in 0f845e9 Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to ignore HTML elements and content to MD044/proper-names #435

Add option to ignore HTML elements and content to MD044/proper-names #435

okalachev commented Oct 5, 2021 •

edited

Loading

DavidAnson commented Oct 5, 2021

okalachev commented Oct 5, 2021

DavidAnson commented Oct 5, 2021

okalachev commented Oct 6, 2021

DavidAnson commented Oct 7, 2021

okalachev commented Oct 7, 2021

groenroos commented Feb 19, 2022

chriswong commented Mar 30, 2022

DavidAnson commented Mar 30, 2022

DavidAnson commented Apr 26, 2022

okalachev commented May 5, 2022

okalachev commented Jun 16, 2022 •

edited

Loading

DavidAnson commented Jun 16, 2022

okalachev commented Jun 16, 2022

okalachev commented Jun 16, 2022

Add option to ignore HTML elements and content to MD044/proper-names #435

Add option to ignore HTML elements and content to MD044/proper-names #435

Comments

okalachev commented Oct 5, 2021 • edited Loading

DavidAnson commented Oct 5, 2021

okalachev commented Oct 5, 2021

DavidAnson commented Oct 5, 2021

okalachev commented Oct 6, 2021

DavidAnson commented Oct 7, 2021

okalachev commented Oct 7, 2021

groenroos commented Feb 19, 2022

chriswong commented Mar 30, 2022

DavidAnson commented Mar 30, 2022

DavidAnson commented Apr 26, 2022

okalachev commented May 5, 2022

okalachev commented Jun 16, 2022 • edited Loading

DavidAnson commented Jun 16, 2022

okalachev commented Jun 16, 2022

okalachev commented Jun 16, 2022

okalachev commented Oct 5, 2021 •

edited

Loading

okalachev commented Jun 16, 2022 •

edited

Loading