Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module,win: make module cache case-insensitive #54478

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

huseyinacacak-janea
Copy link
Contributor

This PR makes the module cache object case-insensitive by utilizing Proxy in JS.

Fixes: #54132

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. labels Aug 21, 2024
@avivkeller avivkeller added the windows Issues and PRs related to the Windows platform. label Aug 21, 2024
Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.06%. Comparing base (2c14615) to head (f038f19).
Report is 614 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #54478      +/-   ##
==========================================
+ Coverage   87.08%   88.06%   +0.98%     
==========================================
  Files         648      652       +4     
  Lines      182341   183578    +1237     
  Branches    34982    35866     +884     
==========================================
+ Hits       158783   161671    +2888     
+ Misses      16831    15155    -1676     
- Partials     6727     6752      +25     
Files with missing lines Coverage Δ
lib/internal/modules/cjs/loader.js 96.14% <100.00%> (+3.21%) ⬆️

... and 213 files with indirect coverage changes

@@ -317,6 +318,44 @@ Module.globalPaths = [];

let patched = false;

/* Make Module._cache case-insensitive on Windows */
if (isWindows) {
/* Create a proxy handler to intercept some operations */
Copy link
Member

@joyeecheung joyeecheung Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a proxy is the right way to fix it - it slows down all access to the cache which is on a hot path. If we want to do this I suspect this is beyond just the cache - the keys are computed by Module._resolveFilename and are attached to the modules as module.filename so that needs to be modified for consistency. Also, the cache is shared by the ESM loader, so something needs to be done for ESM as well for it to be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review.
I couldn't modify the function Module._resolveFilename because the require.cache variable can be used by the user directly. Here you can see.

Since the variable can be used without any get/set functions, I thought that overloading the operations could fix this issue. Then, I found Proxy and used it. You can see the benchmark results below:

                                          confidence improvement accuracy (*)   (**)  (***)
module\\module-loader-circular.js n=10000                -0.02 %       ±2.86% ±3.81% ±4.97%

I'm open to suggestions.

Copy link
Member

@joyeecheung joyeecheung Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.cache is mostly meant to be used with a filename previously resolved by Node.js (which is usually from Module._resolveFilename()). Only modifying Module._cache leads to inconsistency if users then try to use the mod.filename or __filename somewhere, or try to use it together with ESM (e.g. importing a URL constructed from that filename).

module-loader-circular.js isn't a very suitable benchmark because it deletes from the module cache, which is already not a fast path; You'd need to benchmark the common fast path where a module gets re-required and actually hitting the cache.

Copy link
Contributor Author

@huseyinacacak-janea huseyinacacak-janea Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the benchmark test by applying the following patch:

diff --git a/benchmark/module/module-loader-circular.js b/benchmark/module/module-loader-circular.js
index db382142c2e..e712f4bf0fa 100644
--- a/benchmark/module/module-loader-circular.js
+++ b/benchmark/module/module-loader-circular.js
@@ -26,6 +26,11 @@ function main({ n }) {
     require(bDotJS);
     delete require.cache[aDotJS];
     delete require.cache[bDotJS];
+
+    require(aDotJS);
+    require(bDotJS);
+    require(aDotJS);
+    require(bDotJS);
   }
   bench.end(n);
 

The result of this benchmark is below.

                                          confidence improvement accuracy (*)   (**)  (***)
module\\module-loader-circular.js n=10000                -1.55 %       ±4.69% ±6.25% ±8.15%

Do you think this test is sufficient for benchmarking?

You wrote about inconsistency but I couldn't quite understand your concern. Could you please give me an example or point me to an existing test?
I would appreciate your help if you have something in your mind to fix this inconsistency.

Copy link
Member

@joyeecheung joyeecheung Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference comes more from the delete, so they should be removed to check the impact on cache hits.

You wrote about inconsistency but I couldn't quite understand your concern. Could you please give me an example or point me to an existing test?

For example, Object.keys(require.cache).includes(require.resolve('TEST')) will not be the same as !!require.cache[require.resolve('TEST')], if the resolution is not updated (the key can also be just __filename from some module, or any other place to get a resolved file name). Also, import is still case-sensitive, which doesn't seem right if only require becomes case-insensitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed response. I completely understand your concern. I'll do my best to explore potential solutions to this problem and ensure consistency. After that, we can discuss and decide on the best course of action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further investigation, I found the place where the imported modules are cached. Then, I made import case insensitive using the attached patch. However, this test failed. As shown, String objects are not accepted in Node.js, which suggests that making imports case-insensitive may not be very practical.

Considering all these factors, I believe that it may not be worthwhile to pursue making both require and import case-insensitive.

Cache patch
diff --git a/lib/internal/modules/esm/module_map.js b/lib/internal/modules/esm/module_map.js
index 0040ff5f5d1..b524bec1488 100644
--- a/lib/internal/modules/esm/module_map.js
+++ b/lib/internal/modules/esm/module_map.js
@@ -91,7 +91,7 @@ class LoadCache extends SafeMap {
   get(url, type = kImplicitTypeAttribute) {
     validateString(url, 'url');
     validateString(type, 'type');
-    return super.get(url)?.[type];
+    return super.get(url.toLowerCase())?.[type];
   }
   set(url, type = kImplicitTypeAttribute, job) {
     validateString(url, 'url');
@@ -107,15 +107,15 @@ class LoadCache extends SafeMap {
     }) in ModuleLoadMap`);
     const cachedJobsForUrl = super.get(url) ?? { __proto__: null };
     cachedJobsForUrl[type] = job;
-    return super.set(url, cachedJobsForUrl);
+    return super.set(url.toLowerCase(), cachedJobsForUrl);
   }
   has(url, type = kImplicitTypeAttribute) {
     validateString(url, 'url');
     validateString(type, 'type');
-    return super.get(url)?.[type] !== undefined;
+    return super.get(url.toLowerCase())?.[type] !== undefined;
   }
   delete(url, type = kImplicitTypeAttribute) {
-    const cached = super.get(url);
+    const cached = super.get(url.toLowerCase());
     if (cached) {
       cached[type] = undefined;
     }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown, String objects are not accepted in Node.js, which suggests that making imports case-insensitive may not be very practical.

I am not sure what's the correlation between the two? The test you are pointing to is the loader test which is an experimental feature for customizing the loader, and it's the customization that is returning String objects. The internal loader can do whatever coercion necessary to make it work.

@huseyinacacak-janea
Copy link
Contributor Author

Is there anything else I can do to help this PR move forward?

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 3, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 3, 2024
@nodejs-github-bot
Copy link
Collaborator

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
Copy link
Contributor

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call the actual Reflect methods instead of returning blindly true, so the behavior is guaranteed to be the same as without the proxy.

lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/cjs/loader.js Show resolved Hide resolved
@@ -321,35 +324,24 @@ let patched = false;
/* Make Module._cache case-insensitive on Windows */
if (isWindows) {
/* Create a proxy handler to intercept some operations */
const toLowerCaseIfString = (prop) => (typeof prop === 'string' ? StringPrototypeToLowerCase(prop) : prop);
Copy link
Member

@joyeecheung joyeecheung Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is StringPrototypeToLowerCase the right way to convert the paths to be case-insensitive on Windows? String.prototype.toLowerCase follows the unicode case-folding rules, https://tc39.es/ecma262/multipage/text-processing.html#sec-string.prototype.tolowercase , I can't really find proper documentation about the case-folding rules of Windows so far (it seems FS-dependent), but my experience dealing with Windows tells me it's probably something that predates unicode (surely they had this before 1991, and Windows tends to be very backwards-compatible) so it's unlikely to be consistent with the output of String.prototype.toLowerCase().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Internet, at least https://archives.miloush.net/michkap/archive/2005/01/16/353873.html suggests that the path case folding rules on Windows differed from unicode for some early versions and didn't change much later - I doubt if they have switched to be aligned with unicode.

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 12, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 12, 2024
@nodejs-github-bot
Copy link
Collaborator

@StefanStojanovic StefanStojanovic added the blocked PRs that are blocked by other issues or PRs. label Oct 14, 2024
@StefanStojanovic
Copy link
Contributor

I've added the blocked label, so this is not merged accidentally if the CI passes while it is still being discussed.

@mcollina mcollina added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Oct 15, 2024
@mcollina
Copy link
Member

Adding it to the @nodejs/tsc agenda for visibility

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a couple of issues with this.
Not only do we have issues around what should work case-insensitive (path, cjs, esm, only changed values, etc.) but it is also possible to change case-insensitivity on Windows itself (while it's not recommended by Windows).

This is something that would probably require a lot of careful thought.

@mhdawson
Copy link
Member

Discussed in todays TSC meeting, consensus on those present (it was a smaller number 7) was that it does need more work to make the treatment of case more consistent across the different aspects as mentioned by @joyeecheung (ESM, invoking resultion functions, or looking at result of filename) and how windows treats case insensitivity before it should land.

@mhdawson mhdawson removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Oct 23, 2024
@ljharb
Copy link
Member

ljharb commented Oct 23, 2024

I’m a bit confused - there exist non-lowercase npm package names, for example https://www.npmjs.com/package/jsonstream and https://www.npmjs.com/package/JSONStream. Both of these need to be independently requireable and importable. Does this PR only apply to relative paths?

@huseyinacacak-janea
Copy link
Contributor Author

I’m a bit confused - there exist non-lowercase npm package names, for example https://www.npmjs.com/package/jsonstream and https://www.npmjs.com/package/JSONStream. Both of these need to be independently requireable and importable. Does this PR only apply to relative paths?

When you install a module in a project, a folder with the same name as the module is created in node_modules. I guess it should not be possible to install these two modules at the same time in a project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. windows Issues and PRs related to the Windows platform.
Projects
None yet
10 participants