Better handling of duplicate headers when parsing chunks #1017

stuart-marshall · 2023-08-22T05:20:23Z

The previous fix for cursor management when there are duplicate headers worked for the faseMode parse path, but not for the slow mode parse path (that handles quotes).

This change is more general and works for both parse paths.

I also spotted a bug in handling quotes at the end of the file when the headers change.

Added new tests for both of these issues.

…ate-header More fix cursor for duplicate header

pokoli

Thanks for your PR. I've added some comments.
It is not clear for me what the new test is doing. IT will be great if you can explain it.

pokoli · 2023-08-22T06:43:41Z

tests/node-tests.js

+				}
+			},
+			complete: function() {
+				assert(startsWithEtiamOrLorem);


I'm not sure which is the objective of such test.
Could add more assertions? Can we just test the stepped variable to ensure it has steeped correctly?

pokoli · 2023-08-22T06:45:04Z

tests/node-tests.js

@@ -164,6 +164,46 @@ describe('PapaParse', function() {
 		});
 	});

+	it('Checks cursor when file is large and has duplicate headers', function(done) {
+		this.timeout(30000);


Why is the timeout required? I think it should be removed.

pokoli · 2023-08-22T06:47:06Z

papaparse.js

@@ -1508,7 +1508,17 @@ License: MIT
 				if (duplicateHeaders) {
 					var editedInput = input.split(newline);
 					editedInput[0] = Array.from(headerMap).join(delim);
+					// If we change the size of the input due to duplicate headers


Could we make the comment smaller?

What about: store the number of removed chars to correctly report meta.cursor ?

pokoli · 2023-08-22T06:48:40Z

papaparse.js

@@ -1517,12 +1527,7 @@ License: MIT
 				for (var i = 0; i < rows.length; i++)
 				{
 					row = rows[i];
-					// use firstline as row length may be changed due to duplicated headers
-					if (i === 0 && firstLine !== undefined) {


As firstline is no longer used, probably it can be removed as variable

stuart-marshall and others added 4 commits August 20, 2023 16:30

Handle header line expansion for all parser modes

9a122a8

Add test for chunked parsing with duplicate header

0136a53

Add test for trailing quote with renamed headers. Adjust code comment.

ad773e8

Merge pull request #1 from stuart-marshall/more-fix-cursor-for-duplic…

b24698c

…ate-header More fix cursor for duplicate header

pokoli requested changes Aug 22, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of duplicate headers when parsing chunks #1017

Better handling of duplicate headers when parsing chunks #1017

stuart-marshall commented Aug 22, 2023

pokoli left a comment

pokoli Aug 22, 2023

pokoli Aug 22, 2023

pokoli Aug 22, 2023

pokoli Aug 22, 2023

Better handling of duplicate headers when parsing chunks #1017

Are you sure you want to change the base?

Better handling of duplicate headers when parsing chunks #1017

Conversation

stuart-marshall commented Aug 22, 2023

pokoli left a comment

Choose a reason for hiding this comment

pokoli Aug 22, 2023

Choose a reason for hiding this comment

pokoli Aug 22, 2023

Choose a reason for hiding this comment

pokoli Aug 22, 2023

Choose a reason for hiding this comment

pokoli Aug 22, 2023

Choose a reason for hiding this comment