Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of duplicate headers when parsing chunks #1017

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

stuart-marshall
Copy link

The previous fix for cursor management when there are duplicate headers worked for the faseMode parse path, but not for the slow mode parse path (that handles quotes).

This change is more general and works for both parse paths.

I also spotted a bug in handling quotes at the end of the file when the headers change.

Added new tests for both of these issues.

Copy link
Collaborator

@pokoli pokoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR. I've added some comments.
It is not clear for me what the new test is doing. IT will be great if you can explain it.

}
},
complete: function() {
assert(startsWithEtiamOrLorem);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which is the objective of such test.
Could add more assertions? Can we just test the stepped variable to ensure it has steeped correctly?

@@ -164,6 +164,46 @@ describe('PapaParse', function() {
});
});

it('Checks cursor when file is large and has duplicate headers', function(done) {
this.timeout(30000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the timeout required? I think it should be removed.

@@ -1508,7 +1508,17 @@ License: MIT
if (duplicateHeaders) {
var editedInput = input.split(newline);
editedInput[0] = Array.from(headerMap).join(delim);
// If we change the size of the input due to duplicate headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the comment smaller?

What about: store the number of removed chars to correctly report meta.cursor ?

@@ -1517,12 +1527,7 @@ License: MIT
for (var i = 0; i < rows.length; i++)
{
row = rows[i];
// use firstline as row length may be changed due to duplicated headers
if (i === 0 && firstLine !== undefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As firstline is no longer used, probably it can be removed as variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants