Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid dubble newlines in csv output #366

Merged
merged 4 commits into from
May 4, 2020

Conversation

joelostblom
Copy link
Contributor

I noticed that an extra newline character is added to each line of the --csv outputs
(displayed as ^M in vim):

id,start,stop,project,tags^M
76575e4,2020-04-22 19:48:43,2020-04-22 21:15:51,tweaks,optimization^M
128fb31,2020-04-22 21:18:24,2020-04-22 23:19:26,test-proj,"tag1, tag2"^M

This is not needed and can cause some programs to accidentally read in a blank line between each line in the csv output:

id,start,stop,project,tags

76575e4,2020-04-22 19:48:43,2020-04-22 21:15:51,tweaks,optimization

128fb31,2020-04-22 21:18:24,2020-04-22 23:19:26,test-proj,"tag1, tag2"

This PR is a suggestion for how to fix this and change the output of --csv to

id,start,stop,project,tags
76575e4,2020-04-22 19:48:43,2020-04-22 21:15:51,tweaks,optimization
128fb31,2020-04-22 21:18:24,2020-04-22 23:19:26,test-proj,"tag1, tag2"

I have only tested this in vim and gedit on Linux.

@k4nar
Copy link
Collaborator

k4nar commented Apr 27, 2020

We use the default configuration of DictWriter, which is tuned to be close to what Excel expects. Excel being a product from the Windows world, \r\n is used instead of just \n.

Now changing that would probably break compatibility with quite a few integrations. I think we should keep the default as it looks sane (it's the default of most applications generating CSV as far as I can tell), and trimming the CRLF can be done with Unix tools.

@joelostblom
Copy link
Contributor Author

Thanks for the comment @k4nar. I believe Excel is compatible with \n (only have LibreOffice installed to test and it generates \n by default), but if there are already integrations relying on the \r\n options, then maybe I could add a config option for this instead?

@joelostblom
Copy link
Contributor Author

I made a suggestion for a configurable \n line terminator in my latest commits.

@jmaupetit
Copy link
Contributor

I have mixed feelings about this... I think most UNIX users are able to use dos2unix or sed magic to fix their CSV (if required). Maybe I'm a little biased about this.

@joelostblom
Copy link
Contributor Author

Yes it is true that it is possible to use those tools. I personally think it would be more convenient to have it as an option in the configuration. Do you think there is a drawback with adding it, like not cluttering the config file with too many possible options?

@jmaupetit
Copy link
Contributor

Do you think there is a drawback with adding it, like not cluttering the config file with too many possible options?

Yup.

@joelostblom
Copy link
Contributor Author

Makes sense, what do you think about setting the lineterminator to os.linesep instead of adding a config option? That way, users of either platform would get what they expect. This is how the Python data analysis library pandas handles writing of csv files:

line_terminator str, optional
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).

@jmaupetit
Copy link
Contributor

Good idea! I wonder why this is not the default for the csv module.

@k4nar
Copy link
Collaborator

k4nar commented Apr 30, 2020

Ok for me 👍 .

@joelostblom
Copy link
Contributor Author

Great! I added this change and removed the previous ones.

Copy link
Contributor

@davidag davidag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joelostblom, I believe you should also change a couple tests in test_utils.py.

@joelostblom
Copy link
Contributor Author

Thanks @davidag, I updated the two tests that used a lineterminator variable explicitly and added a note in the changelog.

@jmaupetit
Copy link
Contributor

Hey @joelostblom can you rebase your branch so that I can merge this. Thanks!

@joelostblom
Copy link
Contributor Author

@jmaupetit Rebased!

@jmaupetit
Copy link
Contributor

🤔 I guess a new arrow release is breaking the build... again. 😞

@joelostblom
Copy link
Contributor Author

Yes, i submitted another PR to pin the arrow version #372

@jmaupetit
Copy link
Contributor

Yup saw it, merged it! Ready for another rebase?

@joelostblom
Copy link
Contributor Author

Done!

@jmaupetit jmaupetit merged commit 417f01e into jazzband:master May 4, 2020
@jmaupetit jmaupetit mentioned this pull request May 27, 2020
davidag added a commit to davidag/xtimetracker that referenced this pull request Aug 6, 2020
Inspired by @joelostblom in jazzband/Watson#366 which himself got
inspired in Pandas CSV output default behavior.

pandas-dev/pandas@31f86d6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants