Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sclang: encode all filesystem paths as UTF-8 internally #2861

Merged
merged 229 commits into from
Aug 11, 2017
Merged

sclang: encode all filesystem paths as UTF-8 internally #2861

merged 229 commits into from
Aug 11, 2017

Conversation

mossheim
Copy link
Contributor

@mossheim mossheim commented May 5, 2017

Fixes #2441 and fixes #2444.

Issue

The Windows NT filesystem encodes paths using UTF-16. POSIX systems use UTF-8. SCIDE/SCDoc encodes all strings as UTF-8 internally. This causes a problem where scide miscodes strings passed by sclang and fails to find files when a path contains a codepoint on which these encodings differ (specifically, any character that is not in the 7-bit ASCII set).

Proposed fix

Use the UTF-8 encoding for paths whenever they are passed to sclang, and use the native encoding whenever passing to the OS filesystem. This eliminates the encoding-agnostic behavior of sclang and also eliminates the encoding barrier between scide and sclang.

This fix also performs a couple other functions:

  • Eliminate asserts thrown when debugging on Windows while parsing help files with Japanese characters. (https://github.com/brianlheim/supercollider/commit/c1f372621f2f9044a607e5a7ed7cf7c4396d6845)
  • Use boost::filesystem namespace functions where possible. Using recursive_directory_iterator has led to an observed speedup in class library compilation of about 1.5x-2x.
  • Replace the SC_DirUtils translation unit with a new one, SC_Filesystem, that performs similar functions using boost::filesystem::path instead of char*. I changed the name because it's now a singleton class with state rather than a collection of utilities.
  • Rewrite the interface of SC_LanguageConfig to use boost::filesystem::path as its path representation type (instead of std::string).

Further reading:

http://utf8everywhere.org/

Testing tools:

Full test script to exercise all changed routines and components:
https://gist.github.com/brianlheim/ddfbf90c2bd8e178677d93f680c0f062

@mossheim mossheim added this to the 3.9 milestone May 5, 2017
@mossheim mossheim added comp: help schelp documentation comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library" os: Windows Work In Progress (WIP) - don't merge yet labels May 5, 2017
@bagong
Copy link
Contributor

bagong commented May 13, 2017

So how should we go about with this? I had a look, built it on Windows and Mac. It looks very promising (the speed is amazing ;) ), but it's obviously not yet in a "testable" state. So should we just be patient, or can we do something to help?

@mossheim
Copy link
Contributor Author

@bagong Thanks for looking into it!

  • I left about 20 or so "TODO"'s in the code, some of which are questions/uncertainties. You may be able to answer them!

  • The globbing functions need to be rewritten in SC_Filesystem_win.cpp. Specifically, globNext should return the result of ::FindNextFile prepended with the folder from which globbing began. So in other words if the input is C:\path\to\hell\* it should return C:\path\to\hell\file1.txt, C:\path\to\hell\dir2\ (note terminal \), etc.

  • I would love it if someone could quickly write a header to convert between std::string (UTF-8) and std::string (UTF-16). Pretty good solution is given here. And put the pathAsUTF8String functions from SC_Filesystem in there too. All the functions could be inline (header-only). That was what I was going to do next. ((Using the standard conversion operators is fine for now but it might be better to switch it to boost.locale (which isn't currently included in this repo) in the future, since that is a more robust implementation.))

@mossheim
Copy link
Contributor Author

I am sorry I haven't been able to work on it this week, by the way. Development requires me to boot Windows from a hard drive and it takes about an hour or two of use for VS to speed up to a good clip. I hope to set aside a few hours for it tomorrow!

@mossheim
Copy link
Contributor Author

mossheim commented May 15, 2017

Getting closer... fixed almost all the issues with globbing, and wrote a header for codepage conversions, I think that removed a couple TODOs. So I think most of the bigger architectural issues are solved.

summary TODO:

  • make sure all sclang primitives touched still behave as expected (within reason)
  • probably remove "Standalone" info from SC_Filesystem, and all the associated cruft, unless someone comes up with a good reason to keep it done
  • move "My documents" directory acquisition into SC_Filesystem for completeness/single-responsibility principle
  • also fix a few primitives: soundfile primitives (haven't touched yet), ~ File.open, File.getcwd~
  • make sure class library compilation still behaves the same (or better than it used to), especially on macOS w/r/t alias resolution. If anyone wants to do some cross-testing and make sure that behavior (alias missing target, alias pointing to directory, alias pointing to source file, etc.) is staying consistent, I would really appreciate it.

@bagong
Copy link
Contributor

bagong commented May 15, 2017

Unfortunately a build error on Windows:

 "C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\Install.vcxproj" (default target) (1) ->
       "C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\ALL_BUILD.vcxproj" (default target) (3) ->
       "C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\server\plugins\DelayUGens.vcxproj" (default target) (5) ->
       "C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\lang\sclang.vcxproj" (default target) (7) ->
       "C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\lang\libsclang.vcxproj" (default target) (18) ->
       (ClCompile target) ->
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(665): error C2338: The C++ Standard forbids containers of const elements because allocator<const T> is ill-formed. [C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\lang\libsclang.vcxproj]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(695): error C2535: 'const boost::filesystem::path *std::allocator<_Ty>::address(const boost::filesystem::path &) noexcept const': member function already defined or declared [C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\lang\libsclang.vcxproj]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(930): error C2535: 'const boost::filesystem::path *std::_Wrap_alloc<std::allocator<_Ty>>::address(const boost::filesystem::path &) const': member func

With more detail:

    18>C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(665): error C2338: The C++ Standard forb
       ids containers of const elements because allocator<const T> is ill-formed. [C:\Users\Rainer\Projects\sc\supercol
       lider\build_VS2015_64\lang\libsclang.vcxproj]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(883): note: see reference to class tem
         plate instantiation 'std::allocator<_Ty>' being compiled
                 with
                 [
                     _Ty=const SC_Filesystem::Path
                 ]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\vector(450): note: see reference to class templ
         ate instantiation 'std::_Wrap_alloc<std::allocator<_Ty>>' being compiled
                 with
                 [
                     _Ty=const SC_Filesystem::Path
                 ]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\vector(502): note: see reference to class templ
         ate instantiation 'std::_Vec_base_types<_Ty,_Alloc>' being compiled
                 with
                 [
                     _Ty=const SC_Filesystem::Path,
                     _Alloc=std::allocator<const SC_Filesystem::Path>
                 ]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\vector(680): note: see reference to class templ
         ate instantiation 'std::_Vector_alloc<std::_Vec_base_types<_Ty,_Alloc>>' being compiled
                 with
                 [
                     _Ty=const SC_Filesystem::Path,
                     _Alloc=std::allocator<const SC_Filesystem::Path>
                 ]
         C:\Users\Rainer\Projects\sc\supercollider\lang\LangPrimSource\PyrStringPrim.cpp(564): note: see reference to c
         lass template instantiation 'std::vector<const SC_Filesystem::Path,std::allocator<_Ty>>' being compiled
                 with
                 [
                     _Ty=const SC_Filesystem::Path
                 ]
    18>C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(695): error C2535: 'const boost::filesys
       tem::path *std::allocator<_Ty>::address(const boost::filesystem::path &) noexcept const': member function alread
       y defined or declared [C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\lang\libsclang.vcxproj]
                 with
                 [
                     _Ty=const SC_Filesystem::Path
                 ]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(689): note: see declaration of 'std::a
         llocator<_Ty>::address'
                 with
                 [
                     _Ty=const SC_Filesystem::Path
                 ]
    18>C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(930): error C2535: 'const boost::filesys
       tem::path *std::_Wrap_alloc<std::allocator<_Ty>>::address(const boost::filesystem::path &) const': member functi
       on already defined or declared [C:\Users\Rainer\Projects\sc\supercollider\build_VS2015_64\lang\libsclang.vcxproj
       ]
                 with
                 [
                     _Ty=const SC_Filesystem::Path
                 ]
         C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\xmemory0(924): note: see declaration of 'std::_
         Wrap_alloc<std::allocator<_Ty>>::address'
                 with
                 [
                     _Ty=const SC_Filesystem::Path
                 ]
         PyrSymbolPrim.cpp
    18>C:\Users\Rainer\Projects\sc\supercollider\common\SC_AllocPool.h(210): warning C4334: '<<': result of 32-bit shif
       t implicitly converted to 64 bits (was 64-bit shift intended?) [C:\Users\Rainer\Projects\sc\supercollider\build_
       VS2015_64\lang\libsclang.vcxproj]

@mossheim
Copy link
Contributor Author

Oops! I know what I did. Funny that VS 2013's static analysis (or compiler) didn't catch that.

@bagong
Copy link
Contributor

bagong commented May 15, 2017

Yea, I'm building with VS2015 and hope to switch to 2017 when Qt 5.9 is out, that will allow to use vcpkg, and has inbuilt cmake support.

@mossheim
Copy link
Contributor Author

@bagong It should build now, but I haven't confirmed on Windows. Also removed all the old standalone code, that felt good :)

@bagong
Copy link
Contributor

bagong commented May 15, 2017

Builds and works, and Help-generation is back again. This is starting to feel exciting! I don't have time to inspect further during the next hours, hopefully tonight!!
The subjective speed-gain feels more 10x than 1.5-2 times! ;)
This is getting ready for others to try!
If you want to add to your well-being, you may also consider platform/mac/Standalone Resources for removal ;)
Best and thanks!

But wait: the help system doesn't work yet, I was mislead because the "front-page" now comes up. But the links (except to undocumented classes) are still all broken... Will try to give more exact observations tonight.

@mossheim
Copy link
Contributor Author

But the links (except to undocumented classes) are still all broken...

I haven't gone down that route yet, but I was vaguely aware that would need fixing too. Thanks for the heads up :)

@mossheim
Copy link
Contributor Author

Thanks to TravisCI, I've been fixing a few build errors. The issue @bagong mentioned above with the help system will take some analysis. From what I've seen, the help browser isn't thinking about opening files; it looks to me like the datapath abstracts everything as URIs, which is a problem because I don't want to accidentally convert all URIs to a different encoding; http:// URLs can remain unchanged. So I think at some point along the datapath I'll have to insert a check to convert URIs starting with file:///... this code is new to me so it's taking some time to understand.

@mossheim
Copy link
Contributor Author

Hm, one of the problems is that QWebView functions are being called almost directly. I think the simplest solution may be to add some SuperCollider primitives to convert strings between UTF-8 and UTF-16, and call only when url scheme is "file://" AND OS is windows but I'm not 100% sure about this. I don't know where else the conversion would take place. I'll take another look at it when I'm less tired.

@mossheim
Copy link
Contributor Author

OK. I thought about it a bit and I can see two ways of doing this:

  1. Add a utf8-to-16 primitive to the SuperCollider language and use it along with platform switching to make sure the right strings are sent to the help browser. This would probably imply adding a utf16-to-8 primitive as well, for completeness. They wouldn't be much help outside this one situation for most users, but I can imagine that there may be future needs for it.

  2. Add new IDE/GUI code for handling HelpBrowser load requests that are made using file:// URLs, which would transcode from (assumed) UTF-8 to the proper format and then call QWebView.load. I think I prefer this approach because it keeps details at a lower level, as close to the call site as possible.

I'll give 2) a try and see how it feels. If I run into complications or it starts getting messy, I'll try the other approach.

@mossheim
Copy link
Contributor Author

Actually, this is (fingers crossed) turning out to be easier than I expected, and I can do things even closer to the filesystem call than I thought I could. The last commit (6fa61fb) shows where I think those conversions should go.

@mossheim
Copy link
Contributor Author

Ok, now globbing and file opening are fixed, and it seems like links are working in the help browser again. This is almost ready for testing @bagong, I just need to add some conversions for sound file primitives and those two other spots I noted above.

@mossheim
Copy link
Contributor Author

Oh, beautiful! I actually didn't need to fix those spots in the IDE code after all, they work already. @bagong (and @llloret ?) we can start testing now! As things settle I'll remove the extra debugging code and also tackle the sound-file primitives, but the main issues are now out of the way AFAICT!

@mossheim
Copy link
Contributor Author

mossheim commented May 27, 2017

A full list of primitives that need to be tested:

File.delete
File.mtime
File.exists
File.realpath
File.mkdir
File.copy
File.type
File.fileSize
File.getcwd
File.open
Pipe.open
SoundFile.openRead
SoundFile.openWrite
// the remainder of File/Pipe/SoundFile primitives operate on existing *FILE objects and so don't need to be tested

Platform.userHomeDir
Platform.systemAppSupportDir
Platform.userAppSupportDir
Platform.systemExtensionDir
Platform.userExtensionDir
Platform.userConfigDir
Platform.resourceDir
Platform.myDocumentsDir
Platform.ideName

LanguageConfig.getIncludePaths
LanguageConfig.getExcludePaths
LanguageConfig.currentPath
LanguageConfig.store
LanguageConfig.postInlineWarnings (setter and getter)

~string.pathMatch
~string.parseYAMLFile
~string.standardizePath

Sorry, I know that's a lot, but each should really only take one or two tests with a string like "我.txt".

@mossheim
Copy link
Contributor Author

wow, libsndfile has a Windows open method, great: http://www.mega-nerd.com/libsndfile/api.html#open

@bagong
Copy link
Contributor

bagong commented May 27, 2017

Great @brianlheim , I hope I can start tomorrow afternoon!

@mossheim
Copy link
Contributor Author

@bagong Awesome! I just finished cleaning up a bit and adding encoding conversions to the SoundFile primitives. I'll try to put together a comprehensive script we can use to test this; I've already made a start on it for my own purposes. The majority of it can be done without creating a test user account, but some of it will require that as well.

For some functions like pathMatch and standardizePath, we should compare behavior between old and new as those were completely rewritten. I'll note that as well in the script.

@snappizz asked what to test on Linux vs macOS. I think the only thing necessary to test is the default directories given by Platform primitives. Everything else is divided along POSIX/Windows lines AFAIK.

On macOS specifically, we will need to make sure that behavior has been preserved re: aliases, specifically with class library compilation.

@mossheim
Copy link
Contributor Author

Almost complete test script:

https://gist.github.com/brianlheim/7735a484e54b59dce5dd524651586d2a

@mossheim
Copy link
Contributor Author

I've run my test script on Windows and there appear to be no problems. All file primitives work as expected. I'm going to create a non-ASCII user next and test with that; hopefully I can at least get it off the ground.

@bagong
Copy link
Contributor

bagong commented May 29, 2017

Initial feedback, hopefully more tomorrow. Looks good! Help etc, are back. So I thought I'd give it a challenge and created an account for my friend ابو عزيز. Results are mixed. On first start help is generated but not displayed. On second startup it displays, and the links work. Yey! So writing and reading rendered helpfiles from userAppSupportDir seems to work.

Help-sources residing in the extensions folder are a different story, don't work yet (undocumented class is displayed rather than the respective help-file).

Maybe a few screen shots explain better:

screen shot 2017-05-29 at 17 18 14

screen shot 2017-05-29 at 18 26 51

screen shot 2017-05-29 at 18 35 08

Plugins in Extensions load fine. Not sure if they did before.

PS: unrelated but worth a mention: Quarks don't work in this account (I included that other PR in the build):
ERROR: Failed to read quarks directory listing: https://github.com/supercollider-quarks/quarks.git a PrimitiveFailedError

mossheim added 13 commits July 24, 2017 17:11
Change to a single if statement, make comment into a doxygen comment
For a reference to cerr
Remove todos based on obsolete standalone logic
Previous bad commit made files appear to be directories when they weren't.
Don't print from within SC_Filesystem::resolveIfAlias

Fix printing in pyrlexer to use pre-resolution path
Only print an error message. This allows the help system to run even if there's a bad symlink
@mossheim
Copy link
Contributor Author

mossheim commented Jul 24, 2017

just rebased on master. ( @patrickdupuis )

@patrickdupuis
Copy link
Contributor

When through the tests quickly. Everything seemed fine except for the symlink part. I'll run them again when I am a bit more rested and able to concentrate.

@mossheim
Copy link
Contributor Author

@patrickdupuis thanks—i have updated the test script with a note about the known change in the behavior of compilation with symlinks.

@patrickdupuis
Copy link
Contributor

Thanks for the note addition. That error message was the only issue I noticed the first time. Everything looks fine on a second go around 👍

@mossheim
Copy link
Contributor Author

@patrickdupuis Great! Thanks for looking at it again.

So, does anyone oppose merging this now? I would love to get started on a 3.9 release. I think any deeper/subtler bugs that I've introduced will be more efficiently found in beta testing or everyday use, and I can promise "long-term support" for the changes here.

The caveat is that the only Windows toolchain with guaranteed support after this will be VS 2013 / msvc 12. I know that is a bit of a let-down, but unfortunately nobody else is developing using those toolchains, so we have no dev power going to maintenance there. And of course, that locks out supernova and readline support as well for a Windows package.

(If necessary, we can break this topic out into a separate issue. This PR already has way too comments even though they're all on-topic.)

I've been very happy developing this on Windows in VS, and I would love to tackle some of these other support issues after this. It seems like they are pretty manageable once we have moved past Qt 5.5 and can bring in VS 2017. Once this merges, I intend to put time into @scztt's webengine branch and get us on Qt 5.9 ASAP. I tried it out last weekend and got it building in VS 2017 with one minor fix.

I got a slow start on Windows dev because I felt obliged to learn 3 toolchains and an OS at the same time; the documentation is quite messy and confusing, somewhat necessarily, because no toolchain has been deemed "official" and each has significant versioning and capability issues. Windows is not a special case; we could try to add support for the Intel and IBM compilers or gcc on macOS, but Xcode/clang is what we choose to promote because it works and it's generally more comfortable (in my limited experience) to develop with that toolchain on macOS. If we could commit to one toolchain and advertise that, I think that would help with future contributions, and it would make me much more confident that I'm spending my time fruitfully.

@telephon
Copy link
Member

this sounds very reasonable. Thanks a lot!

@nhthn
Copy link
Contributor

nhthn commented Jul 30, 2017

i will be looking over this today.

@mossheim
Copy link
Contributor Author

mossheim commented Aug 4, 2017

ping @snappizz

@nhthn nhthn merged commit b1fd73c into supercollider:master Aug 11, 2017
@mossheim mossheim deleted the bfs/sc-filesystem-working-on-windows branch August 11, 2017 17:57
@mossheim
Copy link
Contributor Author

🎉 thanks again to everyone for their help with this

@telephon
Copy link
Member

thanks!!

@telephon
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: help schelp documentation comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants