Skip to content

Commit

Permalink
Version 422
Browse files Browse the repository at this point in the history
  • Loading branch information
hydrusnetwork committed Dec 16, 2020
1 parent c3100f7 commit ff51cf4
Show file tree
Hide file tree
Showing 45 changed files with 1,840 additions and 617 deletions.
26 changes: 0 additions & 26 deletions bin/upnpc license.txt

This file was deleted.

Binary file removed bin/upnpc_linux
Binary file not shown.
Binary file removed bin/upnpc_osx
Binary file not shown.
5 changes: 5 additions & 0 deletions bin/upnpc_readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
UPnPc is a program that can talk to your internet router to perform UPnP operations. Hydrus uses it to fetch and manage UPnP NAT port forwards when you open _network->data->manage upnp_ and keeps ports forwarded when you set up a server service or the Client API to stay open. It also fetches your external IP for some related 'figure out the external URL for this service' operations. Unless you do some UPnC stuff, hydrus does not touch it.

I used to bundle UPnPc here for all builds, but it threw anti-virus false positives every few months, so it is no longer included. If you are on Linux, you may already have it installed to your system.

If you need it, you can fetch it at http://miniupnp.tuxfamily.org/ (if you are on Linux, you can probably also get it with your package manager). Place 'upnpc-static' executable in this directory, or install to your system PATH as 'miniupnpc', and hydrus will be able to do UPnP things.
Binary file removed bin/upnpc_win32.exe
Binary file not shown.
5 changes: 4 additions & 1 deletion client.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@
argparser.add_argument( '-d', '--db_dir', help = 'set an external db location' )
argparser.add_argument( '--temp_dir', help = 'override the program\'s temporary directory' )
argparser.add_argument( '--db_journal_mode', default = 'WAL', choices = [ 'WAL', 'TRUNCATE', 'PERSIST', 'MEMORY' ], help = 'change db journal mode (default=WAL)' )
argparser.add_argument( '--db_synchronous_override', choices = range(4), help = 'override SQLite Synchronous PRAGMA (default=2)' )
argparser.add_argument( '--db_synchronous_override', type = int, choices = range(4), help = 'override SQLite Synchronous PRAGMA (default=2)' )
argparser.add_argument( '--no_db_temp_files', action='store_true', help = 'run db temp operations entirely in memory' )
argparser.add_argument( '--boot_debug', action='store_true', help = 'print additional bootup information to the log' )
argparser.add_argument( '--no_daemons', action='store_true', help = 'run without background daemons' )
argparser.add_argument( '--no_wal', action='store_true', help = 'OBSOLETE: run using TRUNCATE db journaling' )
argparser.add_argument( '--db_memory_journaling', action='store_true', help = 'OBSOLETE: run using MEMORY db journaling (DANGEROUS)' )
Expand Down Expand Up @@ -105,6 +106,8 @@

HG.no_db_temp_files = result.no_db_temp_files

HG.boot_debug = result.boot_debug

if result.temp_dir is not None:

HydrusPaths.SetEnvTempDir( result.temp_dir )
Expand Down
5 changes: 4 additions & 1 deletion client.pyw
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@ try:
argparser.add_argument( '-d', '--db_dir', help = 'set an external db location' )
argparser.add_argument( '--temp_dir', help = 'override the program\'s temporary directory' )
argparser.add_argument( '--db_journal_mode', default = 'WAL', choices = [ 'WAL', 'TRUNCATE', 'PERSIST', 'MEMORY' ], help = 'change db journal mode (default=WAL)' )
argparser.add_argument( '--db_synchronous_override', choices = range(4), help = 'override SQLite Synchronous PRAGMA (default=2)' )
argparser.add_argument( '--db_synchronous_override', type = int, choices = range(4), help = 'override SQLite Synchronous PRAGMA (default=2)' )
argparser.add_argument( '--no_db_temp_files', action='store_true', help = 'run db temp operations entirely in memory' )
argparser.add_argument( '--boot_debug', action='store_true', help = 'print additional bootup information to the log' )
argparser.add_argument( '--no_daemons', action='store_true', help = 'run without background daemons' )
argparser.add_argument( '--no_wal', action='store_true', help = 'OBSOLETE: run using TRUNCATE db journaling' )
argparser.add_argument( '--db_memory_journaling', action='store_true', help = 'OBSOLETE: run using MEMORY db journaling (DANGEROUS)' )
Expand Down Expand Up @@ -105,6 +106,8 @@ try:

HG.no_db_temp_files = result.no_db_temp_files

HG.boot_debug = result.boot_debug

if result.temp_dir is not None:

HydrusPaths.SetEnvTempDir( result.temp_dir )
Expand Down
41 changes: 40 additions & 1 deletion help/changelog.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,45 @@
<div class="content">
<h3>changelog</h3>
<ul>
<li><h3>version 422</h3></li>
<ul>
<li>advanced tags:</li>
<li>fixed the search code for various 'total' autocomplete searches like '*' and 'namespace:*', which were broken around v419's optimised regular tag lookups. these search types also have a round of their own search optimisations and improved cancel latency. I am sorry for the trouble here</li>
<li>expanded the database autocomplete fetch unit tests to handle these total lookups so I do not accidentally kill them due to typo/ignorance again</li>
<li>updated the autocomplete result cache object to consult a search's advanced search options (as under _tags->manage tag display and search_) to test whether a search cache for 'char' or 'character:' is able to serve results for a later 'character:samus' input</li>
<li>optimised file and tag search code for cases where someone might somehow sneak an unoptimised raw '*:subtag' or 'namespace:*' search text in</li>
<li>updated and expanded the autocomplete result cache unit tests to handle the new tested options and the various 'total' tests, so they aren't disabled by accident again</li>
<li>cancelling a autocomplete query with a gigantic number of results should now cancel much quicker when you have a lot of siblings</li>
<li>the single-tag right-click menu now shows siblings and parents info for every service, and will work on taglists in the 'all known tags' domain. clicking on any item will copy it to clipboard. this might result in megatall submenus, but we'll see. tall seems easier to use than nested per-service for now</li>
<li>the more primitive 'siblings' submenu on the taglist 'copy' right-click menu is now removed</li>
<li>right-click should no longer raise an error on esoteric taglists (such as tag filters and namespace colours). you might get some funky copy strings, which is sort of fun too</li>
<li>the copy string for the special namespace predicate ('namespace:*anything*') is now 'namespace:*', making it easier to copy/paste this across pages</li>
<li>.</li>
<li>misc:</li>
<li>the thumbnail right-click 'copy/open known urls by url class' commands now exclude those urls that match a more specific url class (e.g. /post/123456 vs /post/123456/image.jpg)</li>
<li>miniupnpc is no longer bundled in the official builds. this executable is only used by a few advanced users and was a regular cause of anti-virus false positives, so I have decided new users will have to install it manually going forward.</li>
<li>the client now looks for miniupnpc in more places, including the system path. when missing, its error popups have better explanation, pointing users to a new readme in the bin directory</li>
<li>UPnP errors now have more explanation for 'No IGD UPnP Device' errortext</li>
<li>the database's boot-repair function now ensures indices are created for: non-sha256 hashes, sibling and parent lookups, storage tag cache, and display tag cache. some users may be missing indices here for unknown update logic or hard drive damage reasons, and this should speed them right back up. the boot-repair function now broadcasts 'checking database for faults' to the splash, which you will see if it needs some time to work</li>
<li>the duplicates page once again correctly updates the potential pairs count in the 'filter' tab when potential search finishes or filtering finishes</li>
<li>added the --boot_debug launch switch, which for now prints additional splash screen texts to the log</li>
<li>the global pixmaps object is no longer initialised in client model boot, but now on first request</li>
<li>fixed type of --db_synchronous_override launch parameter, which was throwing type errors</li>
<li>updated the client file readwrite lock logic and brushed up its unit tests</li>
<li>improved the error when the client database is asked for the id of an invalid tag that collapses to zero characters</li>
<li>the qss stylesheet directory is now mapped to the static dir in a way that will follow static directory redirects</li>
<li>.</li>
<li>downloaders and parsing (advanced):</li>
<li>started on better network redirection tech. if a post or gallery URL is 3XX redirected, hydrus now recognises this, and if the redirected url is the same type and parseable, the new url and parser are swapped in. if a gallery url is redirected to a non-gallery url, it will create a new file import object for that URL and say so in its gallery log note. this tentatively solves the 'booru redirects one-file gallery pages to post url' problem, but the whole thing is held together by prayer. I now have a plan to rejigger my pipelines to deal with this situation better, ultimately I will likely expose and log all redirects so we can always see better what is going on behind the scenes</li>
<li>added 'unicode escape characters' and 'html entities' string converter encode/decode types. the former does '\u0394'-to-'Δ', and the latter does '&amp;'-to-'&'</li>
<li>improved my string converter unit tests and added the above to them</li>
<li>in the parsing system, decoding from 'hex' or 'base64' is no longer needed for a 'file hash' content type. these string conversions are now no-ops and can be deleted. they converted to a non-string type, an artifact of the old way python 2 used to handle unicode, and were a sore thumb for a long time in the python 3 parsing system. 'file hash' content types now have a 'hex'/'base64' dropdown, and do decoding to raw bytes at a layer above string parsing. on update, existing file hash content parsers will default to hex and attempt to figure out if they were a base64 (however if the hex fails, base64 will be attempted as well anyway, so it is not critically important here if this update detection is imperfect). the 'hex' and 'base64' _encode_ types remain as they are still used in file lookup script hash initialisation, but they will likely be replaced similarly in future. hex or base64 conversion will return in a purely string-based form as technically needed in future</li>
<li>updated the make-a-downloader help and some screenshots regarding the new hash decoding</li>
<li>when the json parsing formula is told to get the 'json' of a parsed node, this no longer encodes unicode with escape characters (\u0394 etc...)</li>
<li>duplicating or importing nested gallery url generators now refreshes all internal reference ids, which should reduce the liklihood of accidentally linking with related but differently named existing GUGs</li>
<li>importing GUGs or NGUGs through Lain easy import does the same, ensuring the new objects 'seem' fresh to a client and should not incorrectly link up with renamed versions of related NGUGs or GUGs</li>
<li>added unit tests for hex and base64 string converter encoding</li>
</ul>
<li><h3>version 421</h3></li>
<ul>
<li>misc:</li>
Expand All @@ -32,7 +71,7 @@ <h3>changelog</h3>
<li>misc cleanup for duplicates page</li>
<li>.</li>
<li>database modes:</li>
<li>a new 'program launch arguments' help page now talks about all the available command line switches, here: https://github.com/hydrusnetwork/hydrus/blob/master/help/launch_arguments.html</li>
<li>a new 'program launch arguments' help page now talks about all the available command line switches, here: https://hydrusnetwork.github.io/hydrus/help/launch_arguments.html</li>
<li>added the '--db_journal_mode' launch switch to set the SQLite journal mode. default is WAL, permitted values are also TRUNCATE, PERSIST, and MEMORY</li>
<li>ensured --db_synchronous_override was hooked up correctly</li>
<li>the old disk cache options under _speed and memory_ are removed, along with various deprecated disk cache load calls and code</li>
Expand Down
6 changes: 3 additions & 3 deletions help/downloader_parsers_content_parsers.html
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ <h3>tags</h3>
</li>
<li>
<h3>file hash</h3>
<p>This says 'this is the hash for the file otherwise referenced in this parser'. So, if you have another content parser finding a File or Post URL, this lets the client know early that that destination happens to have a particular MD5, for instance. The client will look for that hash in its own database, and if it finds a match, it can predetermine if it already has the file (or has previously deleted it) without ever having to download it. Furthermore, if it does find the file for this URL but has never seen the URL before, it will still associate it with that file's 'known urls' as if it <i>had</i> downloaded it!</p>
<p>This says 'this is the hash for the file otherwise referenced in this parser'. So, if you have another content parser finding a File or Post URL, this lets the client know early that that destination happens to have a particular MD5, for instance. The client will look for that hash in its own database, and if it finds a match, it can predetermine if it already has the file (or has previously deleted it) without ever having to download it. When this happens, it will still add tags and associate the file with the URL for it's 'known urls' just as if it <i>had</i> downloaded it!</p>
<p>If you understand this concept, it is great to include. It saves time and bandwidth for everyone. Many site APIs include a hash for this exact reason--they want you to be able to skip a needless download just as much as you do.</p>
<p><img src="edit_content_parser_panel_hash.png" /></p>
<p>The usual suite of hash types are supported: MD5, SHA1, SHA256, and SHA512. <b>This expects the hash as raw bytes</b>, so if your source provides it as hex or base64 (as above), make sure to decode it! In the area for test results, it will present the hash in hex for your convenience.</p>
<p>The usual suite of hash types are supported: MD5, SHA1, SHA256, and SHA512. An old version of this required some weird string decoding, but this is no longer true. Select 'hex' or 'base64' from the encoding type dropdown, and then just parse the 'e5af57a687f089894f5ecede50049458' or '5a9XpofwiYlPXs7eUASUWA==' text, and hydrus should handle the rest. It will present the parsed hash in hex.</p>
</li>
<li>
<h3>timestamp</h3>
Expand All @@ -64,4 +64,4 @@ <h3>veto</h3>
</ul>
</div>
</body>
</html>
</html>
6 changes: 3 additions & 3 deletions help/downloader_parsers_full_example_file_page.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ <h3>tags</h3>
<p>Skipping ?/-/+ characters can be a pain if you are lacking a nice tag-text class, in which case you can add a regex String Match to the HTML formula (as I do here, since Gelb offers '?' links for tag definitions) like [^\?\-+\s], which means "the text includes something other than just '?' or '-' or '+' or whitespace".</p>
<h3>md5 hash</h3>
<p>If you look at the Gelbooru File URL, <a href="https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg"><b>https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg</b></a>, you may notice the filename is all hexadecimal. It looks like they store their files under a two-deep folder structure, using the first four characters--386e here--as the key. It sure looks like '386e12e33726425dbd637e134c4c09b5' is not random ephemeral garbage!</p>
<p>In fact, Gelbooru use the MD5 of the file as the filename. Many storage systems do something like this (hydrus uses SHA256!), so if they don't offer a &lt;meta&gt; tag that explicitly states the md5 or sha1 or whatever, you can sometimes infer it from one of the file links:</p>
<p>In fact, Gelbooru use the MD5 of the file as the filename. Many storage systems do something like this (hydrus uses SHA256!), so if they don't offer a &lt;meta&gt; tag that explicitly states the md5 or sha1 or whatever, you can sometimes infer it from one of the file links. This screenshot is from the more recent version of hydrus, which has the more powerful 'string processing' system for string transformations. It has an intimidating number of nested dialogs, but we can stay simple for now, with only the one regex substitution step inside a string 'converter':</p>
<p><img src="downloader_post_example_md5.png" /></p>
<p>Here we are using the same property="og:image" rule to fetch the File URL, and then we are regexing the hex hash with .*([0-9a-f]{32}).* (MD5s are 32 hex characters) and decoding from hex to present the Content Parser with raw bytes (Hydrus handles hashes as bytes, not hex--although you'll see in the Content Parser test page it presents the hash neatly in English: "md5 hash: 386e12e33726425dbd637e134c4c09b5").</p>
<p>Here we are using the same property="og:image" rule to fetch the File URL, and then we are regexing the hex hash with .*([0-9a-f]{32}).* (MD5s are 32 hex characters). We select 'hex' as the encoding type. Hashes require a tiny bit more data handling behind the scenes, but in the Content Parser test page it presents the hash again neatly in English: "md5 hash: 386e12e33726425dbd637e134c4c09b5"), meaning everything parsed correct. It presents the hash in hex even if you select the encoding type as base64.</p>
<p>If you think you have found a hash string, you should obviously test your theory! The site might not be using the actual MD5 of file bytes, as hydrus does, but instead some proprietary scheme. Download the file and run it through a program like HxD (or hydrus!) to figure out its hashes, and then search the View Source for those hex strings--you might be surprised!</p>
<p>Finding the hash is hugely beneficial for a parser--it lets hydrus skip downloading files without ever having seen them before!</p>
<h3>source time</h3>
Expand All @@ -67,4 +67,4 @@ <h3>summary</h3>
<p>This is overall a decent parser. Some parts of it may fail when Gelbooru update to their next version, but that can be true of even very good parsers with multiple redundancy. For now, hydrus can use this to quickly and efficiently pull content from anything running Gelbooru 0.2.5., and the effort spent now can save millions of combined <i>right-click->save as</i> and manual tag copies in future. If you make something like this and share it about, you'll be doing a good service for those who could never figure it out.</p>
</div>
</body>
</html>
</html>
Binary file modified help/downloader_post_example_md5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified help/edit_content_parser_panel_hash.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ff51cf4

Please sign in to comment.