Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce pattern complexity #115

Merged
merged 2 commits into from
Dec 23, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Allow Java based browsers
  • Loading branch information
omrilotan committed Dec 21, 2020
commit 421eb2a941c13baa02e7f14b35e635b4b907817c
3 changes: 3 additions & 0 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ try {
// Addresses: libhttp browser
list.splice(list.lastIndexOf('http'), 1)
list.push('(?<!(lib))http')
// Addresses: java based browsers
list.splice(list.lastIndexOf('java'), 1)
list.push('java(?!;)')
} catch (error) {
// ignore errors
}
Expand Down
1 change: 1 addition & 0 deletions list.json
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@
"twingly recon",
"url",
"valid",
"wapchoi/",
"wappalyzer",
"webglance",
"webkit2png",
Expand Down
2 changes: 2 additions & 0 deletions tests/fixtures/manual-legit-browsers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,8 @@ UC:
- Mozilla/5.0 (X11; U; Linux i686; en-US) U2/1.0.0 UCBrowser/9.3.1.344
- Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_2 like Mac OS X; zh-CN) AppleWebKit/537.51.1 (KHTML, like Gecko) Mobile/14F89 UCBrowser/11.5.9.992 Mobile AliApp(TUnionSDK/0.1.20)
- Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X; zh-CN) AppleWebKit/537.51.1 (KHTML, like Gecko) Mobile/16B92 UCBrowser/12.1.7.1109 Mobile AliApp(TUnionSDK/0.1.20.3)
- Nokia200/2.0 (11.81) Profile/MIDP-2.1 Configuration/CLDC-1.1 UCWEB/2.0(Java; U; MIDP-2.0; en-us; nokia200) U2/1.0.0 UCBrowser/8.7.1.234 U2/1.0.0 Mobile
- NokiaC2-00/2.0 (03.45) Profile/MIDP-2.1 Configuration/CLDC-1.1 Mozilla/5.0 (Java; U; kau; nokiac2-00) UCBrowser8.3.0.154/70/352/UCWEB Mobile
- UCWEB/2.0 (Symbian; U; S60 V5; en-US; Nokia5250) U2/1.0.0 UCBrowser/8.9.0.277 U2/1.0.0 Mobile
Viber:
- Mozilla/5.0 (Linux; Android 7.1.2; G011A Build/N2G48H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/66.0.3359.158 Safari/537.36 Viber/11.9.5.8
Expand Down
1 change: 1 addition & 0 deletions tests/fixtures/user-agents.net-bots-ignore-list.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Mozilla/5.0 (Windows; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Windows; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Windows; rv:65.0) Gecko/20100101 Firefox/65.0
Mozilla/5.0 (Windows; rv:81.0) Gecko/20100101 Firefox/81.0
NokiaC3-00/5.0 (08.65) Profile/MIDP-2.1 Configuration/CLDC-1.1 Mozilla/5.0 (Java; U; en-us; nokiac3-00) UCBrowser8.3.0.154/69/444/UCWEB Mobile UNTRUSTED/1.0
NokiaX2-05/2.0 (08.30) Profile/MIDP-2.1 Configuration/CLDC-1.1 UCWEB/2.0 (Java; U; MIDP-2.0; en-US; NokiaX2-05) U2/1.0.0 UCBrowser/9.5.0.449 U2/1.0.0 Mobile UNTRUSTED/1.0
SonyEricssonJ20i/R7CA Profile/MIDP-2.1 Configuration/CLDC-1.1 UNTRUSTED/1.0 UCWEB/2.0 (Java; U; MIDP-2.0; ru; SonyEricssonJ20i) U2/1.0.0 UCBrowser/9.5.0.449 U2/1.0.0 Mobile
windows 7 pro 64 bit, opera stable software browser, active x controls, java updater , java script
13 changes: 12 additions & 1 deletion tests/helpers/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,25 @@ const ignoreList = read(botsIgnoreList)
line => !line.startsWith('#')
)

/**
* For some reason, UCWEB are all considered bots by these guys
* @type RegExp
*/
const USERAGENT_NET_CRAWLER_EXCLUDE_PATTERN = new RegExp([
'ucmini',
'NokiaC3-00\\/5\\.0 \\(\\d+\\.\\d+\\) Profile\\/MIDP-2\\.1 Configuration\\/CLDC-1\\.1 UCWEB\\/2\\.0 \\(Java; U; MIDP-2\\.0;'
].join('|'), 'i')

/**
* List of known crawlers
* @type {string[]}
*/
module.exports.crawlers = [

// Read from text file
...read(crawlerUserAgentsText).trim().split('\n'),
...read(crawlerUserAgentsText).trim().split('\n').filter(
line => !USERAGENT_NET_CRAWLER_EXCLUDE_PATTERN.test(line)
),

// Read from a different text file
...read(
Expand Down