Skip to content

โŒจ๏ธ Wordlists, Dictionaries and Other Data Sets for Writing Software Security Test Cases

License

Notifications You must be signed in to change notification settings

SpycioKon/werdlists

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Twitter: @decalresponds Ask Me Anything! werdlists Apache License 2.0 repo-size made-with-bash


[werdlists]("Word Lists, Attack Strings, Miscellaneous Datasets and a PoC Wiki for Penetration Testing")


"Word Lists" for Software Security Test Cases

This project consists of word lists, dictionary files, attack strings, miscellaneous datasets and a PoC wiki for penetration testers

Brief Introduction to werdlists โœ‚๏ธ

This project is a collection of word lists--they are mostly whitespace-delimited or line-based. Although the passes-dicts folder contains inputs for password cracking, overall the files amassed here are intended to be useful in facilitating the creation of insecure program state (with the help of a black-box fuzzer or scanning utility, for example.) The vast majority of files are simply ASCII with the UNIX style newline.

Details on Selected Folder Samples ๐Ÿ“š

ย ย ย ย Folderย Nameย ย ย ย  Description of Contents
dns-hostnames ๐Ÿ“ƒ The host name part of an FQDN (Fully Qualified Domain Name)
http-security ๐Ÿ” HTTP (Hyper Text Transfer Protocol security info, i.e. CSP (Content Security Policy)
unix-data ๐Ÿ’ป Data associated with various flavors of the UNIX operating system and its clones
telco-data โ˜Ž๏ธ PSTN (Public Switched Telephone Network) a.k.a. POTS (Plain Old Telephone Service) dialing codes and related information
webapp-paths ๐Ÿ„ Path names related to web-based applications

Inspiration Taken from Similar Projects ๐Ÿ’ญ

If you're already familiar with established repositories such as fuzzdb and SecLists, then werdlists is quite similar in mission as it's a centralized attack strings and input data resource with its own unique style, organization, original hand-crafted contents, verification/management scripts, expanded concepts, etc. SecLists is maintained by my colleague and co-worker at IOActive, Daniel Miessler.

Description of the Repository Directory Hierarchy ๐Ÿ”ฉ

The scripts folder consists of shell scripts used for repository maintenance. All scripts use bash syntax and some data files were generated with a script. Folder names are outlined in the INDEX.md file in the repository's root directory. All files in each folder are detailed in the local README.md file, but these lowercased index files describe the contents of each data file, as opposed to directory contents. Each folder has a subject name and storage type which are separated by a dash.

Naming Scheme, Syntax and Meaning Associated With File Extensions ๐Ÿ’ฌ

Most files have the *.txt extension signifying the text/plain MIME type Other file extensions in use are: *.asc, *.csv, *.xml, *.html, and *.yml These are for Comma-Separated Values (text/csv), Extended Markup Language (application/xml), Hyper Text Markup Language (application/html), etc. Any file that is larger than 1MB uncompressed should be compressed with xz according to the commands in the scripts/compress-large-files bash script. In spite of the fact that this is a word lists project, I'm striving to restrict the size of each file to a healthy maximum for manageability purposes. The index file in the root folder (INDEX.md), as well as the indices in each data directory (README.md) are formatted with GitHub Flavored Markdown.


Index Describing Each Folder in the Project ๐Ÿ“‹

arpa-headers: ๐Ÿ“ง Header fields transmitted over RFC2822 style protocols like SMTP
ascii-art: ๐ŸŽจ "Low bit" a.k.a. 7-bit ASCII art items without control characters
biology-info: ๐Ÿ”ฌ Reference information useful in the study of biological issues
browser-data: ๐Ÿšช Data related to GUI browser software like Chrome, FireFox, etc.
cert-data: ๐Ÿ“œ Information commonly utilized by cryptographic certificate materials
char-encodes: ๐Ÿ‰ Various character encodings provided by different locales/charsets
chat-data: ๐Ÿ˜ฎ Additional data on IRC, XMPP and other such messaging protocols
cipher-data: ๐Ÿก Data denoting or used by cryptographic algorithm implementations
cmd-usage: ๐Ÿ”จ Help text shown in a terminal when attempting to execute CLI programs
cms-errors: โ— Error codes and/or messages rendered by a CMS
code-keywords: โ˜• Computer language identifiers declared in defining standards such as reserved words
cpu-arch: ๐Ÿญ Low-level computer architecture and hardware subjects
crypt-output: โœจ Cipher text string outputs created by cryptographic hash functions
database-strs: ๐Ÿ’พ Strings often encountered when working with database software
dialup-modems: ๐Ÿ“  Info about analog modems on POTS
dns-commands: โ™ ๏ธ Commands, packages, utilities, etc. used by the Domain Name System
dns-domains: ๐ŸŒ A list of domains that may or may not be found in the live DNS tree
dns-hostnames: ๐Ÿ”ฆ The host name part of an FQDN
dns-records: ๐ŸŽซ Data specific to RR's in the DNS system
dns-servers: ๐Ÿ”‹ Data provided to, produced by or related to DNS name servers
dns-toplevel: ๐Ÿ” TLD's or Top Level Domains are the uppermost part of DNS hierarchy
environ-names: โ›บ Environment variable names, settings, etc.
exploit-info: ๐ŸŽฑ Technical information on exploitation of security vulnerabilities
file-extens: โš“ Anything concerning Filename extension, i.e. the part after the period in a file name
file-specs: ๐Ÿ“ File format specifications as distributed by vendor(s)/author(s)
ftp-data: ๐Ÿ“ค Various FTP datum from RFC's and elsewhere
glibc-data: โš™๏ธ Data taken from the source code of the GNU C Library
html-words: โŒจ๏ธ Words not uncommon to come across when parsing HTML dialects
http-agents: ๐ŸŽ Software version banners for HTTP User Agents also known as browsers
http-headers: ๐Ÿช Header fields sent in requests and responses by browsers/servers
http-methods: โ–ถ๏ธ Names of HTTP Request methods that are sent at the start of a browser's first request line
http-params: ๐Ÿ”ก Parameters browsers sometimes send when requesting server URI paths
http-paths: ๐Ÿพ Path names that browsers include in queries to servers
http-queries: โ” The object syntax that appears after the question mark in URI's
http-security: ๐Ÿ‘ฎ Hyper Text Transfer Protocol security info, i.e. CSP
http-servers: ๐Ÿข Information related to the usage of web server software
http-status: ๐ŸŽฐ Numeric HTTP status codes that denote the status of a web server during reply as specified in RFC7231, Section 6
inet-addrs: ๐Ÿ”Œ Numeric Internet addresses a.k.a. IP addresses--mostly version 4
inet-routes: โ˜๏ธ Data useful in the maintenance and use of an Internet routing table
inet-services: โ›ฒ Lists of Internet protocols/daemons--similar to /etc/services
infosec-people: :neckbeard: Noteworthy individuals within the information security community
iso-codes: โœ”๏ธ ISO code numbers and such
java-data: โ˜€๏ธ Data found in or related to source code of programs written with Java
libc-data: ๐Ÿญ data for or about programming with the C standard library
linux-data: ๐Ÿ”Ÿ Data identifiers and such from the Linux operating system
linux-paths: :linked_paperclips: Pathnames found on file systems created by Linux installations
malware-iocs: ๐Ÿ’€ IOC for identification of malware infections
mobile-dev: ๐Ÿ“ฑ Mobile device development for "handheld" form factors
net-attacks: โ™จ๏ธ Info about attacks on telecommunications and Internetworks
net-ifaces: :three-networked-computers: Detailed information which can be extracted from network interfaces
ntfs-paths: ๐Ÿ“‚ File paths expected to be seen in NTFS folders
nvd-data: ๐Ÿ›๏ธ Datum utilized by NIST's NVD
owasp-data: ๐Ÿ Data from or for OWASP
passes-dicts: ๐Ÿ”‘ Dictionary files used in brute-force attacks against account passwords
passes-sites: ๐Ÿ”“ Password lists that were publicized after major site compromises
perl-data: ๐Ÿซ Data often seen in PERL (Practical Extraction and Report Language)
php-data: :page-facing-up: Files containing information about the PHP programming language
postal-data: ๐Ÿ“ฌ United States Postal Service information
python-data: ๐Ÿ Data used by the Python scripting language interpreter at runtime
radio-data: ๐Ÿ“ป Things commonly used in radio frequency transmissions
regex-data: ๐Ÿ’ฌ Regular expression patterns to mount attacks and match strings
ruby-data: ๐Ÿ’Ž Data typically seen within the syntax of the Ruby scripting language
search-dorks: ๐Ÿ”Ž General purpose search-engine queries likely to find insecure sites
smtp-messages: โœ‰๏ธ Messages (i.e. signatures, auto-replies, etc.) sent by SMTP servers
soap-messages: ๐Ÿ“จ SOAP (Simple Object Access Protocol) messages
social-data: ๐Ÿ‘€ Sociological or social media related data sets
software-strs: ๐Ÿ’ฝ Strings that describe software engineering, programming languages, etc.
string-enums: ๐ŸŽก Enumerations of values that aren't too terribly unusual
system-admin: ๐Ÿ‘” System administration and BOFH related materials
system-notices: โš ๏ธ Disclaimer/warning messages shown by networked computer systems
telco-data: ๐Ÿ“ž Data on voice-based telecommunications technologies: POTS, PCS, VoIP, SMS etc.
text-files: ๐Ÿ“Œ a special kind of "text file" as in those archived at textfiles.com, i.e. old school zine articles
text-words: โœ๏ธ Lists of words likely to be found in an actual hard copy dictionary
top-secret: ๐Ÿ‘ฝ Files and/or data related to documents that were/are classified
unicode-art: ๐ŸŽญ Unicode art pieces (i.e. requires wide character symbols)
unicode-data: ๐Ÿ”ฃ Unicode character usage and representation
unix-data: ๐Ÿš Data associated with various flavors of the UNIX OS and its clones
unix-paths: ๐Ÿ—„๏ธ File path names found in various UNIX file systems
uri-attacks: ๐Ÿ’ฅ Malicious URI materials specially crafted for attack targets
uri-schemes: ๐Ÿ“Ž Lists containing references for URI schemes (part before colon)
uri-data: ๐Ÿ”— Universal Resource Identifier related data
vuln-data: ๐Ÿ“Š Information about security vulnerabilities found in server software
webapp-attacks: ๐Ÿ’‰ Security Proof-of-concept samples demonstrating various styles of web application attacks
webapp-data: ๐Ÿ’ผ Data associated with applications hosted on web servers
webapp-dirs: Directories related to applications running on a web server
webapp-files: ๐Ÿ“‡ Files related to applications running on a web server
webapp-paths: ๐Ÿ“‘ Path names related to applications running on a web server
webapp-words: ๐Ÿ’ญ Words related to applications running on a web server
web-sites: ๐ŸŒŽ Addresses to and/or information on well known/organized WWW sites
wifi-networks: ๐Ÿ“ก IEEE 802.11 Wi-Fi network information
windows-data: ๐Ÿ’ผ Data only found within the Microsoft Windows series of OSes


About

โŒจ๏ธ Wordlists, Dictionaries and Other Data Sets for Writing Software Security Test Cases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 74.7%
  • JavaScript 9.1%
  • Shell 7.1%
  • AGS Script 6.6%
  • C 1.6%
  • Python 0.4%
  • Other 0.5%