Word lists, Dictionary Files, Attack Strings, Miscellaneous Datasets and Proof-of-Concept Test Cases With a Collection of Tools for Penetration Testers
- Brief Introduction to
werdlists
- Inspiration Taken from Similar Projects
- Repository Directory Hierarchy and Structure
- Folder Names and Description of Contents
This project is a collection of word lists--they are mostly whitespace-delimited
or line-based. Although the passes-dicts
folder contains inputs for password cracking,
overall the files amassed here are intended to be useful in facilitating
the creation of insecure program state (with the help of a black-box fuzzer or scanning
tool.) The vast majority of files are simply ASCII with the UNIX
style newline.
werdlists
is very similar to fuzzdb
and
SecLists
. (SecLists
is maintained by my colleague at IOActive, Daniel Miessler.
Admittedly, werdlists
is quite similar in mission as it's a centralized attack strings
and input data resource. Regardless, werdlists
expands on a number of concepts: it has its own unique style, organization,
original hand-crafted contents, dataset creation/management/validation scripts, scanner springboards, etc.
werdlists
cross-references between the code repositories of third-party scanners and its own datasets folders each tool will benefit from.
Moreover, there are specialized parsing scripts exclusive to werdlists
that extract results produced through pairing test tools with its own data. Output
strings are gathered from those results and fed back into the test tools. In other words, there are a number of interactive and/or
tunable feedback loops implemented. Quite a few of the werdlists
data files were created this way.
The scripts
folder consists of shell scripts used for repository maintenance.
There is a sub-directory of scripts
called init
where scripts that generate data files are kept. If a script filename stored in init
contains
two dashes, then it's output should reflect the contents of the associated data file. For example, compare manpages-environ
and clib-package-names
. All scripts were written using bash syntax.
The contrib
folder is for storing scripts contributed via pull request and the utils
folder contains utilities that aren't necessarily specific to the werdlists
project, such as scripts for managing any wordlist file.
Other data files were manually composed by hand and a small handful were created by recycling output strings back into input parameter lists, i.e. dirbdirs-feedback
The tools
folder lists security tools that the datasets contained in this repository can be provided as input for.
Individual folders are detailed in the Folder Names and Description of Contents section below.
All files in each dataset directory are detailed in the local README.md
file for that folder
(as opposed to the global README.md
in the root directory being read now.)
Most files have the *.txt
extension signifying the text/plain
MIME type
Often used formats besides plain text include: Comma-Separated Values (text/csv
),
Extended Markup Language (application/xml
),
Hyper Text Markup Language (application/html
), etc.
Any file that is larger than 1MB uncompressed will be compressed with xz
according to the commands in the scripts/xzlarge-files
bash script. Other file extensions in use are:
*.ans
, *.asc
, *.bin
, *.c
, *.conf
, *.cpp
, *.csv
, *.html
, *.inf
, *.ini
, *.json
, *.md
, *.rpz
, *.rst
, *.sh
, *.txt
, *.xml
, *.yaml
, *.yml
, *.zip
, and *.zone
.
ย ย ย ย Folderย ย Nameย ย ย ย | Description of Contents |
---|---|
arpa-headers | ๐ง Header fields transmitted over RFC2822 style protocols like SMTP |
ascii-art | ๐จ "Low bit" a.k.a. 7-bit ASCII art items without control characters |
biology-info | ๐ฌ Reference information useful in the study of biological issues |
browser-data | ๐ช Data related to GUI browser software like Chrome, FireFox, etc. |
cert-data | ๐ Information commonly utilized by cryptographic certificate materials |
char-encodes | ๐ Various character encodings provided by different locales/charsets |
char-sequence | โ๏ธ various character sequences modeled after ctype.h |
chat-data | ๐ฎ Additional data on IRC, XMPP and other such messaging protocols |
cipher-data | ๐ก Data denoting or used by cryptographic algorithm implementations |
cmd-usage | ๐จ Help text shown in a terminal when attempting to execute CLI programs |
cms-errors | โ Error codes and/or messages rendered by a CMS |
code-keywords | โ Computer language identifiers, reserved words and other syntax from defining standards |
cpu-arch | ๐ญ Low-level computer architecture and hardware subjects |
crypt-output | โจ Cipher text string outputs created by cryptographic hash functions |
database-strs | ๐พ Strings often encountered when working with database software |
dns-domains | ๐ A list of domains that may or may not be found in the live DNS tree |
dns-hostnames | ๐ฆ The host name part of an FQDN |
dns-records | ๐ซ Data specific to RR's in the DNS system |
dns-servers | ๐ Data provided to, produced by or related to DNS name servers |
dns-toplevel | ๐ TLD's or Top Level Domains are the uppermost part of DNS hierarchy |
environ-vars | โบ Environment variable names, settings, etc. |
exploit-info | ๐ฑ Technical information on exploitation of security vulnerabilities |
file-extens | โ Stuff on Filename extensions, i.e. the part after the dot |
file-specs | ๐ File format specifications as distributed by vendor(s)/author(s) |
ftp-data | ๐ค Various FTP datum from RFC's and elsewhere |
glibc-data | โ๏ธ Data taken from the source code of the GNU C Library |
html-words | โจ๏ธ Words not uncommon to come across when parsing HTML dialects |
http-agents | ๐ Software version banners for HTTP User Agents also known as browsers |
http-headers | ๐ช Header fields sent in requests/responses by browser/server software |
http-methods | |
http-params | ๐ก Parameters browsers sometimes send when requesting server URI paths |
http-security | ๐ HTTP security info such as Content Security Policy |
http-servers | ๐ข Information related to the usage of web server software |
http-status | ๐ฐ Numeric HTTP status codes in server reply as RFC7231 specifies |
inet-addrs | ๐ Numeric Internet addresses a.k.a. IP addresses--mostly version 4 |
inet-routes | โ๏ธ Data useful in the maintenance and use of an Internet routing table |
inet-services | โฒ Lists of Internet protocols/daemons--similar to /etc/services |
infosec-people | Noteworthy individuals known from information security communities |
iso-codes | โ๏ธ Codes, numbers and such as standardized by ISO |
java-data | โ๏ธ Data found in or related to source code of programs written with Java |
linux-data | ๐ Data identifiers and such from the Linux operating system |
linux-paths | ๐๏ธ Pathnames found on file systems created by Linux installations |
malware-iocs | ๐ IOC for identification of malware infections |
mobile-devs | ๐ฑ Mobile device development for "handheld" form factors |
net-attacks | โจ๏ธ Info about attacks on telecommunications and Internetworks |
net-ifaces | ๐ฅ๏ธ Detailed information which can be extracted from network interfaces |
ntfs-paths | ๐ File paths expected to be seen in NTFS folders |
owasp-data | ๐ Data from or for OWASP |
passes-dicts | ๐ Dictionary files for brute-force attacks against account passwords |
passes-sites | ๐ Password lists that were publicized after major site compromises |
perl-data | ๐ซ Data often seen in PERL (Practical Extraction and Report Language) |
php-data | ๐ Files containing information about the PHP programming language |
postal-data | ๐ฌ United States Postal Service information |
python-data | ๐ Data used by the Python scripting language interpreter at runtime |
radio-data | ๐ป Things commonly used in radio frequency transmissions |
regex-data | ๐ฌ Regular expression patterns to mount attacks and match strings |
ruby-data | ๐ Data typically seen within the syntax of the Ruby scripting language |
search-dorks | ๐ General purpose search-engine queries likely to find insecure sites |
smtp-messages | โ๏ธ Messages (i.e. signatures, auto-replies, etc.) sent by SMTP servers |
soap-messages | ๐จ SOAP (Simple Object Access Protocol) messages |
social-data | ๐ Sociological or social media related data sets |
software-strs | ๐ฝ Strings describing software engineering, programming languages, etc. |
string-enums | ๐ก Enumerations of values that aren't too terribly unusual |
system-admin | ๐ System administration and BOFH related materials |
system-notices | |
telco-data | ๐ Voice telecommunications technologies: POTS, PCS, VoIP, SMS etc. |
text-files | ๐ zine articles and such like those archived at textfiles.com |
text-words | โ๏ธ Lists of words likely to be found in an actual hard copy dictionary |
top-secret | ๐ฝ Files and/or data related to documents that were/are classified |
unicode-data | ๐ฃ Unicode character usage and representation |
unix-data | ๐ Data associated with various flavors of the UNIX OS and its clones |
unix-paths | ๐๏ธ File path names found in various UNIX file systems |
uri-attacks | ๐ฅ Malicious URI materials specially crafted for attack targets |
uri-schemes | ๐ Lists containing references for URI schemes (part before colon) |
uri-data | ๐ Universal Resource Identifier related data |
vuln-data | ๐ Information about security vulnerabilities found in server software |
webapp-attacks | ๐ Proof-of-concept samples demonstrating attacks against web applications |
webapp-data | ๐ผ Data associated with applications hosted on web servers |
webapp-dirs | ๐ Directories related to applications running on a web server |
webapp-files | ๐ Files related to applications running on a web server |
webapp-paths | ๐ Path names related to applications running on a web server |
webapp-words | ๐ญ Words related to applications running on a web server |
web-sites | ๐ Addresses to and/or information on well known/organized WWW sites |
wifi-networks | ๐ก IEEE 802.11 Wi-Fi network information |
windows-data | ๐ผ Data only found within the Microsoft Windows series of OSes |
ans asc bin c conf cpp csv html inf ini json md rpz rst sh txt xml yaml yml zip zone