Skip to content

A simple project to scrape 10-K forms from the US SEC using spreadsheets and Python

License

Notifications You must be signed in to change notification settings

peter201943/sec-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT License Python GitHub

A simple project to scrape 10-K forms from the US SEC (Securities and Exchange Commission) using spreadsheets and Python.

Contents

About

A simple scraper for some simple statistics gathering on US SEC 10-K forms. Coded very poorly, and in need of script cleanup.

Usage

  1. Download a decent text editor, such as VS Code
  2. Download Python
  3. Download the project
  4. Open a Command Prompt (Windows) (Mac) in the Folder
  5. Install the Requirements
    pip install -r requirements.txt
  6. Copy your input file (Excel Workbook) into the same directory as the script
  7. Edit sec_scraper.py with:
    • the numbers of spreadsheet columns
    • the names of files
    • the text-search regexes
    • any additional parameters
  8. Create a secrets.json with the following contents:
    {
        "sec_request_headers":
        {
            "User-Agent":       "YOUR INSTITUTION, YOUR EMAIL",
            "Accept-Encoding":  "gzip, deflate",
            "Host":             "www.sec.gov"
        }
    }
  9. Run the script
    python sec_scrape.py
  10. Find your results in the original file

Roadmap

  • See the Notes folder for current status.
    This is not intended to be a long-running project.
  • Significantly better documentation of the code needed
  • Significantly better breakdown of code into smaller functions needed
  • Still very buggy/many edge cases not addressed

Contributing

Prerequisites

Installation

  1. Clone the Repository
    git clone git@github.com:peter201943/sec-scraper.git
  2. Open the Folder
    cd sec-scraper
  3. Create a Virtual Environment
  4. Install the Requirements
    pip install -r requirements.txt
  5. Open the Project (with VS Code, as example)
    code .

Major Files

  • sec_scraper.py Configuration, definition, etcetera. The meat of the project.
  • tests.py Small incremental steps to learn how each part works.

Accepting Changes

This is a low-priority project for peter201943 and as such pull requests are not likely to be accepted. You will be better served by forking it and continuing development of it on your own.

License

Code distributed under the MIT License. See LICENSE for more information.

Documentation distributed under the Creative Commons Attribution 4.0 License.

This document released under Creative Commons Attribution 4.0 License by Peter J. Mangelsdorf.

Contact

Peter James Mangelsdorf
Outlook
Discord
GitHub

Acknowledgements

See Notes for links to articles, repositories, and programs.

About

A simple project to scrape 10-K forms from the US SEC using spreadsheets and Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published