Skip to content

Commit

Permalink
get-checklist script updated and now uses click.
Browse files Browse the repository at this point in the history
1. Moved the script from bin to ebird.pages.scripts to reduce the
   risk of conflict with other ebird libraries.

2. Changed the library used for command line arguments from pyCLI
   to the rather excellent click.

3. Removed support for passing multiple identifiers to the
   get-checklists script.

4. Added simple tests for the command line script that can be
   called manually to make sure everything works with the ebird
   site.

5. Removed TODOs that were completed or no longer needed.
StuartMacKay committed Aug 21, 2017
1 parent e425e80 commit 2c23088
Showing 12 changed files with 78 additions and 63 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/).
This project adheres to [PEP440](https://www.python.org/dev/peps/pep-0440/)
and by implication, [Semantic Versioning](http://semver.org/).

## [unreleased]
## [0.1] - 2017-08-21
### Added
- Function for scraping the data from the view checklist page.
- Added get_checklist for scraping the data from the view checklist page.
- Added script so get_checklist can be called from the command line.
3 changes: 1 addition & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
include *.md *.txt *.py *.cfg
recursive-include bin *.py
recursive-include ebird *.py
recursive-include tests *.py
recursive-include tests *.py *.sh
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -36,7 +36,7 @@ Each of the functions has a corresponding script that can be used on the
command-line:

```sh
$ ebird-get-checklists S38429565
$ get-checklist --id S38429565 --out checklist.json
```
The script allows data for one or more checklists to be downloaded and written
to a file in JSON format.
@@ -47,7 +47,7 @@ ebird-pages works with Python 3.3+.

## Dependencies

eBird Pages makes use of the following packages: Requests, BeautifulSoup4, lxml and pyCLI.
eBird Pages makes use of the following packages: Requests, BeautifulSoup4, lxml and Click.
See requirements.txt for the version numbers of each of the libraries.

## License
50 changes: 0 additions & 50 deletions bin/ebird-get-checklists

This file was deleted.

3 changes: 0 additions & 3 deletions ebird/pages/__init__.py
Original file line number Diff line number Diff line change
@@ -2,9 +2,6 @@

"""A set of functions for scraping data from eBird web pages."""

# TODO Configure logging and add NullHandler
# See https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library

from .version import __version__

# Import all the functions that make up the public API.
Empty file added ebird/pages/scripts/__init__.py
Empty file.
28 changes: 28 additions & 0 deletions ebird/pages/scripts/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import datetime
import json


class JSONDateEncoder(json.JSONEncoder):

def default(self, o):
if isinstance(o, datetime.date):
return o.isoformat()
if isinstance(o, datetime.time):
return o.isoformat()
if isinstance(o, datetime.timedelta):
return str(o)
else:
super().default(o)


def save(fp, values, indent):
"""Save the JSON data to a file or stdout.
:param fp: the writer.
:param values: the python data to be saved.
:param indent: the level of indentation when prettyprinting the output.
"""
fp.write(json.dumps(values, indent=indent, cls=JSONDateEncoder).encode('utf-8'))
if fp.name == '<stdout>':
fp.write(b'\n')
28 changes: 28 additions & 0 deletions ebird/pages/scripts/get_checklist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
"""
Command-line script to scape the web page(s) for a checklists and
save the data in JSON format to a file or to stdout.
"""

import click

from ebird.pages import get_checklist

from .base import save


# noinspection PyShadowingBuiltins
@click.command()
@click.option('--id', prompt=True,
help="The unique identifier for the checklist.")
@click.option('--out', prompt=True, type=click.File('wb'),
help='The name of a file to write the results to. To print'
' the results to the screen use -.')
@click.option('--indent', type=int, default=None,
help='Pretty-print the results with this level of indentation.')
def cli(id, out, indent):
"""Get the data for a checklist from its eBird web page."""
save(out, get_checklist(id), indent)

if __name__ == '__main__':
cli()
2 changes: 1 addition & 1 deletion ebird/pages/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.1.dev1'
__version__ = '0.1'
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@ idna==2.5
lxml==3.8.0
pluggy==0.4.0
py==1.4.34
pyCLI==2.0.3
click==6.7
requests==2.18.2
tox==2.7.0
urllib3==1.22
7 changes: 5 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
@@ -32,7 +32,10 @@ def test_suite():
keywords='eBird web scraper',
packages=['ebird.pages'],
test_suite='setup.test_suite',
scripts=['bin/ebird-get-checklists'],
entry_points="""
[console_scripts]
get-checklist=ebird.pages.scripts.get_checklist:cli
""",
classifiers=[
'Development Status :: 3 - Alpha',
'Environment :: Console',
@@ -54,6 +57,6 @@ def test_suite():
'requests',
'beautifulsoup4',
'lxml',
'pyCLI'
'Click'
],
)
9 changes: 9 additions & 0 deletions tests/checklists/scripts/get-checklist.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/sh

# Tests for the get-checklists script.

# Get the data for a checklist.
get-checklist --id S38645981 --out -

# Missing values should be prompted for.
get-checklist --out -

0 comments on commit 2c23088

Please sign in to comment.