Skip to content

Commit

Permalink
Merge pull request guillermo-navas-palencia#6 from guillermo-navas-pa…
Browse files Browse the repository at this point in the history
…lencia/release_0.3.0

Release 0.3.0
guillermo-navas-palencia authored Mar 13, 2020
2 parents 8785516 + c68531c commit 5e89cb9
Showing 29 changed files with 1,432 additions and 588 deletions.
28 changes: 16 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
@@ -122,7 +122,7 @@ Now that we have checked the binned data, we can transform our original data int
x_transform_woe = optb.transform(x, metric="woe")
x_transform_event_rate = optb.transform(x, metric="event_rate")
The ``analysis`` method performs a statistical analysis of the binning table, computing the statistics Gini index, Information Value (IV), Jensen-Shannon divergence, and the quality score. Additionally, several statistical significance tests between consecutive bins of the contingency table are performed
The ``analysis`` method performs a statistical analysis of the binning table, computing the statistics Gini index, Information Value (IV), Jensen-Shannon divergence, and the quality score. Additionally, several statistical significance tests between consecutive bins of the contingency table are performed.

.. code-block:: python
@@ -139,6 +139,9 @@ The ``analysis`` method performs a statistical analysis of the binning table, co
Gini index 0.87541620
IV (Jeffrey) 5.04392547
JS (Jensen-Shannon) 0.39378376
HHI 0.15727342
HHI (normalized) 0.05193260
Cramer's V 0.80066760
Quality score 0.00000000
Significance tests
@@ -159,8 +162,8 @@ Print overview information about the options settings, problem statistics, and t
.. code-block:: text
optbinning (Version 0.1.0)
Copyright (c) 2019 Guillermo Navas-Palencia, Apache License 2.0
optbinning (Version 0.3.0)
Copyright (c) 2019-2020 Guillermo Navas-Palencia, Apache License 2.0
Begin options
name mean radius * U
@@ -181,6 +184,7 @@ Print overview information about the options settings, problem statistics, and t
min_event_rate_diff 0 * d
max_pvalue no * d
max_pvalue_policy consecutive * d
gamma 0 * d
class_weight no * d
cat_cutoff no * d
user_splits no * d
@@ -196,7 +200,7 @@ Print overview information about the options settings, problem statistics, and t
Pre-binning statistics
Number of pre-bins 9
Number of refinements 2
Number of refinements 1
Solver statistics
Type cp
@@ -207,19 +211,19 @@ Print overview information about the options settings, problem statistics, and t
Best objective bound 5043922
Timing
Total time 0.05 sec
Pre-processing 0.00 sec ( 0.82%)
Pre-binning 0.00 sec ( 7.06%)
Solver 0.04 sec ( 89.95%)
model generation 0.04 sec ( 85.75%)
optimizer 0.01 sec ( 14.25%)
Post-processing 0.00 sec ( 0.16%)
Total time 0.06 sec
Pre-processing 0.00 sec ( 0.80%)
Pre-binning 0.00 sec ( 6.30%)
Solver 0.06 sec ( 91.45%)
model generation 0.05 sec ( 89.12%)
optimizer 0.01 sec ( 10.88%)
Post-processing 0.00 sec ( 0.12%)
Benchmarks
==========

The following table shows how OptBinning 0.2.0 compares to `scorecardpy <https://github.com/ShichenXie/scorecardpy>`_ 0.1.9.1.1 on a selection of variables from the public dataset, Home Credit Default Risk - Kaggle’s competition `Link <https://www.kaggle.com/c/home-credit-default-risk/data>`_. This dataset contains 307511 samples.The experiments were run on Intel(R) Core(TM) i5-3317 CPU at 1.70GHz, using a single core, running Linux. For scorecardpy, we use default settings only increasing the maximum number of bins ``bin_num_limit=20``. For OptBinning, we use default settings (``max_n_prebins=20``) only changing the maximum allowed p-value between consecutive bins, ``max_pvalue=0.05``.
The following table shows how OptBinning compares to `scorecardpy <https://github.com/ShichenXie/scorecardpy>`_ 0.1.9.1.1 on a selection of variables from the public dataset, Home Credit Default Risk - Kaggle’s competition `Link <https://www.kaggle.com/c/home-credit-default-risk/data>`_. This dataset contains 307511 samples.The experiments were run on Intel(R) Core(TM) i5-3317 CPU at 1.70GHz, using a single core, running Linux. For scorecardpy, we use default settings only increasing the maximum number of bins ``bin_num_limit=20``. For OptBinning, we use default settings (``max_n_prebins=20``) only changing the maximum allowed p-value between consecutive bins, ``max_pvalue=0.05``.

To compare softwares we use the shifted geometric mean, typically used in mathematical optimization benchmarks: http://plato.asu.edu/bench.html. Using the shifted (by 1 second) geometric mean we found that **OptBinning** is **17x** faster than scorecardpy, with an average IV increase of **12%**. Besides the speed and IV gains, OptBinning includes many more constraints and monotonicity options.

2 changes: 1 addition & 1 deletion doc/source/binning_binary.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Optimal binning with binary target
==================================

.. autoclass:: optbinning.binning.OptimalBinning
.. autoclass:: optbinning.OptimalBinning
:members:
:inherited-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion doc/source/binning_continuous.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Optimal binning with continuous target
======================================

.. autoclass:: optbinning.continuous_binning.ContinuousOptimalBinning
.. autoclass:: optbinning.ContinuousOptimalBinning
:members:
:inherited-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion doc/source/binning_multiclass.rst
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@ Optimal binning with multiclass target
======================================


.. autoclass:: optbinning.multiclass_binning.MulticlassOptimalBinning
.. autoclass:: optbinning.MulticlassOptimalBinning
:members:
:inherited-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion doc/source/binning_process.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Binning process
===============

.. autoclass:: optbinning.binning_process.BinningProcess
.. autoclass:: optbinning.BinningProcess
:members:
:inherited-members:
:show-inheritance:
6 changes: 3 additions & 3 deletions doc/source/binning_tables.rst
Original file line number Diff line number Diff line change
@@ -4,23 +4,23 @@ Binning tables
Binning table: binary target
----------------------------

.. autoclass:: optbinning.binning_statistics.BinningTable
.. autoclass:: optbinning.binning.binning_statistics.BinningTable
:members:
:inherited-members:
:show-inheritance:

Binning table: continuous target
--------------------------------

.. autoclass:: optbinning.binning_statistics.ContinuousBinningTable
.. autoclass:: optbinning.binning.binning_statistics.ContinuousBinningTable
:members:
:inherited-members:
:show-inheritance:

Binning table: multiclass target
--------------------------------

.. autoclass:: optbinning.binning_statistics.MulticlassBinningTable
.. autoclass:: optbinning.binning.binning_statistics.MulticlassBinningTable
:members:
:inherited-members:
:show-inheritance:
16 changes: 8 additions & 8 deletions doc/source/binning_utilities.rst
Original file line number Diff line number Diff line change
@@ -18,9 +18,9 @@ where :math:`D_i` can be characterized as a logistic function of :math:`\text{Wo
The constant term :math:`\log(N_T^{E} / N_T^{NE})` is the log ratio of the total
number of event :math:`N_T^{E}` and the total number of non-events :math:`N_T^{NE}`. This shows that WoE is inversely related to the event rate.

.. autofunction:: optbinning.transformations.transform_event_rate_to_woe
.. autofunction:: optbinning.binning.transformations.transform_event_rate_to_woe

.. autofunction:: optbinning.transformations.transform_woe_to_event_rate
.. autofunction:: optbinning.binning.transformations.transform_woe_to_event_rate


Metrics
@@ -40,7 +40,7 @@ where :math:`N_i^{E}` and :math:`N_i^{NE}` are the number of events and non-even
bin, respectively, and :math:`N_T^{E}` and :math:`N_T^{NE}` are the total number of
events and non-events, respectively.

.. autofunction:: optbinning.metrics.gini
.. autofunction:: optbinning.binning.metrics.gini

Divergence measures
"""""""""""""""""""
@@ -78,12 +78,12 @@ terms of the Kullback-Leibler divergence
and bounded by :math:`JSD(P||Q) \in [0, \log(2)]`. We note that these measures cannot be directly used whenever :math:`p_i = 0` and/or :math:`q_i = 0`.

.. autofunction:: optbinning.metrics.entropy
.. autofunction:: optbinning.binning.metrics.entropy

.. autofunction:: optbinning.metrics.kullback_leibler
.. autofunction:: optbinning.binning.metrics.kullback_leibler

.. autofunction:: optbinning.metrics.jeffrey
.. autofunction:: optbinning.binning.metrics.jeffrey

.. autofunction:: optbinning.metrics.jensen_shannon
.. autofunction:: optbinning.binning.metrics.jensen_shannon

.. autofunction:: optbinning.metrics.jensen_shannon_multivariate
.. autofunction:: optbinning.binning.metrics.jensen_shannon_multivariate
4 changes: 2 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
@@ -22,9 +22,9 @@
author = 'Guillermo Navas-Palencia'

# The short X.Y version
version = '0.2.0'
version = '0.3.0'
# The full version, including alpha/beta/rc tags
release = '0.2.0'
release = '0.3.0'


# -- General configuration ---------------------------------------------------
18 changes: 17 additions & 1 deletion doc/source/release_notes.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,29 @@
Release Notes
=============

Version 0.3.0 (2020-03-13)
--------------------------

New additions:

- Class ``OptBinning`` introduces a new constraint to reduce dominating bins, using parameter ``gamma``.
- Metrics HHI, HHI regularized and Cramer's V added to ``binning_table.analysis`` method. Updated quality score.
- Added column min/max target and zeros count to ``ContinuousOptimalBinning`` binning table.
- Binning algorithms support univariate outlier detection methods.

Tutorials:

- Tutorial: optimal binning with binary target. New section: Reduction of dominating bins.
- Enhance binning process tutorials.


Version 0.2.0 (2020-02-02)
--------------------------

New additions:

- Binning process to support optimal binning of all variables in dataset.
- Add ``print_output`` option to ``binning_table.analysis`` method.
- Added ``print_output`` option to ``binning_table.analysis`` method.
- New unit tests added.

Tutorials:
Loading
Oops, something went wrong.

0 comments on commit 5e89cb9

Please sign in to comment.