scikit-learn · ogrisel · Aug 11, 2017 · Jul 13, 2017 · Jul 13, 2017 · Jul 13, 2017
diff --git a/doc/about.rst b/doc/about.rst
@@ -67,7 +67,7 @@ Funding
 
 `INRIA <https://www.inria.fr>`_ actively supports this project. It has
 provided funding for Fabian Pedregosa (2010-2012), Jaques Grobler
-(2012-2013) and Olivier Grisel (2013-2015) to work on this project
+(2012-2013) and Olivier Grisel (2013-2017) to work on this project
 full-time. It also hosts coding sprints and other events.
 
 .. image:: images/inria-logo.jpg
@@ -77,7 +77,7 @@ full-time. It also hosts coding sprints and other events.
 
 `Paris-Saclay Center for Data Science <http://www.datascience-paris-saclay.fr>`_
 funded one year for a developer to work on the project full-time
-(2014-2015).
+(2014-2015) and 50% of the time of Guillaume Lemaitre (2016-2017).
 
 .. image:: images/cds-logo.png
    :width: 200pt
@@ -94,23 +94,37 @@ Environment also funds several students to work on the project part-time.
    :target: http://cds.nyu.edu/mooresloan/
 
 
-`Télécom Paristech <http://www.telecom-paristech.com>`_ funds Manoj Kumar (2014),
-Tom Dupré la Tour (2015), Raghav RV (2015-2016) and Thierry Guillemot (2016) to
-work on scikit-learn.
+`Télécom Paristech <http://www.telecom-paristech.com>`_ funded Manoj Kumar (2014),
+Tom Dupré la Tour (2015), Raghav RV (2015-2017), Thierry Guillemot (2016-2017)
+and Albert Thomas (2017) to work on scikit-learn.
 
 .. image:: themes/scikit-learn/static/img/telecom.png
    :width: 100pt
    :align: center
    :target: http://www.telecom-paristech.fr/
 
 
-`Columbia University <http://columbia.edu>`_ funds Andreas Mueller since 2016.
+`Columbia University <http://columbia.edu>`_ funds Andreas Müller since 2016.
 
 .. image:: themes/scikit-learn/static/img/columbia.png
    :width: 100pt
    :align: center
    :target: http://www.columbia.edu/
 
+Andreas Müller also received a grant to improve scikit-learn from the `Alfred P. Sloan Foundation <https://sloan.org>`_ in 2017.
+
+.. image:: images/sloan_banner.png
+   :width: 200pt
+   :align: center
+   :target: https://sloan.org/
+
+`The University of Sydney <http://sydney.edu.au>`_ funds Joel Nothman since July 2017.
+
+.. image:: themes/scikit-learn/static/img/sydney-primary.jpeg
+   :width: 200pt
+   :align: center
+   :target: http://www.sydney.edu.au/
+
 The following students were sponsored by `Google <https://developers.google.com/open-source/>`_
 to work on scikit-learn through the
 `Google Summer of Code <https://en.wikipedia.org/wiki/Google_Summer_of_Code>`_

diff --git a/doc/datasets/index.rst b/doc/datasets/index.rst
@@ -252,7 +252,7 @@ features::
 
 .. topic:: Related links:
 
- _`Public datasets in svmlight / libsvm format`: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
+ _`Public datasets in svmlight / libsvm format`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets
 
  _`Faster API-compatible implementation`: https://github.com/mblondel/svmlight-loader
 
@@ -268,15 +268,15 @@ DataFrame are also acceptable.
 Here are some recommended ways to load standard columnar data into a 
 format usable by scikit-learn: 
 
-* `pandas.io <http://pandas.pydata.org/pandas-docs/stable/io.html>`_ 
+* `pandas.io <https://pandas.pydata.org/pandas-docs/stable/io.html>`_ 
   provides tools to read data from common formats including CSV, Excel, JSON
   and SQL. DataFrames may also be constructed from lists of tuples or dicts.
   Pandas handles heterogeneous data smoothly and provides tools for
   manipulation and conversion into a numeric array suitable for scikit-learn.
-* `scipy.io <http://docs.scipy.org/doc/scipy/reference/io.html>`_ 
+* `scipy.io <https://docs.scipy.org/doc/scipy/reference/io.html>`_ 
   specializes in binary formats often used in scientific computing 
   context such as .mat and .arff
-* `numpy/routines.io <http://docs.scipy.org/doc/numpy/reference/routines.io.html>`_
+* `numpy/routines.io <https://docs.scipy.org/doc/numpy/reference/routines.io.html>`_
   for standard loading of columnar data into numpy arrays
 * scikit-learn's :func:`datasets.load_svmlight_file` for the svmlight or libSVM
   sparse format
@@ -288,14 +288,14 @@ For some miscellaneous data such as images, videos, and audio, you may wish to
 refer to:
 
 * `skimage.io <http://scikit-image.org/docs/dev/api/skimage.io.html>`_ or
-  `Imageio <http://imageio.readthedocs.io/en/latest/userapi.html>`_ 
+  `Imageio <https://imageio.readthedocs.io/en/latest/userapi.html>`_ 
   for loading images and videos to numpy arrays
-* `scipy.misc.imread <http://docs.scipy.org/doc/scipy/reference/generated/scipy.
+* `scipy.misc.imread <https://docs.scipy.org/doc/scipy/reference/generated/scipy.
   misc.imread.html#scipy.misc.imread>`_ (requires the `Pillow
   <https://pypi.python.org/pypi/Pillow>`_ package) to load pixel intensities
   data from various image file formats
 * `scipy.io.wavfile.read 
-  <http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html>`_ 
+  <https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html>`_ 
   for reading WAV files into a numpy array
 
 Categorical (or nominal) features stored as strings (common in pandas DataFrames) 

diff --git a/doc/developers/debugging.rst b/doc/developers/debugging.rst
diff --git a/doc/developers/index.rst b/doc/developers/index.rst
@@ -10,7 +10,7 @@ Developer's Guide
 .. toctree::
 
    contributing
-   debugging
+   tips
    utilities
    performance
    advanced_installation

diff --git a/doc/developers/tips.rst b/doc/developers/tips.rst
@@ -0,0 +1,119 @@
+.. _developers-tips:
+
+===========================
+Developers' Tips and Tricks
+===========================
+
+Productivity and sanity-preserving tips
+=======================================
+
+In this section we gather some useful advice and tools that may increase your
+quality-of-life when reviewing pull requests, running unit tests, and so forth.
+Some of these tricks consist of userscripts that require a browser extension
+such as `TamperMonkey`_ or `GreaseMonkey`_; to set up userscripts you must have
+one of these extensions installed, enabled and running.  We provide userscripts
+as GitHub gists; to install them, click on the "Raw" button on the gist page.
+
+.. _TamperMonkey: https://tampermonkey.net
+.. _GreaseMonkey: http://www.greasespot.net
+
+Viewing the rendered HTML documentation for a pull request
+----------------------------------------------------------
+
+We use CircleCI to build the HTML documentation for every pull request. To
+access that documentation, we provide a redirect as described in the
+:ref:`documentation section of the contributor guide
+<contribute_documentation>`. Instead of typing the address by hand, we provide a
+`userscript <https://gist.github.com/lesteve/470170f288884ec052bcf4bc4ffe958a>`_
+that adds a button to every PR. After installing the userscript, navigate to any
+GitHub PR; a new button labeled "See CircleCI doc for this PR" should appear in
+the top-right area.
+
+Folding and unfolding outdated diffs on pull requests
+-----------------------------------------------------
+
+GitHub hides discussions on PRs when the corresponding lines of code have been
+changed in the mean while. This `userscript
+<https://gist.github.com/lesteve/b4ef29bccd42b354a834>`_ provides a button to
+unfold all such hidden discussions at once, so you can catch up.
+
+Checking out pull requests as remote-tracking branches
+------------------------------------------------------
+
+In your local fork, add to your ``.git/config``, under the ``[remote
+"upstream"]`` heading, the line::
+
+  fetch = +refs/pull/*/head:refs/remotes/upstream/pr/*
+
+You may then use ``git checkout pr/PR_NUMBER`` to navigate to the code of the
+pull-request with the given number. (`Read more in this gist.
+<https://gist.github.com/piscisaureus/3342247>`_)
+
+Display code coverage in pull requests
+--------------------------------------
+
+To overlay the code coverage reports generated by the CodeCov continuous
+integration, consider `this browser extension
+<https://github.com/codecov/browser-extension>`_. The coverage of each line
+will be displayed as a color background behind the line number.
+
+Useful pytest aliases and flags
+-------------------------------
+
+We recommend using pytest to run unit tests. When a unit tests fail, the
+following tricks can make debugging easier:
+
+  1. The command line argument ``pytest -l`` instructs pytest to print the local
+     variables when a failure occurs.
+
+  2. The argument ``pytest --pdb`` drops into the Python debugger on failure. To
+     instead drop into the rich IPython debugger ``ipdb``, you may set up a
+     shell alias to::
+
+         pytest --pdbcls=IPython.terminal.debugger:TerminalPdb --capture no
+
+Debugging memory errors in Cython with valgrind
+===============================================
+
+While python/numpy's built-in memory management is relatively robust, it can
+lead to performance penalties for some routines. For this reason, much of
+the high-performance code in scikit-learn in written in cython. This
+performance gain comes with a tradeoff, however: it is very easy for memory
+bugs to crop up in cython code, especially in situations where that code
+relies heavily on pointer arithmetic.
+
+Memory errors can manifest themselves a number of ways. The easiest ones to
+debug are often segmentation faults and related glibc errors. Uninitialized
+variables can lead to unexpected behavior that is difficult to track down.
+A very useful tool when debugging these sorts of errors is
+valgrind_.
+
+
+Valgrind is a command-line tool that can trace memory errors in a variety of
+code. Follow these steps:
+
+  1. Install `valgrind`_ on your system.
+
+  2. Download the python valgrind suppression file: `valgrind-python.supp`_.
+
+  3. Follow the directions in the `README.valgrind`_ file to customize your
+     python suppressions. If you don't, you will have spurious output coming
+     related to the python interpreter instead of your own code.
+
+  4. Run valgrind as follows::
+
+       $> valgrind -v --suppressions=valgrind-python.supp python my_test_script.py
+
+.. _valgrind: http://valgrind.org
+.. _`README.valgrind`: http://svn.python.org/projects/python/trunk/Misc/README.valgrind
+.. _`valgrind-python.supp`: http://svn.python.org/projects/python/trunk/Misc/valgrind-python.supp
+
+
+The result will be a list of all the memory-related errors, which reference
+lines in the C-code generated by cython from your .pyx file. If you examine
+the referenced lines in the .c file, you will see comments which indicate the
+corresponding location in your .pyx source file. Hopefully the output will
+give you clues as to the source of your memory error.
+
+For more information on valgrind and the array of options it has, see the
+tutorials and documentation on the `valgrind web site <http://valgrind.org>`_.
diff --git a/doc/faq.rst b/doc/faq.rst
@@ -24,9 +24,9 @@ Apart from scikit-learn, another popular one is `scikit-image <http://scikit-ima
 How can I contribute to scikit-learn?
 -----------------------------------------
 See :ref:`contributing`. Before wanting to add a new algorithm, which is
-usually a major and lengthy undertaking, it is recommended to start with :ref:`known
-issues <easy_issues>`_. Please do not contact the contributors of scikit-learn directly
-regarding contributing to scikit-learn.
+usually a major and lengthy undertaking, it is recommended to start with
+:ref:`known issues <new_contributors>`. Please do not contact the contributors
+of scikit-learn directly regarding contributing to scikit-learn.
 
 What's the best way to get help on scikit-learn usage?
 --------------------------------------------------------------

diff --git a/doc/images/sloan_banner.png b/doc/images/sloan_banner.png
diff --git a/doc/index.rst b/doc/index.rst
@@ -207,14 +207,16 @@
                     <li><em>On-going development:</em>
                     <a href="/dev/whats_new.html"><em>What's new</em> (Changelog)</a>
                     </li>
+                    <li><em>July 2017.</em> scikit-learn 0.19.0 is available for download (<a href="whats_new.html#version-0-19">Changelog</a>).
+                    </li>
+                    <li><em>June 2017.</em> scikit-learn 0.18.2 is available for download (<a href="whats_new.html#version-0-18-2">Changelog</a>).
+                    </li>
                     <li><em>September 2016.</em> scikit-learn 0.18.0 is available for download (<a href="whats_new.html#version-0-18">Changelog</a>).
                     </li>
                     <li><em>November 2015.</em> scikit-learn 0.17.0 is available for download (<a href="whats_new.html#version-0-17">Changelog</a>).
                     </li>
                     <li><em>March 2015.</em> scikit-learn 0.16.0 is available for download (<a href="whats_new.html#version-0-16">Changelog</a>).
                     </li>
-                    <li><em>July 2014.</em> scikit-learn 0.15.0 is available for download (<a href="whats_new.html#version-0-15">Changelog</a>).
-                    </li>
                     <li><em>July 14-20th, 2014: international sprint.</em>
                     During this week-long sprint, we gathered 18 of the core
                     contributors in Paris.
@@ -323,14 +325,15 @@
                 Funding provided by INRIA and others.
               </div>
               <div class="span6">
-                 <a class="reference internal" href="about.html#funding" style="text-decoration: none" >
+                 <a class="reference internal" href="about.html#funding" style="text-decoration: none; white-space: nowrap" >
                        <img id="index-funding-logo-big" src="_static/img/inria-small.png" title="INRIA">
                    <img id="index-funding-logo-small" src="_static/img/google.png" title="Google">
                    <!--Due to Télécom ParisTech's logo text being smaller, a style has been added to improve readability-->
                    <img id="index-funding-logo-small" src="_static/img/telecom.png" title="Télécom ParisTech" style="max-height: 36px">
                    <img id="index-funding-logo-small" src="_static/img/FNRS-logo.png" title="FNRS">
-                   <img id="index-funding-logo-small" src="_static/img/nyu_short_color.png" title="NYU CDS">
+                   <img id="index-funding-logo-small" src="_static/img/sloan_logo.jpg" title="Alfred P. Sloan Foundation" style="max-height: 36px">
                    <img id="index-funding-logo-small" src="_static/img/columbia.png" title="Columbia University" style="max-height: 36px;">
+                   <img id="index-funding-logo-small" src="_static/img/sydney-stacked.jpeg" title="The University of Sydney" style="max-height: 36px;">
                  </a>
              </div>
              <div class="span3">

diff --git a/doc/modules/calibration.rst b/doc/modules/calibration.rst
@@ -44,7 +44,7 @@ with different biases per method:
 *  :class:`RandomForestClassifier` shows the opposite behavior: the histograms
    show peaks at approximately 0.2 and 0.9 probability, while probabilities close to
    0 or 1 are very rare. An explanation for this is given by Niculescu-Mizil
-   and Caruana [4]: "Methods such as bagging and random forests that average
+   and Caruana [4]_: "Methods such as bagging and random forests that average
    predictions from a base set of models can have difficulty making predictions
    near 0 and 1 because variance in the underlying base models will bias
    predictions that should be near zero or one away from these values. Because
@@ -57,15 +57,15 @@ with different biases per method:
    ensemble away from 0. We observe this effect most strongly with random
    forests because the base-level trees trained with random forests have
    relatively high variance due to feature subseting." As a result, the
-   calibration curve also referred to as the reliability diagram (Wilks 1995[5]) shows a
+   calibration curve also referred to as the reliability diagram (Wilks 1995 [5]_) shows a
    characteristic sigmoid shape, indicating that the classifier could trust its
    "intuition" more and return probabilties closer to 0 or 1 typically.
 
 .. currentmodule:: sklearn.svm
 
 *  Linear Support Vector Classification (:class:`LinearSVC`) shows an even more sigmoid curve
    as the RandomForestClassifier, which is typical for maximum-margin methods
-   (compare Niculescu-Mizil and Caruana [4]), which focus on hard samples
+   (compare Niculescu-Mizil and Caruana [4]_), which focus on hard samples
    that are close to the decision boundary (the support vectors).
 
 .. currentmodule:: sklearn.calibration
@@ -190,18 +190,18 @@ a similar decrease in log-loss.
 
 .. topic:: References:
 
-    .. [1] Obtaining calibrated probability estimates from decision trees
-          and naive Bayesian classifiers, B. Zadrozny & C. Elkan, ICML 2001
+    * Obtaining calibrated probability estimates from decision trees
+      and naive Bayesian classifiers, B. Zadrozny & C. Elkan, ICML 2001
 
-    .. [2] Transforming Classifier Scores into Accurate Multiclass
-          Probability Estimates, B. Zadrozny & C. Elkan, (KDD 2002)
+    * Transforming Classifier Scores into Accurate Multiclass
+      Probability Estimates, B. Zadrozny & C. Elkan, (KDD 2002)
 
-    .. [3] Probabilistic Outputs for Support Vector Machines and Comparisons to
-          Regularized Likelihood Methods, J. Platt, (1999)
+    * Probabilistic Outputs for Support Vector Machines and Comparisons to
+      Regularized Likelihood Methods, J. Platt, (1999)
 
     .. [4] Predicting Good Probabilities with Supervised Learning,
-          A. Niculescu-Mizil & R. Caruana, ICML 2005
+           A. Niculescu-Mizil & R. Caruana, ICML 2005
 
     .. [5] On the combination of forecast probabilities for
-         consecutive precipitation periods. Wea. Forecasting, 5, 640–
-         650., Wilks, D. S., 1990a
+           consecutive precipitation periods. Wea. Forecasting, 5, 640–650.,
+           Wilks, D. S., 1990a