statmat: added multi linear and lasso regression #1998

aouyang1 · 2024-10-28T00:09:41Z

Hello! First time contributor here. I've been working on some multi-linear regression work lately and wanted to see if some of that could also be integrated here. This only partially solves some of the features requested in #1865 (MISO) but can extend it to more if we want to in this PR. I tried my best to follow similar conventions that I saw in the existing modules, but let me know what I can change to make it more similar.

The 2 new regression structs are OLSRegression and LassoRegression.

OLSRegressions uses a QR decomposition to compute ordinary least squares with multiple features. I'm sure there's more efficient algorithms here.
LassoRegression uses coordinate descent to find the optimal weights.

Each implement the following:

Fit(x, y mat.Matrix) (float64, []float64)
Predict(x mat.Matrix) []float64
Score(x, y mat.Matrix) float64

Inspired by the Python sklearn interface

kortschak

I've taken a brief look through this. There is more API here that I think is necessary. Also there are error returns that do not conform to the approach that we use in Gonum packages; we use error returns for error conditions that the user could not know before calling a function, but panics for cases where the calling parameters do not conform to the documented invariants for the call. Please match this.

I'll take a deeper look in the next week or so.

aouyang1 · 2024-11-04T05:43:35Z

I've taken a brief look through this. There is more API here that I think is necessary. Also there are error returns that do not conform to the approach that we use in Gonum packages; we use error returns for error conditions that the user could not know before calling a function, but panics for cases where the calling parameters do not conform to the documented invariants for the call. Please match this.

I'll take a deeper look in the next week or so.

Thanks for explaining the differences! Will get the errors addressed and converted.

…expected inputs

kortschak

The API is too complex here. I'd estimate that we could make this somewhere between 1/3 and 2/3 of the code that's here by removing the extraneous code.

kortschak · 2024-11-15T07:19:07Z

stat/statmat.go

+// Validate runs basic validation on OLS options
+func (o *OLSOptions) Validate() *OLSOptions {
+	if o == nil {
+		o = NewDefaultOLSOptions()
+	}
+
+	return o
+}


This method doesn't do what is on the tin. Validate implies that it checks that it is correct, this ensures that it is correct. I cannot think of another example where we do something like this in Gonum packages.

kortschak · 2024-11-15T07:19:58Z

stat/statmat.go

+// NewDefaultOLSOptions returns a default set of OLS Regression options
+func NewDefaultOLSOptions() *OLSOptions {
+	return &OLSOptions{
+		FitIntercept: true,
+	}
+}


This seems like more API than we need. Please take a look at how stat.LinearRegression does this. The situation here is a little more complex, but not so much that we need all this. I think a pure function that returns the details that we need to perform predictions and to calculate scores from predictions. The model that we use for the solvers would be appropriate maybe.

Got rid of a lot of the boiler plate. Let me know if this is more of what you were thinking. Took a look at the LinearRegression and the PrincipalComponent methods.

kortschak · 2024-11-15T07:27:57Z

stat/statmat.go

+
+	ym, _ := y.Dims()
+	if ym != m {
+		panic(ErrTargetLenMismatch)


We use mat.ErrShape for this.

kortschak · 2024-11-15T07:28:25Z

stat/statmat.go

+	if x == nil {
+		panic(ErrNoTrainingMatrix)
+	}
+	if y == nil {
+		panic(ErrNoTargetMatrix)
+	}


These can just panic with a nil pointer deref.

kortschak · 2024-11-15T07:48:33Z

stat/statmat.go

+// SoftThreshold returns 0.0 if the value is less than or equal to the gamma input
+func SoftThreshold(x, gamma float64) float64 {
+	res := math.Max(0, math.Abs(x)-gamma)
+	if math.Signbit(x) {
+		return -res
+	}
+	return res
+}


I don't think this needs to be exported.

func softThreshold(x, gamma float64) float64 { switch { case x < -gamma: return x + gamma case gamma < x: return x - gamma default: return 0 } }

good point. will make it private

aouyang1 · 2024-11-19T15:31:16Z

The API is too complex here. I'd estimate that we could make this somewhere between 1/3 and 2/3 of the code that's here by removing the extraneous code.

Sounds good! Will take a crack at simplifying it and model something close to the LinearRegression method.

kortschak

This looks much more manageable.

Initial review only.

kortschak · 2024-11-23T05:59:50Z

stat/statmat_test.go

+				flatten(
+					[][]float64{
+						{0, 0},
+						{3, 5},
+						{9, 20},
+						{12, 6},
+					},
+				),


Suggested change

flatten(

[][]float64{

{0, 0},

{3, 5},

{9, 20},

{12, 6},

},

),

[]float64{

0, 0,

3, 5,

9, 20,

12, 6,

},

(similar throughout and delete flatten)

kortschak · 2024-11-23T06:00:42Z

stat/statmat_test.go

+		intercept float64
+		coef      []float64
+	}{
+		"invalid lambda": {


Suggested change

"invalid lambda": {

"invalid_lambda": {

(similar throughout; it simplifies finding cases)

kortschak · 2024-11-23T06:01:55Z

stat/statmat_test.go

+
+func TestLassoRegression(t *testing.T) {
+	// y = 2 + 3*x0 + 4*x1
+	testData := map[string]struct {


Please don't do this with a map. Use a struct with a name field. Also put the test cases in a global var, lassoRegressionTests, above the TestLassoRegression func decl.

kortschak · 2024-11-23T06:04:14Z

stat/statmat_test.go

 				[]float64{
 					0.8, 0.3, 0.1,
 					0.3, 0.7, -0.1,
-					0.1, -0.1, 7}),
+					0.1, -0.1, 7,
+				}),


Let's leave these formatting changes alone. Please revert this and the changes above in this file.

kortschak · 2024-11-23T06:05:04Z

stat/statmat_test.go

+func TestOLSRegression(t *testing.T) {
+	// y = 2 + 3*x0 + 4*x1
+	testData := map[string]struct {
+		x         *mat.Dense
+		y         *mat.Dense
+		model     OLSModel
+		tol       float64
+		intercept float64
+		coef      []float64
+	}{


Similar here.

kortschak · 2024-11-23T06:05:37Z

stat/statmat_test.go

@@ -321,6 +553,7 @@ func benchmarkCovarianceMatrix(b *testing.B, m mat.Matrix) {
 		CovarianceMatrix(&res, m, nil)
 	}
 }
+


Please revert these formatting changes.

kortschak · 2024-11-23T06:08:15Z

stat/statmat_test.go

@@ -478,3 +718,65 @@ func BenchmarkCorrToCov(b *testing.B) {
 		corrToCov(cc, sigma)
 	}
 }
+
+func BenchmarkLassoRegression(b *testing.B) {


If we're adding benchmarks, they will need to have a timer reset after the set-up, but also the set up doesn't need to do all the work that it is; it should work directly into x, and the matrix constructions should happen outside the benchmark loop.

kortschak · 2024-11-23T06:08:21Z

stat/statmat_test.go

+	}
+}
+
+func BenchmarkOLSRegression(b *testing.B) {


aouyang1 added 2 commits October 27, 2024 16:51

statmat: added multi linear and lasso regression

e8c47db

fixed option validation tests

fed396d

kortschak reviewed Nov 2, 2024

View reviewed changes

conformed to using panics instead of returning errors and documented …

62d66ed

…expected inputs

kortschak reviewed Nov 15, 2024

View reviewed changes

simplified ols and lasso regressions

08ef293

kortschak reviewed Nov 23, 2024

View reviewed changes

aouyang1 added 3 commits November 24, 2024 08:40

addressed comments

f4fda58

fixed formatting changes

c4761db

backed out last formatting changes

88814b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statmat: added multi linear and lasso regression #1998

statmat: added multi linear and lasso regression #1998

aouyang1 commented Oct 28, 2024 •

edited

Loading

kortschak left a comment

aouyang1 commented Nov 4, 2024

kortschak left a comment •

edited

Loading

kortschak Nov 15, 2024

kortschak Nov 15, 2024

aouyang1 Nov 23, 2024

kortschak Nov 15, 2024

kortschak Nov 15, 2024

kortschak Nov 15, 2024

aouyang1 Nov 19, 2024

aouyang1 commented Nov 19, 2024

kortschak left a comment

kortschak Nov 23, 2024

kortschak Nov 23, 2024

kortschak Nov 23, 2024

kortschak Nov 23, 2024

kortschak Nov 23, 2024

kortschak Nov 23, 2024

kortschak Nov 23, 2024

kortschak Nov 23, 2024

statmat: added multi linear and lasso regression #1998

Are you sure you want to change the base?

statmat: added multi linear and lasso regression #1998

Conversation

aouyang1 commented Oct 28, 2024 • edited Loading

kortschak left a comment

Choose a reason for hiding this comment

aouyang1 commented Nov 4, 2024

kortschak left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aouyang1 commented Nov 19, 2024

kortschak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aouyang1 commented Oct 28, 2024 •

edited

Loading

kortschak left a comment •

edited

Loading