Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need a core idea doc #19

Open
timm opened this issue Jun 3, 2024 · 0 comments
Open

need a core idea doc #19

timm opened this issue Jun 3, 2024 · 0 comments

Comments

@timm
Copy link
Owner

timm commented Jun 3, 2024

surfing the long tail

where there is little date

compression is intelligence

  • 1GB picture of a straight line can be condesed to m,b o y=mx+b
  • better yet, condense to two end points
    • now we have anomaly detector (anything off the line between them, anything away from our two poles)
    • now we have runtime certification: summarize the training data, complain when runtime data falls outside the space of things seen during training
    • and now we have a compression algorithm (anything new thata aint an anomaly can be ignore)
    • and now we have on-line learning. if anomalies, recluster that region of the daa

of course, in practice, we'll need more than 2 points. care to guess how many? often less than 100 (to map out 50 lines)

less is more

  • not the best thing
  • but things statisitcally indistinguishable from the best
  • e.g.
    • $N(\mu=0, \sigma=1)$ effectively runes -3 to 3.
    • Cohen's rule says anything closer than $0.35*\sigma$ is different by a small effect of less
    • $0.35/(3 - -3)\approx 5$%. so there are only 17 statistically significant different solutions
    • according to Hamlet the number of random samples needed to be 95% certain of finding something with
      p=0.05 is
      • $n(C=0.95, p=0.05) = \log(1-C)/\log(1-p) \approx 49$
  • And if had some smart hueristic to sort that being better than that, we apply $\log_2$ to the above.
    • so, with some smarts, we can explore the world with $\log_2(49)\approx 6$.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant