GitHub - lunarforge/cs at fc66547c2ca48ee871294365e01f216d7b1a291d

codespelunker (cs)

Example search that uses all current functionality
cs t NOT something test~1 "ten thousand a year" "/pr[e-i]de/"

BUGS
search for cs --hidden --no-gitignore --no-ignore 英文 cuts in the middle of a rune
score from TF/IDF appears to be negative in some cases (overflow??)
searches in TUI for very large directories clobber each other making UI unresponsive
highlight on windows command line not escaped correctly

http://localhost:8080/?q=test&ss=300 bug where display is all yellow

go install && cs --hidden --no-gitignore --no-ignore ten thousand a year <<<<< out of range BUG

NB for the above need to modify to not care about the searchparams length IE remove the condition

        // Only if the score is 0 AND we have a single search param do we
		// consider looking at the filename
		if res.Score == 0 && len(f.searchParams) == 1 {
			matchFilename(f, res)
		}

TODO
clean up parser so multiple spaces aren't tokens or flag em to be ignored
add proximity search "this is"~5 which means they need to be within 5 bytes of each other
add limit to number of results
investigate string match limit (might be wrong for unicode insensitive)
JSON endpoint for HTTP
JSON output for cli
Save to disk output

MAYBE

HTML parser option
https://stackoverflow.com/questions/44441665/how-to-extract-only-text-from-html-in-golang

template example (from root)

cs -d --template-display ./asset/templates/display.tmpl --template-search ./asset/templates/search.tmpl

example, vendor/github.com/rivo/

searching for tab key usage with shift modifier, searched for keytab using ag/rg/ack and nothing useful try using cs and its right at the top

BurntSushi/ripgrep#95

Snippet generation

https://stackoverflow.com/questions/2829303/given-a-document-select-a-relevant-snippet https://stackoverflow.com/questions/282002/c-sharp-finding-relevant-document-snippets-for-search-result-display

https://levelup.gitconnected.com/create-your-own-expression-parser-d1f622077796

https://blog.golang.org/normalization https://news.ycombinator.com/item?id=6806062 https://groups.google.com/forum/#!topic/golang-nuts/Il2DX4xpW3w https://www.reddit.com/r/javascript/comments/9i455b/why_is_%C3%9Ftouppercase_equal_to_ss/ $ rg -i --debug ß DEBUG|grep_regex::literal|/home/bboyter/.cargo/registry/src/github.com-1ecc6299db9ec823/grep-regex-0.1.5/src/literal.rs:59: literal prefixes detected: Literals { lits: [Complete(ß), Complete(ẞ)], limit_size: 250, limit_class: 10 }

head -c200000000 /dev/urandom > 200mb.txt

https://about.sourcegraph.com/blog/going-beyond-regular-expressions-with-structural-code-search/

https://github.com/sourcegraph/src-cli

https://arxiv.org/pdf/1904.03061.pdf

https://www.researchgate.net/publication/4004411_Topic_extraction_from_news_archive_using_TFPDF_algorithm

A number of term-weighting schemes have derived from tf–idf. One of them is TF–PDF (Term Frequency * Proportional Document Frequency).[14] TF–PDF was introduced in 2001 in the context of identifying emerging topics in the media. The PDF component measures the difference of how often a term occurs in different domains. Another derivate is TF–IDuF. In TF–IDuF,[15] idf is not calculated based on the document corpus that is to be searched or recommended. Instead, idf is calculated on users' personal document collections. The authors report that TF–IDuF was equally effective as tf–idf but could also be applied in situations when, e.g., a user modeling system has no access to a global document corpus.

Mostly about ranking/highlighting snippet extraction links

NB problem with most snippet stuff is that it is designed to work on whole words or full word matches not partial matches such as the ones cs supports However this is designed to work like that https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/

https://www.hathitrust.org/blogs/large-scale-search/practical-relevance-ranking-11-million-books-part-3-document-length-normali https://www.quora.com/How-does-BM25-work https://github.com/apache/lucene-solr/blob/master/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java https://lucene.apache.org/core/7_0_0/highlighter/org/apache/lucene/search/vectorhighlight/package-summary.html https://www.compose.com/articles/how-scoring-works-in-elasticsearch/ https://blog.softwaremill.com/6-not-so-obvious-things-about-elasticsearch-422491494aa4 https://github.com/elastic/elasticsearch/blob/master/docs/reference/search/request/highlighting.asciidoc#unified-highlighter https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-highlighting.html#unified-highlighter http://www.public.asu.edu/~candan/papers/wi07.pdf https://faculty.ist.psu.edu/jessieli/Publications/WWW10-ZLi-KeywordExtract.pdf https://www.researchgate.net/publication/221299008_Fast_generation_of_result_snippets_in_web_search https://arxiv.org/pdf/1904.03061.pdf https://web.archive.org/web/20141230232527/http://rcrezende.blogspot.com/2010/08/smallest-relevant-text-snippet-for.html https://stackoverflow.com/questions/282002/c-sharp-finding-relevant-document-snippets-for-search-result-display https://stackoverflow.com/questions/2829303/given-a-document-select-a-relevant-snippet

$ cs "ten thousand a year" && cs "Ten thousand a year" prideandprejudice.txt (-1.386) … features, noble mien, and the report which was in general circulation within five minutes after his entrance, of his having ten thousand a year. The gentlemen pronounced him to be a fine figure of a man, the ladies declared he was much handsomer than Mr. Bingley, and h…

prideandprejudice.txt (-1.386) …before. I hope he will overlook it. Dear, dear Lizzy. A house in town! Every thing that is charming! Three daughters married! Ten thousand a year! Oh, Lord! What will become of me. I shall go distracted.”

  This was enough to prove that her approbation need not be

…

Name		Name	Last commit message	Last commit date
Latest commit History 364 Commits
asset		asset
file		file
processor		processor
string		string
vendor		vendor
.gitignore		.gitignore
.ignore		.ignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Gopkg.lock		Gopkg.lock
Gopkg.toml		Gopkg.toml
LICENSE		LICENSE
README.md		README.md
UNLICENSE		UNLICENSE
check.sh		check.sh
large-search.gif		large-search.gif
main.go		main.go
sc.gif		sc.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codespelunker (cs)

About

Releases

Packages

Languages

License

lunarforge/cs

Folders and files

Latest commit

History

Repository files navigation

codespelunker (cs)

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages