Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LazyIO - network and local file reader/writer compatible with go-streams. #10

Open
BigB84 opened this issue May 16, 2023 · 2 comments
Open

Comments

@BigB84
Copy link

BigB84 commented May 16, 2023

Hi!

With Java 8+ it's possible to read file using streams instead of using loop.
This is so cool, as file isn't read entirely to memory.

Now I'm facing this issue with go.
I need to read hundreds of text files with millions of records and process them.

Thus I need reader and writer that will be compatible with this library.
I tried to do so but it's hard considering implementing generic interface IIterator

The strength of java in this case is the fact that streams became standard.

Do you think it's possible to include such io.reader or bufio.scanner to this library?

@jucardi
Copy link
Owner

jucardi commented May 23, 2023

Hi @BigB84 , we can definitely think about a solution for this case. How are you reading the files and how do you intend to iterate over them?

  • Is it a bunch of files and you want to iterate over each of them and the iteration would be the contents of each file?
  • Are you thinking about loading a file and iterating over the contents of that single file line by line or byte by byte?

@BigB84
Copy link
Author

BigB84 commented May 28, 2023

Thanks for reply!

I think it's second case you mentioned.

Actually It's not secret so I can share the actual problem.

I maintain DNS with domain blocklist. Blocklist is built using hostlists.
Hostlists are just text files with domains written line-by-iline but there are plenty of formats. They may be obtained locally from disk or from network by https

Consider just a few formats of hostlists written that way:

127.0.0.1 example.com
127.0.0.1 subdomain.of.example.com
127.0.0.1 another.subdomain.of.example.com
127.0.0.1 something.1.example.com
...
0.0.0.0 example.com
0.0.0.0 subdomain.of.example.com
0.0.0.0 another.subdomain.of.example.com
0.0.0.0 something.2.example.com
...
example.com
subdomain.of.example.com
another.subdomain.of.example.com
something.3.example.com
...

Each one I need to process that way they are cleaned from unwanted expressions (0.0.0.0, 127.0.0.1 etc.).

All set of rules is more complicated, so that's why I use lazy streams for efficient mapping and filtering.

In this case I need reader of type string.
The problem is how to implement such generic reader?

Shall we write two readers for instance: LazyReader of T interface that implements IIterator and LazyLinewiseReader (with fixed T = string) for this specific case?

In the future someone may need to implement reading integers byte by byte so he/she would Implement such reader as LazyIntegerByteReader.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants