Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scrubbing all chunks on a chunkserver for CRC errors #624

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

guestisp
Copy link

By sending a SIGUSR2 to a chunkserver process, a scrub loop is started (or aborted).
Scrub will check all chunks for any inconsistencies (crc errors) and will mark any failed chunks as damaged, forcing LizardFS to fix them

@psarna
Copy link
Member

psarna commented Nov 12, 2017

The first few red lights:

  1. This code runs in a thread, but it's not thread-safe at all. We have locks for our directory structure (folderlock), for our chunk hash table (hashlock) and others. If you don't use them, you'll end up with several threads operating on the same data structure without synchronization. That leads to memory corruption and no consistency.

1a. Also, remember that when you do add locks, they should be acquired for as little time as possible, because taking a lock for a long time leads to huge performance drop (other threads would do nothing until you release it).

  1. Instead of
+    // SIGUSR2 could be signal number 31, 12 or 17 based on platform
+    if ( signal == 31 || signal == 12 || signal == 17 ) {

it's just

+    if (signal == SIGUSR2) {

if somebody uses a chunkserver on one system and compiles it on a system with totally different signal numbers, then it's not something we should care about. This one would unnecessarily reserve 3 signal numbers for one purpose.

  1. There's a (very big) memory leak. You allocate "cssorttab" structure, which has the size of all chunks in the directory, via malloc, but it's never freed with free.

  2. cssorttab allocation is also redundant here - you could just take chunks straight from the test list.

  3. Finally, this code is very similar to test thread, so another solution would be just to rewrite current test thread code, so it could check more than 1 chunk per second. Then, starting a "scrub" would just be turning up test limits to something like 1000 chunks per second.

To sum up, I think it's better if "scrubbing" is implemented by amending existing test thread, not adding another one.

@guestisp
Copy link
Author

Thank you for your advice, as wrote in other thread, i'm not a C programmer.
First version of scrub was made by using the test thread, i've dismissed that version due to some issue (that honestly, i don't remember anymore)

I'll try again to reuse the tester thread, that would be a better solution for sure.

@guestisp
Copy link
Author

guestisp commented Nov 12, 2017

One of the issue I had, was how to detect the end of the loop, in other words: how can I know that all chunks was scrubbed, inside the tester thread ?

I'm refactoring the whole procedure, but insted of increasing the chunks per second, i'm using a variable to see if scrub is running. If is running, no "rate limit" is applied and all chunks are checked in one after the other

@psarna
Copy link
Member

psarna commented Nov 13, 2017

It's hard to define "all chunks", because number of chunks constantly changes on a working installation.

I already have an old commit that makes test_thread more frequent, I'll dig it out and post it here. From there, it's not hard to create a script that changes chunkserver config, reloads, and thus turns on "scrubbing".

After that, there are several ways to check if "all" chunks were checked. One of them is to add a counter that resets on reload and can be queried to check how many chunks were already checked. Then, since each chunk is checked once, you can guess when the scrub covered all chunks already.

@psarna
Copy link
Member

psarna commented Nov 13, 2017

@4Dolio
Copy link

4Dolio commented Nov 13, 2017

I still think the easiest scrub is to null read all data from a mount at each chunk server... but thats just me leveraging what's already available. Shrugs.

@guestisp
Copy link
Author

It's hard to define "all chunks", because number of chunks constantly changes on a working installation.

That's why, my first version, used a static array populated on scrub start.

I already have an old commit that makes test_thread more frequent, I'll dig it out and post it here. From there, it's not hard to create a script that changes chunkserver config, reloads, and thus turns on "scrubbing".

This is cool, and should be merged, but is not a proper solution for a scrubbing.
Changing configuration on the fly is error prone, you can script that, but i think it's obvious that having to change a configuration file on a production system is much more risky than triggering a SIGUSR2 (or anything else). Additionally, you won't be able to automatically stop the scrubbing when all chunks where scrubbed. You'll end with a permanent scrub with very low interval, that will be a performance hit.

Our current test system has about 9.000.000 chunks.
Even setting 1 chunks per second, i'll need about 104 days to scrub them all and with 1 chunks per second running forever i'll have a huge performance hit.
Something different is absolutely needed and must report a progress somewhere (like my implementation did, with progress and ETA), because if you have a scrub running for many days, you must be notified about it's progress, like with a RAID controller:

11/13/17  7:13:59: ld sync: all LDs sync'd:53:  66=Consistency Check started on VD 00/0
11/13/17  7:13:59: ld sync: all LDs sync'd:59:  65=Consistency Check progress on VD 00/0 is 0.00%(6s)
11/13/17  7:14:16: EVT#22150-11/13/17  7:14:16:  65=Consistency Check progress on VD 00/0 is 0.24%(23s)
11/13/17  7:14:50: EVT#22152-11/13/17  7:14:50:  65=Consistency Check progress on VD 00/0 is 0.78%(57s)
11/13/17  7:15:24: EVT#22154-11/13/17  7:15:24:  65=Consistency Check progress on VD 00/0 is 1.24%(91s)
11/13/17  7:27:14: EVT#22187-11/13/17  7:27:14:  65=Consistency Check progress on VD 00/0 is 10.20%(801s)
11/13/17  7:28:28: EVT#22189-11/13/17  7:28:28:  65=Consistency Check progress on VD 00/0 is 10.69%(875s)
11/13/17  7:29:05: EVT#22191-11/13/17  7:29:05:  65=Consistency Check progress on VD 00/0 is 11.23%(912s)

After that, there are several ways to check if "all" chunks were checked. One of them is to add a counter that resets on reload and can be queried to check how many chunks were already checked. Then, since each chunk is checked once, you can guess when the scrub covered all chunks already.

That what I did, here: https://github.com/guestisp/lizardfs/blob/a7b3f02c7b7564da8b6c481bc805903f7425e391/src/chunkserver/hddspacemgr.cc#L2975

but if chunks always change (as it should be), the current tester implementation will never finish (that's good for the tester thread). The scrub should finish. What I'm trying to do is to fetch a list of existing chunks at a given time (the time of scrubbing start), loop through them and then stop at the end.

I can read the first chunk path, then go on until the same chunk is seen again. When we see the first scrubbed chunk for the second time, all chunks where scrubbed

I still think the easiest scrub is to null read all data from a mount at each chunk server... but thats just me leveraging what's already available. Shrugs.

Yes, but it's the same for ZFS. So the native ZFS scrub is nonsense, as you can easily read all data from disk (ZFS will trigger a scrub on every read).
Data security/reliability must not be delegated to external system.

@guestisp
Copy link
Author

Ok, still trying to implement this feature by resuing hdd_tester_thread here: https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2847

If I understood properly, this is a huge infinite loop
https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2859
that will cycle all chunks here https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2897 and wait up to the defined interval: https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2906

Is this true ?

Now, what I'm unable to understand is what the following code does:
https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2876-L2893

variables are too "short" and not self-explaining

Another question: is that thread self-updating when chunks are added or removed ? How can I "fix" that ? I don't want to create an infinite loop for the scrub. Scrubbing should start, scrub what already exist at the time of start, and then finish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants