Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread safety when resolving at the same time on different threads. #574

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Sali0m
Copy link

@Sali0m Sali0m commented Nov 5, 2024

Hello, as mentioned in this issue: #547
We encountered on our project some crashes when we try to resolve some dependencies on different threads.

Here is what the stack trace of the different threads looks like (second one is the one crashing in this case).
Capture d’écran 2024-11-05 à 14 14 48
image

Using the previously used ThreadSafeDictionary seem to fix the issue.

@Sali0m Sali0m changed the title Fix ThreadSafety when registering and resolving at the same time on different threads. Fix thread safety when registering and resolving at the same time on different threads. Nov 5, 2024
@Sali0m Sali0m changed the title Fix thread safety when registering and resolving at the same time on different threads. Fix thread safety when resolving at the same time on different threads. Nov 5, 2024
@marcellyluise
Copy link

marcellyluise commented Jan 13, 2025

Hi guys! @Sali0m, would you believe that is missing just the lint and spm-check to be able to merge this Pull Request? I'm asking it because my application is needing this fix too. I'd be happy to help merging this PR.

@Sali0m
Copy link
Author

Sali0m commented Jan 14, 2025

Hi guys! @Sali0m, would you believe that is missing just the lint and spm-check to be able to merge this Pull Request? I'm asking it because my application is needing this fix too. I'd be happy to help merging this PR.

I have no idea why the checks haven't run tbh, if you have any idea how I can make them run, please do tell :)

@Sali0m
Copy link
Author

Sali0m commented Jan 14, 2025

Hi guys! @Sali0m, would you believe that is missing just the lint and spm-check to be able to merge this Pull Request? I'm asking it because my application is needing this fix too. I'd be happy to help merging this PR.

I need an approval from a maintainer for the actions to run

@marcellyluise
Copy link

marcellyluise commented Jan 14, 2025

Hi guys! @Sali0m, would you believe that is missing just the lint and spm-check to be able to merge this Pull Request? I'm asking it because my application is needing this fix too. I'd be happy to help merging this PR.

I need an approval from a maintainer for the actions to run

Oh I see. Let's try adding them to this thread: @yoichitgy, @jakubvano, @1ucas and @maxim-chipeev you guys are the top 4 commiters here. Any chance any of you are also a maintainer to take a look in this PR? We need a maintainer approval so the actions run to validate this pull request.

@maxim-chipeev
Copy link
Contributor

maxim-chipeev commented Jan 14, 2025

Hi, thank you for contributing and for pinging me! Could you share a bit more on the nature of the problem? ThreadSafeDictionary is not performant enough for the needs of very-large codebases.

Is there a chance you can reproduce the issue with tests?

@Sali0m
Copy link
Author

Sali0m commented Jan 15, 2025

Hi, thank you for contributing and for pinging me! Could you share a bit more on the nature of the problem? ThreadSafeDictionary is not performant enough for the needs of very-large codebases.

Is there a chance you can reproduce the issue with tests?

I'll try to find some time to make a reproduction project so you can better understand the issue, but since it's been a few months, it may take a while for me to remember how to reproduce it.

What I can remember is that we were resolving a few things on different thread while there may have been a register too, and one of the dictionary then crashed.

@nathan-batista
Copy link

nathan-batista commented Jan 15, 2025

Hello @maxim-chipeev.
I'm facing the same issue and here is a code snippet of a test that is able to replicate the issue.
I don't know if their use case is the same as mine but the cause would be the same, an access to the services dictionary from different threads.

func test_SwinjectCrash() {
        let numberOfQueues = 25
        var expectations = [XCTestExpectation]()
        let resolutionDate = Date() + 1
        
        struct TestModel {
            let name = "Testing"
        }
        
        let container = Container()
        
        // The issue of read-write access happens due to for example
        // when using the synchronized container in one thread and the original container on another
        // The synchronized container will call syncIfEnabled, blocking anyone that tries to get the mutex
        // But since the method syncIfEnabled only verifies if the mutex is available if the container is marked as
        // synchronized, if the other thread calls the original container then the synchronized flag is false and then
        // the syncIfEnabled ignores the mutex proceeding with the execution.
        
        // This happens due to the synchronyzed container calling parent?.getEntry, accessing the dictionary of the  original container if the service isn't found in the new synchronized container(which will happen almost 100% of the time since we usually add the services to the original container)
        
        // This could also happen when resolving if you use a synchronized container in one thread and the original container in another
        
        // In our case we have some registrations that are done at a later moment, only when needed. Since some functionalities in the app aren't used very often, registering them all at the beginning would only be a burden to the app startup.
        
        // We also don't use a synchronyzed container by default when registering since this isn't recommended
        // and this would probably degrade our performance since it would wait the mutex to execute the whole
        // register/resolve function instead of waiting the mutex only to access the dictionary as in the ThreadSafeDictionary implementation
        
        for _ in 0..<numberOfQueues {
            DispatchQueue.global().async {
                // Only an example to force the crash.
                // This doesn't happen with ThreadSafeDictionary since the implementation would always verify
                // the mutex before accessing the dictionary
                Thread.sleep(until: resolutionDate)
                container.register(TestModel.self) { _ in
                    TestModel()
                }
            }
        }
        
        for _ in 0..<numberOfQueues {
            let expectation = expectation(description: "Should have been resolved")
            expectations.append(expectation)
            DispatchQueue.global().async { [expectation] in
                Thread.sleep(until: resolutionDate)
                _ = container.synchronize().resolve(TestModel.self)
                expectation.fulfill()
            }
        }
        
        wait(for: expectations, timeout: 10)
    }

@maxim-chipeev
Copy link
Contributor

Hello @maxim-chipeev. I'm facing the same issue and here is a code snippet of a test that is able to replicate the issue. I don't know if their use case is the same as mine but the cause would be the same, an access to the services dictionary from different threads.

` func test_SwinjectCrash() { let numberOfQueues = 25 var expectations = XCTestExpectation let resolutionDate = Date() + 1

    struct TestModel {
        let name = "Testing"
    }
    
    let container = Container()
    
    // The issue of read-write access happens due to for example
    // when using the synchronized container in one thread and the original container on another
    // The synchronized container will call syncIfEnabled, blocking anyone that tries to get the mutex
    // But since the method syncIfEnabled only verifies if the mutex is available if the container is marked as
    // synchronized, if the other thread calls the original container then the synchronized flag is false and then
    // the syncIfEnabled ignores the mutex proceeding with the execution.
    
    // This happens due to the synchronyzed container calling parent?.getEntry, accessing the dictionary of the  original container if the service isn't found in the new synchronized container(which will happen almost 100% of the time since we usually add the services to the original container)
    
    // This could also happen when resolving if you use a synchronized container in one thread and the original container in another
    
    // In our case we have some registrations that are done at a later moment, only when needed. Since some functionalities in the app aren't used very often, registering them all at the beginning would only be a burden to the app startup.
    
    // We also don't use a synchronyzed container by default when registering since this isn't recommended
    // and this would probably degrade our performance since it would wait the mutex to execute the whole
    // register/resolve function instead of waiting the mutex only to access the dictionary as in the ThreadSafeDictionary implementation
    
    for _ in 0..<numberOfQueues {
        DispatchQueue.global().async {
            // Only an example to force the crash.
            // This doesn't happen with ThreadSafeDictionary since the implementation would always verify
            // the mutex before accessing the dictionary
            Thread.sleep(until: resolutionDate)
            container.register(TestModel.self) { _ in
                TestModel()
            }
        }
    }
    
    for _ in 0..<numberOfQueues {
        let expectation = expectation(description: "Should have been resolved")
        expectations.append(expectation)
        DispatchQueue.global().async { [expectation] in
            Thread.sleep(until: resolutionDate)
            _ = container.synchronize().resolve(TestModel.self)
            expectation.fulfill()
        }
    }
    
    wait(for: expectations, timeout: 10)
}

`

Hi Nathan, thank you for sharing this snippet—from what I can see that does look unsafe. The Container returned by .synchronize() should be the only one referred to by business logic.

What I can do alternatively to this PR, is remove the "synchronization" logic that seems to be confusing—and make Containers inherently threadsafe to access.

Can you try if it crashes with the following being the only synchronize call?

let container = Container().synchronize()

@nathan-batista
Copy link

Hello @maxim-chipeev.
I already tried the following approach:

let container = Container()
let synchronizedContainer = container.synchronize()

for _ in 0..<numberOfQueues {
    DispatchQueue.global().async {
        Thread.sleep(until: resolutionDate)
        container.register(TestModel.self) { _ in
            TestModel()
        }
    }
}

for _ in 0..<numberOfQueues {
    let expectation = expectation(description: "Should have been resolved")
    expectations.append(expectation)
    DispatchQueue.global().async {
        Thread.sleep(until: resolutionDate)
        _ = synchronizedContainer.resolve(TestModel.self)
        expectation.fulfill()
    }
}

But the solution above crashes the app.

The only way to completely solve this issue by using the synchronize() function would be to do as below:

let container = Container().synchronize() as! Container
for _ in 0..<numberOfQueues {
    DispatchQueue.global().async {
        Thread.sleep(until: resolutionDate)
        container.register(TestModel.self) { _ in
            TestModel()
        }
    }
}
for _ in 0..<numberOfQueues {
    let expectation = expectation(description: "Should have been resolved")
    expectations.append(expectation)
    DispatchQueue.global().async {
        Thread.sleep(until: resolutionDate)
        _ = container.resolve(TestModel.self)
        expectation.fulfill()
    }
}

But as i commented in the code snippet i sent previously i believe the performance of this approach would be even worse than using the ThreadSafeDictionary since the whole register and resolve block would be executed by the syncIfEnabled function, then each register/resolve call would have to wait the mutex to be available instead of waiting only the access to the dictionary to be available.

@maxim-chipeev
Copy link
Contributor

maxim-chipeev commented Jan 15, 2025

Why not the following?

let container = Container().synchronize()

for _ in 0..<numberOfQueues {
    DispatchQueue.global().async {
        Thread.sleep(until: resolutionDate)
        container.register(TestModel.self) { _ in
            TestModel()
        }
    }
}

What I'm seeing is a concurrent registration on a non-threadsafe Container. That should be expected to crash.

@maxim-chipeev
Copy link
Contributor

But as i commented in the code snippet i sent previously i believe the performance of this approach would be even worse than using the ThreadSafeDictionary since the whole register and resolve block would be executed by the syncIfEnabled function, then each register/resolve call would have to wait the mutex to be available instead of waiting only the access to the dictionary to be available.

I see, what I'm looking to know then is whether registration is something you commonly see in your codebase long after app-launch? In our cases all of our registrations happen at the beginning of the app lifecycle.

@nathan-batista
Copy link

nathan-batista commented Jan 15, 2025

But as i commented in the code snippet i sent previously i believe the performance of this approach would be even worse than using the ThreadSafeDictionary since the whole register and resolve block would be executed by the syncIfEnabled function, then each register/resolve call would have to wait the mutex to be available instead of waiting only the access to the dictionary to be available.

I see, what I'm looking to know then is whether registration is something you commonly see in your codebase long after app-launch? In our cases all of our registrations happen at the beginning of the app lifecycle.

In my use case since i have different modules we only register the essential and the most used features in the beginning of the app life cycle.
Services from modules that aren't used very often are only registered when needed.
Let's say i have the ModuleA, ModuleB and ModuleC in which ModuleA and ModuleB are essential for my app but ModuleC isn't used very often, in this case i register ModuleA and ModuleB at the beginning of the app life cycle but i only register the ModuleC dependencies when the user starts a journey where this module is needed since there is a possibility those dependencies won't ever be needed if the user doesn't start this journey

This could also be the case for apps which have features enabled by feature flags or paid features by user.

And using the synchronized container only for the "lazy registration" where the registrations are done at a later moment isn't something recommended at least according to the function documentation since this could cause different different graph scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants