Skip to content

gdal_config option doesn't work for threaded readers #186

Open
@vincentsarago

Description

We already tried to solve this using apiroute_factory
https://developmentseed.org/titiler/concepts/APIRoute_and_environment_variables/

from fastapi import FastAPI, APIRouter
from rasterio._env import get_gdal_config
from titiler.custom.routing import apiroute_factory
from titiler.endpoints.factory import TilerFactory

app = FastAPI()
route_class = apiroute_factory({"GDAL_DISABLE_READDIR_ON_OPEN": "FALSE"})
router = APIRouter(route_class=route_class)

tiler = TilerFactory(router=router)

While this approach worked for a while, it didn't seems to work in python 3.8 and felt quite of a hack.

We then switched and added gdal_config options in the factory (ref #170). I though this worked ... but I was wrong.

In FastAPI, when you define a function with a simple def myfunction(..., fastapi will use starlette's run_in_threadpool to run the function (so the API isn't blocked). This makes the function to be run in a thread ... which isn't the MainThread, and this is important. In Rasterio, when using with rasterio.Env( block, rasterio will check if we are running it in the MainThread or not (https://github.com/mapbox/rasterio/blob/master/rasterio/_env.pyx#L158-L161) and then use different function to set the GDAL config.

The problem here seems that because we use simple def we run the endpoint function in a sub Thread (ThreadPoolExecutor-X_X) and then the env cannot be forwarded to threads within the threads... (yes threads within threads seems bad anyway).

It's kinda hard for me to explain so here is an example

from concurrent import futures
import rasterio
from rasterio._env import get_gdal_config
import threading

from starlette.concurrency import run_in_threadpool

def f(r=None):
    return get_gdal_config("GDAL_DISABLE_READDIR_ON_OPEN"), str(threading.current_thread())


print()
print("1 - simple - OK")
with rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN="FALSE"):
    print(f())

print()
print("2 - async simple - OK")
with rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN="FALSE"):
    print(await run_in_threadpool(f))

def g():
    print("Where am I: " + str(threading.current_thread())) # print what thread is used when calling rasterio.Env
    with rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN="FALSE"):
        with futures.ThreadPoolExecutor() as executor:
            return list(executor.map(f, range(1)))[0]

print()
print("3 - simple multi threads - OK")
print(g())

print()
print("4 - async multithread - NOK") 
print(await run_in_threadpool(g))
1 - simple - OK
('FALSE', '<_MainThread(MainThread, started 4487112128)>')

2 - async simple - OK
('FALSE', '<Thread(ThreadPoolExecutor-0_13, started daemon 123145455685632)>')

3 - simple multi threads - OK
Where am I: <_MainThread(MainThread, started 4487112128)>
('FALSE', '<Thread(ThreadPoolExecutor-23_0, started daemon 123145492471808)>')

4 - async multithread - NOK
Where am I: <Thread(ThreadPoolExecutor-0_4, started daemon 123145408389120)>
('EMPTY_DIR', '<Thread(ThreadPoolExecutor-24_0, started daemon 123145492471808)>')

☝️ where we use run_in_threadpool we simulate the actual setting in titiler (def), fastAPI doesn't use run_in_threadpool when using async def.

Fix ?

  1. use async def for functions definition

If you are using a third party library that communicates with something (a database, an API, the file system, etc) and doesn't have support for using await, (this is currently the case for most database libraries), then declare your path operation functions as normally, with just def, like:

FastAPI docs seems to say we shouldn't (and I think @geospatial-jeff told the same in the past)

  1. see if we can change rasterio (🙅)

Not an expert here, but I guess there is a good reason to use CPLSetConfigOption only in MainThread (cc @sgillies, sorry for the ping but it's only a FYI)

ref: rasterio/rasterio#1012 & rasterio/rasterio#997

  1. change rio-tiler reader to forward gdal config 🤷‍♂️

Seems the safest choice 😭

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions