Skip to content

BUG: Crash (access violation) running GeoDataFrame.dissolve only on Windows Server 2022 + Python 3.8/3.9 #2766

Closed
@l0b0

Description

Disclaimer: I don't have a Windows machine at work to test on (only GitHub runners) and I don't know enough about Geopandas, GDAL, Jupyter, Windows memory models, etc. to reduce this to a minimal example in a reasonable amount of time. If that means this issue will be closed, then that's OK, and sorry if this is considered wasting your time - I'll simply not support Windows for the time being. Others at Toitū Te Whenua LINZ have expressed an interest in getting GDAL etc. running "properly" in Windows without installing things like QGIS, so the motivation goes beyond just "I want to support as many platforms as possible".

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of geopandas.
  • (optional) I have confirmed this bug exists on the main branch of geopandas.

Code Sample, a copy-pastable example

This branch demonstrates the issue. The code has been extracted with jupyter nbconvert --to=script sentinel2_water_extraction.ipynb. Also relevant is the GitHub workflow running all this.

Problem description

This seems to be the relevant part of the full build log:

Traceback (most recent call last):
  File "C:\Miniconda\envs\__setup_conda\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Miniconda\envs\__setup_conda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Miniconda\envs\__setup_conda\Scripts\jupyter-nbconvert.EXE\__main__.py", line 7, in <module>
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\jupyter_core\application.py", line 276, in launch_instance
    return super().launch_instance(argv=argv, **kwargs)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\traitlets\config\application.py", line 992, in launch_instance
    app.start()
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 426, in start
    self.convert_notebooks()
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 600, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 563, in convert_single_notebook
    output, resources = self.export_single_notebook(
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 491, in export_single_notebook
    output, resources = self.exporter.from_filename(
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 190, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 207, in from_file
    return self.from_notebook_node(
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\notebook.py", line 35, in from_notebook_node
    nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 147, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 342, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\preprocessors\base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\preprocessors\execute.py", line 91, in preprocess
    self.preprocess_cell(cell, resources, index)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\preprocessors\execute.py", line 112, in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\jupyter_core\utils\__init__.py", line 160, in wrapped
    return loop.run_until_complete(inner)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nest_asyncio.py", line 90, in run_until_complete
    return f.result()
  File "C:\Miniconda\envs\__setup_conda\lib\asyncio\futures.py", line 201, in result
    raise self._exception
  File "C:\Miniconda\envs\__setup_conda\lib\asyncio\tasks.py", line 256, in __step
    result = coro.send(None)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbclient\client.py", line 1021, in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
  File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbclient\client.py", line 915, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
SIEVE_THRESHOLD = 10

for image_with_water in IMAGES:
    # Mask out all values below the mndwi threshold (these are not water)
    water_mask = np.ma.masked_less(  # type: ignore[no-untyped-call]
        image_with_water["mndwi"], MNDWI_THRESHOLD
    )

    # Extract the polygons from the mask
    water_polygons = rasterio.features.shapes(
        source=water_mask.astype("uint8"),
        transform=image_with_water["transform_window"],
    )

    # Add all polygons to the list
    water_polygons = list(water_polygons)

    # Extract the polygon coordinates and values from the list
    polygons = [polygon for polygon, value in water_polygons]
    values = [str(int(value)) for polygon, value in water_polygons]

    # Convert polygons into a shapely.shape
    polygons = [shape(polygon) for polygon in polygons]

    # Create a geopandas dataframe populated with the polygon shapes
    water_geodataframe = gpd.GeoDataFrame(
        {"is_water": values, "geometry": polygons}, crs=image_with_water["src_crs"]
    )

    # Dissolve all records into two records. is water / is not water
    is_water = water_geodataframe.dissolve(by="is_water")

    # Select only the water records
    water = is_water.query("is_water=='0'")

    # Export the polygons
    water.to_file(
        os.path.join(
            OUTPUT_DIRECTORY, f"water_mndwi_{image_with_water['image_id']}.shp"
        )
    )

    # Export the accompanying Sentinel-2 image
    export_raster(image_with_water)

    # Export working steps if desired
    image_with_water["water"] = water
    image_with_water["water_mask"] = water_mask
    if len(water.area) == 1:
        image_with_water["area"] = water.area[0] / 1000000  # in km2
    else:
        image_with_water["area"] = 0
------------------

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[12], line 31
     26 water_geodataframe = gpd.GeoDataFrame(
     27     {"is_water": values, "geometry": polygons}, crs=image_with_water["src_crs"]
     28 )
     30 # Dissolve all records into two records. is water / is not water
---> 31 is_water = water_geodataframe.dissolve(by="is_water")
     33 # Select only the water records
     34 water = is_water.query("is_water=='0'")

File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\geodataframe.py:1684, in GeoDataFrame.dissolve(self, by, aggfunc, as_index, level, sort, observed, dropna)
   1681     merged_geom = block.unary_union
   1682     return merged_geom
-> 1684 g = self.groupby(group_keys=False, **groupby_kwargs)[self.geometry.name].agg(
   1685     merge_geometries
   1686 )
   1688 # Aggregate
   1689 aggregated_geometry = GeoDataFrame(g, geometry=self.geometry.name, crs=self.crs)

File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\generic.py:265, in SeriesGroupBy.aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    262     return self._python_agg_general(func, *args, **kwargs)
    264 try:
--> 265     return self._python_agg_general(func, *args, **kwargs)
    266 except KeyError:
    267     # TODO: KeyError is raised in _python_agg_general,
    268     #  see test_groupby.test_basic
    269     result = self._aggregate_named(func, *args, **kwargs)

File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\groupby.py:1332, in GroupBy._python_agg_general(self, func, *args, **kwargs)
   1328 name = obj.name
   1330 try:
   1331     # if this function is invalid for this dtype, we will ignore it.
-> 1332     result = self.grouper.agg_series(obj, f)
   1333 except TypeError:
   1334     warnings.warn(
   1335         f"Dropping invalid columns in {type(self).__name__}.agg "
   1336         "is deprecated. In a future version, a TypeError will be raised. "
   (...)
   1340         stacklevel=3,
   1341     )

File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\ops.py:1047, in BaseGrouper.agg_series(self, obj, func, preserve_dtype)
   1040     result = self._aggregate_series_pure_python(obj, func)
   1042 elif not isinstance(obj._values, np.ndarray):
   1043     # _aggregate_series_fast would raise TypeError when
   1044     #  calling libreduction.Slider
   1045     # In the datetime64tz case it would incorrectly cast to tz-naive
   1046     # TODO: can we get a performant workaround for EAs backed by ndarray?
-> 1047     result = self._aggregate_series_pure_python(obj, func)
   1049     # we can preserve a little bit more aggressively with EA dtype
   1050     #  because maybe_cast_pointwise_result will do a try/except
   1051     #  with _from_sequence.  NB we are assuming here that _from_sequence
   1052     #  is sufficiently strict that it casts appropriately.
   1053     preserve_dtype = True

File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\ops.py:1104, in BaseGrouper._aggregate_series_pure_python(self, obj, func)
   1098 splitter = get_splitter(obj, ids, ngroups, axis=0)
   1100 for i, group in enumerate(splitter):
   1101
   1102     # Each step of this loop corresponds to
   1103     #  libreduction._BaseGrouper._apply_to_group
-> 1104     res = func(group)
   1105     res = libreduction.extract_result(res)
   1107     if not initialized:
   1108         # We only do this validation on the first iteration

File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\groupby.py:1318, in GroupBy._python_agg_general.<locals>.<lambda>(x)
   1315 @final
   1316 def _python_agg_general(self, func, *args, **kwargs):
   1317     func = com.is_builtin_func(func)
-> 1318     f = lambda x: func(x, *args, **kwargs)
   1320     # iterate through "columns" ex exclusions to populate output dict
   1321     output: dict[base.OutputKey, ArrayLike] = {}

File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\geodataframe.py:1681, in GeoDataFrame.dissolve.<locals>.merge_geometries(block)
   1680 def merge_geometries(block):
-> 1681     merged_geom = block.unary_union
   1682     return merged_geom

File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\base.py:800, in GeoPandasBase.unary_union(self)
    781 @property
    782 def unary_union(self):
    783     """Returns a geometry containing the union of all geometries in the
    784     ``GeoSeries``.
    785
   (...)
    798     POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))
    799     """
--> 800     return self.geometry.values.unary_union()

File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\array.py:650, in GeometryArray.unary_union(self)
    649 def unary_union(self):
--> 650     return vectorized.unary_union(self.data)

File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\_vectorized.py:1034, in unary_union(data)
   1032 data = [g for g in data if g is not None]
   1033 if data:
-> 1034     return shapely.ops.unary_union(data)
   1035 else:
   1036     return None

File C:\Miniconda\envs\__setup_conda\lib\site-packages\shapely\ops.py:161, in CollectionOperator.unary_union(self, geoms)
    159     subs[i] = g._geom
    160 collection = lgeos.GEOSGeom_createCollection(6, subs, L)
--> 161 return geom_factory(lgeos.methods['unary_union'](collection))

OSError: exception: access violation writing 0x0000000000001101

Importantly, the exact same code, using the same Python and package versions (including Python 3.10), runs fine on Linux and macOS.

Expected Output

Output of geopandas.show_versions()

SYSTEM INFO ----------- python : 3.9.13 (main, Oct 13 2022, 21:23:06) [MSC v.1916 64 bit (AMD64)] executable : C:\Miniconda\envs\__setup_conda\python.exe machine : Windows-10-10.0.20348-SP0

GEOS, GDAL, PROJ INFO

GEOS : None
GEOS lib : None
GDAL : 3.5.2
GDAL data dir: C:\Miniconda\envs__setup_conda\lib\site-packages\fiona\gdal_data
PROJ : 9.1.0
PROJ data dir: C:\Miniconda\envs__setup_conda\lib\site-packages\pyproj\proj_dir\share\proj

PYTHON DEPENDENCIES

geopandas : 0.12.2
numpy : 1.23.5
pandas : 1.3.5
pyproj : 3.4.0
shapely : 1.8.5.post1
fiona : 1.8.22
geoalchemy2: None
geopy : None
matplotlib : 3.6.3
mapclassify: None
pygeos : None
pyogrio : None
psycopg2 : None
pyarrow : None
rtree : None

Metadata

Assignees

No one assigned

    Labels

    installationIssues related to get a working installation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions