BUG: Crash (access violation) running GeoDataFrame.dissolve
only on Windows Server 2022 + Python 3.8/3.9 #2766
Description
Disclaimer: I don't have a Windows machine at work to test on (only GitHub runners) and I don't know enough about Geopandas, GDAL, Jupyter, Windows memory models, etc. to reduce this to a minimal example in a reasonable amount of time. If that means this issue will be closed, then that's OK, and sorry if this is considered wasting your time - I'll simply not support Windows for the time being. Others at Toitū Te Whenua LINZ have expressed an interest in getting GDAL etc. running "properly" in Windows without installing things like QGIS, so the motivation goes beyond just "I want to support as many platforms as possible".
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of geopandas.
- (optional) I have confirmed this bug exists on the main branch of geopandas.
Code Sample, a copy-pastable example
This branch demonstrates the issue. The code has been extracted with jupyter nbconvert --to=script sentinel2_water_extraction.ipynb
. Also relevant is the GitHub workflow running all this.
Problem description
This seems to be the relevant part of the full build log:
Traceback (most recent call last):
File "C:\Miniconda\envs\__setup_conda\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Miniconda\envs\__setup_conda\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Miniconda\envs\__setup_conda\Scripts\jupyter-nbconvert.EXE\__main__.py", line 7, in <module>
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\jupyter_core\application.py", line 276, in launch_instance
return super().launch_instance(argv=argv, **kwargs)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\traitlets\config\application.py", line 992, in launch_instance
app.start()
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 426, in start
self.convert_notebooks()
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 600, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 563, in convert_single_notebook
output, resources = self.export_single_notebook(
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\nbconvertapp.py", line 491, in export_single_notebook
output, resources = self.exporter.from_filename(
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 190, in from_filename
return self.from_file(f, resources=resources, **kw)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 207, in from_file
return self.from_notebook_node(
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\notebook.py", line 35, in from_notebook_node
nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 147, in from_notebook_node
nb_copy, resources = self._preprocess(nb_copy, resources)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\exporters\exporter.py", line 342, in _preprocess
nbc, resc = preprocessor(nbc, resc)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\preprocessors\base.py", line 47, in __call__
return self.preprocess(nb, resources)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\preprocessors\execute.py", line 91, in preprocess
self.preprocess_cell(cell, resources, index)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbconvert\preprocessors\execute.py", line 112, in preprocess_cell
cell = self.execute_cell(cell, index, store_history=True)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\jupyter_core\utils\__init__.py", line 160, in wrapped
return loop.run_until_complete(inner)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nest_asyncio.py", line 90, in run_until_complete
return f.result()
File "C:\Miniconda\envs\__setup_conda\lib\asyncio\futures.py", line 201, in result
raise self._exception
File "C:\Miniconda\envs\__setup_conda\lib\asyncio\tasks.py", line 256, in __step
result = coro.send(None)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbclient\client.py", line 1021, in async_execute_cell
await self._check_raise_for_error(cell, cell_index, exec_reply)
File "C:\Miniconda\envs\__setup_conda\lib\site-packages\nbclient\client.py", line 915, in _check_raise_for_error
raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
SIEVE_THRESHOLD = 10
for image_with_water in IMAGES:
# Mask out all values below the mndwi threshold (these are not water)
water_mask = np.ma.masked_less( # type: ignore[no-untyped-call]
image_with_water["mndwi"], MNDWI_THRESHOLD
)
# Extract the polygons from the mask
water_polygons = rasterio.features.shapes(
source=water_mask.astype("uint8"),
transform=image_with_water["transform_window"],
)
# Add all polygons to the list
water_polygons = list(water_polygons)
# Extract the polygon coordinates and values from the list
polygons = [polygon for polygon, value in water_polygons]
values = [str(int(value)) for polygon, value in water_polygons]
# Convert polygons into a shapely.shape
polygons = [shape(polygon) for polygon in polygons]
# Create a geopandas dataframe populated with the polygon shapes
water_geodataframe = gpd.GeoDataFrame(
{"is_water": values, "geometry": polygons}, crs=image_with_water["src_crs"]
)
# Dissolve all records into two records. is water / is not water
is_water = water_geodataframe.dissolve(by="is_water")
# Select only the water records
water = is_water.query("is_water=='0'")
# Export the polygons
water.to_file(
os.path.join(
OUTPUT_DIRECTORY, f"water_mndwi_{image_with_water['image_id']}.shp"
)
)
# Export the accompanying Sentinel-2 image
export_raster(image_with_water)
# Export working steps if desired
image_with_water["water"] = water
image_with_water["water_mask"] = water_mask
if len(water.area) == 1:
image_with_water["area"] = water.area[0] / 1000000 # in km2
else:
image_with_water["area"] = 0
------------------
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[12], line 31
26 water_geodataframe = gpd.GeoDataFrame(
27 {"is_water": values, "geometry": polygons}, crs=image_with_water["src_crs"]
28 )
30 # Dissolve all records into two records. is water / is not water
---> 31 is_water = water_geodataframe.dissolve(by="is_water")
33 # Select only the water records
34 water = is_water.query("is_water=='0'")
File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\geodataframe.py:1684, in GeoDataFrame.dissolve(self, by, aggfunc, as_index, level, sort, observed, dropna)
1681 merged_geom = block.unary_union
1682 return merged_geom
-> 1684 g = self.groupby(group_keys=False, **groupby_kwargs)[self.geometry.name].agg(
1685 merge_geometries
1686 )
1688 # Aggregate
1689 aggregated_geometry = GeoDataFrame(g, geometry=self.geometry.name, crs=self.crs)
File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\generic.py:265, in SeriesGroupBy.aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
262 return self._python_agg_general(func, *args, **kwargs)
264 try:
--> 265 return self._python_agg_general(func, *args, **kwargs)
266 except KeyError:
267 # TODO: KeyError is raised in _python_agg_general,
268 # see test_groupby.test_basic
269 result = self._aggregate_named(func, *args, **kwargs)
File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\groupby.py:1332, in GroupBy._python_agg_general(self, func, *args, **kwargs)
1328 name = obj.name
1330 try:
1331 # if this function is invalid for this dtype, we will ignore it.
-> 1332 result = self.grouper.agg_series(obj, f)
1333 except TypeError:
1334 warnings.warn(
1335 f"Dropping invalid columns in {type(self).__name__}.agg "
1336 "is deprecated. In a future version, a TypeError will be raised. "
(...)
1340 stacklevel=3,
1341 )
File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\ops.py:1047, in BaseGrouper.agg_series(self, obj, func, preserve_dtype)
1040 result = self._aggregate_series_pure_python(obj, func)
1042 elif not isinstance(obj._values, np.ndarray):
1043 # _aggregate_series_fast would raise TypeError when
1044 # calling libreduction.Slider
1045 # In the datetime64tz case it would incorrectly cast to tz-naive
1046 # TODO: can we get a performant workaround for EAs backed by ndarray?
-> 1047 result = self._aggregate_series_pure_python(obj, func)
1049 # we can preserve a little bit more aggressively with EA dtype
1050 # because maybe_cast_pointwise_result will do a try/except
1051 # with _from_sequence. NB we are assuming here that _from_sequence
1052 # is sufficiently strict that it casts appropriately.
1053 preserve_dtype = True
File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\ops.py:1104, in BaseGrouper._aggregate_series_pure_python(self, obj, func)
1098 splitter = get_splitter(obj, ids, ngroups, axis=0)
1100 for i, group in enumerate(splitter):
1101
1102 # Each step of this loop corresponds to
1103 # libreduction._BaseGrouper._apply_to_group
-> 1104 res = func(group)
1105 res = libreduction.extract_result(res)
1107 if not initialized:
1108 # We only do this validation on the first iteration
File C:\Miniconda\envs\__setup_conda\lib\site-packages\pandas\core\groupby\groupby.py:1318, in GroupBy._python_agg_general.<locals>.<lambda>(x)
1315 @final
1316 def _python_agg_general(self, func, *args, **kwargs):
1317 func = com.is_builtin_func(func)
-> 1318 f = lambda x: func(x, *args, **kwargs)
1320 # iterate through "columns" ex exclusions to populate output dict
1321 output: dict[base.OutputKey, ArrayLike] = {}
File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\geodataframe.py:1681, in GeoDataFrame.dissolve.<locals>.merge_geometries(block)
1680 def merge_geometries(block):
-> 1681 merged_geom = block.unary_union
1682 return merged_geom
File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\base.py:800, in GeoPandasBase.unary_union(self)
781 @property
782 def unary_union(self):
783 """Returns a geometry containing the union of all geometries in the
784 ``GeoSeries``.
785
(...)
798 POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))
799 """
--> 800 return self.geometry.values.unary_union()
File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\array.py:650, in GeometryArray.unary_union(self)
649 def unary_union(self):
--> 650 return vectorized.unary_union(self.data)
File C:\Miniconda\envs\__setup_conda\lib\site-packages\geopandas\_vectorized.py:1034, in unary_union(data)
1032 data = [g for g in data if g is not None]
1033 if data:
-> 1034 return shapely.ops.unary_union(data)
1035 else:
1036 return None
File C:\Miniconda\envs\__setup_conda\lib\site-packages\shapely\ops.py:161, in CollectionOperator.unary_union(self, geoms)
159 subs[i] = g._geom
160 collection = lgeos.GEOSGeom_createCollection(6, subs, L)
--> 161 return geom_factory(lgeos.methods['unary_union'](collection))
OSError: exception: access violation writing 0x0000000000001101
Importantly, the exact same code, using the same Python and package versions (including Python 3.10), runs fine on Linux and macOS.
Expected Output
Output of geopandas.show_versions()
GEOS, GDAL, PROJ INFO
GEOS : None
GEOS lib : None
GDAL : 3.5.2
GDAL data dir: C:\Miniconda\envs__setup_conda\lib\site-packages\fiona\gdal_data
PROJ : 9.1.0
PROJ data dir: C:\Miniconda\envs__setup_conda\lib\site-packages\pyproj\proj_dir\share\proj
PYTHON DEPENDENCIES
geopandas : 0.12.2
numpy : 1.23.5
pandas : 1.3.5
pyproj : 3.4.0
shapely : 1.8.5.post1
fiona : 1.8.22
geoalchemy2: None
geopy : None
matplotlib : 3.6.3
mapclassify: None
pygeos : None
pyogrio : None
psycopg2 : None
pyarrow : None
rtree : None