Missing and empty geometries

GeoPandas supports, just like in pandas, the concept of missing values (NA or null values). But for geometry values, we have an additional concept of empty geometries:

  • Empty geometries are actual geometry objects but that have no coordinates (and thus also no area, for example). They can for example originate from taking the intersection of two polygons that have no overlap. The scalar object (when accessing a single element of a GeoSeries) is still a Shapely geometry object.

  • Missing geometries are unknown values in a GeoSeries. They will typically be propagated in operations (for example in calculations of the area or of the intersection), or ignored in reductions such as unary_union. The scalar object (when accessing a single element of a GeoSeries) is the Python None object.

Warning

Starting from GeoPandas v0.6.0, those two concepts are more consistently separated. See below for more details on what changed compared to earlier versions.

Consider the following example GeoSeries with one polygon, one missing value and one empty polygon:

In [1]: from shapely.geometry import Polygon

In [2]: s = geopandas.GeoSeries([Polygon([(0, 0), (1, 1), (0, 1)]), None, Polygon([])])

In [3]: s
Out[3]: 
0    POLYGON ((0.00000 0.00000, 1.00000 1.00000, 0....
1                                                 None
2                             GEOMETRYCOLLECTION EMPTY
dtype: geometry

In spatial operations, missing geometries will typically propagate (be missing in the result as well), while empty geometries are treated as a geometry and the result will depend on the operation:

In [4]: s.area
Out[4]: 
0    0.5
1    NaN
2    0.0
dtype: float64

In [5]: s.union(Polygon([(0, 0), (0, 1), (1, 1), (1, 0)]))
Out[5]: 
0    POLYGON ((1.00000 1.00000, 1.00000 0.00000, 0....
1                                                 None
2    POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1....
dtype: geometry

In [6]: s.intersection(Polygon([(0, 0), (0, 1), (1, 1), (1, 0)]))
Out[6]: 
0    POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1....
1                                                 None
2                             GEOMETRYCOLLECTION EMPTY
dtype: geometry

The GeoSeries.isna() method will only check for missing values and not for empty geometries:

In [7]: s.isna()
Out[7]: 
0    False
1     True
2    False
dtype: bool

On the other hand, if you want to know which values are empty geometries, you can use the GeoSeries.is_empty attribute:

In [8]: s.is_empty
Out[8]: 
0    False
1    False
2     True
dtype: bool

To get only the actual geometry objects that are neither missing nor empty, you can use a combination of both:

In [9]: s.is_empty | s.isna()
Out[9]: 
0    False
1     True
2     True
dtype: bool

In [10]: s[~(s.is_empty | s.isna())]
Out[10]: 
0    POLYGON ((0.00000 0.00000, 1.00000 1.00000, 0....
dtype: geometry

Changes since GeoPandas v0.6.0

In GeoPandas v0.6.0, the missing data handling was refactored and made more consistent across the library.

Historically, missing (“NA”) values in a GeoSeries could be represented by empty geometric objects, in addition to standard representations such as None and np.nan. At least, this was the case in GeoSeries.isna() or when a GeoSeries got aligned in geospatial operations. But, other methods like dropna() and fillna() did not follow this approach and did not consider empty geometries as missing.

In GeoPandas v0.6.0, the most important change is GeoSeries.isna() no longer treating empty as missing:

  • Using the small example from above, the old behaviour treated both the empty as missing geometry as “missing”:

    >>> s
    0    POLYGON ((0 0, 1 1, 0 1, 0 0))
    1                              None
    2          GEOMETRYCOLLECTION EMPTY
    dtype: object
    
    >>> s.isna()
    0    False
    1     True
    2     True
    dtype: bool
    
  • Starting from GeoPandas v0.6.0, it will now only see actual missing values as missing:

    In [11]: s.isna()
    Out[11]: 
    0    False
    1     True
    2    False
    dtype: bool
    

    For now, when isna() is called on a GeoSeries with empty geometries, a warning is raised to alert the user of the changed behaviour with an indication how to solve this.

Additionally, the behaviour of GeoSeries.align() changed to use missing values instead of empty geometries to fill non-matching indexes. Consider the following small toy example:

In [12]: from shapely.geometry import Point

In [13]: s1 = geopandas.GeoSeries([Point(0, 0), Point(1, 1)], index=[0, 1])

In [14]: s2 = geopandas.GeoSeries([Point(1, 1), Point(2, 2)], index=[1, 2])

In [15]: s1
Out[15]: 
0    POINT (0.00000 0.00000)
1    POINT (1.00000 1.00000)
dtype: geometry

In [16]: s2
Out[16]: 
1    POINT (1.00000 1.00000)
2    POINT (2.00000 2.00000)
dtype: geometry
  • Previously, the align method would use empty geometries to fill values:

    >>> s1_aligned, s2_aligned = s1.align(s2)
    
    >>> s1_aligned
    0                 POINT (0 0)
    1                 POINT (1 1)
    2    GEOMETRYCOLLECTION EMPTY
    dtype: object
    
    >>> s2_aligned
    0    GEOMETRYCOLLECTION EMPTY
    1                 POINT (1 1)
    2                 POINT (2 2)
    dtype: object
    

    This method is used under the hood when performing spatial operations on mis-aligned GeoSeries objects:

    >>> s1.intersection(s2)
    0    GEOMETRYCOLLECTION EMPTY
    1                 POINT (1 1)
    2    GEOMETRYCOLLECTION EMPTY
    dtype: object
    
  • Starting from GeoPandas v0.6.0, GeoSeries.align() will use missing values to fill in the non-aligned indices, to be consistent with the behaviour in pandas:

    In [17]: s1_aligned, s2_aligned = s1.align(s2)
    
    In [18]: s1_aligned
    Out[18]: 
    0    POINT (0.00000 0.00000)
    1    POINT (1.00000 1.00000)
    2                       None
    dtype: geometry
    
    In [19]: s2_aligned
    Out[19]: 
    0                       None
    1    POINT (1.00000 1.00000)
    2    POINT (2.00000 2.00000)
    dtype: geometry
    

    This has the consequence that spatial operations will also use missing values instead of empty geometries, which can have a different behaviour depending on the spatial operation:

    In [20]: s1.intersection(s2)
    Out[20]: 
    0                       None
    1    POINT (1.00000 1.00000)
    2                       None
    dtype: geometry