hydrointerp - A Python package for 2D interpolation of hydrologic time series data

The hydrointerp package contains a class and associated interpolation functions specifically designed for 2D hydrologic data with time series. The core functionality is derived from the Scipy interpolation functions griddata and map_coordinates.

The GitHub repository is found here

Introduction

There are many hydrological datasets that have a representation as two dimensional spatial data (e.. lon/lat or X/Y) over many time steps (e.g. gridded precipitation, wind, temperature, groundwater levels, etc.). These datasets may be derived from remote sensing equipment (e.g. NASA satellites) or may be derived from point observations (e.g. meteorological stations). For the purposes of hydrologic analysis, there are obvious needs to convert from point station data to gridded, gridded to points, gridded to different spatial resolution gridded, and so on. This package provides these conversions using different types of interpolators.

At the moment, these interpolators are purely 2D spatial interpolators and do not combine interpolations in time and space. Combining the two different dimension types (length and time) is not trivial nor is it obvious how that should be performed. As a consequence, each time step is interpolated independently.

Installation

hydrointerp can be installed via pip or conda:

pip install hydrointerp

or:

conda install -c mullenkamp hydrointerp

The core dependencies are Pandas, Scipy, xarray, and pyproj.

Be careful not to cause dependency conflicts due to pyproj. hydrointerp uses pyproj >= 2.1 due to a major improvement in functionality. Unfortunately, some other geographic python packages have not updated to the new version yet (e.g. geopandas). Consequently, you may not be able to have both hydrointerp and these other packages installed simultaneously until they update their package dependencies.

How to use hydrointerp

This section will describe how to use the hydrointerp package. Nearly all outputs are either as Pandas DataFrames or Xarray Datasets.

Necessary imports

For the examples, the numpy, pandas, xarray, and hydrointerp packages are needed.

import numpy as np
import pandas as pd
import xarray as xr
from hydrointerp import Interp, datasets

Loading in appropriate input data

The input must be either a grid as an Xarray Dataset or as points as a Pandas DataFrame. Both of these input data must have associated naming parameters.

### Input Parameters
In [1]: nc1 = 'nasa_gpm_2015-06-18'

In [2]: csv1 = 'ecan_data_2015-06-18'

In [3]: grid_time_name = 'time'

In [4]: grid_x_name = 'lon'

In [5]: grid_y_name = 'lat'

In [6]: grid_data_name = 'precipitationCal'

In [7]: grid_crs = 4326

In [8]: point_time_name = 'date'

In [9]: point_x_name = 'NZTMX'

In [10]: point_y_name = 'NZTMY'

In [11]: point_data_name = 'precip'

In [12]: point_crs = 2193

### Read input data
In [13]: ds = xr.open_dataset(datasets.get_path(nc1))

In [14]: df1 = pd.read_csv(datasets.get_path(csv1), parse_dates=['date'], infer_datetime_format=True)

In [15]: print(df1.head())
        date  precip      NZTMX      NZTMY
0 2015-06-18    96.5  1507022.0  5266024.0
1 2015-06-19   166.0  1507022.0  5266024.0
2 2015-06-18    63.0  1506391.0  5253154.0
3 2015-06-19    73.0  1506391.0  5253154.0
4 2015-06-18    88.5  1482760.0  5244669.0

### Assign nan toplaces wherethe quality index is below 0.4
In [16]: ds2 = ds[[grid_data_name]].where(ds.precipitationQualityIndex > 0.4)

In [17]: print(ds2)
<xarray.Dataset>
Dimensions:           (lat: 160, lon: 150, time: 2)
Coordinates:
  * time              (time) datetime64[ns] 2015-06-18 2015-06-19
  * lon               (lon) float32 165.1 165.1 165.2 ... 179.8 179.9 179.9
  * lat               (lat) float32 -48.95 -48.85 -48.75 ... -33.15 -33.05
Data variables:
    precipitationCal  (time, lon, lat) float32 0.006805 0.07318 0.0 ... 0.0 0.0

### Close the file (by removing the object)
In [18]: ds.close()

### Create example points
In [19]: points_df = df1.loc[[6, 15, 132], [point_x_name, point_y_name]].copy()

In [20]: points_df.rename(columns={point_x_name: 'x', point_y_name: 'y'}, inplace=True)

Initialising Interp

The package and general usage is via the main Interp class. It must be initialised with appropriate datasets and name parameters. Bare in mind, it is not required to have both input grids and points. One set is fine, and the appropriate interpolation methods will appear.

In [21]: interpc = Interp(ds2, grid_time_name, grid_x_name, grid_y_name, grid_data_name, grid_crs, point_data=df1, point_time_name=point_time_name, point_x_name=point_x_name, point_y_name=point_y_name, point_data_name=point_data_name, point_crs=point_crs)

Nan filling

If your grid has nans (which the example does), fill those nans with the grid_interp_na method.

In [22]: nan1 = ds2[grid_data_name].isnull().sum()

In [23]: interpc.grid_interp_na()

In [24]: nan2 = interpc.grid_data['precip'].isnull().sum()

In [25]: assert (nan1 > 0) & (nan2 == 0)

Base Interpolators

All the 2D interpolators you’ll need…

## Parameters
In [26]: to_crs = 2193

In [27]: grid_res = 10000

In [28]: bbox=None

In [29]: order=2

In [30]: extrapolation='constant'

In [31]: cval=np.nan

In [32]: digits = 2

In [33]: min_lat = -48

In [34]: max_lat = -41

In [35]: min_lon = 170

In [36]: max_lon = 178

In [37]: min_val=0

In [38]: method='linear'

## grid to grid
In [39]: interp1 = interpc.grid_to_grid(grid_res, to_crs, bbox, order, extrapolation, min_val=min_val)
Preparing input and output
Running interpolations...
Packaging up the output

In [40]: print(interp1)
<xarray.Dataset>
Dimensions:  (time: 2, x: 140, y: 180)
Coordinates:
  * time     (time) datetime64[ns] 2015-06-18 2015-06-19
  * x        (x) float64 8.568e+05 8.668e+05 8.768e+05 ... 2.237e+06 2.247e+06
  * y        (y) float64 4.548e+06 4.558e+06 4.568e+06 ... 6.328e+06 6.338e+06
Data variables:
    precip   (time, x, y) float32 nan nan nan nan nan ... 0.0 -0.0 0.0 nan nan

## points to grid
In [41]: interp2 = interpc.points_to_grid(grid_res, to_crs, bbox, method, extrapolation, min_val=min_val)
Prepare input and output data
Run interpolations...
2015-06-18 00:00:00
2015-06-19 00:00:00
Packaging up the output

In [42]: print(interp2)
<xarray.Dataset>
Dimensions:  (time: 2, x: 35, y: 33)
Coordinates:
  * time     (time) datetime64[ns] 2015-06-18 2015-06-19
  * y        (y) float64 5.018e+06 5.028e+06 5.038e+06 ... 5.328e+06 5.338e+06
  * x        (x) float64 1.329e+06 1.339e+06 1.349e+06 ... 1.659e+06 1.669e+06
Data variables:
    precip   (time, y, x) float64 nan nan nan nan nan ... nan nan nan nan nan

## grid to points
In [43]: interp3 = interpc.grid_to_points(points_df, to_crs, order, min_val=min_val)
Preparing input and output
Running interpolations...
Packaging up the output

In [44]: print(interp3)
                                    precip
time       x         y                    
2015-06-18 1515946.0 5249806.0  617.679993
           1581327.0 5279063.0   86.559998
           1437006.0 5066885.0   78.220001
2015-06-19 1515946.0 5249806.0   59.439999
           1581327.0 5279063.0   45.299999
           1437006.0 5066885.0    2.220000

## points to points
In [45]: interp4 = interpc.points_to_points(points_df, to_crs, method, min_val=min_val)
2015-06-18 00:00:00
2015-06-19 00:00:00

In [46]: print(interp4)
                                precip
time       x         y                
2015-06-18 1515946.0 5249806.0    45.0
           1581327.0 5279063.0    12.0
           1437006.0 5066885.0    36.5
2015-06-19 1515946.0 5249806.0    55.5
           1581327.0 5279063.0    37.5
           1437006.0 5066885.0    17.0

Adjust grid from points

There is also a method to adjust a grid based on the point_data (bias correction). And a method to run tests on it’s accuracy.

In [47]: interp5 = interpc.adjust_grid_from_points(grid_res, to_crs)
Preparing input and output
Running interpolations...
Packaging up the output
Preparing input and output
Running interpolations...
Packaging up the output
Prepare input and output data
Run interpolations...
2015-06-18 00:00:00
2015-06-19 00:00:00
Packaging up the output

In [48]: print(interp5)
<xarray.Dataset>
Dimensions:  (time: 2, x: 35, y: 32)
Coordinates:
  * x        (x) float64 1.327e+06 1.337e+06 1.347e+06 ... 1.657e+06 1.667e+06
  * y        (y) float64 5.018e+06 5.028e+06 5.038e+06 ... 5.318e+06 5.328e+06
  * time     (time) datetime64[ns] 2015-06-18 2015-06-19
Data variables:
    precip   (time, x, y) float64 15.05 17.17 15.35 20.94 ... 11.09 11.26 7.764

In [49]: interp6 = interpc.validate_grid_from_points(0.08, grid_res, to_crs)
Preparing input and output
Running interpolations...
Packaging up the output
Preparing input and output
Running interpolations...
Packaging up the output
Prepare input and output data
Run interpolations...
2015-06-18 00:00:00
2015-06-19 00:00:00
Packaging up the output
2015-06-18 00:00:00
2015-06-19 00:00:00
Preparing input and output
Running interpolations...
Packaging up the output

In [50]: print(interp6)
                                precip  point_precip  grid_precip
time       x         y                                           
2015-06-18 1418364.0 5135227.0     5.8         15.22         9.04
           1499654.0 5232199.0    33.0         46.40        34.14
           1510512.0 5187224.0    31.0         23.38        26.83
           1515640.0 5225591.0    49.5         40.46        24.75
           1569400.0 5166693.0     6.5         15.57        17.80
           1674522.0 5337964.0     1.5           NaN          NaN
2015-06-19 1418364.0 5135227.0    14.2          8.42         6.91
           1499654.0 5232199.0    49.5         70.27       112.43
           1510512.0 5187224.0    31.5         49.47        51.22
           1515640.0 5225591.0    41.5         48.58        95.60
           1569400.0 5166693.0    28.0         16.37        18.26
           1674522.0 5337964.0     8.5           NaN          NaN

Package References

Base class

class hydrointerp.Interp(grid_data=None, grid_time_name=None, grid_x_name=None, grid_y_name=None, grid_data_name=None, grid_crs=None, point_data=None, point_time_name=None, point_x_name=None, point_y_name=None, point_data_name=None, point_crs=None)

Base Interp class to prepare the input data for the interpolation functions.

Parameters
  • data (DataFrame or Dataset) – A pandas DataFrame containing four columns as shown in the below parameters or an Xarray Dataset.

  • time_name (str) – If grid is a DataFrame, then time_name is the time column name. If grid is a Dataset, then time_name is the time coordinate name.

  • x_name (str) – If grid is a DataFrame, then x_name is the x column name. If grid is a Dataset, then x_name is the x coordinate name.

  • y_name (str) – If grid is a DataFrame, then y_name is the y column name. If grid is a Dataset, then y_name is the y coordinate name.

  • data_name (str) – If grid is a DataFrame, then data_name is the data column name. If grid is a Dataset, then data_name is the data variable name.

  • from_crs (int or str or None) –

    The projection info for the input data if the result should be reprojected to the to_crs projection (either a proj4 str or epsg int).

    returns

    rtype

    Interp object

Nan interpolator

Interp._grid_interp_na(method='linear', min_val=None, inplace=True)

Function to fill nan’s in the grid_data to make it complete. Necessary for grid_to_* functions.

Parameters
Returns

Return type

Xarray Dataset

Base interpolators

Interp._grid_to_grid(grid_res, to_crs=None, bbox=None, order=3, extrapolation='constant', fill_val=nan, digits=2, min_val=None)

Function to interpolate regularly or irregularly spaced values over many time stamps. Each time stamp of spatial values are interpolated independently (2D interpolation as opposed to 3D interpolation). Returns an xarray Dataset with the 3 dimensions. Uses the scipy interpolation function called map_coordinates.

Parameters
  • grid_res (int or float) – The resulting grid resolution in the unit of the final projection (usually meters or decimal degrees).

  • to_crs (int or str or None) – The projection for the output data similar to from_crs.

  • bbox (tuple of int or float) – The bounding box for the output interpolation in the to_crs projection. None will return a similar grid extent as the input. The tuple should contain four ints or floats in the following order: (x_min, x_max, y_min, y_max)

  • order (int) – The order of the spline interpolation, default is 3. The order has to be in the range 0-5. An order of 1 is linear interpolation.

  • extrapolation (str) – The equivalent of ‘mode’ in the map_coordinates function. Options are: ‘constant’, ‘nearest’, ‘reflect’, ‘mirror’, and ‘wrap’. Most reseaonable options for this function will be either ‘constant’ or ‘nearest’. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.map_coordinates.html for more details.

  • fill_val (int or float) – If ‘constant’ if passed to the extrapolation parameter, fill_val assigns the value outside of the boundary. Defaults to numpy.nan.

  • digits (int) – The number of digits to round the output.

  • min_val (int, float, or None) – The minimum value for the results. All results below min_val will be assigned min_val.

Returns

Return type

xarray Dataset

Interp._points_to_grid(grid_res, to_crs=None, bbox=None, method='linear', extrapolation='contstant', fill_val=nan, digits=2, min_val=None)

Function to take a dataframe of point value inputs (df) and interpolate to a grid. Uses the scipy griddata function for interpolation.

Parameters
  • grid_res (int or float) – The resulting grid resolution in the unit of the final projection (usually meters or decimal degrees).

  • to_crs (int or str or None) – The projection for the output data similar to from_crs.

  • bbox (tuple of int or float) – The bounding box for the output interpolation in the to_crs projection. None will return a similar grid extent as the input. The tuple should contain four ints or floats in the following order: (x_min, x_max, y_min, y_max)

  • method (str) –

    The scipy griddata interpolation method to be applied. Options are ‘nearest’, ‘linear’, and ‘cubic’. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html for more details.

  • extrapolation (str) – Either ‘constant’ or ‘nearest’.

  • fill_val (int or float) – If ‘constant’ if passed to the extrapolation parameter, fill_val assigns the value outside of the boundary. Defaults to numpy.nan.

  • digits (int) – The number of digits to round the output.

  • min_val (int, float, or None) –

    The minimum value for the results. All results below min_val will be assigned min_val.

    returns

    rtype

    xarray Dataset

Interp._grid_to_points(point_data, to_crs=None, bbox=None, order=3, digits=2, min_val=None)

Function to take a dataframe of point value inputs (df) and interpolate to other points (point_data). Uses the scipy griddata function for interpolation.

Parameters
  • point_data (str or DataFrame) – Path to geometry file of points to be interpolated (e.g. shapefile). Can be any file type that fiona/gdal can open. It can also be a DataFrame with ‘x’ and ‘y’ columns and the crs must be the same as to_crs.

  • to_crs (int or str or None) – The projection for the output data similar to from_crs.

  • bbox (tuple of int or float) – The bounding box for the output interpolation in the to_crs projection. None will return a similar grid extent as the input. The tuple should contain four ints or floats in the following order: (x_min, x_max, y_min, y_max)

  • order (int) – The order of the spline interpolation, default is 3. The order has to be in the range 0-5. An order of 1 is linear interpolation.

  • digits (int) – The number of digits to round the output.

  • min_val (int, float, or None) – The minimum value for the results. All results below min_val will be assigned min_val.

Returns

Return type

DataFrame

Interp._points_to_points(point_data, to_crs=None, method='linear', digits=2, min_val=None)

Function to interpolate regularly or irregularly spaced values over many time stamps. Each time stamp of spatial values are interpolated independently (2D interpolation as opposed to 3D interpolation). Returns an xarray Dataset with the 3 dimensions. Uses the scipy interpolation function called map_coordinates.

Parameters
  • point_data (str or DataFrame) – Path to geometry file of points to be interpolated (e.g. shapefile). Can be any file type that fiona/gdal can open. It can also be a DataFrame with ‘x’ and ‘y’ columns and the crs must be the same as to_crs.

  • to_crs (int or str or None) – The projection for the output data similar to from_crs.

  • method (str) – The scipy griddata interpolation method to be applied. Options are ‘nearest’, ‘linear’, and ‘cubic’.

  • digits (int) – The number of digits to round the output.

  • min_val (int, float, or None) – The minimum value for the results. All results below min_val will be assigned min_val.

Returns

Return type

DataFrame

Adjust grid from points

Interp._adjust_grid_from_points(grid_res=None, to_crs=None, order=2, method='linear', digits=2, min_val=0)

Method to adjust a grid by forcing it through point data.

Parameters
  • grid_res (int, float, or None) – The resulting grid resolution in the unit of the final projection (usually meters or decimal degrees).

  • to_crs (int or str or None) – The projection for the output data similar to from_crs.

  • order (int) – The order of the spline interpolation, default is 3. The order has to be in the range 0-5. An order of 1 is linear interpolation.

  • method (str) –

    The scipy griddata interpolation method to be applied. Options are ‘nearest’, ‘linear’, and ‘cubic’. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html for more details.

  • digits (int) – The number of digits to round the output.

  • min_val (int, float, or None) – The minimum value for the results. All results below min_val will be assigned min_val.

Returns

Return type

Dataset

Interp._validate_grid_from_points(test_perc=0.1, grid_res=None, to_crs=None, order=2, method='linear', digits=2, min_val=0)

Method to validate a grid created by the adjust_grid_from_points method.

Parameters
  • grid_res (int, float, or None) – The resulting grid resolution in the unit of the final projection (usually meters or decimal degrees).

  • to_crs (int or str or None) – The projection for the output data similar to from_crs.

  • order (int) – The order of the spline interpolation, default is 3. The order has to be in the range 0-5. An order of 1 is linear interpolation.

  • method (str) –

    The scipy griddata interpolation method to be applied. Options are ‘nearest’, ‘linear’, and ‘cubic’. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html for more details.

  • digits (int) – The number of digits to round the output.

  • min_val (int, float, or None) – The minimum value for the results. All results below min_val will be assigned min_val.

Returns

Return type

DataFrame

API Pages

License and terms of usage

This package is licensed under the terms of the Apache License Version 2.0 and can be found on the GitHub project page.