Skip to content

Rasterband::write with Buffer<T> forces unnecessary data copy. #453

Open
@metasim

Description

When working with large rasters, minimizing the number of data copies is important. The Rasterband::write method as it currently stands, forces some users into an additional buffer copy.

To illustrate, consider the following use case (pseudo-ish code):

let mut data: Vec<u8> = vec![0u8; <some big number>];
for i in 0..100 {
   modify_data(&mut data);
   write_data_with_gdal(&data);
}

Seems reasonable, no? And let's assume for external reasons we only have control over the implementation of write_data_with_gdal. Problem arises when we try to implement it:

fn write_data_with_gdal(data: &Vec<u8>) {
  let driver = DriverManager::get_driver_by_name("GTiff").unwrap();
  let ds = driver.create(...).unwrap();
  let mut rb = ds.rasterband(1).unwrap();
  rb.write(/*... what to do here? ...*/).unwrap();
}

The Rasterband::write method looks like this:

pub fn write<T: GdalType + Copy>(
        &mut self,
        window: (isize, isize),
        window_size: (usize, usize),
        buffer: &Buffer<T>,
    ) -> Result<()> {

It wants the data as as a reference to a Buffer<T>, which looks like this:

pub struct Buffer<T: GdalType> {
    pub size: (usize, usize),
    pub data: Vec<T>,
}

The contained Vec is owned by Buffer::data. We only have a &Vec<u8>. So to call write, even though write only needs a Buffer<T> reference, the only way to create the referent is to clone our &Vec<u8> parameter:

fn write_data_with_gdal(data: &Vec<u8>) {
  let driver = DriverManager::get_driver_by_name("GTiff").unwrap();
  let ds = driver.create(...).unwrap();
  let mut rb = ds.rasterband(1).unwrap();
  let dims = (data.len(), 1);
  let buf = Buffer { size: dims, data: data.clone() }; // <--- no way to avoid `clone()`
  rb.write((0, 0), dims, &buf).unwrap();
  // `buf` is dropped...
}

I'd argue we need an alternative to this write method for users not wishing to incur the additional memory and time overhead. One might argue that the user should only work with Buffer<T> instances, but given that working interoperability with other Vec<T>-based APIs is very common in the raster space, we should support writing a Vec<T> without the clone() cost.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions