Rasterband::write
with Buffer<T>
forces unnecessary data copy. #453
Description
When working with large rasters, minimizing the number of data copies is important. The Rasterband::write
method as it currently stands, forces some users into an additional buffer copy.
To illustrate, consider the following use case (pseudo-ish code):
let mut data: Vec<u8> = vec![0u8; <some big number>];
for i in 0..100 {
modify_data(&mut data);
write_data_with_gdal(&data);
}
Seems reasonable, no? And let's assume for external reasons we only have control over the implementation of write_data_with_gdal
. Problem arises when we try to implement it:
fn write_data_with_gdal(data: &Vec<u8>) {
let driver = DriverManager::get_driver_by_name("GTiff").unwrap();
let ds = driver.create(...).unwrap();
let mut rb = ds.rasterband(1).unwrap();
rb.write(/*... what to do here? ...*/).unwrap();
}
The Rasterband::write
method looks like this:
pub fn write<T: GdalType + Copy>(
&mut self,
window: (isize, isize),
window_size: (usize, usize),
buffer: &Buffer<T>,
) -> Result<()> {
It wants the data as as a reference to a Buffer<T>
, which looks like this:
pub struct Buffer<T: GdalType> {
pub size: (usize, usize),
pub data: Vec<T>,
}
The contained Vec
is owned by Buffer::data
. We only have a &Vec<u8>
. So to call write
, even though write
only needs a Buffer<T>
reference, the only way to create the referent is to clone our &Vec<u8>
parameter:
fn write_data_with_gdal(data: &Vec<u8>) {
let driver = DriverManager::get_driver_by_name("GTiff").unwrap();
let ds = driver.create(...).unwrap();
let mut rb = ds.rasterband(1).unwrap();
let dims = (data.len(), 1);
let buf = Buffer { size: dims, data: data.clone() }; // <--- no way to avoid `clone()`
rb.write((0, 0), dims, &buf).unwrap();
// `buf` is dropped...
}
I'd argue we need an alternative to this write
method for users not wishing to incur the additional memory and time overhead. One might argue that the user should only work with Buffer<T>
instances, but given that working interoperability with other Vec<T>
-based APIs is very common in the raster space, we should support writing a Vec<T>
without the clone()
cost.