Thursday, December 25, 2014

Processing GeoTiff files in .NET (without using GDAL)

I've recently created a process that is able to import geoferenced raster data, namely GeoTiff files... with a catch:
As I didn't find a proper lib to load GeoTiff files I was converting the source files to an ASCII Gridded XYZ format prior to importing. Despite the fancy name it's simply an ASCII file where each line contains:
<LONGITUDE>     <LATITUDE>      <VALUE1>   [ <VALUE 2>]   [<VALUE 3>]
Each line on this file corresponds to a pixel on a raster image, thus it's incredibly easy to parse in C#. The code would be something like this:
foreach (var line in File.ReadLines(filePath))
{
    if (line == null)
    {
        continue;
    }

    var components = line.Split(
        new[] { ' ' }, 
        StringSplitOptions.RemoveEmptyEntries);

    var longitude = double.Parse(components[0]);
    var latitude = double.Parse(components[1]);

    IEnumerable<string> values = components.Skip(2);

    //... process each data item

    yield return dataItem;
}

Nice and easy. But this has two disadvantages:

First, it requires an extra step to convert to XYZ from Geotiff, although easily done with the GDAL:
gdal_translate -of XYZ <input.tif> <output.xyz>
Another disadvantage is that the XYZ becomes much larger than its TIFF counterpart. Depending on the source file the difference could be something like 10 GB vs 500 MB (yes, 50x bigger)

So, I would really like to be able to process a GeoTiff file directly in c#. One obvious candidate is using GDAL, particularly its C# bindings. Although these bindings work relatively well they rely on an unmanaged GDAL installation. For me this is a problem as I want to host this on Azure WebSites, where installing stuff is not an option.
So, I embraced the challenge of being able to process a GeoTIFF file in a fully managed way.


The file that I'll be loading is this one. It's an height map for the entire world (21600x10800, 222MB)


First step, loading the TIFF file, still ignoring any geo data.

Fortunately I found a really nice lib that does exactly what I wanted: be able to load a tiff file in c#, fully managed. It's called LibTiff.NET.

Although I'm not really found of its API, it does include tons of functionality and documentation. Most demos load the full image into memory (which isn't really viable for larger tiff files) but it does include the capability to process a file one line at a time.

My starting point was this example here. The important bits are:
using (Tiff tiff = Tiff.Open(@"<file.tif>", "r"))
{
    int width = tiff.GetField(TiffTag.IMAGEWIDTH)[0].ToInt();
    byte[] scanline = new byte[tiff.ScanlineSize()];

    for (int i = 0; i < height; i++)
    {
        tiff.ReadScanline(scanline, i);
    }       
}
This sample opens a tiff and iterates each line. This code is used to open 8bit tiff files. 16 bits files would require an additional operation:
using (Tiff tiff = Tiff.Open(@"<file.tif>", "r"))
{
    int width = tiff.GetField(TiffTag.IMAGEWIDTH)[0].ToInt();
    byte[] scanline = new byte[tiff.ScanlineSize()];
    ushort[] scanline16Bit = new ushort[tiff.ScanlineSize() / 2];

    for (int i = 0; i < height; i++)
    {
        tiff.ReadScanline(scanline, i);
        Buffer.BlockCopy(scanline, 0, scanline16Bit, 0, scanline.Length);
    }       
}
Second step, loading geographical data and corresponding height value

The aforementioned lib doesn't include any support for GeoTiff. So, time to take a look at the official docs.

According to the official FAQ (http://www.remotesensing.org/geotiff/faq.html):
GeoTIFF is a metadata format, which provides geographic information to associate with the image data. But the TIFF file structure allows both the metadata and the image data to be encoded into the same file.
GeoTIFF makes use of a public tag structure which is platform interoperable between any and all GeoTIFF-savvy readers. Any GIS, CAD, Image Processing, Desktop Mapping and any other types of systems using geographic images can read any GeoTIFF files created on any system to the GeoTIFF specification.
 
Before delving into the spec, I need a comparison point to make sure I'm getting the proper values. Thus I'm using gdalinfo to provide some information on the raster file. It's usage is really simple:
gdalinfo altitude.tif
This outputs the following information:
Size is 21600, 10800
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.016666666666667,-0.016666666666667)
Metadata:
  AREA_OR_POINT=Area
  EXIF_ColorSpace=65535
  EXIF_DateTime=2005:10:12 22:04:52
  EXIF_Orientation=1
  EXIF_PixelXDimension=21600
  EXIF_PixelYDimension=10800
  EXIF_ResolutionUnit=2
  EXIF_Software=Adobe Photoshop CS2 Macintosh
  EXIF_XResolution=(72)
  EXIF_YResolution=(72)
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
Band 1 Block=21600x1 Type=Byte, ColorInterp=Gray
  Min=0.000 Max=213.000 
  Minimum=0.000, Maximum=213.000, Mean=22.754, StdDev=25.124
  Metadata:
    STATISTICS_MAXIMUM=213
    STATISTICS_MEAN=22.753594797178
    STATISTICS_MINIMUM=0
    STATISTICS_STDDEV=25.124203131182
I'm mostly interested on these two lines:
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.016666666666667,-0.016666666666667)
This means that the top-left corner of the image corresponds to coordinate -180,90 and that each pixel increments 0.016(7) degrees.

So, time to check the spec, namely section 2.6.1 on http://www.remotesensing.org/geotiff/spec/geotiff2.6.html):

Apparently tags 33922 and 33550 provide exactly the information I need. They're defined as:
ModelTiepointTag:
      Tag = 33922 (8482.H)
      Type = DOUBLE (IEEE Double precision)
      N = 6*K,  K = number of tiepoints
      Alias: GeoreferenceTag
      Owner: Intergraph
This tag stores raster->model tiepoint pairs in the order

        ModelTiepointTag = (...,I,J,K, X,Y,Z...),
where (I,J,K) is the point at location (I,J) in raster space with pixel-value K, and (X,Y,Z) is a vector in model space. In most cases the model space is only two-dimensional, in which case both K and Z should be set to zero; this third dimension is provided in anticipation of future support for 3D digital elevation models and vertical coordinate systems.
and
ModelPixelScaleTag:
      Tag = 33550
      Type = DOUBLE (IEEE Double precision)
      N = 3
      Owner: SoftDesk
This tag may be used to specify the size of raster pixel spacing in the model space units, when the raster space can be embedded in the model space coordinate system without rotation, and consists of the following 3 values:

    ModelPixelScaleTag = (ScaleX, ScaleY, ScaleZ)
where ScaleX and ScaleY give the horizontal and vertical spacing of raster pixels. The ScaleZ is primarily used to map the pixel value of a digital elevation model into the correct Z-scale, and so for most other purposes this value should be zero (since most model spaces are 2-D, with Z=0).
To read theses tags with the LibTiff.Net lib one needs to use the GetField method. Thus, loading the values from these tags will be as simple as:
FieldValue[] modelPixelScaleTag = tiff.GetField((TiffTag)33550);
FieldValue[] modelTiepointTag = tiff.GetField((TiffTag)33922);

byte[] modelPixelScale = modelPixelScaleTag[1].GetBytes();
double pixelSizeX = BitConverter.ToDouble(modelPixelScale, 0);
double pixelSizeY = BitConverter.ToDouble(modelPixelScale, 8)*-1;

byte[] modelTransformation = modelTiepointTag[1].GetBytes();
double originLon = BitConverter.ToDouble(modelTransformation, 24);
double originLat = BitConverter.ToDouble(modelTransformation, 32);
With this information I'm mostly ready to iterate the various raster lines. But, and although conceptually the top-left corner corresponds to coordinate -180, 90 the top-left pixel itself corresponds to coordinate -179.99166, 89.99166. This is obtained through:
double startLat = originLat + (pixelSizeY/2.0);
double startLon = originLon + (pixelSizeX/2.0);

So here's the full source-code.
Disclaimer: This is mostly coded for my particular scenario as GeoTiff supports a wider-range of options.
using (Tiff tiff = Tiff.Open(filePath, "r"))
{
    int height  = tiff.GetField(TiffTag.IMAGELENGTH)[0].ToInt();
    FieldValue[] modelPixelScaleTag = tiff.GetField((TiffTag)33550);
    FieldValue[] modelTiepointTag = tiff.GetField((TiffTag)33922);

    byte[] modelPixelScale = modelPixelScaleTag[1].GetBytes();
    double pixelSizeX = BitConverter.ToDouble(modelPixelScale, 0);
    double pixelSizeY = BitConverter.ToDouble(modelPixelScale, 8)*-1;

    byte[] modelTransformation = modelTiepointTag[1].GetBytes();
    double originLon = BitConverter.ToDouble(modelTransformation, 24);
    double originLat = BitConverter.ToDouble(modelTransformation, 32);

    double startLat = originLat + (pixelSizeY/2.0);
    double startLon = originLon + (pixelSizeX / 2.0);

    var scanline = new byte[tiff.ScanlineSize()]; 

    //TODO: Check if band is stored in 1 byte or 2 bytes. 
    //If 2, the following code would be required
    //var scanline16Bit = new ushort[tiff.ScanlineSize() / 2];
    //Buffer.BlockCopy(scanline, 0, scanline16Bit, 0, scanline.Length);

    double currentLat = startLat;
    double currentLon = startLon;

    for (int i = 0; i < height; i++)
    {
        tiff.ReadScanline(scanline, i); //Loading ith Line            
                    
        var latitude = currentLat + (pixelSizeY * i);
        for (var j = 0; j < scanline.Length; j++)
        {
            var longitude = currentLon + (pixelSizeX*j);
            geodata.Points[0] = new[] { new PointXY(longitude, latitude) };
            object value = scanline[j];

            //... process each data item

            yield return dataItem;
        }
    }
}
This code is actually working quite well. Eventually could be interesting to put this onto a proper lib and add support for additional GeoTiff features. Adding that to my backlog :)

11 comments:

  1. Haven't really tried it, but the lib does support writing tiff files: https://bitmiracle.github.io/libtiff.net/?topic=html/e4f25423-eede-4ef6-a920-9cb539d056c6.htm

    So I'm guessing it's just the matter of setting the correct tags for getodata.

    ReplyDelete
  2. Hello,

    Thanks a lot for the post. It was a great help for something that I needed to do. I do have one question though. Is there any way to extract the following information from the GeoTiff file using libtiff.net?

    GEOGCS["WGS 84",
    DATUM["WGS_1984",
    SPHEROID["WGS 84",6378137,298.257223563,

    Right now I am working with a workaround by calling gdalinfo from inside C# code and parsing the resultant output. It would be really great if libtiff.net offers such a provision. I have looked through the documentation but could not see any information on this. Any ideas??? Thanks a lot in advance.

    Rashid.

    ReplyDelete
    Replies
    1. Interesting question indeed. I'm using python with libtiff (and sometimes Delphi with LibTiffDelphi). I'm used to add a *.tfw file or world file to my tiff file. That *.tfw file seems to also contain the same data as stored in ModelPixelScaleTag and ModelTiepointTag. And yes, using gdalinfo to scan the file I see this:
      GEOGCS["WGS 84",
      DATUM["WGS_1984",
      SPHEROID["WGS 84",6378137,298.257223563,
      AUTHORITY["EPSG","7030"]],
      AUTHORITY["EPSG","6326"]],
      PRIMEM["Greenwich",0],
      UNIT["degree",0.0174532925199433],
      AUTHORITY["EPSG","4326"]]
      I saw more tags mentioned on this webpage and I tried to retrieve them: http://www.awaresystems.be/imaging/tiff/tifftags/private.html
      The tags are called ModelTransformationTag, GeoKeyDirectoryTag, GeoDoubleParamsTag and GeoAsciiParamsTag. When I read them with libtiff from the same TIFF file (written by GDAL) as I scanned with gdalinfo, I get:
      None
      [1, 1, 0, 7, 1024, 0, 1, 2, 1025, 0, 1, 1, 2048, 0, 1, 4326, 2049, 34737, 13, 0, 2054, 0, 1, 9102, 2057, 34736, 1, 1, 2059, 34736, 1, 0]
      1.94734590925e-317
      GCS_WGS_1984|
      The last line seems to contain truncated information. Not sure I understand the rest of the output, but hopefully we can learn how to retrieve this kind of data further and to make sense out of such data? Regards ...

      Delete
    2. The last line may or may not be truncated. According to http://geotiff.maptools.org/spec/geotiff2.4.html

      "Note on ASCII Keys:
      Special handling is required for ASCII-valued keys. While it is true that TIFF 6.0 permits multiple NULL-delimited strings within a single ASCII tag, the secondary strings might not appear in the output of naive "tiffdump" programs. For this reason, the null delimiter of each ASCII Key value shall be converted to a "|" (pipe) character before being installed back into the ASCII holding tag, so that a dump of the tag will look like this.

      AsciiTag="first_value|second_value|etc...last_value|"
      A baseline GeoTIFF-reader must check for and convert the final "|" pipe character of a key back into a NULL before returning it to the client software."

      Since libtiff is particularly barebones I'd say it's not handling the end of line properly. On the bright side, if the '|' is actually the end of data then everything is fine.

      Delete
  3. Hello, this is a wonderful article! Do you have any idea of how to find the min/max elevation data from the geotiff data set? I am trying to normalize the data set in order to use as a heightmap (which requires values between 0 and 1). Currently I am manually finding the min/max while reading in the raw elevation data, but this seems horribly inefficient.

    Thanks!

    ReplyDelete
    Replies
    1. Hi. gdal_info will provide you that information:

      From my example above:
      Minimum=0.000, Maximum=213.000, Mean=22.754, StdDev=25.124

      Delete
  4. Thanks a lot!!! Literally saved my day!

    ReplyDelete
  5. Thanks for the start! In what context does geodata, PointXY and dataItem come from. I don't see them referenced at all in your "full" source code.

    ReplyDelete
  6. Hi Pedro,
    I'm ATM trying to do something like you did.. and to find the tifftags used in gdallinfo.I stumbled upon this link :
    https://www.awaresystems.be/imaging/tiff/tifftags/geokeydirectorytag.html
    which gives interesting infos on tifftags...
    in short, we are working on private use tifftags allocated by adobe ...
    33550 // GEOTIFF_MODELPIXELSCALETAG ( what you were looking for )
    33922 // GEOTIFF_MODELTIEPOINTTAG ( what you were looking for )
    34735 , 34736 , 42113 // ?? yet to see what it is ..
    34737 // some SRS infos , but not implemented in the tifftag enum of bitmiracle.libtiff, the lib i use.

    Thanks for the first digging !!

    ReplyDelete
  7. Sir, thanks for your solution.
    I have a question about those geoData, can we manipulate it just like a CAD Software, or an industrial CT Scanner ?

    ReplyDelete