What is the cause for the degradation of environment?
Capitalism, corruption, consuming society? - OVERPOPULATION!
Please, save the Planet - kill yourself...

Saturday, August 17, 2013

How to Count Unique Values in Raster

Recently I demonstrated how to get histograms for the rasters in QGIS. But what if one need to count exact number of the given value in a raster (for example for assessment of classification results)? In this case the scripting is required.

UPD: the script below was transformed in more usable SEXTANTE script, see this post.

We'll use Python-GDAL for this task (thanks to such tutorials as this one) it is extremely easy to learn how to use it). The script I wrote seems to be not optimal (hence its working just fine and quickly so you may use it freely) due to it is not clear to me whether someone would like to use it for the floating data types (which seems to be not that feasible due to great number of unique values in this case) or such task is performed only for integers (and in this case code might be optimised). It works with single and multi band rasters.

How to use

The code provided below should be copied into a text file and saved with '.py' extension. Enable Python console in QGIS, hit "Show editor" button in console and open the script file. Then replace 'raster_path' with the actual path to the raster and hit 'Run script button'. In the console output you will see sorted list of unique values of raster per band:

QGIS Python console view
Script body (to the right) and its output (to the left)

Script itself

#!/usr/bin/python -tt
#-*- coding: UTF-8 -*-

'''
#####################################################################
This script is designed to compute unique values of the given raster
#####################################################################
'''

from osgeo import gdal
import sys
import math

path = "raster_path"

gdalData = gdal.Open(path)
if gdalData is None:
  sys.exit( "ERROR: can't open raster" )

# get width and heights of the raster
xsize = gdalData.RasterXSize
ysize = gdalData.RasterYSize

# get number of bands
bands = gdalData.RasterCount

# process the raster
for i in xrange(1, bands + 1):
  band_i = gdalData.GetRasterBand(i)
  raster = band_i.ReadAsArray()

  # create dictionary for unique values count
  count = {}

  # count unique values for the given band
  for col in range( xsize ):
    for row in range( ysize ):
      cell_value = raster[row, col]

      # check if cell_value is NaN
      if math.isnan(cell_value):
        cell_value = 'Null'

      # add cell_value to dictionary
      try:
        count[cell_value] += 1
      except:
        count[cell_value] = 1

  # print results sorted by cell_value
  for key in sorted(count.iterkeys()):
    print "band #%s - %s: %s" %(i, key, count[key])

Saturday, August 3, 2013

Classification of the Hyper-Spectral and LiDAR Imagery using R (mostly). Part 2: Classification Approach and Spectre Profile Creation

Introduction

This is the second part of my post series related to hyper-spectral and LiDAR imagery using R. See other parts:

In this part I will describe my general approach to classification process and then I will show you how to create cool spectral response plots like this:
spectral response graph
Colour of boxes corresponds to the band wavelength


Classification approach

I decided to use Random Forest algorithm for the per-pixel classification. Random Forest is a reliable machine-learning algorithm and I already successfully used it for my projects. Notice that the winner of the contest used Random Forest too, but with per-object classification approach.

Random Forest is implemented in R in 'randomForest' package and cool thing is that 'raster' package is able to implement it to rasters. But not only initial 144 hyper-spectral bands + LiDAR were 'feeded' to Random Forest, but in addition several indices and DSM derivatives were used. Here is what I did for classification:
  1. Cloud shadow identification and removal.
  2. Self-casted shadow identification and removal.
  3. Creation of the spectral profiles for the 16 classes. This needed to choose which bands will be used for indices calculation.
  4. Indices calculation: NDVI, NHFD, NDWI.
  5. DSM processing: elevated objects extraction, DSM to DTM conversion, calculation of slope, aspect, normalised DEM, topographic position and terrain roughness indices.
  6. Random Forest model creation and adjustment. Classification itself.
  7. Final filtering using python-GDAL.
  8. Result evaluation.

Spectral profiles

For calculation of indices one need to choose the bands as an input. Which one to pick if there are 144 bands in hyper-spectral image? The spectral profile graphs which should help with that. But these won't be common spectral profiles that we all used to, like this one:
Spectral profile from one of my researches
We need to see dispersion of band values for each class much more than mean values, because dispersion will significantly affect classification. If some classes will have overlapping values in the same band it will decrease accuracy of classification. Ok, at our spectral profile we should see wavelength, band number and dispersion of given classes to make a decision which band to pick for indices calculation. This means that box plots will be used instead of linear plots. Also boxes should be coloured accordingly to the corresponding wavelength

Needed libraries

library(ggplot2)
library(reshape2)
library(raster)
library(rgdal)

Load raster and take samples using .shp-file with classes

# Load raster
image <- stack('~/EEEI_contest/all_layers.tif')

# Load point .shp-file with classes
s_points_dFrame <- readOGR( '~/EEEI_contest',
                    layer="sampling_points",
                    p4s="+proj=utm +zone=15 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0")
s_points <- SpatialPoints(s_points_dFrame)
dFrame <- as.data.frame(s_points_dFrame)

# Extract data at sampling points
probe <- extract(image, s_points, method='bilinear')

# Combine sampling results with original data 
sample <- cbind(dFrame, probe)

Spectral profile creation

At this point we have 'sample' dataframe that looks like this:
'pattern' is a class name and 'pattern_id' is the integer that corresponds to each class. These fields, as well as 'lat' and 'lon' belonged to the shp-file. Ohter fields were created with 'extract' function.
# get numbers for bands instead of names in headers of columns
for (i in c(7:ncol(sample))){
  colnames(sample)[i] <- i-6
}

Creation of the proper names for bands and palette creation

# preparation for palette creation (establish wavelengths)
palette <- c()
violet <- c(380:450)
blue <- c(452:495)
green <- c(496:570)
yellow <- c(571:590)
orange <- c(591:620)
red <- c(621:750)
NIR <- c(750:1050)

# process band names (future captions) and palette
for (i in c(7:150)){
  # calculate wavelength for a given band
  wave <- (i-7)*4.685315 + 380
  wave <- round(wave, digits = 0)
  
  # rename colunms in 'sample'
  band_num <- paste('(band', i-6, sep = ' ')
  band_num <- paste(band_num, ')', sep = '')
  colnames(sample)[i] <- paste(wave, band_num, sep = ' ')
  
  # add value to palette
  if (is.element(wave, violet)) {palette <- c(palette, '#8F00FF')   
  } else if (is.element(wave, blue)) {palette <- c(palette, '#2554C7')   
  } else if (is.element(wave, green)) {palette <- c(palette, '#008000')   
  } else if (is.element(wave, yellow)) {palette <- c(palette, '#FFFF00')   
  } else if (is.element(wave, orange)) {palette <- c(palette, '#FF8040')   
  } else if (is.element(wave, red)) {palette <- c(palette, '#FF0000')   
  } else if (is.element(wave, NIR)) {palette <- c(palette, '#800000')   
  }
  
}

# Remove unneeded fields
sample <- subset(sample, select = 1:150)
sample[, 5] <- NULL
sample[, 5] <- NULL
sample <- sample[,3:148]
Now our dataframe looks like this:

Reshape dataframe for future plotting:
#
sample <- melt(sample, id = c('pattern', 'pattern_id'))
#
#
Now our dataframe looks like this:

We want to create spectral profiles for each class. This means we need to create plotting function. Keep in mind that 'ggplot2' doesn't work with local variables. We need to create generic plotting function and run it in a cycle that will create global subsets from our dataframe:
plotting <- function(data_f, plot_name){
  p <- ggplot(data_f, aes(data_f[,3], data_f[,4])) +
       geom_boxplot(aes(fill = data_f[,3]),
                    colour = palette,
                    # make outliers have the same colour as lines    
                    outlier.colour = NULL) +
       theme(axis.text.x = element_text(angle=90, hjust = 0),
             axis.title = element_text(face = 'bold', size = 14),
             title = element_text(face = 'bold', size = 16),
             legend.position = 'none') +
       labs(title = paste('Spectral response for class\n',
                          plot_name,
                          sep = ' '),
            x = 'Wavelength, nm',
            y = 'Response')+
       scale_fill_manual(values = palette)
  print(p)
}

Final stage: plotting (EDIT: name of the dataframe was fixed to match previous code parts)
# get unique classes

# uniue class numbers:
u <- unique(sample[,'pattern_id'])
# unique class names:
u_names <- unique(unlist(sample[,'pattern']))

for (i in u){
  # works only if class numbers (u) are consequtive integers from 1 to u
  data_f <- subset(sample, pattern_id == i)  
  plot_name <- u_names[i]
  plotting(data_f, plot_name)
}

If you are curious here you are the rest of the plots: