Misanthrope's Thoughts: Classification of the Hyper-Spectral and LiDAR Imagery using R (mostly). Part 2: Classification Approach and Spectre Profile Creation

Introduction

This is the second part of my post series related to hyper-spectral and LiDAR imagery using R. See other parts:

In this part I will describe my general approach to classification process and then I will show you how to create cool spectral response plots like this:

Colour of boxes corresponds to the band wavelength

Classification approach

I decided to use Random Forest algorithm for the per-pixel classification. Random Forest is a reliable machine-learning algorithm and I already successfully used it for my projects. Notice that the winner of the contest used Random Forest too, but with per-object classification approach.

Random Forest is implemented in R in 'randomForest' package and cool thing is that 'raster' package is able to implement it to rasters. But not only initial 144 hyper-spectral bands + LiDAR were 'feeded' to Random Forest, but in addition several indices and DSM derivatives were used. Here is what I did for classification:

Cloud shadow identification and removal.
Self-casted shadow identification and removal.
Creation of the spectral profiles for the 16 classes. This needed to choose which bands will be used for indices calculation.
Indices calculation: NDVI, NHFD, NDWI.
DSM processing: elevated objects extraction, DSM to DTM conversion, calculation of slope, aspect, normalised DEM, topographic position and terrain roughness indices.
Random Forest model creation and adjustment. Classification itself.
Final filtering using python-GDAL.
Result evaluation.

Spectral profiles

For calculation of indices one need to choose the bands as an input. Which one to pick if there are 144 bands in hyper-spectral image? The spectral profile graphs which should help with that. But these won't be common spectral profiles that we all used to, like this one:

Spectral profile from one of my researches

We need to see dispersion of band values for each class much more than mean values, because dispersion will significantly affect classification. If some classes will have overlapping values in the same band it will decrease accuracy of classification. Ok, at our spectral profile we should see wavelength, band number and dispersion of given classes to make a decision which band to pick for indices calculation. This means that box plots will be used instead of linear plots. Also boxes should be coloured accordingly to the corresponding wavelength

Needed libraries

library(ggplot2)
library(reshape2)
library(raster)
library(rgdal)

Load raster and take samples using .shp-file with classes

# Load raster
image <- stack('~/EEEI_contest/all_layers.tif')

# Load point .shp-file with classes
s_points_dFrame <- readOGR( '~/EEEI_contest',
                    layer="sampling_points",
                    p4s="+proj=utm +zone=15 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0")
s_points <- SpatialPoints(s_points_dFrame)
dFrame <- as.data.frame(s_points_dFrame)

# Extract data at sampling points
probe <- extract(image, s_points, method='bilinear')

# Combine sampling results with original data 
sample <- cbind(dFrame, probe)

Spectral profile creation

At this point we have 'sample' dataframe that looks like this:

'pattern' is a class name and 'pattern_id' is the integer that corresponds to each class. These fields, as well as 'lat' and 'lon' belonged to the shp-file. Ohter fields were created with 'extract' function.

# get numbers for bands instead of names in headers of columns
for (i in c(7:ncol(sample))){
  colnames(sample)[i] <- i-6
}

Creation of the proper names for bands and palette creation

# preparation for palette creation (establish wavelengths)
palette <- c()
violet <- c(380:450)
blue <- c(452:495)
green <- c(496:570)
yellow <- c(571:590)
orange <- c(591:620)
red <- c(621:750)
NIR <- c(750:1050)

# process band names (future captions) and palette
for (i in c(7:150)){
  # calculate wavelength for a given band
  wave <- (i-7)*4.685315 + 380
  wave <- round(wave, digits = 0)
  
  # rename colunms in 'sample'
  band_num <- paste('(band', i-6, sep = ' ')
  band_num <- paste(band_num, ')', sep = '')
  colnames(sample)[i] <- paste(wave, band_num, sep = ' ')
  
  # add value to palette
  if (is.element(wave, violet)) {palette <- c(palette, '#8F00FF')   
  } else if (is.element(wave, blue)) {palette <- c(palette, '#2554C7')   
  } else if (is.element(wave, green)) {palette <- c(palette, '#008000')   
  } else if (is.element(wave, yellow)) {palette <- c(palette, '#FFFF00')   
  } else if (is.element(wave, orange)) {palette <- c(palette, '#FF8040')   
  } else if (is.element(wave, red)) {palette <- c(palette, '#FF0000')   
  } else if (is.element(wave, NIR)) {palette <- c(palette, '#800000')   
  }
  
}

# Remove unneeded fields
sample <- subset(sample, select = 1:150)
sample[, 5] <- NULL
sample[, 5] <- NULL
sample <- sample[,3:148]

Now our dataframe looks like this:

Reshape dataframe for future plotting:

#
sample <- melt(sample, id = c('pattern', 'pattern_id'))
#
#

Now our dataframe looks like this:

We want to create spectral profiles for each class. This means we need to create plotting function. Keep in mind that 'ggplot2' doesn't work with local variables. We need to create generic plotting function and run it in a cycle that will create global subsets from our dataframe:

plotting <- function(data_f, plot_name){
  p <- ggplot(data_f, aes(data_f[,3], data_f[,4])) +
       geom_boxplot(aes(fill = data_f[,3]),
                    colour = palette,
                    # make outliers have the same colour as lines    
                    outlier.colour = NULL) +
       theme(axis.text.x = element_text(angle=90, hjust = 0),
             axis.title = element_text(face = 'bold', size = 14),
             title = element_text(face = 'bold', size = 16),
             legend.position = 'none') +
       labs(title = paste('Spectral response for class\n',
                          plot_name,
                          sep = ' '),
            x = 'Wavelength, nm',
            y = 'Response')+
       scale_fill_manual(values = palette)
  print(p)
}

Final stage: plotting (EDIT: name of the dataframe was fixed to match previous code parts)

# get unique classes

# uniue class numbers:
u <- unique(sample[,'pattern_id'])
# unique class names:
u_names <- unique(unlist(sample[,'pattern']))

for (i in u){
  # works only if class numbers (u) are consequtive integers from 1 to u
  data_f <- subset(sample, pattern_id == i)  
  plot_name <- u_names[i]
  plotting(data_f, plot_name)
}

If you are curious here you are the rest of the plots:

5 comments :

AnonymousAugust 5, 2013 at 3:09 AM
The comic sans overwhelms any value this blog post may have....Sorry...
AnonymousAugust 12, 2013 at 1:57 AM
Thanks for the text.

P.s. You should really think about changing font and so...
Jeffrey EvansMay 13, 2014 at 2:53 AM
You really need to be mindful of your object names not conflicting with functions. Use of the object names "image" and "sample" mask very important base functions.