Introduction
This us the first post of my series notes about hyperspectral imagery classification. See other parts:
Part 2: Classification Approach and Spectral Profiles Creation
There was the EEEI Data Fusion Contest this spring. This year they wanted people to elaborate about hyper-spectral (142-bands imagery) and LiDAR data. The resolution of the data-set was about 5 m. There were 2 nominations: best classification and the best scientific paper.
I work with high-resolution imagery quite often, but classification is a very rear task for me though. I thought that this contest was a great opportunity to develop my skills. And not just a classification skills, but R skills as well... I decided to participate in best classification contest, and to use R for the most part.
I learned a lot and I will share my knowledge with you in a series of posts starting with this one. And like in some great novels, I will start from the very end - evaluation of my results.
Results of my classification (created in R, designed in QGIS) |
Contest results
...are available here. As you may notice, I'm not on the list of the 10 best classification ;-) But there is almost unnoticeable 0.03% difference between my result (85.93% accuracy) and the result of the 10-th place. Not a bad result for a relatively inexperienced guy, don't you think? And I know, that I could have done better - I had 99% prediction accuracy for the training samples. It's funny, but my classification map looks better than map that took 7-th place!
How to evaluate classification using R
Due to I was not on the top ten list, I had to evaluate the result on my own. The organisers finally disclosed evaluation samples and I got a chance for a self assessment. So we have a set of .shp-files - each contains ground-truth polygons for one of the 16 classes and a classification map. We need to accomplish 3 goals:
- Create a visual representation of missclassification.
- Assess accuracy.
- Create a confusion matrix.
- Visualise classification map using EEEI colour palette.
Lets get a palette!
Official EEEI palette |
To extract colour values from the palette above you may use GIMP. But I used a widget that every KDE-user (Linux) should have by default. You can probe and save colour values from any part of your screen. Quite useful!
'Color picker' in work |
Now let's see the code for our tasks.
Load needed libraries
library(rgdal) library(raster) library(reshape2) library(caret) library(ggplot2)
Load and process data
# get classification raster ras <- raster('~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/raster.tif', verbose = T) # get list of shp-files for evaluation shapes <- list.files(path = '~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi', pattern = '*shp')
# a list for accuracy assessment accuracy_list <- list() # create an empty dataframe to be filled vith evaluation results # field names are not arbitrary!!! eval_df <- data.frame(variable = character(), value = character()) # create an empty dataframe to be used for plotting # field names are not arbitrary!!! plot_data <- data.frame(variable = character(), value = character(), Freq = integer()) for (f_name in shapes) { # delete '.shp' from the filename layer_name <- paste(sub('.shp', '', f_name)) class <- readOGR("~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi", layer = layer_name, verbose = F) # extract values from raster probe <- extract(ras, class) # replace class numbers with names samples <- list() for (lis in probe) { for (value in lis) { if (value == 0) {c_name <- Unclassified } else if (value == 1) {c_name <- 'Healthy grass' } else if (value == 2) {c_name <- 'Stressed grass' } else if (value == 3) {c_name <- 'Synthetic grass' } else if (value == 4) {c_name <- 'Trees' } else if (value == 5) {c_name <- 'Soil' } else if (value == 6) {c_name <- 'Water' } else if (value == 7) {c_name <- 'Residential' } else if (value == 8) {c_name <- 'Commercial' } else if (value == 9) {c_name <- 'Road' } else if (value == 10) {c_name <- 'Highway' } else if (value == 11) {c_name <- 'Railway' } else if (value == 12) {c_name <- 'Parking Lot 1' } else if (value == 13) {c_name <- 'Parking Lot 2' } else if (value == 14) {c_name <- 'Tennis Court' } else if (value == 15) {c_name <- 'Running Track'} samples <- c(samples, c = c_name) } } # make layer_name match sample name if (layer_name == 'grass_healthy') {layer_name <- 'Healthy grass' } else if (layer_name == 'grass_stressed') {layer_name <- 'Stressed grass' } else if (layer_name == 'grass_syntethic') {layer_name <- 'Synthetic grass' } else if (layer_name == 'tree') {layer_name <- 'Trees' } else if (layer_name == 'soil') {layer_name <- 'Soil' } else if (layer_name == 'water') {layer_name <- 'Water' } else if (layer_name == 'residental') {layer_name <- 'Residential' } else if (layer_name == 'commercial') {layer_name <- 'Commercial' } else if (layer_name == 'road') {layer_name <- 'Road' } else if (layer_name == 'highway') {layer_name <- 'Highway' } else if (layer_name == 'railway') {layer_name <- 'Railway' } else if (layer_name == 'parking_lot1') {layer_name <- 'Parking Lot 1' } else if (layer_name == 'parking_lot2') {layer_name <- 'Parking Lot 2' } else if (layer_name == 'tennis_court') {layer_name <- 'Tennis Court' } else if (layer_name == 'running_track') {layer_name <- 'Running Track'} # create a dataframe with classification results df <- as.data.frame(samples) dfm <- melt(df, id = 0) dfm['variable'] <- layer_name # add data to evaluation dataframe eval_df <- rbind(eval_df, dfm) # assess accuracy of current class mytable <- table(dfm) dmt <- as.data.frame(mytable) total_samples <- 0 correct_predictions <- 0 for (i in 1:nrow(dmt)) { predict_class <- toString(dmt[i,2]) pc_frequency <- dmt[i,3] if (predict_class == layer_name) { correct_predictions <- dmt[i,3] } total_samples <- total_samples + pc_frequency } accuracy <- round(correct_predictions/total_samples, 2) accuracy_list <- c(accuracy_list, c = accuracy) # append data for plotting plot_data <- rbind(plot_data, dmt) }
Create a fancy graph (that is shown on the map)
# create facets plot EEEI_palette <- c('#D4D4F6', '#5F5F5F', '#710100', '#00B300', '#007900', '#014400', '#008744', '#D0B183', '#BAB363', '#DAB179', '#737373', '#A7790A', '#00EA01', '#CA1236', '#00B9BB') plot_data <- plot_data[order(plot_data$variable),] p = ggplot(data = plot_data, aes(x = factor(1), y = Freq, fill = factor(value) ) ) p <- p + facet_grid(facets = . ~ variable) p <- p + geom_bar(width = 1) + xlab('Classes') + ylab('Classification rates') + guides(fill=guide_legend(title="Classes"))+ scale_fill_manual(values = EEEI_palette)+ ggtitle('Classification Accuracy')+ theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) p
Let't finally get our accuracy result!
# accuracy assessment fin_accuracy <- mean(unlist(accuracy_list)) fin_accuracy <- paste(round(fin_accuracy*100, 2), '%', sep = '') print(paste('Total accuracy:', fin_accuracy), sep = ' ')
[1] "Total accuracy: 85.93%"
Confusion matrix
# confusion matrix creation
true <- as.factor(eval_df$variable)
predict <- as.factor(eval_df$value)
confusionMatrix(predict, true)