Introduction
This us the first post of my series notes about hyperspectral imagery classification. See other parts:
Part 2: Classification Approach and Spectral Profiles Creation
There was the EEEI Data Fusion Contest this spring. This year they wanted people to elaborate about hyper-spectral (142-bands imagery) and LiDAR data. The resolution of the data-set was about 5 m. There were 2 nominations: best classification and the best scientific paper.
I work with high-resolution imagery quite often, but classification is a very rear task for me though. I thought that this contest was a great opportunity to develop my skills. And not just a classification skills, but R skills as well... I decided to participate in best classification contest, and to use R for the most part.
I learned a lot and I will share my knowledge with you in a series of posts starting with this one. And like in some great novels, I will start from the very end - evaluation of my results.
![]() |
| Results of my classification (created in R, designed in QGIS) |
Contest results
...are available here. As you may notice, I'm not on the list of the 10 best classification ;-) But there is almost unnoticeable 0.03% difference between my result (85.93% accuracy) and the result of the 10-th place. Not a bad result for a relatively inexperienced guy, don't you think? And I know, that I could have done better - I had 99% prediction accuracy for the training samples. It's funny, but my classification map looks better than map that took 7-th place!
How to evaluate classification using R
Due to I was not on the top ten list, I had to evaluate the result on my own. The organisers finally disclosed evaluation samples and I got a chance for a self assessment. So we have a set of .shp-files - each contains ground-truth polygons for one of the 16 classes and a classification map. We need to accomplish 3 goals:
- Create a visual representation of missclassification.
- Assess accuracy.
- Create a confusion matrix.
- Visualise classification map using EEEI colour palette.
Lets get a palette!
![]() |
| Official EEEI palette |
To extract colour values from the palette above you may use GIMP. But I used a widget that every KDE-user (Linux) should have by default. You can probe and save colour values from any part of your screen. Quite useful!
![]() |
| 'Color picker' in work |
Now let's see the code for our tasks.
Load needed libraries
library(rgdal) library(raster) library(reshape2) library(caret) library(ggplot2)
Load and process data
# get classification raster
ras <- raster('~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/raster.tif',
verbose = T)
# get list of shp-files for evaluation
shapes <- list.files(path = '~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi', pattern = '*shp')
# a list for accuracy assessment
accuracy_list <- list()
# create an empty dataframe to be filled vith evaluation results
# field names are not arbitrary!!!
eval_df <- data.frame(variable = character(),
value = character())
# create an empty dataframe to be used for plotting
# field names are not arbitrary!!!
plot_data <- data.frame(variable = character(),
value = character(),
Freq = integer())
for (f_name in shapes) {
# delete '.shp' from the filename
layer_name <- paste(sub('.shp', '', f_name))
class <- readOGR("~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi",
layer = layer_name,
verbose = F)
# extract values from raster
probe <- extract(ras, class)
# replace class numbers with names
samples <- list()
for (lis in probe) {
for (value in lis) {
if (value == 0) {c_name <- Unclassified
} else if (value == 1) {c_name <- 'Healthy grass'
} else if (value == 2) {c_name <- 'Stressed grass'
} else if (value == 3) {c_name <- 'Synthetic grass'
} else if (value == 4) {c_name <- 'Trees'
} else if (value == 5) {c_name <- 'Soil'
} else if (value == 6) {c_name <- 'Water'
} else if (value == 7) {c_name <- 'Residential'
} else if (value == 8) {c_name <- 'Commercial'
} else if (value == 9) {c_name <- 'Road'
} else if (value == 10) {c_name <- 'Highway'
} else if (value == 11) {c_name <- 'Railway'
} else if (value == 12) {c_name <- 'Parking Lot 1'
} else if (value == 13) {c_name <- 'Parking Lot 2'
} else if (value == 14) {c_name <- 'Tennis Court'
} else if (value == 15) {c_name <- 'Running Track'}
samples <- c(samples, c = c_name)
}
}
# make layer_name match sample name
if (layer_name == 'grass_healthy') {layer_name <- 'Healthy grass'
} else if (layer_name == 'grass_stressed') {layer_name <- 'Stressed grass'
} else if (layer_name == 'grass_syntethic') {layer_name <- 'Synthetic grass'
} else if (layer_name == 'tree') {layer_name <- 'Trees'
} else if (layer_name == 'soil') {layer_name <- 'Soil'
} else if (layer_name == 'water') {layer_name <- 'Water'
} else if (layer_name == 'residental') {layer_name <- 'Residential'
} else if (layer_name == 'commercial') {layer_name <- 'Commercial'
} else if (layer_name == 'road') {layer_name <- 'Road'
} else if (layer_name == 'highway') {layer_name <- 'Highway'
} else if (layer_name == 'railway') {layer_name <- 'Railway'
} else if (layer_name == 'parking_lot1') {layer_name <- 'Parking Lot 1'
} else if (layer_name == 'parking_lot2') {layer_name <- 'Parking Lot 2'
} else if (layer_name == 'tennis_court') {layer_name <- 'Tennis Court'
} else if (layer_name == 'running_track') {layer_name <- 'Running Track'}
# create a dataframe with classification results
df <- as.data.frame(samples)
dfm <- melt(df, id = 0)
dfm['variable'] <- layer_name
# add data to evaluation dataframe
eval_df <- rbind(eval_df, dfm)
# assess accuracy of current class
mytable <- table(dfm)
dmt <- as.data.frame(mytable)
total_samples <- 0
correct_predictions <- 0
for (i in 1:nrow(dmt)) {
predict_class <- toString(dmt[i,2])
pc_frequency <- dmt[i,3]
if (predict_class == layer_name) {
correct_predictions <- dmt[i,3]
}
total_samples <- total_samples + pc_frequency
}
accuracy <- round(correct_predictions/total_samples, 2)
accuracy_list <- c(accuracy_list, c = accuracy)
# append data for plotting
plot_data <- rbind(plot_data, dmt)
}
Create a fancy graph (that is shown on the map)
# create facets plot
EEEI_palette <- c('#D4D4F6',
'#5F5F5F',
'#710100',
'#00B300',
'#007900',
'#014400',
'#008744',
'#D0B183',
'#BAB363',
'#DAB179',
'#737373',
'#A7790A',
'#00EA01',
'#CA1236',
'#00B9BB')
plot_data <- plot_data[order(plot_data$variable),]
p = ggplot(data = plot_data,
aes(x = factor(1),
y = Freq,
fill = factor(value)
)
)
p <- p + facet_grid(facets = . ~ variable)
p <- p + geom_bar(width = 1) +
xlab('Classes') +
ylab('Classification rates') +
guides(fill=guide_legend(title="Classes"))+
scale_fill_manual(values = EEEI_palette)+
ggtitle('Classification Accuracy')+
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
p
Let't finally get our accuracy result!
# accuracy assessment
fin_accuracy <- mean(unlist(accuracy_list))
fin_accuracy <- paste(round(fin_accuracy*100, 2), '%', sep = '')
print(paste('Total accuracy:', fin_accuracy), sep = ' ')
[1] "Total accuracy: 85.93%"
Confusion matrix
# confusion matrix creation
true <- as.factor(eval_df$variable)
predict <- as.factor(eval_df$value)
confusionMatrix(predict, true)






