Misanthrope's Thoughts: 2013

Wednesday, September 18, 2013

Count Unique Values In Raster Using SEXTANTE and QGIS

I decided to make my script for counting unique values in raster more usable. Now you can use it via SEXTANTE in QGIS. Download script for SEXTANTE and extract archive to the folder that is is intended for your Python SEXTANTE scripts (for examlple ~./qgis2/processing/scripts). If you don't know where this folder is located go Processing -> Options and configuration -> Scripts -> Scripts folder, see the screenshot:

Now restart QGIS and in SEXTNTE Toolbox go to Scripts. You will notice new group named Raster processing. There the script named Unique values count will be located:

Launch it and you will see its dialogue window:

Main window

Note that Help is available:

Either single- or multi-band rasters are accepted for processing. After the raster is chosen (input field) one need to decide whether to round values for counting or not. If no rounding is needed - leave 'round values to ndigits' field blank. Otherwise enter needed value there. Note that negative values are accepted - this will round values to ndigits before decimal point.

When the set up is finished hit the Run button. You will get something like this:

Result window

Saturday, August 17, 2013

How to Count Unique Values in Raster

Recently I demonstrated how to get histograms for the rasters in QGIS. But what if one need to count exact number of the given value in a raster (for example for assessment of classification results)? In this case the scripting is required.

UPD: the script below was transformed in more usable SEXTANTE script, see this post.

We'll use Python-GDAL for this task (thanks to such tutorials as this one) it is extremely easy to learn how to use it). The script I wrote seems to be not optimal (hence its working just fine and quickly so you may use it freely) due to it is not clear to me whether someone would like to use it for the floating data types (which seems to be not that feasible due to great number of unique values in this case) or such task is performed only for integers (and in this case code might be optimised). It works with single and multi band rasters.

How to use

The code provided below should be copied into a text file and saved with '.py' extension. Enable Python console in QGIS, hit "Show editor" button in console and open the script file. Then replace 'raster_path' with the actual path to the raster and hit 'Run script button'. In the console output you will see sorted list of unique values of raster per band:

Script body (to the right) and its output (to the left)

Script itself

#!/usr/bin/python -tt
#-*- coding: UTF-8 -*-

'''
#####################################################################
This script is designed to compute unique values of the given raster
#####################################################################
'''

from osgeo import gdal
import sys
import math

path = "raster_path"

gdalData = gdal.Open(path)
if gdalData is None:
  sys.exit( "ERROR: can't open raster" )

# get width and heights of the raster
xsize = gdalData.RasterXSize
ysize = gdalData.RasterYSize

# get number of bands
bands = gdalData.RasterCount

# process the raster
for i in xrange(1, bands + 1):
  band_i = gdalData.GetRasterBand(i)
  raster = band_i.ReadAsArray()

  # create dictionary for unique values count
  count = {}

  # count unique values for the given band
  for col in range( xsize ):
    for row in range( ysize ):
      cell_value = raster[row, col]

      # check if cell_value is NaN
      if math.isnan(cell_value):
        cell_value = 'Null'

      # add cell_value to dictionary
      try:
        count[cell_value] += 1
      except:
        count[cell_value] = 1

  # print results sorted by cell_value
  for key in sorted(count.iterkeys()):
    print "band #%s - %s: %s" %(i, key, count[key])

Saturday, August 3, 2013

Classification of the Hyper-Spectral and LiDAR Imagery using R (mostly). Part 2: Classification Approach and Spectre Profile Creation

Introduction

This is the second part of my post series related to hyper-spectral and LiDAR imagery using R. See other parts:

Part 1: Result Evaluation.

In this part I will describe my general approach to classification process and then I will show you how to create cool spectral response plots like this:

Colour of boxes corresponds to the band wavelength

Classification approach

I decided to use Random Forest algorithm for the per-pixel classification. Random Forest is a reliable machine-learning algorithm and I already successfully used it for my projects. Notice that the winner of the contest used Random Forest too, but with per-object classification approach.

Random Forest is implemented in R in 'randomForest' package and cool thing is that 'raster' package is able to implement it to rasters. But not only initial 144 hyper-spectral bands + LiDAR were 'feeded' to Random Forest, but in addition several indices and DSM derivatives were used. Here is what I did for classification:

Cloud shadow identification and removal.
Self-casted shadow identification and removal.
Creation of the spectral profiles for the 16 classes. This needed to choose which bands will be used for indices calculation.
Indices calculation: NDVI, NHFD, NDWI.
DSM processing: elevated objects extraction, DSM to DTM conversion, calculation of slope, aspect, normalised DEM, topographic position and terrain roughness indices.
Random Forest model creation and adjustment. Classification itself.
Final filtering using python-GDAL.
Result evaluation.

Spectral profiles

For calculation of indices one need to choose the bands as an input. Which one to pick if there are 144 bands in hyper-spectral image? The spectral profile graphs which should help with that. But these won't be common spectral profiles that we all used to, like this one:

Spectral profile from one of my researches

We need to see dispersion of band values for each class much more than mean values, because dispersion will significantly affect classification. If some classes will have overlapping values in the same band it will decrease accuracy of classification. Ok, at our spectral profile we should see wavelength, band number and dispersion of given classes to make a decision which band to pick for indices calculation. This means that box plots will be used instead of linear plots. Also boxes should be coloured accordingly to the corresponding wavelength

Needed libraries

library(ggplot2)
library(reshape2)
library(raster)
library(rgdal)

Load raster and take samples using .shp-file with classes

# Load raster
image <- stack('~/EEEI_contest/all_layers.tif')

# Load point .shp-file with classes
s_points_dFrame <- readOGR( '~/EEEI_contest',
                    layer="sampling_points",
                    p4s="+proj=utm +zone=15 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0")
s_points <- SpatialPoints(s_points_dFrame)
dFrame <- as.data.frame(s_points_dFrame)

# Extract data at sampling points
probe <- extract(image, s_points, method='bilinear')

# Combine sampling results with original data 
sample <- cbind(dFrame, probe)

Spectral profile creation

At this point we have 'sample' dataframe that looks like this:

'pattern' is a class name and 'pattern_id' is the integer that corresponds to each class. These fields, as well as 'lat' and 'lon' belonged to the shp-file. Ohter fields were created with 'extract' function.

# get numbers for bands instead of names in headers of columns
for (i in c(7:ncol(sample))){
  colnames(sample)[i] <- i-6
}

Creation of the proper names for bands and palette creation

# preparation for palette creation (establish wavelengths)
palette <- c()
violet <- c(380:450)
blue <- c(452:495)
green <- c(496:570)
yellow <- c(571:590)
orange <- c(591:620)
red <- c(621:750)
NIR <- c(750:1050)

# process band names (future captions) and palette
for (i in c(7:150)){
  # calculate wavelength for a given band
  wave <- (i-7)*4.685315 + 380
  wave <- round(wave, digits = 0)
  
  # rename colunms in 'sample'
  band_num <- paste('(band', i-6, sep = ' ')
  band_num <- paste(band_num, ')', sep = '')
  colnames(sample)[i] <- paste(wave, band_num, sep = ' ')
  
  # add value to palette
  if (is.element(wave, violet)) {palette <- c(palette, '#8F00FF')   
  } else if (is.element(wave, blue)) {palette <- c(palette, '#2554C7')   
  } else if (is.element(wave, green)) {palette <- c(palette, '#008000')   
  } else if (is.element(wave, yellow)) {palette <- c(palette, '#FFFF00')   
  } else if (is.element(wave, orange)) {palette <- c(palette, '#FF8040')   
  } else if (is.element(wave, red)) {palette <- c(palette, '#FF0000')   
  } else if (is.element(wave, NIR)) {palette <- c(palette, '#800000')   
  }
  
}

# Remove unneeded fields
sample <- subset(sample, select = 1:150)
sample[, 5] <- NULL
sample[, 5] <- NULL
sample <- sample[,3:148]

Now our dataframe looks like this:

Reshape dataframe for future plotting:

#
sample <- melt(sample, id = c('pattern', 'pattern_id'))
#
#

Now our dataframe looks like this:

We want to create spectral profiles for each class. This means we need to create plotting function. Keep in mind that 'ggplot2' doesn't work with local variables. We need to create generic plotting function and run it in a cycle that will create global subsets from our dataframe:

plotting <- function(data_f, plot_name){
  p <- ggplot(data_f, aes(data_f[,3], data_f[,4])) +
       geom_boxplot(aes(fill = data_f[,3]),
                    colour = palette,
                    # make outliers have the same colour as lines    
                    outlier.colour = NULL) +
       theme(axis.text.x = element_text(angle=90, hjust = 0),
             axis.title = element_text(face = 'bold', size = 14),
             title = element_text(face = 'bold', size = 16),
             legend.position = 'none') +
       labs(title = paste('Spectral response for class\n',
                          plot_name,
                          sep = ' '),
            x = 'Wavelength, nm',
            y = 'Response')+
       scale_fill_manual(values = palette)
  print(p)
}

Final stage: plotting (EDIT: name of the dataframe was fixed to match previous code parts)

# get unique classes

# uniue class numbers:
u <- unique(sample[,'pattern_id'])
# unique class names:
u_names <- unique(unlist(sample[,'pattern']))

for (i in u){
  # works only if class numbers (u) are consequtive integers from 1 to u
  data_f <- subset(sample, pattern_id == i)  
  plot_name <- u_names[i]
  plotting(data_f, plot_name)
}

If you are curious here you are the rest of the plots:

Sunday, July 28, 2013

Classification of the Hyper-Spectral and LiDAR Imagery using R (mostly). Part 1: Result Evaluation

Introduction

This us the first post of my series notes about hyperspectral imagery classification. See other parts: Part 2: Classification Approach and Spectral Profiles Creation

There was the EEEI Data Fusion Contest this spring. This year they wanted people to elaborate about hyper-spectral (142-bands imagery) and LiDAR data. The resolution of the data-set was about 5 m. There were 2 nominations: best classification and the best scientific paper.

I work with high-resolution imagery quite often, but classification is a very rear task for me though. I thought that this contest was a great opportunity to develop my skills. And not just a classification skills, but R skills as well... I decided to participate in best classification contest, and to use R for the most part.

I learned a lot and I will share my knowledge with you in a series of posts starting with this one. And like in some great novels, I will start from the very end - evaluation of my results.

Results of my classification (created in R, designed in QGIS)

Contest results

...are available here. As you may notice, I'm not on the list of the 10 best classification ;-) But there is almost unnoticeable 0.03% difference between my result (85.93% accuracy) and the result of the 10-th place. Not a bad result for a relatively inexperienced guy, don't you think? And I know, that I could have done better - I had 99% prediction accuracy for the training samples. It's funny, but my classification map looks better than map that took 7-th place!

How to evaluate classification using R

Due to I was not on the top ten list, I had to evaluate the result on my own. The organisers finally disclosed evaluation samples and I got a chance for a self assessment. So we have a set of .shp-files - each contains ground-truth polygons for one of the 16 classes and a classification map. We need to accomplish 3 goals:

Create a visual representation of missclassification.
Assess accuracy.
Create a confusion matrix.
Visualise classification map using EEEI colour palette.

Lets get a palette!

Official EEEI palette

To extract colour values from the palette above you may use GIMP. But I used a widget that every KDE-user (Linux) should have by default. You can probe and save colour values from any part of your screen. Quite useful!

'Color picker' in work

Now let's see the code for our tasks.

Load needed libraries

library(rgdal)
library(raster)
library(reshape2)
library(caret)
library(ggplot2)

Load and process data

# get classification raster
ras <- raster('~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/raster.tif',
              verbose = T)
# get list of shp-files for evaluation
shapes <- list.files(path = '~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi', pattern = '*shp')

# a list for accuracy assessment
accuracy_list <- list()

# create an empty dataframe to be filled vith evaluation results
# field names are not arbitrary!!!
eval_df <- data.frame(variable = character(),
                      value = character())

# create an empty dataframe to be used for plotting
# field names are not arbitrary!!!
plot_data <- data.frame(variable = character(),
                        value = character(),
                        Freq = integer())


for (f_name in shapes) {
  # delete '.shp' from the filename
  layer_name <- paste(sub('.shp', '', f_name))
  class <- readOGR("~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi",
                   layer = layer_name,
                   verbose = F)

  # extract values from raster
  probe <- extract(ras, class)

  # replace class numbers with names 
  samples <- list()
  for (lis in probe) {
    for (value in lis) {
      if (value == 0) {c_name <- Unclassified
      } else if (value == 1) {c_name <- 'Healthy grass'
      } else if (value == 2) {c_name <- 'Stressed grass'
      } else if (value == 3) {c_name <- 'Synthetic grass'
      } else if (value == 4) {c_name <- 'Trees'
      } else if (value == 5) {c_name <- 'Soil'
      } else if (value == 6) {c_name <- 'Water'
      } else if (value == 7) {c_name <- 'Residential'
      } else if (value == 8) {c_name <- 'Commercial'
      } else if (value == 9) {c_name <- 'Road'
      } else if (value == 10) {c_name <- 'Highway'
      } else if (value == 11) {c_name <- 'Railway'
      } else if (value == 12) {c_name <- 'Parking Lot 1'
      } else if (value == 13) {c_name <- 'Parking Lot 2'
      } else if (value == 14) {c_name <- 'Tennis Court'
      } else if (value == 15) {c_name <- 'Running Track'} 
      samples <- c(samples, c = c_name)
    }
  }

  # make layer_name match sample name
  if (layer_name == 'grass_healthy') {layer_name <- 'Healthy grass'
  } else if (layer_name == 'grass_stressed') {layer_name <- 'Stressed grass'
  } else if (layer_name == 'grass_syntethic') {layer_name <- 'Synthetic grass'
  } else if (layer_name == 'tree') {layer_name <- 'Trees'
  } else if (layer_name == 'soil') {layer_name <- 'Soil'
  } else if (layer_name == 'water') {layer_name <- 'Water'
  } else if (layer_name == 'residental') {layer_name <- 'Residential'
  } else if (layer_name == 'commercial') {layer_name <- 'Commercial'
  } else if (layer_name == 'road') {layer_name <- 'Road'
  } else if (layer_name == 'highway') {layer_name <- 'Highway'
  } else if (layer_name == 'railway') {layer_name <- 'Railway'
  } else if (layer_name == 'parking_lot1') {layer_name <- 'Parking Lot 1'
  } else if (layer_name == 'parking_lot2') {layer_name <- 'Parking Lot 2'
  } else if (layer_name == 'tennis_court') {layer_name <- 'Tennis Court'
  } else if (layer_name == 'running_track') {layer_name <- 'Running Track'} 

  # create a dataframe with classification results
  df <- as.data.frame(samples)
  dfm <- melt(df, id = 0)
  dfm['variable'] <- layer_name

  # add data to evaluation dataframe
  eval_df <- rbind(eval_df, dfm)

  # assess accuracy of current class
  mytable <- table(dfm)
  dmt <- as.data.frame(mytable)
  total_samples <- 0
  correct_predictions <- 0
  for (i in 1:nrow(dmt)) {
    predict_class <- toString(dmt[i,2])
    pc_frequency <- dmt[i,3]
    if (predict_class == layer_name) {
      correct_predictions <- dmt[i,3]
    }
    total_samples <- total_samples + pc_frequency
  }
  accuracy <- round(correct_predictions/total_samples, 2)
  accuracy_list <- c(accuracy_list, c = accuracy)


  # append data for plotting
  plot_data <- rbind(plot_data, dmt)
}

Create a fancy graph (that is shown on the map)

# create facets plot
EEEI_palette <- c('#D4D4F6',
                  '#5F5F5F',
                  '#710100',
                  '#00B300',
                  '#007900',
                  '#014400',
                  '#008744',
                  '#D0B183',
                  '#BAB363',
                  '#DAB179',
                  '#737373',
                  '#A7790A',
                  '#00EA01',
                  '#CA1236',
                  '#00B9BB')
plot_data <- plot_data[order(plot_data$variable),]
p = ggplot(data = plot_data,
           aes(x = factor(1),
               y = Freq,
               fill = factor(value)
           )
)
p <- p + facet_grid(facets = . ~ variable)
p <- p + geom_bar(width = 1) +
  xlab('Classes') +
  ylab('Classification rates') +
  guides(fill=guide_legend(title="Classes"))+
  scale_fill_manual(values = EEEI_palette)+
  ggtitle('Classification Accuracy')+
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
p

Let't finally get our accuracy result!

# accuracy assessment
fin_accuracy <- mean(unlist(accuracy_list))
fin_accuracy <- paste(round(fin_accuracy*100, 2), '%', sep = '')
print(paste('Total accuracy:', fin_accuracy), sep = ' ')

[1] "Total accuracy: 85.93%"

Confusion matrix

# confusion matrix creation
true <- as.factor(eval_df$variable)
predict <- as.factor(eval_df$value)
confusionMatrix(predict, true)

Enjoy statistics!

Confusion Matrix and Statistics

                 Reference
Prediction        Commercial Healthy grass Highway Parking Lot 1
  Commercial             850             0      14           151
  Healthy grass            0           868       0             0
  Highway                  0             0     718            14
  Parking Lot 1           41             0      54           641
  Parking Lot 2            0             0      19            73
  Railway                  0             0      11            44
  Residential            155             0     191             0
  Road                     0             0      20           107
  Running Track            0             0       0             0
  Soil                     0             0       1             0
  Stressed grass           0            61       0             0
  Synthetic grass          0             0       0             0
  Tennis Court             0             0       0             0
  Trees                    0           117       0             0
  Water                    0             0       0             0
                 Reference
Prediction        Parking Lot 2 Railway Residential Road Running Track
  Commercial                  0       0          74    2             0
  Healthy grass               0       0           0    0             0
  Highway                     0       9           0    0             0
  Parking Lot 1             104       8           0   11             0
  Parking Lot 2             155       4           0    3             0
  Railway                     2     917          11   61             0
  Residential                 0      40         918    0             0
  Road                       12      14           0  930             0
  Running Track               0       0           1    0           465
  Soil                        6       2           0   27             1
  Stressed grass              0      11           4    0             0
  Synthetic grass             0       0           0    0             3
  Tennis Court                0       3           0    0             0
  Trees                       0       0          10    0             0
  Water                       0       0           0    0             0
                 Reference
Prediction        Soil Stressed grass Synthetic grass Tennis Court Trees
  Commercial         0              0               0            0     0
  Healthy grass      0             14               0            0    34
  Highway            0              0               0            0     0
  Parking Lot 1      0              0               0            0     0
  Parking Lot 2      0              0               0            0     0
  Railway            0              0               0            0     0
  Residential        0              1               0            0     0
  Road              14              0               0            0     0
  Running Track      0              0               0            0     0
  Soil            1040              0               0            0     0
  Stressed grass     0            931               0            0    17
  Synthetic grass    0              0             503            0     0
  Tennis Court       0              0               0          245     0
  Trees              0             85               0            0  1004
  Water              0              0               0            0     0
                 Reference
Prediction        Water
  Commercial          0
  Healthy grass       3
  Highway             0
  Parking Lot 1      22
  Parking Lot 2       0
  Railway             0
  Residential         0
  Road                0
  Running Track       0
  Soil                0
  Stressed grass      0
  Synthetic grass     0
  Tennis Court        0
  Trees               0
  Water             118

Overall Statistics

               Accuracy : 0.859         
                 95% CI : (0.853, 0.866)
    No Information Rate : 0.088         
    P-Value [Acc > NIR] : <2e-16        

                  Kappa : 0.847         
 Mcnemar's Test P-Value : NA            

Statistics by Class:

                     Class: Commercial Class: Healthy grass Class: Highway
Sensitivity                     0.8126               0.8298         0.6984
Specificity                     0.9780               0.9953         0.9979
Pos Pred Value                  0.7791               0.9445         0.9690
Neg Pred Value                  0.9820               0.9839         0.9724
Prevalence                      0.0872               0.0872         0.0857
Detection Rate                  0.0709               0.0724         0.0599
Detection Prevalence            0.0910               0.0767         0.0618
                     Class: Parking Lot 1 Class: Parking Lot 2
Sensitivity                        0.6223               0.5556
Specificity                        0.9781               0.9915
Pos Pred Value                     0.7276               0.6102
Neg Pred Value                     0.9650               0.9894
Prevalence                         0.0859               0.0233
Detection Rate                     0.0535               0.0129
Detection Prevalence               0.0735               0.0212
                     Class: Railway Class: Residential Class: Road
Sensitivity                  0.9097             0.9018      0.8994
Specificity                  0.9883             0.9647      0.9848
Pos Pred Value               0.8767             0.7034      0.8478
Neg Pred Value               0.9917             0.9906      0.9905
Prevalence                   0.0841             0.0849      0.0862
Detection Rate               0.0765             0.0766      0.0776
Detection Prevalence         0.0872             0.1088      0.0915
                     Class: Running Track Class: Soil
Sensitivity                        0.9915      0.9867
Specificity                        0.9999      0.9966
Pos Pred Value                     0.9979      0.9656
Neg Pred Value                     0.9997      0.9987
Prevalence                         0.0391      0.0879
Detection Rate                     0.0388      0.0867
Detection Prevalence               0.0389      0.0898
                     Class: Stressed grass Class: Synthetic grass
Sensitivity                         0.9030                 1.0000
Specificity                         0.9915                 0.9997
Pos Pred Value                      0.9092                 0.9941
Neg Pred Value                      0.9909                 1.0000
Prevalence                          0.0860                 0.0420
Detection Rate                      0.0777                 0.0420
Detection Prevalence                0.0854                 0.0422
                     Class: Tennis Court Class: Trees Class: Water
Sensitivity                       1.0000       0.9517      0.82517
Specificity                       0.9997       0.9806      1.00000
Pos Pred Value                    0.9879       0.8257      1.00000
Neg Pred Value                    1.0000       0.9953      0.99789
Prevalence                        0.0204       0.0880      0.01193
Detection Rate                    0.0204       0.0837      0.00984
Detection Prevalence              0.0207       0.1014      0.00984

As you may see, the main source of misclassification are Parking Lot 1 and Parking Lot 2. The accuracy for other classes is above 90%, and it is great. Frankly, I still don't understand what is the difference between Parking Lots 1 and 2... Official answer was that Parking Lot 2 is cars (isn't detecting them using 5 m resolution imagery is a questionable task???)... But it seems that it is something else. It is hard to classify something that you don't understand what it is...

Thursday, July 25, 2013

A Note About proj4 in R

It's been a long time since I had to transform some coordinates 'manually'. I mean I had a list of coordinates that I needed to [re]project. There are some nice proprietary tools for it, but who needs them when the opensource is everywhere?

Proj4 is an obvious tool for this task. And using it from within R is somewhat a natural approach when you need to process a csv-file. There is a package called... 'proj4' that provides a simple interface to proj4. It has only two functions: project and ptransform. The first one is meant to provide transition between unprojected and projected coordinate reference systems and the second - between different projected coordinate reference systems.

Look closely at description of the project function: "Projection of lat/long coordinates or its inverse". Ok... here is a huge setup which made me write this post. If you would try to use coordinates in lat/long format here you would screw up! Your points would fly thousands of miles away from initial position. Actually you have to use coordinates in long/lat format to get the result you need. Don't trust manuals ;-)

Oh, I really hate the guy who started this practice of placing longitude before latitude in GIS...

Sunday, July 7, 2013

Getting raster histogram in QGIS using SEXTANTE and R

The issue with the broken histogram creation tool in QGIS annoyed me far too long. Sometimes you just need a quick glance on the histogram of a raster just to make a decision on how to process it or just to assess distribution of classes. But as you know, QGIS histogram tool creates broken plots: they are just useless (don't know if something have changed in Master).

Fortunately there is the SEXTANTE - extremely useful plugin that provides access to hundreds of geoprocessing algorithms and allows to create your own very quickly. SEXTANTE grants access to R functions, and one of the out-of-the-box example scripts, surprise-surprise, is about raster histogram creation. "Hooray! The problem is solved!" - one may cry out. Unfortunately it's not. "Raster histogram" tool is somewhat suitable only for single-band rasters (its algorithm is as simple as hist(as.matrix(raster))). Obviously we need to create our own tool. Luckily it is extremely simple if one is already familiar with R.

To create your own R tool for QGIS go to Analisys menu and activate SEXTANTE toolbox. Then go to R-scripts -> Tools and double-click 'Create new R script' and just copy and paste this code (an explanatoin will follow the description):
UPD: now this code works for both multi-band and single-band rasters. Here is the code for QGIS 1.8.

##layer=raster
##Tools=group
##dens_or_hist=string hist
##showplots
library('rpanel')
library('rasterVis')
str <- dens_or_hist
if (str !='dens' & str != 'hist'){
rp.messagebox('you must enter "dens" or "hist"', title = 'oops!')
} else {
    im <- stack(layer)
    test <- try(nbands(im))
    if (class(test) == 'try-error') {im <- raster(layer)}
    if (str == 'dens') {
    densityplot(im)
    } else if (str == 'hist') {
    histogram(im)
  }
}

UPD2: Here is the code for QGIS 2.0:

##layer=raster
##Raster processing=group
##dens_or_hist=string hist
##showplots
library('rpanel')
library('rasterVis')
str <- dens_or_hist
if (str !='dens' & str != 'hist'){
  rp.messagebox('you must enter "dens" or "hist"', title = 'oops!')
  } else {
      if (nbands(layer) == 1) {
      layer <- as.matrix(layer)
      layer <- raster(layer)
        }
      if (str == 'dens') {
        densityplot(layer)
        } else if (str == 'hist') {
            histogram(layer)
            }
      }

Now save the script. It will appear among other R scripts. Launch it and you will see this window:

Script window

In 'layer' drop-down menu choose needed raster. In 'dens or hist' field type dens or hist to identify which representation of the raster cell data you prefer: density plot or a histogram. [ It would be great to know if it is possible to embed these options in drop-down menu as well (help me if you know how). ] In 'R plots' field set the destination of the graph to be saved (optional). If you chose density plot you will get something like this:

Density plot

If you choose histogram, you will get something like this:

Histogram

And if you fail to provide correct value in 'dens or hist' field you will get this:

Fail window

How it works.

##layer=raster provides us with the drop-down menu to select needed raster and it will be addresed as layer in our script. ##Tools=group will assign our script to the Tools group of R scripts in SEXTANTE. ##dens_or_hist=string provides us with the string input field dens or hist. ##showplots will show the plot in Results window after execution. library('rpanel') this library is needed to show pop-up window with the error when the user fails to type 4 letters correctly. library('rasterVis') this library works on top of raster package and it is responsible for plotting our graphs. Then we define condition to pop up error window via rp.messagebox if the plot type is misspelled or missing. The following piece of code is interesting:

im <- stack(layer)

test <- try(nbands(im))

if (class(test) == 'try-error') {im <- raster(layer)}

In order to correctely process the raster we need to have a RasterStack class for multi-band raster or a RasterLayer class if the raster is single-banded. 'raster' package is a bit weird with its classes so there are even different functions for determination of the number of bands for different classes and types of rasters. So the code above is weird too))) We assume that the raster is multi-banded and create a RasterStack object. Then we check whether our guess was correct: if raster is actually single-banded we will have an error if we try to use nbands() function on it. If it is the case - we will create RasterLayer object instead of RasterStack.

Finally, we transform our raster to the RasterStack object to be able to run either densityplot() or histogram() function.

Final notes

In a blink of an eye we were able to create quite handy tool to replace a long broken one. But there is the downside. This script won't be able to handle huge rasters - I wasn't able to process my IKONOS imagery where sides of the rasters exceeds 10000 pixels. Also I would not recommend to use in on a hyperspectral imagery with the dozens of bands because the process will run extremely sloooow (it will take hours and tons of RAM) and the result will be obviously unreadable.

Wednesday, June 26, 2013

Great comics: "How to suck at your religion"

Check out this great stuff. I even would have laugh it all of these were not true...

Sunday, March 31, 2013

RSAGA: Getting Started

RSAGA provides access to geocomputation capabilities of SAGA GIS from within R environment. Having SAGA GIS installed is a (quite obvious) pre-requirement to use RSAGA.

In Linux x64 sometimes additional preparations are needed. In Linux SAGA as well as other software that would like to use SAGA modules usually searches for them in /usr/lib/saga, but if your Linux is x64, they usually will be located in /usr/lib64/saga. Of course you may set up proper environmental variables, but the most lazy and overall-effective way is just to add symbolic link to /usr/lib64/saga (or whatever a correct path is) from /usr/lib/saga:

:~> sudo ln -s /usr/lib64/saga /usr/lib/saga
Now no app should miss these modules.

About Questions and Answers

That is very interesting how people perceive and assess questions and answers. How do we assess quality of the question? The first and the main criteria is whether topic-starter did his homework searching for the answer himself. That's why "Do your homework" is the first advise on "how to ask the question" page on every software forum (and there is one on gis.stackexchange). The second criteria is sufficiency of information provided by the topic-starter. No one likes to play clairvoyant and guess what is the actual situation (but with experience the ability to foresight omitted details is being developed) and waste their time asking for more information.

GRUB2 Recovery

Encountered an unpleasant issue today. Suddenly without any reason GRUB2 at my dual boot laptop halted with the "GRUB" string on the black screen and refused to proceed any further. Seems that is a known issue for the openSUSE 12.2 (GRUB2 became a bootloader for this version).

GRUB recovery usually an easy procedure and I did it quite often having several machines with a windows as a spare OS (it has an annoying habit deleting GRUB during own installation). I just used openSUSE live USB to load, and in Yast in Bootloader settings asked to propose the configuration for GRUB and then just saved it to the /boot partition.

This time it didn't work at all... So I've spent all the day looking for solution. This two sources helped me a lot: Re-install Grub2 from DVD Rescue and openSUSE Help and Troubleshooting. Here you are the steps that helped me with my dualboot needs:

Boot to Rescue System from the openSUSE installation DVD (login is root).
Run fdisk -l to locate root partition (/dev/sda6 for me)
Mount all that is needed (including partitions with other OSs so they will be scanned and included in GRUB configuration):

mount /dev/sda6 /mnt
mount --bind /dev /mnt/dev
chroot /mnt
mount /proc
mount /sys
mount -a

If the configuration file for GRUB is lost (like in my case):

grub2-mkconfig -o /boot/grub2/grub.cfg

Write GRUB2 configuration to disc:

grub2-install /dev/sda

Unmount and reboot:

umount -a
exit
reboot

Thursday, March 7, 2013

DSM to DEM Conversion

Often you may have some Digital Surface Model (DSM) acquired from a LiDAR data, and you may desire to get rid of buildings and trees in order to get a DEM. There can be found some articles on a subject that would provide more accurate result then that described here. But I will show here my quick-n-dirty way when one don't need extreme accuracy. It is meant to be used for a relatively flat urban areas.

There will be 3 steps:

Subtraction of the elevated objects like buildings and trees from DSM.
Filling the gaps of the raster cleaned of the elevated objects.
Polishing the DEM.

FOSS software you will need: SAGA GIS, QGIS.

Initial DSM (LiDAR)

Signs of Scientific Respect

Today I received a copy of proceedings of the conference I participated in. A peculiar moment is that my article about using Random Forest algorithm for the illegal dumping sites forecast is the very first article of my section (as well as of the whole book) and it was placed regardless of the alphabetical order of the family names of the authors (this order is correct for all other authors in all sections).

My presentation and speech were remarkable indeed - the director of my scientific-research centre later called it "the speech of guru" (actually, not a "guru", there is just no suitable equivalent in English for the word used). Also the extended version of this article for one of the journals of the Russian Academy of Sciences received an extremely positive feedback from the reviewers. So I suppose the position of my article is truly some kind of respect for the research and presentation and not a random editorial mistake.

Now I should overcome procrastination and make a post (or most likely two) about this research of mine.