Supervised classification

Introduction

Remotely-sensed images are used to create land cover, land use and many other maps. This is usualy done by creating areas of similar pixels, grouped by discrete classes. This classification of the image content can be done by the software (unsupervised) or can be guided by the user (supervised).

Unsupervised classification is where the outcomes (groupings of pixels with common characteristics) are based on the software analysis of an image without the user providing sample classes. The computer uses techniques to determine which pixels are related and groups them into classes. The user can specify which algorism the software will use and the desired number of output classes but otherwise does not aid in the classification process. However, the user must have knowledge of the area being classified when the groupings of pixels with common characteristics produced by the computer have to be related to actual features on the ground (such as wetlands, developed areas, coniferous forests, etc.).

Supervised classification is based on the idea that a user can select sample pixels in an image that are representative of specific classes and then direct the image processing software to use these training sites as references for the classification of all other pixels in the image. Training sites (also known as testing sets or input classes) are selected based on the knowledge of the user. The user also sets the bounds for how similar other pixels must be to group them together. These bounds are often set based on the spectral characteristics of the training area, plus or minus a certain increment (often based on "brightness" or strength of reflection in specific spectral bands). The user also designates the number of classes that the image is classified into. Many analysts use a combination of supervised and unsupervised classification processes to develop final output analysis and classified maps.

One common application of remotely-sensed images to rangeland management is the creation of maps of land cover, vegetation type, or other discrete classes by remote sensing software. In supervised classification, the image processing software is guided by the user to specify the land cover classes of interest. The user defines “training sites” – areas in the map that are known to be representative of a particular land cover type – for each land cover type of interest. The software determines the spectral signature of the pixels within each training area, and uses this information to define the mean and variance of the classes in relation to all of the input bands or layers. Each pixel in the image is then assigned, based on its spectral signature, to the class it most closely matches. It is important to choose training areas that cover the full range of variability within each land cover type to allow the software to accurately classify the rest of the image. Some of the more common classification algorithms used for supervised classification include the Minimum-Distance to the Mean Classifier, Parallelepiped Classifier, and Gaussian Maximum Likelihood Classifier.

Supervised classification using QGIS SCP

Workflow

Requirements:

Knowledge (know the study area)
Imagery (images of the study area)

Preparation

Downloaded the images
Convert radiance to surface reflectance
Create a band set
Play with compositions
Compare compositions with other imagery

What are we looking for?

Define the classes:
- Trees
- Crops
- Water
- Roads

Provide training data for each class

For each class, select representative sample sites of that cover type.

For example, for trees, we need to select 2 or 3 training sites (called Regions Of Interest (ROI) in the SCP terminology). The same for all other classes.

For each class, the computer computes the spectral signature of that class.

Preview the classification

Before processing all the image(s), composed of several bands, we might want to preview the classification on well know areas.

Run the classification

The computer runs the classification. Several algorithms can be used to do this.

Future work

Calculation of classification accuracy

After this supervised classification process, we should assess the accuracy of land cover/use classification, in order to understand the its reliability and to identify possible errors.

To properly assess the classification accuracy, it would require ancillary data and probably field survey.

Calculation of classification statistics

Exercise

Setup

Open the QGIS project we created to download, preprocess and display Sentinel-2 images.

Make sure you are able to show different compositions of the Band set (with Sentinel-2 images).

Classes

Macroclass name	Macroclass ID	Class name	Class ID
Vegetation	1	Crop	1
Vegetation	1	Tree	2
Man made	2	Buildings	3
Man made	2	Roads	4
Water	2	Lakes	5

QGIS SCP Usage

SCP Dock → Training input tab

Define the active Band set. In the previous project, the preprocessed and clipped images were organized as Band set 3.

Create a new Training Input File (training areas.scp)

Create the first Macroclass and Class

Adjust the distance to 0.05

Select one crop area

Do two different classes of crop and merge them.

= Preview

= Run

Exercise (individual/group of 2)

Use Macro-classes and Classes

Use the

Macro classes:

Vegetation

 Forest (trees);
 Crop (e.g. fields with green vegetation);
 Shrubland (vegetation dominated by shrubs, Shrubland)

Built-up

 Buildings
 Asphalt

Water

 Water (e.g. surface water, lakes, sea)
 Wetlands

Remarks

Training areas

Yes, it is possible if all the images have the same number of bands. However, if images are acquired in different months, land cover changes (especially of vegetation state) will affect the spectral signature (i.e. the same pixel has different spectral signature in different periods). Atmospheric effects could also affect the images differently. That could reduce classification accuracy. Therefore, it is suggested to collect always the ROIs and spectral signatures for every image.