Long Short-Term Memory Networks for Land Cover Change Detection

Using FCNs & LSTMs in Remote Sensing

Oliver Sefrin

~5 min read · January 28, 2021 (Updated: December 28, 2021) · Free: Yes

Remote Sensing of Earth's surface proves to be a highly relevant topic in the era of beginning climate change, with many applications such as in natural resource management, urban planning, and natural hazard assessment and mitigation. The availability of worldwide free-to-use satellite data enables the usage of Deep Learning techniques such as Fully Convolutional Networks (FCNs).

This post presents the Deep Learning methodology used in a land cover classification and change detection task. It can be easily applied to other types of image segmentation tasks with sequential image data.

In this article, we provide answers to the following questions:

Why Deep Learning and FCN in Remote Sensing?
How do LSTMs help to work with satellite image sequences?
How can we use incomplete image tiles?

Why Deep Learning and FCN in remote sensing?

Deep neural networks with a large number of hidden layers greatly improve a model's capacity to learn complex features compared to Artificial Neural Networks (ANNs) with few hidden layers. This high capacity comes, however, with a large number of trainable parameters. Therefore, to properly train deep ANNs, large amounts of training data are mandatory. In Remote Sensing, this requirement is nowadays easy to fulfill thanks to freely available satellite imagery such as from the Copernicus Program, for example, the Sentinel-2 satellite mission.

Fully Convolutional Networks (FCNs) are an extension of the better-known Convolutional Neural Networks (CNNs). Both have in common that their architecture heavily relies on the convolutional layer and that they are commonly used with images as input. They differ, however, in the structure of their output: whereas a CNN classifies the entire image with one label (e.g., "cat"), the FCN attributes a label to every individual pixel (e.g., "cat" or "background"). In this land cover classification task, the problem was the correct classification of each pixel with one out of seven land cover classes such as grassland, forest, etc.

Given the higher-dimensional output compared to the CNN, FCNs tend to be even more complex. In our case, the so-called U-Net FCN was used with roughly 29 millions trainable parameters. Interestingly, this model uses a CNN as an encoder in an encoder-decoder structure. Despite the model's large complexity, the good news is: an FCN is just as easy to implement in a machine learning workflow as any other ANN. Ready-to-use implementations in TensorFlow can be found here.

How do LSTMs help to work with satellite image sequences?

The Long Short-Term Memory (LSTM) network is a concept of a Recurrent Neural Network (RNN), which has quickly found great success in disciplines such as speech recognition or translation. Compared to the above-mentioned CNN and FCN, RNNs operate on sequential data.

In our task, we acquired sequential data by getting satellite imagery of our area of interest for different dates. Since the appearance of some land cover classes such as grassland or farmland can strongly change depending on the season due to phenology, we managed to increase the classification accuracy on these classes thanks to this sequential approach.

Given that we still operate on sequences of images, a 2D-convolutional implementation of LSTM cell (ConvLSTM2D from TensorFlow) is used. The complete usage can be seen in this code snippet:

To summarize the implementation, make sure to:

wrap the base FCN into TimeDistributed, which lets the FCN get applied onto every input image;
set the filters argument of the ConvLSTM2D layer to your number of possible output classes;
wrap the ConvLSTM2D layer into Bidirectional. This lets the data pass both in forward and backward order through the LSTM cell, enabling the LSTM to learn from all data (by design, an LSTM only uses the previous data in its calculations on current data).

In this implementation, the use of an LSTM only adds 9280 trainable parameters to the model. Compared to the roughly 29 million of the FCN alone, the added complexity is therefore virtually negligible. Given that in each inference step, the model needs to process the input of six images, though, it is recommended to train a baseline FCN on monotemporal data first.

How can we use incomplete image tiles?

Finally, we want to share a small trick on how incomplete image tiles can be prevented from being discarded from training.

Image tiling is, in the first place, necessary due to memory restrictions: the full image of our region of interest is simply too large to be processed by a GPU at once. We, therefore, divide the image into square tiles, which can be processed consecutively.

An image tile can be incomplete for various reasons. If the area of interest is not rectangular or cannot be divided perfectly, edge tiles with no class information for certain pixels remain. In our case, another type occurred: out of 14 land cover classes originally present in the ground truth, the seven smallest classes were discarded due to simply being too small for the training.

The remedy for using tiles despite having pixels with classes that don't appear in the classification output is as follows:

All problematic pixels (outside area of interest or from ignored class) are summarized into one new Excluded class.
We formally allow the model to predict into this Excluded class. This means that it is added to the list of the model's seven regular output classes.
We use a weighted loss function and, crucially, set the Excluded class's weight to zero. This renders the predictions on pixels of the Excluded class meaningless and prevents the model from ever predicting into that class, which is exactly the desired behavior. More specifically, the categorical cross-entropy loss is used (Tensorflow implementation here) with the inverse pixel numbers used as weights for the normal classes.

The following code snippet shows the implementation of said weighted loss:

We hope that our paper and this article with some code examples help you in your research. For further reads, please have a look at Sefrin, Riese, Keller 2021, check out our code repository, and leave a comment for questions and feedback!

#land-cover-classification #long-short-term-memory #python #image-segmentation #machine-learning

Long Short-Term Memory Networks for Land Cover Change Detection

Using FCNs & LSTMs in Remote Sensing

Why Deep Learning and FCN in remote sensing?

How do LSTMs help to work with satellite image sequences?

How can we use incomplete image tiles?

Reporting a Problem