6  Discussion and improvements

6.1 Dataset

From the results of this work, the only certitude is that more training data would be necessary to confirm any conclusion. Even with augmentation techniques, the dataset is too small to completely train a model and really experiment with the small changes applied to it. Since the training loop quickly reaches overfitting, we don’t really get to see how the model could perform in the most interesting cases, which are the small, covered or hardly visible trees.

Therefore, the biggest and conceptually simplest improvement that could be done to this work would be to improve and extend the dataset. Improving with more diversity, covering a larger part of the Netherlands (or even beyond, but consistency over images and point clouds can only be ensured in this country), and extending with more images and more trees. These improvements could also include adding species information to the dataset, to train models that are capable of differentiating species.

At the time of writing this report, the newest version of the point cloud has also been released for one third of the Netherlands, including the area of the current dataset. This new data could be better for this project, because it was acquired in 2023, the same year as the images that are used in the dataset.

Another approach to generate a larger dataset could be to create a large artificially annotated dataset, like it was done in another paper (B. G. Weinstein et al. 2020). Their approach was to create a very large dataset with medium-quality data which can be used to pre-train the model. Then, they use hand-annotated data to finish the training. They created this large dataset using classical non-machine learning techniques, using only the point cloud. This approach could solve some of the issues, but it is not sure whether this would not introduce problems related to covered trees, which might be missing a lot more in the automatic dataset. Furthermore, using only the point cloud would leave behind all the trees only visible on the images.

Finally, it would also be interesting to see if the NEON tree dataset (B. Weinstein, Marconi, and White 2022) can be used to train a model and make some experiments. This would not also allow to experiment on covered trees, but it could still show whether multiple layers of CHM can help delineating individual trees.

6.2 Instability

As explained in the previous section Section 6.1, the size of the dataset is probably the main reason for the instability of the training pipeline. But other reasons might also be responsible for this instability. As explained in Section 5.2, the random channel dropouts might also destabilize the training by creating inputs with very different repartition of the information, since some channels are randomly removed. The value to use as a replacement of these channels is also not easy to choose, as on the normalized CHM rasters, a value of 0 might be equivalent to a height above ground of 10 m for example, and a flat surface at 10 m is not exactly no data and might also be misleading if the CHM usually goes from 0 to 4 m.

More generally, it is possible that other factors are responsible for the instability of the training pipeline, and it would be useful to find and correct them.

6.3 Model performance

Even though the results displayed in Chapter 5 are far from perfect, they are still promising for several reasons.

First, the overall results of the model on the general trees are quite impressive, given that only about 1600 trees are used to train each model, with only about 150 images covering about 0.4 km². Even though the whole dataset is quite homogenous, this still shows that the model is capable of generalizing relatively well.

Then, a few correlations have been noticed regarding some parameters or combinations of parameters, even though more robust experiments would be necessary to confirm them. This is for example the case between agnostic and non-agnostic models, where the former seem to perform better in general.

However, the models don’t perform good enough in the difficult areas to be able to draw conclusion on the modifications of the architecture. The overall performance of the model, which rarely gets a sortedAP score above 0.3, is also relatively low. But this low score mostly comes from the amount of small trees which have a relatively high importance in the computation of sortedAP because each tree has the same weight, no matter their size. In contrast, the models perform quite well on the large trees, finding most of them in situations that are not too much complex.

Finally, it could be interesting to try another method to extract multiple CHM layers, which consists in removing only the points of the point cloud which corresponds to the previous CHM and all those which are close below them, and repeating the process to scan downwards. This has the advantage of being more adaptive than using fixed height thresholds for the whole dataset.