AN OPTIMIZED ARTIFICIAL NEURAL NETWORK FOR THE CLASSIFICATION OF URBAN ENVIRONMENT COMFORT USING LANDSAT-8 REMOTE SENSING DATA IN GREATER JAKARTA AREA, INDONESIA

The development of computer vision technology as a type of artificial intelligence is increasing rapidly in various fields. This method uses deep learning methods based on artificial neural networks, a well-performed algorithm in multi-parameter analysis. One of the development of computer vision models and algorithms is for a thematic digital image classification, such as environmental analysis. Remote sensing based digital image classification is one of the reliable tools for environmental quality analysis. This study aims to perform neural network optimization for the analysis of the urban environment comfort based on satellite data. The input data used are 4 types of geobiophysical indexes as urban environmental comfort parameters derived from cloud-free annual mosaics Landsat-8 remote sensing satellite data. The results obtained in this study indicate that the 1 hidden layer neural network architecture with 16 neurons for the classification of urban environmental comfort and 10 other land cover classes is quite good. The result of the classification using this optimized artificial neural network shows that the distribution of classes is very uncomfortable which dominates the Greater Jakarta area and its surroundings. For other classes in the study area, some are uncomfortable and rather comfortable. By using this method, we obtained a fast classification training time of 18 seconds for 145 iterations to achieve an RMS Error of 0.01, and has a fairly high classification accuracy overall 89% with a Kappa coefficient of 0.88, while the 2 hidden layer neural network architecture does not succeed in achieving convergence.


Introduction
Computer vision as part of artificial intelligence (AI) is growing rapidly in various applications. In the digital classification of images, for example, computer vision uses human visual work for object recognition so that decisions can be made regarding the results of object interpretation (Wijaya & Prayudi, 2010). To obtain the best results in image classification, various architectural models are developed to obtain an effective optimal neural network architecture to produce the most accurate decisions (Arsenov, Ruban, Smelyakov, & Chupryna, 2018). Rapid developments in image processing and machine learning using artificial neural networks have occurred in various fields, such as medicine, economics to earth science (Maulik, Egele, Lusch, & Balaprakash, 2020;Windarto, Lubis, & Solikhun, 2018;Zhang et al., 2019).
In the application of deep learning methods with artificial neural networks in these studies, the optimal neural network architecture is sought and used to improve model performance (Windarto et al., 2018;Zhang et al., 2019). In the multilayer neural network, the elements that construct the neural network architecture are the number of layers, the number of neurons in each layer or layer, the activation function of each neuron and the training algorithm used (Kushardono, 2017). Optimization of elements can be done to design the best architecture, including determining the number of neurons and the number of hidden layers that are most suitable because theoretically, it is impossible to know the number of neurons and the number of hidden layers needed in each case (Benardos & Vosniakos, 2007). Optimization of the artificial neural network architecture, one of which has been developed with a neural architecture optimization (NAO) design automation algorithm (Luo, Tian, Qin, Chen, & Liu, 2018). Determination of the best neural network architecture model also needs to be done by optimizing the learning rate and momentum rate on the neural network with the backpropagation learning system (Astria, Windarto, & Damanik, 2022;Kushardono, 2017).

Literature Review
In the environmental field, many analyzes of the urban environment have been carried out with various land parameters. The analysis of the urban environment includes using surface temperature parameters, vegetation area and air pollution from remote sensing data. In various cities of the world it has been known that the increase in surface temperature has occurred so that it affects the comfort of the environment, one of which is in the city of Dhaka (Faisal et al., 2021). In another study, the increase in surface temperature occurred in Addis Ababa as the area of builtup land in the city increased (Worku, Teferi, & Bantider, 2021). With this surface temperature data, an Urban Heat Island (UHI) study has been carried out which is accompanied by an increase in CO2 emissions in DKI Jakarta (Rushayati & Hermawan, 2013).
Various classification algorithms including deep learning can be applied to remote sensing data. This algorithm is widely adopted for remote sensing applications in all spectral, spatial and temporal dimensions and is found to have a classification accuracy that exceeds other non-deep learning algorithms (Heydari & Mountrakis, 2019). Monitoring the development of the city and how its environmental ecosystem has been studied using remote sensing technology and with a modified neural system (Gu & Wei, 2021). In the study of remote sensing applications for sustainable agriculture, the Artificial Neural Network algorithm is effective in calculating the volume of coffee plants (Oliveira, Santos, Kazama, Rolim, & Silva, 2021). With the capabilities of this deep learning algorithm, it can be seen the potential for its widespread use in the field of remote sensing, especially in environmental studies.
In this study, the land parameter index used was derived from Landsat-8 OLI data, namely, the temperature index of the surface temperature based on the thermal band 10 and band 11 Rozenstein method which has been developed for LST derivation (Rozenstein, Qin, Derimian, & Karnieli, 2014), the settlement density parameter index based on the brightness, wetness index land and water based on the index of wetness, and vegetation density based on the index of greenness (Baig, Zhang, Shuai, & Tong, 2014). The classification of environmentals comfort is carried out using the input of the land parameter index from the Landsat data above, where the classifier used is neural network method that has proven advantages over other methods (Kushardono, 2019). With this artificial neural network algorithm, the optimal neural network architecture modeling is then carried out to analyze the comfort of the urban environment.

Research Methods Location and Data
The study area for comparative research on neural network architecture in urban environmental analysis is Jakarta and the capital buffer zones include Tangerang, Bekasi, Depok, Bogor and parts of Sukabumi, Cianjur, Karawang and Purwakarta. Fig. 1 shows the coverage of the research study area.
In this study, the urban environment comfort was analyzed using environmental temperature parameters based on surface temperature, vegetation cover, as well as residential density and wetness of land or water, all of which were derived from multitemporal remote sensing satellite data. The study area as shown in Fig. 1 is 121.67 km x 121.67 km or 1,481,072.40 hectares.
The satellite remote sensing data used was Landsat-8 OLI full band 2019 annual cloud-free mosaic (Fig. 2), where the data came from daily Landsat OLI data with radiometric and geometric corrected Surface Reflectance which was then carried out by Topographic Correction and annual cloud-free composite, and that was a ready data product of LAPAN. The method was a development of previous pixel-based method (Dewanti Dimyati, Danoedoro, Hartono, & Kustiyo, 2018) and a mosaic of Landsat thermal data with a quantile of 85% which was carried out using a method that has been developed for similar data. As for what was meant by the mosaic of Landsat thermal data with a quantile of 85%, Landsat-8 thermal data for a period of 1 year with a total of 23 recording data was selected with the maximum temperature. The thermal data was then sorted from the smallest to the largest, and statistically the Quantile value of 85% was chosen, with the assumption that the maximum 15% data contained errors (Kustiyo, 2017). In addition, very high resolution satellite data was used instead of field data for reference in the analysis for 2019 period, where data was obtained through GoogleEarth.
Another tool used in this research is computer for data processing. The computer used is a Dell Latitude Rugged which has an Intel Core i7 2.7 GHz processor with 16 GB of RAM.

Method
The method used is to determine the optimal neural network architecture in the analysis of the comfort of the urban environment based on supervised classification using a neural network and input geobiophysical index information from annual full band Landsat mosaic data, namely information on surface temperature, brightness index, greenness index and wetness index. Geobiophysical index is an appearance parameter that characterizes earth surface objects such as reflection coefficient, surface temperature, chlorophyll content, water content, and object surface roughness, in this case what we use to identify is brightness for density of settlements, vegetation density index with greeness, soil wettability with wetness index, and surface temperature index with brightness temperature. The data used is annual cloud-free mosaic landsat data derived from multitemporal landsat data within a year. From these input parameters, the best architecture is determined using a genetic algorithm (GA) as well as the development of criteria that measure the performance of the neural network including training and generalization and complexity as in the research that has been done for optimizing the neural network architecture (Benardos & Vosniakos, 2007).
This study adopts a classifier architecture using multilayer neural networks (Kushardono, 2017) as shown in Fig. 3, where the 1 hidden layer architecture is used with the number of neurons in the input layer, the hidden layer, the output layer 4-16-10, and the 2 hidden layer architecture with the total number of neurons 4-16-16-10. While the weight and bias factors for each neuron in the hidden layer and output layer are determined using training data based on a number of backpropagation training iterations, with the learning rate and momentum rate using the optimum values obtained from previous studies, namely 0.1 and 0.9 (Kushardono, 2017).
Training data for training on backpropagation neural networks and test site data used to test the accuracy of the classification results are determined based on field conditions and the results of interpretation of very high resolution data to see land use cover, and average surface temperature from Landsat thermal data. The number of samples used are 3.584 samples for training data (with almost the same number of samples per class, namely around 350 samples), and 30.879 samples for validation tests (around 1.800 to 3.000 samples per class) outside the same location used for training data, or approximately 11% for training data and 89% for validation tests. This is because the classification using neural networks based on backpropagation learning as previously experienced does not require a lot of training data, yet more accuracy is needed to determine the location of the training data that represents each class so that the learning neural network converges more quickly by obtaining a very small RMS error.
In taking samples, the land cover class and convenience class are treated the same in terms of the number of samples, where the land cover class is only used to facilitate classification using a neural network outside the residential area which is the target of the comfort level classification. So that after completing the classification process using neural networks, the results of land cover classification can be combined and grouped into non-residential classes whose convenience level is not calculated.
The backpropagation-based training process is carried out until the RMS error is lower than 0.01 or a maximum of 10,000 iterations. Based on previous experience, including from research by (Kushardono, Fukue, Shimoda, & Sakata, 1995) that neural network learning with back propagation is successful if there are no errors between the output layer output and the learning teacher RMS error is less than 0.1 and in research it is used 0.01 to further ensure that learning is perfectly convergent. While 10,000 iterations are also based on the research experience that if there are more learning iterations than that it is a failure or the RMS error is still high and cannot converge.Where in this process the weight and bias factors for each neuron in each iteration are corrected based on the RMS error which is calculated from the difference between the output of the neurons in the output layer and the wanted output according to the training data class. After completing the training process, the forward propagation process is continued to classify all target data using the weight and bias factors of each neuron from the last training iteration.

Results and Discussions Land parameters for urban environmental comfort analysis
The results of the transformation of the four indices from Landsat-8 OLI image data are shown in Fig. 4. The land parameters used as input for the classification of urban environmental comfort are derived from Landsat-8 OLI satellite imagery, including surface temperature, vegetation index, wetness index and brightness index. For the surface temperature, low temperatures are given a dark brown color and the higher the temperature, the lighter the color. For the greenness index of vegetation, the index range is shown with a green gradation which shows a higher index for darker greens. As for the wetness index, which is indicated by a blue gradation, the higher wetness value is shown in a lighter blue color, and vice versa, the lower the wetness, the darker the blue. Furthermore, the brightness index is indicated by a gradation of gray for low brightness and a darker brown gradation for high brightness. Table 1 below shows the value of each geobiophysical index parameter, namely brightness, wetness, grenness and surface temperature. On the result of the surface temperature transformation, we can see how the distribution of surface temperature in the study area. With a brown color gradation, the lighter color means the higher surface temperature, and the darker color means the lower surface temperature in the area. Areas with high surface temperatures are seen in downtown and urban areas with dense settlements without vegetation. On the other hand, areas with moderate to low surface temperatures are seen in vegetated areas. Furthermore, on the result of the transformation of the vegetation index, there is a gradation of green color, from dark green to light green. The color indicates the vegetation condition of the land in the study area. In areas with dense and green vegetation, the greenness index is moderate to high, which is represented by dark green color. Overall, this green area is in urban green space, hills and mountains in the south. On the other hand, in areas with sparse vegetation or densely populated areas without vegetation, the greenery index is relatively low, which is indicated by a light green color.
Hereinafter, the result of the wetness index transformation are represented by a blue color gradation. It can be seen that the spatial distribution of areas with low humidity are depicted in dark blue and areas with high wettability are depicted in light blue. The light blue color indicating high wetness covers areas of inundated land, bodies of water on land and sea. On the other hand, dark blue color indicates low wetness in open land, densely populated areas and other non-water areas. Then the fourth land parameter extracted from remote sensing data is the brightness index. This brightness temperature is represented by a black-white-brown gradation, where black has the lowest brightness temperature, white is medium and then dark brown indicates a higher temperature. Then the fourth land parameter extracted from remote sensing data is the brightness index. This brightness temperature is represented by a black-white-brown gradation, where black has the lowest brightness temperature, white is medium and then dark brown indicates a higher temperature. Areas of low to moderate brightness include water, open land and vegetation. Meanwhile, areas with high brightness include built-up areas such as dense settlements. These four indices become inputs for determining the optimal ANN architectural design in the analysis of urban environmental comfort. Based on the four indices, the class is determined in the classification results as shown in Table 2. Architectural optimization is carried out on the architectural model to obtain the model with the most efficient time and the best accuracy (Astria et al., 2022). Table 2 shows the categories for class training data. Comfort in urban settlements is determined by looking at the density of settlements and surface temperature, besides that comfort is also influenced by the vegetated land in the area. Various studies have shown the effect of these land parameters on thermal comfort in big cities, urban areas and their surroundings (Al-Masaodi & Al-Zubaidi, 2021;Bachir et al., 2021;Faisal et al., 2021;Jamei, Rajagopalan, & Sun, 2019). The classes consist of the very uncomfortable class which is colored red, uncomfortable in brown, rather comfortable in yellow, comfortable in greenish yellow, vegetated land in light green, densely vegetated land in dark green, rice field in cyan, inland waters in blue, shallow water in light blue and deep water in dark blue. For land with the category of residential environment comfort, the classes are very uncomfortable, uncomfortable, rather comfortable and comfortable. Besides there are six land cover categories for other classes, namely vegetated land, densely vegetated land, rice field, inland waters, shallow water and deep water.

Optimization of the neural network layer
The comfort class is determined based on the density of land use for settlements, offices and public buildings in urban areas and temperature conditions based on surface temperature, where for land use parameters with reference to high-resolution satellite imagery and field survey data are classified into 3 categories, namely dense buildings without vegetation, buildings with medium density and vegetated, and sparse buildings and vegetated. While the surface temperature based on statistical data calculations is divided equally into 3 categories, namely high, medium and low, so that from these 2 parameters, 4 comfort classes are obtained, namely very uncomfortable (dense settlements with very high temperatures), uncomfortable (dense settlements of moderate temperatures), rather comfortable (rather rare settlements, moderate temperature), comfortable (rare settlement low temperature).
Meanwhile, to ease neural network learning in the classification of comfort levels in urban areas or settlements outside the area where comfort is classified above, also for the classification process it is necessary to add other land use/ land cover classes in forest areas, gardens, rice fields and waters covered in the imagery, namely vegetated land, densely vegetated land, rice fields, mainland waters, shallow water and deep water. Where other land covers are not part of the area whose comfort level is detected, and later it can be masked if we want to display only spatial information on the comfort of the urban environment.  Fig 5 shows the example of interpretation of RGB geobiophysical index images and high resolution images for training data. With the temperature-greenness-wetness band composite, this area has a red color which means it has a very uncomfortable class. From the validation using high resolution image data, this area is a densely populated residential area with no or sparse vegetation. Meanwhile the other image is a rather comfortable class which is displayed with the same composite band and produces a yellow color. Validation with hires image shows that this area is a residential area with buildings that are not too dense and have quite a lot of vegetation.
The next result is tabulated data from the training results in a number of iterations to determine the optimal neural network architecture. The first architecture uses 1 hidden layer with a training time of 18 seconds. This model was built by adopting the method of constructing criteria to measure the neural network (Benardos & Vosniakos, 2007;Kushardono, 2017). In this model, 145 iterations were obtained to reach a training RMS error of 0.01. In the initial iteration, the RMS error obtained was still high reaching 0.8 but the more iterations were carried out, the lower RMS error were reached until it was optimal at 145 iterations, reaching 0.01. This shows that the architectural performance with 1 hidden layer is very good and efficient in terms of processing time. In Fig. 6, it can be seen the iteration plot graph and the RMS error of the training, where the RMS error decreases with increasing iterations carried out in the model.  The classification carried out using the 1 hidden layer neural network architecture model is shown in Fig. 7. In this figure, there are 10 classes in the classification results, with the same number as the number of classes in the training data. To obtain the classification results with this 1 hidden layer architecture, the required classification time is 5 minutes 20 seconds for the input image size of 4082 pixels x 4083 lines x 4 bands. With the result of the classification as shown in Fig. 7, it can be seen that the distribution of classes is very uncomfortable which dominates the Greater Jakarta area and its surroundings. For other classes in the study area, there are uncomfortable and rather comfortable. The classification results are very good with the Overall Accuracy of 89% and Kappa Coefficient of 0.88 on the architectural model using 1 hidden layer. The second architecture that is measured is the architecture with 2 hidden layers. Unlike the previous architecture, this architecture with 2 hidden layers requires a very long training time of 32 minutes 18 seconds. The maximum number of iterations determined in this study is 10,000 iterations, the RMS error obtained with the 2 hidden layer architecture model is still very high 0.84. This means that with the 2 hidden layer architecture model which has a total of 16 neurons in each layer, the backpropagation training which is based on the delta between the expected output value and the output value for each iteration cannot converge to a total RMS error of 0.01. Fig. 8 shows the iteration and the RMS error, where the RMS error actually increases with increasing iterations carried out in the 2 hidden layer model. The next result is shown in Fig. 9, which is the result of a classification with a 2 hidden layer architecture. In this ANN architecture, it can be seen that by using this model, which cannot converge and cause the expected RMS error cannot be obtained, the classification results are very poor with an accuracy of only 29% and a Kappa coefficient of 0.2, because it is only classified into 3 classes out of 10 target classes, with the dominant class distribution being the rather comfortable yellow class and the cyan colored rice field class and the deep water class for the marine area. The classification time with this model is also almost 1 minute longer, namely 6 minutes 16 seconds for an image of 4082 pixels x 4083 lines x 4 band. By looking at the classification results above and with the RMS error in measuring the performance of the 2 hidden layer architecture, it can be seen that the performance of the 1 hidden layer architecture model is much better than the 2 hidden layers. This can be seen from the shorter processing time and lower error, and by using test site data, a high classification accuracy is obtained, namely Overall Accuracy of 89% and Kappa Coefficient of 0.88 on the architectural model 1 hidden layer. This is in line with the results of previous studies (Kushardono, Fukue, Shimoda, & Sakata, 1995;Li, Zhang, & Huang, 2022;Silva, Xavier, da Silva, & Santos, 2020) that to get fast convergence in training using backpropagation it takes the number of neurons and the number of layers in the hidden layer that is balanced with the number of neurons in the output layer, in this study the architecture of 1 hidden layer was obtained, with 16 neurons is the best for 10 neurons in the output layer to accommodate 10 class categories with 4 neurons in the input layer.

Conclusion
Based on the research results, an optimal Artificial Neural Network architectural model was obtained for the analysis of environmental comfort in urban areas based on Landsat-8 satellite data. The optimal architecture in this study is the Multilayer Neural Network using 1 hidden layer which has 16 neurons and 4 neurons and 10 neurons in a row at the input and output layers, where the fast classification training time is 18 seconds for 145 iterations to reach RMS Error 0, 01, and the classification time is 5 minutes 20 seconds, and has a fairly good classification accuracy of 89% overall with a Kappa coefficient of 0.88, for 10 classes of urban environmental comfort and other land cover. The result of the classification using this optimized artificial neural network shows that the distribution of classes is very uncomfortable which dominates the Greater Jakarta area and its surroundings. For other classes in the study area, some are uncomfortable and rather comfortable.
From these results, research can be developed to compare the classification results with the Artificial Neural Network algorithm by comparing the temporal aspects of the Landsat-8 imagery used and how the classification results perform with that method. This research can also be an input for the optimal Neural Network architectural design for similar research with other images or in other research areas..