Visual place recognition for indoors/outdoors under varied intensity conditions with Applications


This report is about developing robots’ visual place recognition memory so that they can recognize images that are taken under different weather conditions and illumination conditions. I approached this problem by simulating the workflow in offline mode, where I collected test images, implemented localization algorithm, and recorded data from testing the algorithm on these images. The user will be asked as series of questions in order to locate the input image in the correct directory, then the image pixels will be modified in two different ways – one is using grayscale method, another one is using illumination invariant method. SIFT (Scale Invariant Feature Transforms) is used to generate descriptors of each image file, and they are compared to get a matching level for the images.The use of descriptors allows the application to ignore rotations and scaling. Finally the application will compare different methods of the image RGB channels’ modification and output the results of whether the place has been visited or not.


With the vast and increasing applications of robots in agriculture and other fields, it is important for robots to memorize locations and identify if they have visited these locations. However, when performing indoor or outdoor activities, the pictures taken may vary depending on the weather condition, illumination condition, scaling, and rotation, which create challenges for robots to accurately and efficiently recognize places they have been to. Previous work has been done using machine-learning approach, but error rates were not significantly reduced due to the lack of data available, and especially for application purpose, there are too many possible variations which make feature selection a hard task. One challenge for identify image in different weather conditions is that there exists shadow which can make the same color look very different under different conditions. Colin McManus et al. [1] proposed their idea which is to normalize the colors to monochronic, and use illumination invariant, so that these factors will not affect later comparisons. Two basic yet important questions were addressed in their paper: when two images are compared, where should the robot look at, and what should it look for. In this project, I used SIFT [2] (Scale-Invariant Feature Transform) to handle these two basic questions, as it can identify and locate local extremum and refine them.

Much previous work has been done to extract illumination invariants. Ratnasingam and McGinnity [3] proposed an algorithm to extract two illuminant invariant chromaticity features, which has very promising results of testing on perceptually similar colors, objects with similar indoor illuminations, and skin-detections. They claimed that using two features instead of one, as some of the perceptually similar colors cannot be distinguished by only using one feature. Will Maddern et al. [4] also proposed that though two features will be more accurate in detecting perceptually similar colors, one dimensional space is sufficient to identify most scenes in nature. In their paper a one-dimensional color space is proposed. Illumination invariant intensity equation [4] is: L = log(G) − αlog(R) − (1 − α)log(B) where alpha is channel coefficients, which are subjected to the following constraints: 1/λ1 = α/λ2 + λ3/(1−α) where λ1 , λ2 , λ3 are being the peak sensitivity wavelengths for each image sensor.


This section introduces the general structure of the application, theoretical foundations and algorithms used, and overall workflow of the application. In this project, I simulate the online mode using a directory system so that the pictures taken will be stored in the corresponding directories based on their properties such as whether they are known or unknown places. This simulates the online mode workflow that when robots are performing the task, they will carry GPS locator which will feed the robots with information of the location that can be used when they are comparing the images. This is because if the application performs brute-force searching, the efficiency will be greatly reduced as there might be numerous irrelevant images. This project adopts the offline mode where the work of GPS locators is done by a file system, and the location information will be input by users. After testing images are stored in the corresponding directory, the application will only visit the related directories, which largely improves the speed and accuracy of the application. I use RGB only approach as my baseline, and improve the program by rescaling the images to monochronic and adding illumination invariants. RGB alone equalization will be greatly affected by the illumination condition. SIFT is then used to find potential keypoint locations, which are local extremum that are compared with their neighbors, and they will be refined based on a certain threshold. To avoid the application being affected by different illumination conditions, I extracted illumination invariant by following the formula given by Will Maddern et al. [4]. After the descriptors and generated, the application compares them, and the threshold is set to 10 such that if there are 10 descriptors match, the test is considered as passing.

Figure 1 shows that original image with RGB unchanged. These two images are taken under different indoor illumination conditions, as the first image taken with lights off and the second image taken with lights on. Figure 2 shows the images after they have been normalized to monochronic [5]. Figure 3 shows the images that have been applied illumination invariant formula. It can be seen that though the illumination conditions vary considerably in Figure 1, their effects are eliminated in Figure 3 after illumination invariants are extracted. Then SIFT is used to identify and refine descriptors, and once they are extracted, the application compares them. Figure 4 shows the monochronic images whose descriptors are identified and marked by SIFT, and are matched accordingly. Figure 5 shows the images with illumination invariants whose descriptors are marked by SIFT. My approach combines comparing monochronic images and images with illumination invariant, and the final result will be based on the results of these two comparisons such that if either of them passed, the whole test passed. This is because though sometimes the comparison fails for monochronic approach, it does pass the illumination invariant test, as grayscale approach may be insufficient for images under certain conditions. Combining the results of these two approaches ensures the accuracy of the application. Thus Figure 4 and Figure 5 are the final testing the application will go through.

Algorithm 1 describes how RGB scales are transferred to illumination invariants. After splitting the R,G,B data and retrieve image information, Illumination invariant intensity equation [3] is applied. While the channel coefficient α can be different depending on the task and camera, I fixed α to be 0.37 after testing different values, and chose the optimal one for my final application. Then algorithm will go through each pixel of the image, and change construct an array to store the illumination invariants that are computed based the algorithm with the RGB value of each pixel. Figure 6 above shows the workflow of the application. First it is fed with original images whose RGB scales are not changed. Then the application does two validation. The first validation is made between the original image and the one whose RGB are normalized.






Figure 7 shows the file system that is used in this application. To test my application, I first took 3 images of each location from different positions or under different illumination and weather conditions, and put them in one directory. If the place has been visited, then it will be named as PlaceN, as shown in Figure 7. Each such directory contains Img1 to ImgN. There are 8 such directories, which are placed under the home folder, called image_database, which simulates the database for online bode. Then I took one image for each of the dataset as test image, and assume that the application knows which directory it should be compared with. For testing each image, the application will perform three comparisons with both monochorionic and illumination invariant approaches, and output the final results. Method 1 column in Table 1 shows the results of comparison made with the monochronic approach on the three images, Method 2 column shows the comparisons made with the illumination invariant approach on the three images, and Method 3 column shows the combined results of the previous two tests. As Table 1 shows, for instance, for dataset 1, all three images failed for monochronic testing, but all of them passed for illumination invariant testing, so the final result is that all of the three images are considered to be the same locations as the testing image. The average data shows that by combining these two tests, the accuracy can be improved. And it can be seen that illumination invariant gives better results. Similarly, for dataset 2, the application will compare the testing image with the 3 images in the corresponding directory, and the table shows that one testing failed for the monochronic approach and one testing failed for the illumination invariant approach. Thus the final result for this dataset is 100%, which indicates that the application successfully identifies the location. The same procedure is repeated for the rest 6 datasets, and Table 1 records the results.






It is an important task for robots to recognize places that they have visited in applications of agriculture, etc. This paper presents an application that simulates the offline mode of robots’ visual place recognition. By using a file system to store images in directories with corresponding image properties, the application can efficiently search through the file system and compare images. Two approaches are combined to validate the localisation: monochronic approach and illumination invariant approach. SIFT is used to identify and refine descriptors that can eliminate the effect of rotation and scaling. It has been shown that this application achieves satisfying results if the comparison is considered to be successful when either of the approach succeeds.

Future Work

Firstly, this project can be extended to online mode where a robot is used to take pictures and store them in a database, instead of using file systems to simulate the workflow. This will allow real-time localisation when robots are put into application in agriculture, etc. Ideally the robots will carry GPS locators that allow them to store GPS information when the images are taken, so that the location information can be stored along with the image. That way when the application tries to compare images, GPS information will assist the work and speed up the comparison and searching.

To improve the generality of the application, more test images can be taken on buildings or objects with perceptually similar colors, as the assumption that colors are distinguishable easily in nature may fail when the robots need to explore some areas that have more perceptually similar colors. Also instead of extracting only one illumination invariant feature, more features can be extracted to ensure the comprehensibility.

When tested on outdoor buildings under different weathers, as shown in Figure 8 where the left one was taken when it was snowing and cloudy and the right was taken when it was sunny, the application did not perform as well as it did for indoor images. It can be seen that for the illumination invariant images, the number of descriptors that extracted is very small, and thus makes the task of descriptor very hard. The application can also be improved by improving its descriptor selection so that it can perform well in outdoor situations. One possible solution is instead of using SIFT, we can develop our own algorithm to extract descriptors for outdoor images.

Also the algorithm can be further refined by testing more images, and gather data on what value the channel coefficient should take that would optimize the performance of the algorithm. The running time of the algorithm can be improved, since if the application is put into practice, it would be considered time-consuming as real-time comparison is usually expected from such application.


[1] Colin McManus, Winston Churchill, Will Maddern, Alexander D. Stewart and Paul Newman, “Shady Dealings: Robust, Long-Term Visual Localisation using Illumination Invariance.” 2014 IEEE International Conference on Robotics and Automation (ICRA), May 2014.

[2] David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, January 2004.

[3] S. Ratnasingam and T. McGinnity, “Chromaticity space for illuminant invariant recognition,” IEEE Transactions on Image Processing, vol. 21, 2012.

[4] Will Maddern, Alexander D. Stewart, Colin McManus, Ben Upcroft, Winston Churchill and Paul Newman, “Illumination Invariant Imaging: Applications in Robust Vision-based Localisation, Mapping and Classification for Autonomous Vehicles”, IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 2014.

[5] Tomihisa Welsh, Michael Ashikhmin and Klaus Mueller, “Transferring Color to Greyscale Images”, ACM Transactions on Graphics, 2002.