Since the first part of our “Missing villages” project is already in the finish line, we have to explain the second part, which called “Find the village”.
In the first phase, we tried to name the village, now we have to create the village boundaries, if possible. As we said in the previous article, the delimitation of human habitats is not easy, the structure of the settlement is often region dependent. In the Ruvuma region (southern Tanzania) the settlements are well separated on the map. In contrast, in agricultural areas of the Shinyanga region, delimitation sometimes seems an impossible task.
According to the data created by Digital Globe and funded by the Gates Foundation, approximately 17 million buildings are in Tanzania. By the time that this study was made, the OSM community has mapped circa 11 million building so far, so using buildings data from OpenStreetMap is not enough, we had to use a different source to identify the settlement pattern. To map populations, we used the previously mentioned data from the Gates Foundation and Facebook’s High-Resolution Population Density data for validation.
A building aggregation tool was prepared in ArcGIS to aggregate the building footprints (Polygon layer) to produce settlement layers.
At first, buffer building footprints by 50m was created then merged
Secondly, the merged multipolygons were divided into single polygons.
Each settlement patterns area was measured, also the covered buildings were counted for each “building cluster”. These two data can help us to calculate:
Building density (number of buildings / total area)
Number of buildings, which data can help to predict the number of the population
The building density and building number can help in the classification of settlements
Urban or rural area
Settlement type: hamlet, village, or town
During the post-process, sharp angles in polygon were outlined and made smoother to improve aesthetic or cartographic quality. Finally, the unwanted “holes” from polygons were eliminated.
Using this data, we can identify the settlement pattern which can help us creating settlement boundaries for bigger villages, – especially with high building density – and adding to OSM.
In the final steps, first we have to remove those from the resulting settlement boundaries that are already in the OSM database. Secondly – using the validated settlement POIs, we select those settlement boundaries features that contain the village’s point features.
In the end, the selected village boundaries will be uploaded to OSM and will be waiting for phase 3.
Knowing the vast, rural area of Tanzania is crucial to provide timely and effective help for girls during Female Genital Mutilation (FGM) ‘cutting seasons’. In recent years, we have managed to map millions of buildings which can help us determine the distribution of the population. Although low population density areas in Tanzania are not sufficiently mapped yet, the initial steps have already been taken.
Goals of mapping
Crowd2Map Tanzania is a “crowdsourced mapping project aiming to put rural Tanzania on the map”. A primary goal is to help fight against FGM. Girls are rescued and taken to safe houses by local volunteers and police. However, for this they need maps. But maps can do more than just show these rescue teams the way to remote villages. The existence of spatial information can help with development and to increase commercial efficiency and economic growth opportunities for businesses and entrepreneurs, giving them the opportunity to make better-informed decisions. Growing wealth improves the quality of life, gives a chance for more opportunities and a better quality of education.
Find the village
So, we now know where to find traces of human settlements, but how do we delineate each settlement and, more importantly, how do we know what the name of the settlement is?
The delimitation of human habitats is not easy, the structure of the settlement is often region dependent. What does it mean? In the Ruvuma region (southern Tanzania) the settlements are well separated on the map. In contrast, in agricultural areas of the Shinyanga region, delimitation sometimes seems an impossible task.
And what about the names of the settlements? Local volunteers can help us identify all the names of circa 10,000 – 12,000 settlements in Tanzania, OR we can try to find some open source data which contains this information. Recruiting hundreds of volunteers from all over the country is beyond our power, so we need to focus on the second SOLUTION in most places. Fortunately, we have some open source data from The United Republic of Tanzania – Government Basic Statistics Portal, like health facilities or schools, or waterpoints located all over Tanzania.
Our project objective is to add the missing village names in Tanzania, using open source government data about water sources in Tanzania.
Method for the estimation of village position
The shared database contains about 87,000 water sources, which can be lakes, rivers, machine drilled boreholes or springs. The database also contains the physical condition (quality, quantity) of the water sources as well as their spatial location, indicating, for example, the village name where the water source is, or the nearest village to it. This data helps us determine the name of the village in OSM.
For data validation the best possible application is JOSM, which can prepare our data to upload to OSM after data validation. During validation, the next datasets and imagery were used:
Thyessen polygons were calculated from the water points layer, to get the influence zone of each water point. Then, the polygons were merged by attribute, where the village name is the same. The resulting polygons can help to determine the area where the village has to be.
In the same time, Mean center was calculated for the points inside a polygon → potential position of the village. (Since in a few cases the name of a village occurs more than once in the country, a “village+district” combined data was used to help us to find the real mean center.) This is our village data POI which need to be implemented to OSM.
OpenStreetMap imagery was used to identify the trace of human activity if the area was well mapped. We were also able to get an answer as to whether the name of the settlement has already been given to OSM.
Maxar satellite imagery was used for those areas that weren’t mapped yet.
Other useful datasets for validation
Waterpoints: can be really useful, if the position of the village’s POI is unusually far from any populated area. In this case, it is worth looking at how each water point is located in the area. Another example, when the village consists of two sub-villages, then the “SUBVILLAGE” attribute of the water database can help determine where the center of the village can be.
Health facilities data: The government data contains more than 7,000 health facilities like hospitals or clinics. The names of these facilities are usually, but not exclusively, the same as the name of the municipality where it is located.
Education data: The government data contains almost 7,000 schools. The village names are available in this data.
The Voronoi polygon assigns the area where the village is located (or has to be). The village POI assigns the potential location of the settlement, BUT its accuracy depends on the number of water abstraction points and their location in/around the given settlement.
By the end of September, more than 143 districts were validated (88% of all districts), and 5505 villages POIs were added which is 52% of the total village POIs in Tanzania.
Number of edits by users – which was added with “TNZ_missing_villages” hashtag
Crowd2map volunteers in the lead
The OSM database currently contains 10483 Tanzanian village points, a significant part was added by the volunteers of the Crowd2map team. The following pie chat shows how this 10483 POIs is divided between the TOP5 volunteers and the rest of mapper community: