^{*}

There are two methods for GIS similarity measurement problem, one is cross-coefficient for GIS attribute similarity measurement, and the other is spatial autocorrelation that is based on spatial location. These methods can not calculate subzone similarity problem based on universal background. The rough measurement based on membership function solved this problem well. In this paper, we used rough sets to measure the similarity of GIS subzone discrete data, and used neighborhood rough sets to calculate continuous data’s upper and lower approximation. We used neighborhood particle to calculate membership function of continuous attribute, then to solve continuous attribute’s subzone similarity measurement problem.

GIS entity has some spatial relevance in real world. Tober [

GIS subzone measurement is actually an uncertainty study problem. Li [

So it has many studies about similarity measurement for discrete value and continuous value in mathematics. There are two Similarity measurement correlations methods in GIS, one is cross-coefficient, and the other is spatial autocorrelation that based on spatial location. These two methods can not measure similarity of GIS subzone. This paper use rough sets measurement method to measure two subzone’s similarity problem, simultaneously study the subzone’s similarity based on one universe set.

Global spatial autocorrelation is an attribute value description of whole region spatial character. And it estimated global spatial autocorrelation statistic for global Moran’s I and global Geary’s C, to analyze total region spatial correlation and spatial discrepancy. And global Moran’s I is used commonly, it is defined as follows:

where x_{i} is the observed value for observed spatial cell,

is the average value for each observed value, S_{0} is the sum of all element spatial weight matrix (W), and it can obtained from the follows formula:

is the spatial weighting matrix, and the value of can obtain from follows formula:

where n is the number for spatial cell. And if the i cell and the j cell are neighborhood, then = 1, otherwise = 0. And one cell is neighborhood for itself, namely = 1. It can use Z test to statistic test its result after computing Moran’s I, it can obtain from follows formula (4):

It is frequently took Moran’s I as cross-coefficient, and the value of Moran’s I is between –1 and 1. In given level of significance, when Moran’s I is obviously positive, it indicates each observed value has positive correlation, and higher observed value is cluster to higher observed value, lower observed value is cluster to lower observed value, it presents higher to higher cluster or lower to lower cluster. when Moran’s I is obviously negative, it indicates each observed value has negative correlation, and higher observed value is cluster to lower observed value, it presents dispersed pattern. When the Moran’s I trends to 0, it express that it has no spatial autocorrelation, it is random patterns for spatial observed value.

Example 1 considering the example seen in

Global Moran’s I is an overall statistic index, and it only illustrated the average degree of region and adjacent region. Local spatial disparities may expand, when the

whole region express a region’s spatial disparities trend we need to use ESDA local analysis method. Anselin (1994) proposed the local spatial relation index LISA (Local Indicators of Spatial Association), it can show the spatial autocorrelation characteristic for local and each spatial cell. It apportioned global Moran’s I to each region, and the i statistic for each region is:

where z_{i}, z_{j} is standardization average value, is spatial weighting matrix.

In given significance level, if I_{i} is obviously positive and z_{i} is greater than 0, and it indicates that the observed value of position I and neighborhood are relatively higher, it is higher to higher cluster, if it is obviously positive and z_{i} is less than 0, and it indicated that the observed value of position I and neighborhood are relatively lower, it is lower to lower cluster, if it is obviously negative and z_{i} is greater than 0, and it indicates that the neighborhood value is far lower to position I, it is higher to lower cluster, if it is obviously negative and z_{i} is less than 0, and it indicates that the neighborhood value is far higher to position i, it is lower to higher cluster.

It is weighted average product for observed value of position i and neighborhood. So global Moran’s I and local Moran’s I_{i} have follows relation:

The formal condition of LISA statistic and local Moran’s I_{i} is:

We can use Moran scatter plot to describe LISA. All observed value is cross shaft, and all spatial lag value (W_{x}) is on ordinate axis. All spatial lag value for each region’s observed value is the weighted average value of neighborhood’s observed value. It concretely defined by standardized spatial weighting matrix. The Moran scatter plot can be divided into four quadrants, it is respectively corresponding to four spatial different region spatial type. The right upper quadrant (HH) is the level for region and its neighborhood are higher, and the spatial disparities degree of both is on the small side. The left upper quadrant (HL) is the region’s level is lower than its neighborhood, and the spatial disparities degree of both are comparatively large. The left lower quadrant (LL) is the spatial level for region and its neighborhood are higher, and the spatial disparities degree of both is on the small side. The right lower quadrant (LH) is the region’s spatial level is higher than its neighborhood, and the spatial disparities degree of both is comparatively large.

Example 2 we can compute local Moran’s I of

We can obviously obtain some properties of spatial autocorrelation as below:

1) Patial autocorrelation can only compute continuous attribute value, and can not compute discrete categorical data.

2) Spatial autocorrelation can only compute similarity problem of the whole or each unit’s element, and it can not compute for the similarity between subzones that are composed of several units in whole region.

The cross-coefficient r is frequently used to measure linear correlation dimension of two variables in statistics, when is not all zero and y_{i} is not all zero, the formula of cross-coefficient can obtain from follows formula (8):

where r is the cross-coefficient of variables y and x. and are respectively to the average value of order and. We can obviously obtain some properties of cross-coefficient as below:

1) Cross-coefficient can only compute continuous attribute value, and can not compute discrete categorical data.

2) The length order of and must be the same, if not, it can not compute it.

Rough sets theory is a mathematical tool for dealing with uncertainty and vague knowledge. And it is a good technique for dealing with uncertainty and fuzzy of GIS data, it is also a good technique for spatial entity relations. There are many references for studying uncertainty and fuzzy of spatial entity, such as Zhang [26,27].

Definition 1. Given knowledge base K = (U, R), for each subset and an equivalence relation, we can define two subsets as follows:

where, , are respectively called lower approximation and upper approximation of set X. This definition is Pawlak rough sets. If set R is subset of universe, then definable set R is R precise set. If R is a not definable set, then R is a rough set. If it has a polygon object X seen in

Example 3. The classification map of Moran’s I can divide into {{1, 3, 7}{2, 4, 5, 8, 9}{6}} according to equivalence class in fig 2. Now it has a subzone X covering {2, 3, 4, 6}, then we can obtain the lower approximation of subzone X is {6}, upper approximation is universe U. All element’s value must be discrete value when we use Pawlak rough sets partition and compute, but GIS object attribute’s value is continuous value in practice, such as slope, population density and so on. Then we should use neighborhood rough sets to compute continuous attribute value.

Geng (2009) suggested We should measure different attribute’s distance in spatial cluster. d_{ij}_{ }is the distance of attribute level X_{i}_{ }and X_{j}. The frequently used distance formulas are Minkowski distance, Mahalanobis distance, Canberra distance [

when q = 1，that is Absolute distance:

when q = 2，that is Euclidean distance:

when，that is Chebyshev distance:

Then, we can obviously see diamond is absolute distance, roundness is Euclidean distance, square is Chebyshev distance in

Example 4. Now we consider it has a GIS map level that composed of nine basic units in _{1} and x_{2} in attribute B, that we can compute d(x_{1}, x_{2}) = 0.2. It should use Euclidean distance for measure distance x1 and x_{2} in attribute B, C, that we can compute d(x_{1}, x_{2}) = 0.45. We should dispose source data first in practice, for lack of space, the details will not be dealt with here.

Li [

There are two methods to define neighborhood, one is defined by the numbers of neighborhood, such as classic k-nearest neighbor methods, the other is defined by distance from one measurement central point to boundary. We used the second method in our work.

Definition 2. Given a N dimension real number space Ω, we call d is a measurement of R^{N}, it usually satisfy follows properties:

1) d(x_{1}, x_{2}) ≥ 0, d(x_{1}, x_{2}) = 0, if and only if x_{1} = x_{2}，；

2) d(x_{1}, x_{2}) = d(x_{2}, x_{1}),;

3) d(x_{1}, x_{3}) ≤ d(x_{1}, x_{2}) + d(x_{2}, x_{3}),.

Then we called (Ω, d) is real number space. And Euclidean distance is a common measurement tool for real number space.

Definition 3. Given a non-null limited set U{x_{1}, x_{2}, x_{3}, x_{n}} in real number space, for every object x_{i}_{ }in U, then the δ-neighborhood definition is as follows:

where δ > 0, is δ neighborhood information granulation from x_{i}, it for short called as x_{i }neighborhood granulation.

From the measurement properties, we can get three properties about neighborhood information granulation:

1), because of；

2)

3)

So Given a measurement space (Ω, d) and a non-null limited set U{x_{1}, x_{2}, x_{3}, x_{n}}, if δ_{1 }≤ δ_{2}, then we can get these properties:

1)

2)

Obviously, neighborhood relations are a kind of similarity relations, which satisfy reﬂexivity and symmetry properties. Neighborhood relations draw the objects together for similarity or indistinguishability in terms of distances and the samples in the same neighborhood granule are close to each other.

Example 5. Nine polygons are seen in _{1}, x_{2}, x_{3}, , x_{9}}, and B and C are respectively stand for two attribute level value (such as slope, aspect etc), when we choose value in one dimension attribute, we can use absolute distance. We use f (x, b) to express the value in attribute B for example x , then we can get f (x_{1}, b) = 1.6，f (x_{2}, b) = 1.8, , f (x_{9}, b) = 2.1. if we assigned the neighborhood threshold is 0.2, because of |f(x_{1}, b) – f(x_{2}, b)| = 0.2 ≤ 0.2, then

. In this case, we can get

, ,

.

when we get value in two dimension attribute, we should use Euclidean distance, we used f (x, b) to express the value for attribute B, C for example x, if the neighborhood threshold is 0.3. Then we can compute each polygon’s neighborhood in two dimension space,

, ,

, ,

, ,

,.

If it has many attributes, we can compute the distance for examples, and computed the neighborhood for examples.

Definition 4. Given a set of objects U{x_{1}, x_{2}, x_{3}, ,x_{n}} and a neighborhood relation R, called D = {U, R} is a neighborhood approximation space [

Definition 5. Given D = {U, R} and X ⊆ U. For any X ⊆ U, two subsets of objects, it is called lower and upper approximations of X in D= {U, R}, that are defined as follows:

Obviously,.The positive region of X, negative region of X and boundary region of X in the approximation space are defined as follows:

A sample in the decision system belongs to either the positive region or the boundary region of decision. Therefore, the neighborhood model divides the samples into two subsets: positive region and boundary region. Positive region is the set of samples which can be classified into one of the decision classes without uncertainty, while boundary region is the set of samples which can not be determinately classified. Intuitively, the samples in boundary region are easy to be misclassified. In data acquirement and preprocessing, one usually tries to find a feature space in which the classification task has the least boundary region. It is as summarized in Zhang [

Example 6. We given two sets X = {x_{1}, x_{2}, x_{3}, x_{5}, x_{7}} and Y={x_{2}, x_{4}, x_{6}} in _{1}, x_{2}, x_{5}}, pos (Y) ={x_{6}}, accordingly, we can get the negative region and boundary region for two sets.

Then we can get a map that shown binary classification in a 2-D numerical space in _{1} is belongs to the lower approximations of the first example, x_{3} is belongs to the lower approximations of the second example because of its neighborhood are from the second number, x_{2} is boundary example because of its neighborhood is belongs to the first example and the second example too. The definition is according to our intuitive recognition ： 识别；再认；认可；采认for classification problem in real world.

Definition 7. U is universe, R is equivalence relation of U, , the rough membership for element of set A [

The rough membership of x in A is equal to rough membership for fuzzy set x in equivalence class that weakly contains to A. So we can understand rough membership as a coefficient, it describe inaccuracy for in A.

The formula (17) is defined for GIS discrete value by Pawlak rough sets membership, but for a continuous value, we can not get equivalence class easily, and we can get this membership from Definition 8.

Definition 8. For GIS continuous value, we use neighborhood rough sets definition for continuous value membership, we defined as follows:

The rough membership of x in A is equal to rough membership for neighborhood information granulation in equivalence class that weakly contains to A.

Definition 9. U is universe, R is equivalence relation of U, , then a fuzzy set can get from A and R, via:

Definition 10. Given universe, R is equivalence relation of U, A and B are two rough sets of universe U, , the rough membership about A, B in equivalence relation R is separately and (i = 1, 2, , n), we can get the membership of A and B in equivalence relation R is separately, that defined as follows:

Then the similarity of set A and B can get from follows formula: [

We used the formula from Shi [

Obviously, the higher the similarity of set A and B has, the bigger value has, vice versa. And it satisfied these properties:

1);

2);

3)if and only if, one value is at least 0 for and, and set A and B can not be null at the same time.

Considering the example seen in

Then the similarity of subzone A and B is:

In a similar way, ,. So the similarity for A and B is less than the similarity of A and C, the similarity for B and C is less than the similarity of A and C.

Considering the example seen in

For continuous value in

Then the similarity of subzone A and B is:

In a similar way, ,. So the similarity for A and C is less than the similarity of A and B, the similarity for B and C is less than the similarity of A and C.

If used spatial autocorrelation to measure the subzone similarity for above case, we can find it can not measure

This paper used rough membership measure similarity problem for different subzone. Because Moran’s I can only measure universe or each unit’s spatial autocorrelation, it can not measure subzone, so our method can compute GIS subzone similarity based on universe. And for continuous value, we used distance function and neighborhood rough sets to divide continuous value’s upper and lower approximation and classification problem, then we put forward a rough membership function based on neighborhood information granulation. At last, we used rough similarity measurement formula to measure GIS subzone similarity problem, this method can provide a new direction for GIS point group or others’ object group similarity measurement. Our future work should study object group similarity based on different distribution, and for similarity problem based on rough entropy.

The author would like to thank the project sponsored by the scientific research foundation of GuangXi University (Grant No.XTZ110584).