_{1}

The process of extracting patterns that are frequent from supermarket datasets is a well known problem of data mining. Nowadays, we have many approaches to resolve the problem. Association rule mining is one among them. Supermarket data are usually temporal in nature as they record all the transactions in the supermarket, with the time of occurrence. An algorithm has been proposed to find frequent itemsets, taking the temporal attributes in supermarket dataset. The best part of the algorithm is that each frequent itemset extracted by it is associated with a list of time intervals in which it is frequent. Taking time of transactions as calendar dates, we may get various types of periodic patterns viz. yearly, quarterly, monthly, etc. If the time intervals associated with a periodic itemset are kept in a compact manner, it turns out to be a fuzzy time interval. Clustering of such patterns can be a useful data mining problem. In this paper, we put forward an agglomerative hierarchical clustering algorithm which is able to extracts clusters among such periodic itemsets. Here we take two similarity measures, one on the itemsets of the clusters and others on the corresponding fuzzy time intervals. The efficiency of the proposed method is demonstrated through experimentation on real datasets.

The most important data mining problems based on unsupervised learning approach is Clustering and it is very useful for the extraction of data distribution and patterns in the datasets. The clustering process is used to discover both the dense and sparse regions in a dataset. The two main broad approaches are partitioning approach and hierarchical approach. The hierarchical clustering creates a hierarchy of clusters from small to big or big to small and consequently it is named as agglomerative or divisive clustering techniques respectively. Clustering of numerical data has been studied in the past [

Association rule mining is an important data mining problem which derives associations among data and was formulated by Agrawal et al. [

In this paper, we devise an agglomerative hierarchical clustering method to explore clusters among such periodic patterns. We define the similarity measure on the corresponding fuzzy time intervals [

The rest of the paper is arranged as follows. Section 2 presents a brief literature review related to the existing clustering algorithms. In section 3, we present some basic definitions and results used in this paper. The proposed agglomerative clustering algorithm is discussed in section 4. In section 5, we discuss some analysis of experiments and results. Finally, we wind up the paper with possible future enhancements of the proposed work in section 6.

In this segment, we present a brief assessment of the existing research findings related to our work. In [

Finding associations among data has also attracted a large number of researchers. In [

In this section, we present a summarized view of some definitions and results on which our proposed algorithm is based.

Let X be the universe of discourse, then the fuzzy set A of X is characterized by

An α-cut of the fuzzy set A of X is actually a crisp set A_{α} with elements x of X having membership greater than or equal to α i.e.

A fuzzy set

A fuzzy number is a convex set defined in the real line whose membership value is 1 for at least one x Î X.

A trapezoidal fuzzy numbers denoted by A = (a, b, c, d), where it’s membership function is given by

In short, we can express the above membership function as

It is to be mentioned here that our fuzzy time intervals associated with periodic frequent patterns are actually trapezoidal fuzzy numbers. The fuzzy time intervals are formed using method [

A generalized trapezoidal fuzzy numbers is represented as A = (a, b, c, d, h), where its membership is given by

In short, we can express the above membership function as

Let 0 ≤ a ≤ 1, 0 ≤ b ≤ 1, then the similarity measure between a and b is given by

Let A = (a1, a2, a3, a4: h_{A}) and B = (b1, b2, b3, b4: h_{B}) be two generalized trapezoidal fuzzy numbers, then the similarity measure is defined in [

The larger the value of S(A, B) the more similarity between A and B. Obviously, 0 ≤ S(A, B) ≤ 1, A and B will be identical if S(A, B) = 1.

Let A and B be two periodic frequent itemsets having periods T and S respectively. Then the merge function is defined

In this segment, we describe our proposed clustering algorithm based on the notion explained in the previous section. The proposed algorithm takes as input, all periodic patterns with fuzzy time intervals describing their periods. The fuzzy time intervals are constructed using the methods discussed in [_{1} and T_{2} are said to be similar if and only if the value S(T_{1}, T_{2}) is greater than some pre-defined threshold.

Initially, each pattern is assigned to a separate cluster. Thereafter, for each pair of clusters the similarity value S( ) is calculated and merge function is applied (to generate a new bigger cluster) if the S( ) is greater than the threshold. And their corresponding periods/fuzzy time intervals are aggregated [

Frequent Pattern Clustering Algorithm (n, θ)

Input: The number of frequent patterns n and threshold θ

Output: A set of cluster S of with fuzzy time intervals

Steps:

Initially set of clusters is empty

1) S ¬ f

2) read each frequent pattern A[i] with fuzzy time intervals T[i]

3) To construct a cluster C with T if a cluster C_{1} Î S with sim(T_{1}, T) ³ θ

4) Then C = merge (C_{1}, C) with T = aggregate (T_{1}, T)

5) remove C_{1} and T_{1} from S

6) add C with its fuzzy time intervals to S

7) Process continue till no merger is possible.

8) return S

9) stop

For experimentation, we have used a synthetic dataset T10I4D100K, available from FIMI^{1} website. As the dataset is non-temporal, we consider the temporal features, the calendar dates and execute the algorithm [

In this paper, we have presented an agglomerative-hierarchical clustering algorithm to find clusters among periodic patterns with fuzzy time intervals. The

Data Size (No of Transactions) | Max No. of Itemsets | No of Clusters Obtained | Number of Itemsets Misclassified |
---|---|---|---|

00000 | 0 | 0 | 0 |

10000 | 123 | 10 | 3 |

25000 | 220 | 14 | 2 |

50000 | 350 | 17 | 1 |

75000 | 599 | 21 | 1 |

algorithm starts with as many clusters as the periodic patterns having fuzzy time intervals. Then, if their similarity value is greater than pre-defined threshold, the pairs of clusters are merged. The similarity is defined on fuzzy time intervals associated with periodic patterns. After each level the corresponding fuzzy time intervals are updated by aggregation. Although we have used the agglomerative- hierarchical approach in this paper; any other approach can also be considered provided the similarity measure is properly defined.

Mazarbhuiya, F.A. (2017) A Novel Approach for Clustering Periodic Patterns. International Journal of Intelligence Science, 7, 1-8. http://dx.doi.org/10.4236/ijis.2017.71001