Mining Frequent Item Sets in Asynchronous Transactional Data Streams over Time Sensitive Sliding Windows Model

EPs (Extracting Frequent Patterns) from the continuous transactional data streams is a challenging and critical task in some of the applications, such as web mining, data analysis and retail market, prediction and network monitoring, or analysis of stock market exchange data. Many algorithms have been developed previously for mining FPs (Frequent Patterns) from a data stream. Such algorithms are currently highly required to develop new solutions and approaches to the precise handling of data streams. New techniques, solutions, or approaches are developed to address unbounded, ordered, and continuous sequences of data and for the generation of data at a rapid speed from data streams. Hence, extracting FPs using fresh or recent data involves the high-level analysis of data streams. We have suggested an efficient technique for the window sliding model; this technique extracts new and fresh FPs from high-speed data streams. In this study, a CPILT (Compacted Tree Compact Pattern Tree) is developed to capture the latest contents in the stream and to efficiently remove outdated contents from the data stream. The main concept introduced in this work on CPILT is the dynamic restructuring of a tree, which is helpful in producing a compacted tree and the frequency descending structure of a tree on runtime. With the help of the mining technique of FP growth, a complete list of new and fresh FPs is obtained from a CPILT using an existing window. The memory usage and time complexity of the latest FPs in high-speed data streams can efficiently be determined through proper experimentation and analysis.


INTRODUCTION
D ata streams are realtime, continuous, possibly infinite, fast, changing, and ordered, with a huge amount of sequences of items [1,2]. A data stream rapidly changes with time, so that acquiring all the elements in it is impossible. Each element in it is examined once for a time. Given that new data elements are discovery, fraud detection, and business improvement.
FPs (Frequent Patterns) have been extracted from large data sets by many researchers using different techniques.
Many aspects maybe related to discovering FPs from large data sets. However, the major aspects that are relevant to recurrent pattern finding are storage and run time. Hence, researchers focus on finding FPs that take less time and storage. In the present study, various algorithms for finding FPs in large data sets are discussed [4]. Sequential patterns are basically found from continuous data streams and from transitional and normal data sets. The execution times of different algorithms for FP mining (FPM) are compared in this work. Based on speed, the performances of different FP algorithms are also compared. FPs are also found in real time to show the difference from an ordinary system [4].
The data collected from various sources, such as sensor data and weather or satellite data, are basically huge and inexact. Database sizes are growing rapidly, and such databases may be used for knowledge discovery that requires les storage and time. Different objects may have various relationships with one another, which may lead to association rules in diverse databases. Different patterns of objects are discovered through such types of relationships among these objects. Such type of pattern matching can be utilized indifferent applications of decision support and weather forecasting.
Association rules [5] may contain FPM as sub-problem and can be utilized to search out the frequent items from the large databases. Association rules are induced from the concept of market basket analysis, which is mostly used in identifying customer behavior with respect to purchasing different products from the market. For example, if a child purchases a book, then the child is most likely to purchase a pen also. When FPs become exponentially large, a major problem arises in FPM. high-speed, and unbounded features, data stream mining has become difficult. In algorithm [1], FPM is likely to be close to the proposed system, which is generally used to extract FPs using a specific data stream. In FPM, a SW (Sliding Window) mechanism is used, based on which the parts of the window are made by dividing the windows into parts of equal size in fixed numbers that contain transactions with non-overlapping batches. The prefix tree structure [6] for canonical order is mostly used to store information in the current window. For every batch, every node of most trees maintains a list that stores the frequency count. Tree traversal can be avoided to extract information from a tree in an old batch.
Therefore, the tracks of the last visited batches are maintained via FPM, and an extra pointer is used for every node to count the last restructured batch number.
Frequency list contents are changed for SW reflection using nodes. The FP-growth technique is utilized for mining when the information is captured using SW [7], that is, the FPs from a complete information sets.
However, FPM has several limitations. First, information or item sets are stored in the canonical order, and using FPM for structures related to a highly compacted tree provides no guarantee, which is quite important in manage data streaming to avoid the overhead related to massive storage, decrease search space, and ultimately hasten FP growth-based frequent pattern mining operations. Secondly, the lists of frequency counts in FPM [1]  Hence, for mining, FPM consumes more time than such types of trees based on structures organized in such a way that the frequency depends on the order of items.
Moreover, FPM construction is totally based on an assumption that does not consider the limitations of the main memory, which is unrealistic in considering or processing huge amounts of data, such as a data stream.
By contrast, CPILT, the proposed tree in this study, exactly provides similar information on data streams and performs similarly to FPM, such action with the storage only in an FP tree with a strong compact structure, thereby presenting an efficient data structure for strong storage. Most proficient FP growth-based mining platforms are provided by a highly compacted tree structure. Moreover, efficiency is achieved in a path that presents a transaction by maintaining only the frequency count list for the last node rather than maintaining specific information for every node.
Furthermore, an extra patch pointer is not required by CPILT for every node to maintain the last updated track.
CPILT is regularly updated with the mechanism by extracting such type of transactions that expire after every window slide. This feature guarantees that garbage nodes do not exist in a tree and that a clear tree status is ensured for mining.

RELATED WORK
Data streams are realtime, continuous, possibly infinite, fast changing, and ordered, with huge amounts of sequence items [8]. A data stream rapidly changes with time, so that acquiring all the elements it contains is impossible [9]. As such, each element in a data stream is examined individually [10]. Given that new data elements are produced continuously via streaming, the memory usage for stream mining should be limited [3]. In stream mining, a new data stream should be guaranteed to be available immediately whenever a request is made for such stream. This requirement makes data streaming more challenging in applications, such as knowledge discovery, fraud detection, and business improvement [11]. To mine frequent item sets, many authors have suggested different techniques. Ran, et. al. [12] have suggested the "lossy weight algorithm", which is mostly used in finding frequent item sets based on weight. The "SW" approach has also been proposed [4,13] and is mostly used in data stream mining. This approach is subdivided in to two types: the "transaction-sensitive" SW and the "time-sensitive" SW.
Jiayin, et. al. [14] proposed using this algorithm in mining frequent item sets in SW; they effectively proposed a new approach to extracting frequent item sets using the SW technique.
In the past decades, static mining FP [6,12] and incremental databases [15] have been addressed in an excellent manner. A parallel a prior algorithm is used for the FP algorithm [16] to find association rules [5]  pattern tree and an FP-growth-based algorithm [7]. The passes of database scan will be reduced in two pass by using FP-growth tree used in [6], and the candidate generation requirement can be eliminated. Introducing such highly compact structure for an FP tree has introduced in turn a new research method for mining FPs with the structure of prefix tree [6]. General and research issues related to mining frequent patterns in data streams are revised [15,17]. The scope of the present work focuses on data stream mining using the SW mechanism [18], but the literature review here mainly focuses on studies related to window-based methods.
Most studies focus on models of landmark window [17] and SW [15], which are mainly used to find FPs in a data stream. In [19] represents the first attempt to mine FP from the entire history related to data streaming. Lossy counting [8] and sticky sampling are single-pass algorithms developed based on the anti-monotone property. These two algorithms, which have some error bound, deliver approximate results. In this paper the [20] uses a lattice structure, which refers to a frequently enumerated tree divided into several equal stored pattern classes with the same transaction ID within a single class. Another algorithm [21]was implemented to find When information is captured for a full window, the request of the mining FP growth-based approach is used to mine a complete set of recurrent patterns through a tree.

Compact Pattern Item List Tree
The definitions of the basic terminologies used to clarify the FP idea in streams are presented in this section.
Suppose I= {i1; i 2 ;...; i n } is a set of items that are also recognized as literals; in some application fields these items are used as parts of the information. Set C={i l ; ... ; The problem is to mine a precise set of regular current patterns; the FW (Frequent Window) in the data streams; thus, the SW mechanism is utilized.

Designing and Constructing CPILT
The mechanism of efficiently extracting plates is explained using CPILT. Through the methods of dynamic tree rearrangement, a tree structure for frequency descending is utilized; performing this task shows how CPILT periodically reorganizes itself. Here, mining performance analysis is also noticed when CPILT achieves its goal using its mechanism for dynamic tree rearrangement. Only one scan for a database is required to mine FPs using the A brief description of the CPILT structure is required before elaborating the erection process of CPILT. CPILT consists of one node as the root node, which is referred to as the null node. The children of root node can be labeled as item set of prefix sub-trees, and F-list is a exclusive item set that has a relative frequency as compared with F-lists and a pointer, that shows the first node in the CPILT that has item sets. Similar to the FP growth-based tree, CPILT tree structure also bears the nodes that present an item set with the limited numbers of badges (i.e. support and passes) for such sets of items for the present window, which is root of the node in the CPILT. A novel idea is used to form an F-list for a transaction. For such purpose, information on the platebased supports count is considered only for tail items.
The tail item is explained below.

Definition-2: Tail Node
Suppose t = {i 1 ; i 2 ; ... ; i n } represents the arranged transaction, and i n is called the tail item. If the lexicographically given transaction t {a; b; c} is insert into the CPILT, then c node in the CPILT becomes the tail node.

CPILT Organization
To maintaining the order of frequency in descending way, CPILT is restructured by itself dynamically. The CPILT construction process has two phases: insertion and rearrangement. The stream contents are captured in the insertion phase in a tree with respect to an F-list sorting order, and the current sorting order is the F-list sorting order. The rearrangement phase is utilized to restructure a tree in frequency descending order with the help of already obtained information. During the process of tree construction, both phases are executed repeatedly many times. Given that the data stream has a dynamic nature, the transition from the insertion to the rearrangement phase can be executed when the plate information is captured dynamically. CPILT formation from a data stream can be explained through an example.
In Fig. 1, the step-by-step formation methodology of CPILT is shown with the same data stream and related transaction IDs.
Given the example under explanation, we assume that after inserting each plate, the rearrangement of the tree is performed. We discuss several types of rearrangement criteria that may be useful for applications for data streams with a dynamic nature. We also assume that tree formation During the construction of CPILT, two properties are observed.

Property-1:
The total count of the frequency in CPILT e" summation of total count of frequency of the children.

Property-2:
The total count of the frequency of tail nodes in CPILT > summation of the total frequency of the children at the end of the tree.
In window sliding, CPILT is modified with the exclusion of old plate information and insertion of new plate information.

Extraction of Old Information
To assemble a platform that is set to be mine, the configuration of the CPILT is updated with precise contents in the current window. When the windows are made to slide, CPILT is modified by traversing the entire tree. The plate counter for each tail node is updated to delete old and expired information that contains CPILT.
Changes may be reflected if necessary at the remaining nodes in the corresponding path, as the tail nodes maintain the plate information. Fig. 2 shows the basic mechanism of extracting old information from plates in CPILT and the refreshing algorithm for such purpose. For each tail node, when the update operation for the plate counter starts, the refreshing operation for CPILT begins as well. This operation starts when the lowest items exists in F-list.
The 1 st value in the plate counters is removed relating to each tail node of item. The oldest plate expiration is shown by shifting the remaining values by one slot to the left in the list. When the update is performed, any tail node becomes an ordinary node, and zeroes exist in its plate counter for all entries. When the total count for a node becomes zero, the deletion operation for that node is performed. In the same manner, the deletion operation for any node can be performed from CPILT when the support for any ordinary node becomes zero.
In the tree update process, no operation is performed in CPILT for an ordinary node. The mechanism for handling expired plate for CPILT is considered. For Window-1, the stream data and CPILT construction in Fig. 2 (1), thereby resulting in zeroes in all the node records. As such, the deletion operation is completed for the "e" node of the tree.

Mehran University Research Journal of Engineering & Technology, Volume 35, No. 4, October, 2016 [p-ISSN: 0254-7821, e-ISSN: 2413-7219] 633
Same value is decremented from the support count value related to every node that starts from this to root node.
The subsequent immediate node to root node in path, that is,"c:1," is further processed. The deletion operation for this node is performed when the total support count for this node is zero. For the outstanding nodes (i.e. items"a" and "d"), the update procedure is executed consequently. The F-list is updated by following every update process, and the changes are saved. The update operation for item "e" is terminated with the completion of the adjustment for the F-list when no more nodes belong to "e" in the tree. The subsequent CPILT is presented in Fig. 5. "c" is the subsequent node in the tree containing a particular node "c: 1; 1, 0,"which is also called the tail node. Therefore, the same method accepted for the "e" in tree is also accepted for the update process. "b: 1; 0, 1" is the first node of "b" that contains the subsequent item and is also a tail node. Fig. 2 shows that the first value of the plate counter is zero. The transactions in the 1 st plate that are shown by such type of tail node do not appeared.
Hence, the total counts of the value in this node and remaining nodes toward the root node in the path of the update process is not needed.
The modifying method for the counters of the plates are executed by shifting the corresponding values to left, and a 0 is placed at end to store the upcoming plates for the new information. The total count for the contents and plate counter of node are1 and 1, 0, respectively, after the process is completed, as shown in Fig. 2. The subsequent node containing the item "b" is also a tail node that has the same information as the previous one. Hence, the same procedure is adopted. Item "a" consists of a single node "a: 1," which is the subsequent item in the tree and is an ordinary node. An operation need not be performed for such node. After skipping "a," we achieve a single ordinary node "d: 2" in the tree. Similar to the previous item, the node is also skipped, and the update process for CPILT is terminated because the F-list contains no remaining items. The final CPILT in Fig. 2 is shown after plate 1 extraction, and this tree is prepared similar to an updated tree that is ready for capturing information on the upcoming new plate according to the mechanism of CPILT formation.
The formation of CPILT when new information on the plate included is presented in Fig. 2 (tid 5 tid 6) are inserted in the CPILT reorganization in Fig. 2. The tree is reorganized after the insertion of new plate information infrequency descending order, as shown in Fig. 2. Considering the plate extraction methods and the construction of CPILT.
Depending on FPM and the process of CPILT formation, the storage efficiency that CPILT can obtain is due to (1) tree formation with high compactness, (2)

Experimental Analysis and Results
In this section, the outcome of our complete examination of the performance of CPILT for data streams against artificial and actual sets of data is given in Table 2. Table 2 presents the statistical information on data sets, which are used in the examination.
The 1 st three main data sets are attained from [28]. Several  Table 2; they are mostly found in TABLE 1. TRANSACTIONAL DATABASE   D  I  n  o  i  t  c  a  s  n  a  r  T  D  I  n  o  i  t  c  a  s  n  a  r  T  D  I  n  o  i  t  c  a  s  n  a  r  T   1 [13,16] indicate that Apriori works well for such type of data sets, when the item numbers are reduced, but it does not work well when the patterns are extensive. FPs may be large and/or insignificant maintenance thresholds. The performance matter can be fixed in the method based on the spreading growth of a tree with a frequency descending arrangement [29]. Consequently, straight collective mining patterns can be produced from the static window of a fixed size using both MFI-Trans SW and CPILT, which perform outclass in many circumstances compared with FPM. In the case of platebased SW in case of MFI-Trans SW and CPILT, we mostly ignore their associated runtime. When the memory obligation of the window is not dependent on the method of SW, the memory evaluation can be created between them. FPM is quite parallel to our suggested CPILT, as is mentioned in [6]. In this study, we relate CPILT to FPM. The experimental analysis can be divided into two parts. First, CPILT density is illustrated with respect to the amount of nodes and memory. Second, the act of FPM and CPILT is illustrated from the perspective of runtime.

Runtime Competence
We match the total competency of the runtime of CPILT, MFI-Tran SW, and FPM with the construction,

Memory Adeptness
We prove the memory requirement against FPM, MFI-Trans SW, and our suggested CPILT for dissimilar datasets with varying window scope. We test the memory requirement against the preliminary or primary data structure using the SW methodology using a window with a stable size. We also test the memory requirement using windows of varying sizes for each data set.   Fig. 24 shows that the quality devalued with the number of dissimilar item sets in truncations.
The distribution of the data will be very dense when the transaction length is small. May be some duplication between the items and the transactions are exists with low restoration error, displayed in the Fig. 24.

Restoration Error
We also count the restoration error for the frequent patterns based on the probabilistic model [2]. A restoration procedure for the set of item sets S is the function mapping S to the values between the 0 -1:f: S- [0,1]. The restoration quality is measured by p-norm for the relative errors [2]. We use the above model to minimize the restoration error with respect to the frequent item sets.

CONCLUSION
In our paper, we presented the newest idea for the mechanism of the dynamic restructure of tree to handle continuous data streams. We proposed and developed a structure for CPILT; this tree restructures itself to achieve a frequency descending compact tree using a single pass. CPILT decreases the runtime and memory capacity for handling high-speed data streams. An effective restructuring mechanism for the structure of CPILT was also proposed and explained. For runtime and memory efficiency, we compared our algorithm with FPM and MFI-TranSW.The analysis of the results showed that our proposed CPILT provides better results than FPM and MFI-TranSW. The future work of our research is to summarize frequent sets of items.