Cache Memory: An Analysis on Replacement Algorithms and Optimization Techniques

Caching strategies can improve the overall performance of a system by allowing the fast processor and slow memory to at a same pace. One important factor in caching is the replacement policy. Advancement in technology results in evolution of a huge number of techniques and algorithms implemented to improve cache performance. In this paper, analysis is done on different cache optimization techniques as well as replacement algorithms. Furthermore this paper presents a comprehensive statistical comparison of cache optimization techniques.To the best of our knowledge there is no numerical measure which can tell us the rating of specific cache optimization technique. We tried to come up with such a numerical figure. By statistical comparison we find out which technique is more consistent among all others. For said purpose we calculated mean and CV (Coefficient of Variation). CV tells us about which technique is more consistent. Comparative analysis of different techniques shows that victim cache has more consistent technique among all.


INTRODUCTION
C ache is a high speed memory which is not as costly as registers but it is faster when compared to main memory.Cache memory is basically used to store data and information which are currently being used.To access data from main memory takes more time, therefore to reduce this time a special memory inside the CPU (Central Processing Unit) is reserved to keep small amount of data for some time.
CPU having cache memory needs less time to wait for an instruction to be fetched from the memory for processing.
Absence of cache memory decreases execution rate which affect the performance of CPU.
The main purpose of cache memory is to reduce the speed gap between slow memory and fast processor at a reduced cost [1].It mostly consists of most recently accessed piece of main memory.All information is stored in some storage media like main memory.Whenever CPU/processor use some data or piece of information it is copied into some faster storage media like cache.when processor try to approach a particular piece of information again, the system checks it in cache first, if it is in cache processor use it from there if not found in cache it must be brought from main memory and copy it into cache assuming we will need it again.
Generally the cache memory is categorized in three L (Levels) i.e.L1, L2 and L3 cache [2].L1 cache is actually the fastest or smallest than L2 and L3.Usually L1 cache resides in memory or it is also called on chip memory.L2 cache is faster than L3.Last level that is L3 is largest and slowest among all three levels.Inmulti-core processor severy processor has its own L1 and L2 cache.L3 cache is shared among all processors [3].This hierarchy of cache levels is shown in Fig. 1.
Whenever desired chunk of information whether it is data or instruction is present in cache this situation is called cache hitand time taken to find out whether it is present in cache or not is called hit latency [4].If required data is not found in cache then it would be brought into the cache from main memory this situation is called cache miss [5].Mainly, three type of cache misses exist: (i) compulsory misses which take place when a memory location is accessed for the first time, (ii) conflict misses which occur due to insufficient space when two blocks are mapped on the same location (iii) capacity misses takes place due to small space [6].
Due to small size of Cache Memory it is needed that its content has to be replaced according to the usage and specific time period.An important component of cache is its replacement policy which is a decision about which page or data is replaced from cache to make space in cache for new data.Here the problem is that what piece of data is going to be replaced.Different history based prediction algorithm were developed and implemented these algorithms map pages according to their suitability for eviction.Although they have some drawbacks but in order to achieve better performance they are used.
There exists a fair distance between the processor speed and the memory access latency, so a great effort has been put in this regard to reduce this gap.There is lot of work done to overcome this gap that is hardware based, compiler based and operating system based.This paper discusses several basic and advance cache optimization techniques and replacement policies proposed till date.Rest of the paper is categorized is as follows: Section 2 is based on cache replacement policies, Section 3 discusses cache optimization techniques, Section 4 is about performance evaluation ,Section 5 is discussion and Paper is concluded in section 6.

REPLACEMENT ALGORITHMS
Replacement algorithms/policies are used in order to attain optimized usage of cache.When cache is full, then replacement policies decide which piece of data is replaced in order to make space for new data that is currently being used.An efficient algorithm is that which can take less time and number of cache misses are low and also balancing cost.Following are some of the algorithms.

LRU (Least Recently Used) Algorithm):
This algorithm discards the least recently used item from the cache in order to make space for the new data item.In order to achieve this, history of all data items that is which data item is used when, is kept.A variable known as Aging Bit is used to store this information, Although this algorithm

FIG. 1. SHOWING CACHE HIERARCHY
provides better performance but cost of implementation is much more [6].Variants of LRU are the most popular among all other algorithms.Key advantage of this policy is its simple implementation, time and space overhead is constant."Recency" is a main factor in this algorithm while LRU takes into account the characteristics of"recency" of the workload; it ignores and exploits the capabilities of "frequency" of a workload [7].If all objects have same frequency then this algorithm randomly discards any data item [8].

GDS (Greedy Dual Size):
In this algorithm index is calculated according to the size of a file.Larger the file smaller is the index.File with the smallest index is replaced in this algorithm.Inflation value is used to keep track of frequently accessed files in the cache [9].

CAR (Clock with Adaptive Replacement) Algorithm:
CAR is simple to implement and it has very low overhead on cache hits.It shows high performance and it also provides service of self-tuning.It is scan resistant which result in low space overheads that are less than 1% [10].
CAR does not care for certain workloads [11].

ARC (Adaptive Replacement Cache):
This algorithm is easy to implement running time is not dependent on cache size.ARC has a low space overhead of approximately 0.75% of the size of the cache.ARC is a scan resistant also leads to self-tuning.This algorithm continuously balanced recency and frequency features by responding to changing access pattern [12].In this algorithm cache is divide into two queues, each is handle by using CLOCK or LRUthat contains pages accessed only once, while the other contains the page which are accessed more than one time [13].Like other algorithms ARC also has a constant complexity per request."Ghost cache" is a special term used in this algorithm to handle the data element which will be used in near future [14][15].

RR (Random Replacement) Algorithm: This algorithm
randomly selects any of the data item from the cache and replace it with the desire one [4].This algorithm does not need to keep track of the history of the data contents and it does not need any data structure.Due to which it consumes less resources, therefore its cost is less as compare to other algorithms [16].

SLRU (Segmented LRU) Algorithm: This algorithm
partitions the cache into two portions, one is unprotected and the other is protected.Protected portion is reserved for mostly used objects.When first request for an object has been done then this object is inserted into the unprotected portion.On a cache hit the object is moved into the protected portion [17].Both portions are managed by LRU technique.But content from unprotected part has been removed and content from protected part has been moved back to the unprotected part a recently used content.This method requires a variable that calculates what percentage of the cache space is reserved for protected part [18].
LR+5LF Algorithm: LR+5FU replacement policy is a combination of two popular replacement policies i.e.LRU and LFU.The problems arrived in LRU and LFU policies are solved by new policy called LR+5FU [6].The weighing problem of LRU AND LFU is solved by this algorithm.
LR+5LF policy reduces cache miss with greater amount than LRU, FIFO, and LFU at L1 and L2 cache [19].

FIFO Algorithm:
The first in first out algorithm removes the page that has not been used for a long time.It treats the pages as a circular buffer, and pages are removed in a round robin fashion.It cause early page fault [20].This classification described in Fig. 2 and comparisons between different algorithms are shown in Table 1.[21] has been implemented through which user can control space allocation in cache.This technique is hard to implement but it produces less hit rate and also reduce cache pollution [22].

OPTIMIZATION TECHNIQUES
One of the other ways of cache optimization is compiler based optimization in which loops are optimizing through compiler.To set the accessed data in cache loops must be reduced to smaller size.In this way all the tasks will be executed consecutively which will be using same data from cache [23].
Different methods are used to make cache's performance better.One of them is jigsaw [24].It is used to solve scalability and interference issues.It helps to define how data would be mapped to shares.Every share has a unique id.Jigsaw produces better performance than NUCA design [25].   is managed in such a way that most of the data is served by fastest bank.To more data to faster banks in steps, a switched network is used.The core feature of NUCA design is the low latency access.
A set of compiler algorithms have been written for the prediction about the data to be reused in near future.
These predictions are used to make the hit rates better.
The algorithm used for the purpose is evict-me which uses cache line tag of one bit.So when ever evict-me tag is set for any cache line, the cache line will be replaced [26] but this technique is complex and consume high energy.
Different methods are used to implement two way set associative cache.One of them is predictive sequential associative cache.Using this method to implement set associative cache access time becomes approximately equal to direct mapped cache [27].This method uses different prediction which helps to reduces access time and hit rate.
One way to improve the performance of the cache is to produce the next data to be used by the cache, data perfecting is used to produce data in advance which is next to be used [28].
Enlarging the size of cache reduce the chance to occur capacity misses.Simple cache and way prediction methods are used to reduce cache hit time.Comparison of different techniques is shown in Table 3.

PERFORMANCE EVALUATION
To date researcher came up with many cache optimization techniques.People judged them by actually using them.
To the best of our knowledge there is no numerical measure which can tell us about rating of specific cache optimization techniques.We tried to come up with such a numerical figure.
Table 2 shows collected performance parameters against cache optimization techniques.Their value are given as H (High), M (Medium), L (Low) and N (Not Applicable).
In Table 3 numerical values are assigned against, H, M and L .Table 4 is numerical replacement of Table 2 which is explained by Table 3.
In Table 4 if the particular parameter decreases the performance then values are assigned as 1, 2, and 3 for H, M and L, respectively, if performance increases values are assigned as 3, 2, and 1 for H, M and L, respectively.
The last two column of Table 4 shows Mean and CV which tells us about more consistent technique for cache optimization.
CV provides information about data i.e. how much data is scattered from its mean.As low as the value of coefficient is, technique is more consistent.It is calculated in last column of Table 4.These tables give us a better view to judge the subject techniques.Fig. 3 shows the graphical representation of Table 4.

DISCUSSIONS
In We also studied different optimization techniques and compared them in Table 2.

CONCLUSION
The cache is a critical part of performance there are many cache replacement policies and optimization techniques exist.We tried to provide comprehensive review on both proposed till date.
At the end we presented a comprehensive tabular representation of cache optimization techniques and their

First) :
This algorithm keeps the average latency to a minimum by first expelling the object with the lowest download latency [7].This algorithm gives the best result in cases where the data is retrieved by executing a query against a relational database.Algorithms that are discussed above are classified in several classes.These classes are made in term of different parameters discussed in [9-2].In all parametersrecency and frequency are the most important factors.This calcification is also used by other authors.Classification of classes is: section 1 different cache replacement policies which have been implemented in the past have been studied and compared with each other.Policies are compared on the basis of following parameters, Cache miss, cache hit, and Resource utilization, Queuing, Size of physical memory, Page movement overhead, Recency/frequency and extra parameter.

TABLE 4 . STATISTICAL COMPARISON OF CACHE OPTIMIZATION METHODS GIVEN IN TABLE 3
FIG. 3. STATISTICAL COMPARISON OF CACHE OPTIMIZATION TECHNIQUES GIVEN IN TABLE 4 USING COEFFICIENT OF VARIATION numerical representation is made, which enable us to see which technique is more consistent.We compared total of 15 techniques.By computing values graph shows that victim cache has lowest coefficient of variation which shows that it is more consistent among all other techniques.Resizing and Remapping has highest coefficient of variation which means that it is less consistent technique among all compared techniques.
In future we would like to perform experiments on techniques like LRU+5LF, CAR, ARC and compare their results by homogeneously passing them through a large set of instructions.