# SEAD Counter: Self-Adaptive Counters With Different Counting Ranges

Xilai Liu<sup>®</sup>, Yan Xu<sup>®</sup>, Peng Liu<sup>®</sup>, Tong Yang<sup>®</sup>, *Member, IEEE*, Jiaqi Xu<sup>®</sup>, *Graduate Student Member, IEEE*, Lun Wang<sup>®</sup>, Gaogang Xie, *Member, ACM*, Xiaoming Li, *Senior Member, IEEE*, and Steve Uhlig<sup>®</sup>

Abstract—The Sketch is a compact data structure useful for network measurements. However, to cope with the high speeds of the current data plane, it needs to be held in the small onchip memory (SRAM). Therefore, the product of the counter size and the number of counters must be below a certain limit. With small counters, some will overflow. With large counters, the total number of counters will be small, but each counter will be shared by more flows, leading to poor accuracy. To address this issue, we propose a generic technique: self-adaptive counters (SEAD Counter). When the value of the counter is small, it works as a standard counter. When the value of the counter is large however, we increment it using a predefined probability, so as to represent this large value. Moreover, in the SEAD Counter, the probability decreases when the value increases. We show that this technique can significantly improve the accuracy of counters. This technique can be adapted to different circumstances. We theoretically analyze the improvements achieved by the SEAD Counter. We further show that our SEAD Counter can be extended to three typical sketches and Bloom filters. We conduct extensive experiments on three real datasets and one synthetic dataset. The experimental results show that, compared with the state-of-theart, sketches using the SEAD Counter improve the accuracy by up to 13.6 times, while the Bloom filters using SEAD Counter can reduce the false positive rate by more than one order of magnitude.

Manuscript received January 25, 2020; revised December 6, 2020 and May 24, 2021; accepted July 23, 2021; approved by IEEE/ACM TRANS-ACTIONS ON NETWORKING Editor D. Malone. This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB1800201 and in part by the National Science Foundation of China (NSFC) under Grant U20A20179. The preliminary version of this paper was published on IEEE INFOCOMM 2019 [DOI: 10.1109/INFO-COM.2019.8737531]. (Xilai Liu and Yan Xu contributed equally to this work.) (Corresponding authors: Tong Yang; Gaogang Xie.)

Xilai Liu is with the Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China, and also with the School of Computer Science and Technology, University of Chinese Academy of Sciences (UCAS), Beijing 100190, China (e-mail: liuxilai20121013@gmail.com).

Yan Xu is with the Department of Computer Science, University of Maryland, College Park, MD 20742 USA (e-mail: yxu98@umd.edu).

Peng Liu, Tong Yang, and Xiaoming Li are with the Department of Computer and Science, Peking University, Beijing 100871, China (e-mail: yangtongemail@gmail.com).

Jiaqi Xu and Steve Uhlig are with the School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, U.K. (e-mail: jiaqi.xu@qmul.ac.uk; steve.uhlig@quml.ac.uk).

Lun Wang is with the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA 94720 USA (e-mail: wanglun@berkeley.edu).

Gaogang Xie is with the Computer Network Information Center, Chinese Academy of Sciences (CAS), Beijing 100190, China, and also with the School of Computer Science and Technology, University of Chinese Academy of Science (UCAS), Beijing 100190, China (e-mail: xie@cnic.cn).

Digital Object Identifier 10.1109/TNET.2021.3107418

*Index Terms*—Estimator, sketches, generic counter, network measurement, compression, self-adaptive.

# I. INTRODUCTION

A. Background and Motivation

**N** ETWORK measurements provide indispensable information for congestion control [2], [3], DDoS attack detection [4], [5], heavy hitter identification [6]–[8] and heavy change detection [9]. Measuring the size of different flows in network traffic, known as per-flow size measurements or *perflow measurements for short*, has attracted attentions in recent years [10]–[13]. Flow identifiers (flow IDs) are selected from the header fields of packets, such as the five-tuple: source IP address, source port, destination IP address, destination port, protocol. Flow size is defined as the number of packets in a flow. Flow volume is defined as the number of bytes in the flow.

As the line rate can be high, e.g., 10Gbps, 40Gbps, it is challenging to perform per-flow measurements at line rate. To achieve high processing speed, the data structure should be small enough to be stored in the on-chip memory, such as a Block RAM in FPGA or ASIC chips, or the caches of CPU or GPU chips. However, the size of on-chip memory is very limited (usually less than 20MB [14]–[17]). This means that it is almost impossible to keep one counter for each flow to record the flow size if there comes large number of flows. For example, for the datacenter trace [58], we found there were about 8M flows when counting 30M packets. While for the CAIDA trace, there are about 1.5M flows when counting 30M packets. To achieve memory efficiency, various sketches (e.g., sketch of CountMin (CM) [18], Conservative Update (CU) [19] and Count (C) [20]) allow counters to be randomly shared by multiple flows, inevitably incurring errors.

In real network traffic, it is well known that the flow size/volume distribution is highly skewed [13], [19], [21]–[26]. Specifically, most flows are very small in size, often known as mouse flows; while a few are very large, often known as elephant flows. As elephant flows are typically more important than the small ones, the size of each counter needs to be large enough to store the largest flow. Further, in many practical scenarios, one does not have any idea of the approximate flow size of elephant flows beforehand. Given the limited size of on-chip memory, if we use large counters, the number of counters will be small, and each counter will be shared by more flows, leading to poor accuracy. In this case, most counters are mapped by mouse flows, keeping a small value, and thus their significant bits are wasted. Therefore, if we use large counters, it will be a waste of memory, and the accuracy

1558-2566 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. will be poor. In summary, it is challenging to achieve memory efficiency in skewed network traffic.

We desire to use many small counters to record the sizes of both mouse flows and elephant flows, and keep a balance between memory efficiency and accuracy. The challenge then becomes how can we set up a compact data structure to keep track of a large number of flows using limited amount of space. In this paper, we managed to find a technique to overcome this challenge by introducing the self-adaptive counters technique (SEAD Counter).

#### **B.** Prior Art and Their Limitations

There are primarily two kinds of algorithms for per-flow measurements. The first kind is based on sampling [27]. However, recent works [11], [12], [28] pointed out that the sampling method might lose important information. The other kind is based on a compact data structure called sketch. There are three classic sketches: sketches of Count [20], CM [18], and CU [19]. The CM sketch is the most widely used one in networking [29]. These sketches share the same data structure and similar operations. Therefore, we only present the details of the CM sketch here. A CM sketch consists of d arrays, each of which is associated with a hash function, denoted by  $h_1(.) \dots h_d(.)$ . The i<sup>th</sup> array is represented by  $A_i$ . Each array consists of w counters. When inserting a packet with flow ID e, the CM sketch adds the packet size of e to the d counters:  $A_1[h_1(e)] \dots A_d[h_d(e)]$ , called the *d* mapped counters in this paper. When querying item e, the CM sketch reports the minimum value among d mapped counters:  $min\{A_1[h_1(e)] \dots A_d[h_d(e)]\}$ . Obviously, the accuracy of the CM sketch will decrease as the number of counters decreases. However, as each counter needs to be large enough to accommodate the largest flows, the CM sketch cannot achieve memory efficiency, as do sketches of C and CU. To achieve memory efficiency, small counters must be used.

The state-of-the-art named Counter tree [30] uses multiple layers of small counters, with the counters at the higher layers used to store the significant bits of the large flows. In this way, it can improve the memory efficiency. Unfortunately, for each packet belonging to a large flow (also called elephant flow), Counter-Tree needs to access all layers, requiring many memory accesses for each insertion. Achieving memory efficiency while ensuring fast and sufficient processing speed to keep up with line rate is therefore very challenging.

The Bloom filter [31] is a space-efficient probabilistic data structure for representing a set of elements. It has been widely used to support membership query with an acceptable false positive rate. Bloom filters have been applied to various fields, like networking and databases, and achieved great success. It has three important applications: single set membership queries, many sets membership queries and estimating the multiplicities of elements. We refer the reader to several surveys [32]–[34] for detailed applications.

A Bloom filter is used to tell whether an item belongs to a set or not. A Bloom filter is an array consisting of w bits and is associated with k independent hash functions  $h_1(\cdot), h_2(\cdot), \ldots, h_k(\cdot)$ , whose outputs are uniformly distributed in the range [1, w]. We denote the *i*-th bit of the array with A[i]. To insert an item e, the Bloom filter computes k hash functions and sets  $A[h_1(e)], A[h_2(e)], \ldots, A[h_k(e)]$ to 1. To query an item e', it checks whether  $A[h_1(e')],$  $A[h_2(e')], \ldots, A[h_k(e')]$  are all 1. If the k positions are all 1's, then it reports that e is in the set; otherwise, it reports that *e* is not in the set. Obviously, a BF never reports  $e \notin S$  if *e* actually belongs to *S*, i.e., it has no false negatives. However, a BF may report  $e \in S$  when *e* actually does not belong to *S* sometimes, i.e., it has false positives. The false positive rate of a BF is so small that it can be negligible in practical scenarios. For example, fewer than 10 bits per element are required for a 1% false positive probability, independent of the size or number of elements in the set [35].

However, the conventional method may lead to a waste of memory or a huge loss in terms of accuracy. Since we cannot know the size of the set beforehand, it is impossible to find an appropriate size for the Bloom filter.

A Bloom filter may be "too large" or "too small" for a given set, which means that after inserting all the items from that set, the number of 1's in the Bloom filter may be too small, indicating a waste of memory we may achieve the same false positive rate with much less space. In other words, we may use too much space for the Bloom filter), or too large, indicating too many collisions and a drastic drop in accuracy.

Therefore, When we say a Bloom filter is "too large", we mean that we only need to represent a small number of elements, so we don't have to allocate so much space for the Bloom filter. When we say a Bloom filter is "too small", we mean that we need to represent a large number of elements, so we have to allocate much more space for the Bloom filter. The situation of the Bloom filter is the same as that of the sketches, where we can't configure a sketch or a Bloom filter accurately because we don't know how many flows (elements) there will be beforehand. Therefore, we propose SEAD Counter to improve the space efficiency, which means that we can allocate the same memory size (the limit amount of memory) as we do w.o. SEAD Counter while achieving a better accuracy with this space-efficient probabilistic data structure.

#### C. Our Solution

Most of the sketches have two shortcomings for skewed datasets:

- 1) The different sizes of the counters have to be set large enough so that they can hold the largest flows in the stream without overflowing. As memory is limited, the size of the counters could not be set large enough and could be like 8-bit or 16-bit. If we set them too large, there will be too many collisions and a drastic drop in accuracy. As a consequence, if memory is limited and the counters could not be set large enough, a lot of bits are wasted while some counters in the hash table may overflow.
- 2) They lack a mechanism to deal with changes of network workloads. For example, on black Friday, a sketch on a router near a server of an electronic commerce website not designed to handle this unusual workload is likely to fail to handle the load.

In this paper, we propose a generic technique aimed at making every bit count. Our technique is applicable to all sketches using counters, and it can be extended to Bloom filters, too. Recall that the design goal is to achieve both memory efficiency and fast, sufficient speed. To achieve memory efficiency, we have to use small counters. To achieve sufficient and fast processing speed, to make the data structure as easy to use by real applications, we do not introduce multiple layers, a common design point in many papers like Counter-Tree [30] or Pyramid Sketch [36], or change the basic data structure of sketches. The challenge is that each small counter has to be able to represent the size of both mouse and elephant flows.

Our key idea is simple: when a counter is going to overflow, for each insertion, instead of always increasing it, we increase it with a predefined probability. When the value of the counter is small (e.g., is mapped by one or several mouse flows), we consider it as a normal counter, so as to achieve high accuracy. By carefully designing the probability setting, we can use a small counter to represent a very large value as well. Introducing this probability could incur inaccurate recording of large values. Fortunately, we prove theoretically and empirically that the incurred inaccuracy is negligible compared to the large value of the size of elephant flows.

Based on the above idea, we propose two versions of our technique: the Static Sign Bits version and the Dynamic Sign Bits version. Our technique splits each counter into two parts: (1) sign bits, and (2) counting bits. When the sign bits are all 0, we increment the counting bits normally; when the sign bits are non-zero, we increment the counting bits with a probability calculated by the value of the sign bits. As the number of bits of each counter is fixed, if we assign more bits as sign bits, the number of counting bits will decrease. In this case, the counter cannot accurately record the size of mouse flows.

For the static version, we fix the length of sign bits in advance, and do not change it during insertions. The shortcoming of the static version is that it is hard to determine how many bits should be assigned for the sign bits. To address this, we propose the Dynamic Sign Bits version, which uses an adaptive method that dynamically adjusts the length of the sign bits according to the value that the counter needs to represent. This adaptive method works as follows: we consider all sequences of 1 bits from the left as sign bits. The leftmost 0 bit is considered a marker, the split bit, while others are counting bits. For example, given a counter with value 11101011, it means the following. The first three bits are sign bits, representing a value of 3. The fourth bit, 0, the split bit, splits the sign and counting bits. The last four bits 1011 are the counting bits. More details about the adaptive methods are provided in Section III-B.

# D. Key Contributions

- We propose two versions of a generic sketch technique, static and dynamic. With our technique, sketches can use small counters to accurately record the sizes of both elephant and mouse flows, achieving memory efficiency as well as sufficient and fast speed. Besides, Bloom filters using SEAD Counters can achieve a higher accuracy. We apply our technique to sketches of CM, CU and C. We use our counters in Counting Bloom Filters (CBF) and Variable-Increment Counting Bloom Filters (VI-CBF). We also use our counter as the estimator.
- 2) We conduct a detailed analysis of the adaptive counters to show their theoretical properties.
- We carry out extensive experiments, and provide results on three real datasets and one synthetic dataset, demonstrating the improved performance of our adaptive counters.

The rest of the paper is organized as follows. We describe related work in Section II. The main ideas of the SEAD Counter are described in Section III. We provide a case study to show how to apply the SEAD Counter to the CM sketch, the CU sketch and the C sketch in Section IV. In Section V, we provide a theoretical analysis for two versions of the SEAD Counter. In Section VI, we provide experimental evaluations on SEAD Counter, compared with two other estimators and SEAD Counter sketches, compared to original sketches. Finally, we conclude in Section VII.

# II. RELATED WORK

The idea of estimators was first introduced by Morris in Approximate Counting [37]. At the price of small errors, estimators can represent large values with small counters. In network area, Small Active Counters (SAC) [38] first adopted this idea and later developed by DISCO [39] to provide better accuracy. Recently, CEDAR [40] and ICE-Buckets [41] extended the analysis of the estimation function in DISCO and include some other estimation techniques to further improve precision. An alternative to estimators is to use a variable-length counter encoding, like Pyramid Sketch [36] and BRICK [42].

For per-flow measurements, two methods exist: sampling and sketches. The authors in [43] prove that sampling has poor accuracy, and propose an algorithm to improve it. To improve the accuracy, various sketches have been proposed, including the CM sketch [18], the CU sketch [19], the C sketch [20] and many other advanced techniques such as the A sketch [13], the CSM sketch [44], the Tug-of-war sketch [45] and its enhanced version [46], the AMS sketch [47]. Sampling and sketches methods can be combined with estimators to further reduce the size of every counter at the expense of precision.

Due to space limitations and the too many various flavours of related sketches, we only briefly introduce the most typical estimators and sketches.

We also cover widely used Bloom filters, such as the standard Bloom Filter (BF) [31], the Counting Bloom Filter (CBF) [48], and the Variable-Increment Counting Bloom Filter (VI-CBF).

#### A. Small Active Counter

For one counter with *n* bits, the Small Active Counter [38] divide it into two parts: 1-bit exponent part called *mode* and *k*-bit estimation part called *A*. Using *r* as the global scale parameter, the real value *V* can be estimated by  $\hat{V} = A \cdot 2^{r \cdot mode}$ . In its HAC version, DRAM resources can be used to further improve its accuracy.

# B. ICEBuckets

For one counter with n bits to count up to M, ICE-Buckets [41] relies on an estimation function  $A : \{0, \ldots, 2^n - 1\} \longrightarrow [0, M]$ , which accepts a symbol l between 0 and  $2^n - 1$  as input and returns an estimation value for it. Using  $\epsilon$  as one parameter, the expression of ICEBuckets' estimation function is:

$$A_{\epsilon}(l) = \frac{(1+2\epsilon^2)^l - 1}{2\epsilon^2} (1+\epsilon^2)$$

ICEBuckets proved its expression is optimal. Besides, ICEBuckets introduces independent counter estimation buckets, where each bucket has its own  $\epsilon$  value to achieve multiple scale in overall data structure. After separating the flows to different buckets, their optimal estimation functions are configured according to each bucket's counter scale. However, due to complicated expression, its processing speed is limited.

# C. CU Sketch

The CU sketch [19] has the same structure as the CM sketch, but its insertion strategy is sacrificing speed for accuracy using "conservative update". When inserting a packet e, it only adds the metric of interest of e to the smallest counter(s) among d mapped counters. When querying packet e, the CU sketch reports the minimum value among d mapped counters.

# D. C Sketch

The C sketch [20] differs from the CM and CU sketch in the sense that each array is associated with two hash functions. Besides d hash functions  $h_1(.) \cdots h_d(.)$ , there are additional d hash functions  $s_1(.) \cdots s_d(.)$  mapping each incoming packet to  $\{+1, -1\}$ . If the result of the second hash function is +1, then the insertion proceeds as usual. Otherwise, the metric of interest of the packet will be subtracted from the d mapped counters. When querying, it will report the absolute value of the median of  $\{A[h_i(key)] \times s_i(key)\}_i$ , where i is in  $0, 1, \cdots, d-1$ .

# E. Sophisticated Sketches

Many advanced sketch techniques have been proposed recently [49]–[51]. The augmented sketch (A sketch) [13] is targeted at improving accuracy by using one additional filter to dynamically capture packets from heavy-hitters. The A sketch adds an additional filter (which is actually a queue with k packets) to an existing sketch T. The A sketch is very accurate only for those packets in the filter. For the sketch which offload heavy hitters by one additional structure, like A sketch, we can combine it with our SEAD counter to both use smaller SEAD counters for the mouse flows and reduce the error of elephant flows by the additional structure.

The pyramid sketch [36] automatically enlarges the size of the *d* mapped counters according to the current frequency of the incoming packet, while achieving fast speed.

The Counter-Tree [30] provides a scalable architecture for per-flow measurements. The key idea of the Counter-Tree is to rearrange the counter sharing scheme based on the CM sketch. The authors claim that their 2-D tree structure is able to work with very limited space. However, in terms of accuracy, the performance of the Counter-Tree is not as good as the original CM sketch under the same memory usage. Moreover, the CM sketch and other typical sketches support the (ID, f)insertion where  $f \ge 1$ . The Counter-Tree on the other hand only supports (ID, 1)-insertions, which makes it unsuitable for applications such as flow volume measurements.

# F. Counting Bloom Filters (CBF)

The Counting Bloom Filter (CBF) suggested by Fan *et al.* [48] is a generalization of the BF, in which each hash entry contains a counter with a fixed size of b bits, instead of a single bit in the BF.

To insert an element, all the corresponding hashed counters are incremented by one. Likewise, to delete it, all of its hashed counters are decremented. To determine if an element  $e \in S$ , we check if all of its hashed entries are positive.

Unfortunately, while supporting deletions, CBF also needs large amounts of memory space, which is often valuable in networking devices.

# G. Variable-Increment Counting Bloom Filters (VI-CBF)

The Variable-Increment Counting Bloom Filter [52], denoted as VI-CBF, is a generalization of CBF that uses variable increments to update each entry. When inserting an element, the counters of VI-CBF are incremented by a hashed variable increment instead of a unit increment. Then, to query an element, the exact value of a counter is considered, not just its positiveness.

IEEE/ACM TRANSACTIONS ON NETWORKING

They first define a set of possible variable increments D. Then, for each counter update, they hash the element into a value of D and use it to increment the counter. Likewise, to delete an element, they decrement it by its hashed value in D. Last, to determine if an element  $e \in S$ , they check in each of its counters if its hashed value in D could be part of the sum. If this is not the case in at least one counter, then necessarily  $e \notin S$ . Otherwise, as for BF and CBF, they state that  $e \in S$ , which might be a false positive.

Specifically, k hash functions  $G = \{g_1, \dots, g_k\}$  are employed to select k increments from the set D. Note that  $D = \{v_1, v_2, \dots, v_u\}$  is a set of integers such that all the sums  $v_{i_1} + v_{i_2} + \dots + v_{i_l}$  with  $1 \le i_1 \le \dots \le i_l \le u$  are distinct. Thereafter, k hash functions  $\{h_1, \dots, h_k\}$  map the element into k cells of VI-CBF.

To query an element e, VI-CBF first checks the k counters  $C_1[h_1(e)], \dots, C_1[h_k(e)]$ . If any counter is 0, obviously  $e \notin S$ . If  $C_1(i)$  is small, VI-CBF considers the exact values in both counter vectors. In this case, no more than u elements are hashed into these cells. Thereafter, VI-CBF can deduce the employed increments in the value of  $C_2(i)$ . If  $v_{g_i(e)}$  is contained in  $C_2[g_i(e)]$  ( $i \in [1, k]$ ), then  $e \in S$  with high probability; otherwise,  $e \notin S$ . Finally, if  $C_1(i)$  is large, VI-CBF considers that this cell is not useful and examines other cells for possibly eliminating the membership of e. By updating the counters with diverse increments, VI-CBF effectively distinguishes the elements mapped into a cell. It was shown that the VI-CBF has a better false positive rate than the CBF for a fixed number of bits per element (bpe), although it requires more bits per counter for its smaller number of counters.

The time complexity for query of CBF is O(k), where k is the number of hash functions or the number of counters an element corresponds to. In the case of VI-CBF, for each counter  $c_i$ , it checks if  $(c_i - v_{g_i(y)}) \in (-\infty, 1] \cup [1, L - 1]$ , where D = [L, 2L - 1]. Therefore, the time complexity is also O(k).

#### **III. THE SEAD COUNTER TECHNIQUE**

In this section, we describe our SEAD Counter (Self-Adaptive Counters) approach. We propose two versions of SEAD Counter, a Static Sign Bits version and a Dynamic Sign Bits version. Inspired by floating-point number representation, the Static Sign Bits version uses some *sign bits* to adjust the magnitude of the counters, enabling the counters to represent larger values. However, it is hard to determine an appropriate length for the sign bits so that mouse flows can be accurately recorded. To overcome the weakness of the static version, we further design a dynamic version.

Some notations used in this section are shown in Table I.

# A. Static Sign Bits Version

**Rationale:** Let n be the counter size, *i.e.*, the number of bits in a counter. To achieve memory efficiency, we need to use fine-grained counters, *i.e.*, small counters (*e.g.*, n = 8). The capacity of the counters, *i.e.*, the maximum value that counters can represent, is fixed to  $2^{n}$ -1. However, when a counter is LIU et al.: SEAD COUNTERS WITH DIFFERENT COUNTING RANGES

TABLE I Symbols Used in the Paper

| Symbol                    | Description                                              |
|---------------------------|----------------------------------------------------------|
| l                         | the value resided in an SEAD counter                     |
| d                         | number of counter arrays in a sketch                     |
| W                         | number of counters in a counter array                    |
| n                         | number of bits in a counter                              |
| S                         | length of sign bits in a counter                         |
| $\gamma[0],, \gamma[k-1]$ | the expansion array for SEAD Counter.                    |
| stage[i]                  | the starting value of stage i, depending on the          |
|                           | version, can be derived from $\gamma[0],, \gamma[k-1]$ . |



Fig. 1. Example of SEAD Counter with Static Sign Bits, where n = 8 and s = 3. To insert a pair (e, 1), the counting part is increased by 1 with probability  $\frac{1}{4}$ . If the counting part overflows, we increase the sign part by 1, and get a new expansion parameter  $\gamma[3] = 8$ .

mapped by too many flows, especially large ones, the value of the counter will easily exceed  $2^{n}$ -1.

Our technique is inspired by, but different from the encoding style of *floating-point numbers*. In a typical floating-point representation, the value can be calculated using the following three parts: 1) a *sign digit* indicating the value to be positive or negative. 2) *exponent digits* which represent an integer that controls the magnitude of this representation. 3) *significant digits* that carry the value related to its measurement resolution. Similarly, in our technique, we also split a counter into two parts: a *Sign Bits* part (sign part for convenience) and a *Counting Bits* part (counting part for convenience). The counting part functions as the significant digits, while the sign part functions as the exponent digits. Specifically, for each possible value of the sign part, we pre-define a corresponding integer indicating how many times the counting part should be expanded. We call this pre-defined array the *expansion array*.

Self Adaptive Counter (SEAD Counter): Our technique is called Self-Adaptive Counters (SEAD Counter). Next we show the data structure and operations of the SEAD Counter.

**Data Structure:** For the SEAD Counter, it has a s-bit sign part and a (n-s)-bit counting part as shown in Figure 1. We denote the expansion array as  $\gamma[0], \gamma[1], \dots, \gamma[k-1]$ , where  $k = 2^s$ . For example,  $\gamma[i] = 2^i$ . After setting up all the above parameters, the capacity of the Static Sign Bits version of the SEAD Counter is  $C_{static}(n,s) = \sum_{i=0}^{2^s-1} (\gamma[i] \times 2^{n-s})$ . Here, we define the capacity of the SEAD counter as the *expected* number of increments it can take before exceeding its maximal value. Due to over-sampling problem, we should set the expansion array to make the capacity slightly higher than the number of increments we want to support. Besides, in the extreme case where over-sampling happened, we just return the capacity value, which is the maximum possible value that can be represented by our counter.

**Insertion:** We show the steps of how to add 1 to an SEAD Counter. The procedure to update SEAD Counters with larger values can be seen in **Algorithm 1**.

Step 1: In an SEAD counter, we get the sign part  $s_0$ , the counting part  $c_0$ , and the value  $\gamma[s_0]$  from the expansion array.

# Algorithm 1 *read* and *update* Functions of SEAD Counters

read: (the value resided in the counter: l, an expansion array: γ)
1: c<sub>l</sub> = l

2: if l < 0 then

3:  $\tilde{l} + 1 \mapsto c_l$ , thus  $c_l$  is the two's complement of l

- 4:  $s_0$ : value of the sign part in  $c_l$
- 5:  $c_0$ : value of the counting part in  $c_l$

6:  $c = stage[s_0 - 1] + (\gamma[s_0] \times c_0)$ 

7: if l < 0 then

```
8: -c \mapsto c
```

9: **return** *c* 

**update:** (the value resided in the counter: l, a value v, an expansion array  $\gamma$ )

1: if  $read(l, \gamma) + v > C_{dynamic}(n)$  then

2: 
$$2^{n-1} \mapsto l$$
 (set l to the maximum in the counter)

4:  $c_l = l$ 

- 5: **if** l < 0 **then**
- 6:  $l+1 \mapsto c_l$ , thus  $c_l$  is the two's complement of l
- 7:  $s_0$ : value of the sign part in  $c_l$
- 8:  $c_0$ : value of the counting part in  $c_l$

9: 
$$q = \frac{v}{\gamma[s_0]}, r = v\%\gamma[s_0]$$

- 10: if  $r \neq 0$  then
- 11: p: a random number in [0, 1]

12: **if**  $p < \frac{r}{\gamma[s_0]}$  **then** 

13:  $l+1 \mapsto l$ 

14: **if**  $0 < q < stage[s_0] - c_0$  **then** 

15:  $l + q \mapsto l$ 

va

16: **else** 

17: 
$$v' = v - (stage[s_0] - c_0) \times \gamma[s_0]$$

18:  $stage[s_0] \mapsto l$ 19:  $update(l, v', \gamma)$ 

Step 2: Since  $\gamma[s_0]$  indicates how many times the counting part should be expanded, we first calculate  $\frac{1}{\gamma[s_0]}$ , and add 1 to the counting part with probability  $\frac{1}{\gamma[s_0]}$ .

Step 3: If the counting part reaches  $2^{n-s}$ , we increase the sign part by 1, and set the counting part to zero.

**Query:** For an SEAD counter C, we calculate the value represented by C as follows:

1) First, we get the value of the sign part  $s_0$  and the value of the counting part  $c_0$ . Then, we find  $\gamma[s_0]$  from the expansion array and another value  $stage[s_0]$  from the *stage array*. The stage array is pre-computed using the following formula (We assume  $\gamma[j] = m^j$  in the stage expression to contrast static and dynamic version better):

$$\begin{cases} stage[0] = 0, \\ stage[i] = 2^{n-s} \times \sum_{j=0}^{i-1} \gamma[j] = \frac{2^{n-s}(m^i - 1)}{m-1}, \ i > 0 \end{cases}$$
(1)

2) The value represented by C can be calculated with the following formula:

$$lue(\mathcal{C}) = c_0 \times \gamma[s_0] + stage[s_0].$$
  
=  $c_0 m^{s_0} + \frac{2^{n-s}(m^{s_0} - 1)}{m-1}$  (2)



Fig. 2. Structure of Dynamic Sign Bits SEAD Counter.

The problem of the Static Sign Bits version is that, since we do not know the size of the mouse flows, we cannot determine an appropriate length for the counting and sign parts. Specifically, when the sign part is zero, the flow size is accurately recorded. When we use a large sign part, the counting part will be shortened, and thus the counter may not accurately record the mouse flows.

# B. Dynamic Sign Bits Version

**Rationale:** Given a fixed counter size, to address the issue of the space taken by a fixed-length sign part, we can use an adaptive method to dynamically adjust it. The length of the sign part is initialized to 0. Except for the split digit, all other bits are used for counting. As the value represented by the counter becomes larger, we increase the length of the sign part dynamically. In this way, we can accurately record mouse flows, while being able to deal with elephant flows.

**Data Structure:** An *n*-bit Dynamic Sign Bits version SEAD Counter has three parts: 1) a sign part whose length  $l_s$  is made up by ones, 2) a *split digit* which is the leftmost zero digit, 3) a  $(n-l_s-1)$ -bit counting part. We create an expansion array  $\gamma[0], \gamma[1], \dots, \gamma[n-2]$ . After setting up the above parameters, the capacity of the Dynamic Sign Bits version SEAD Counter is  $C_{dynamic}(n) \sum_{i=0}^{n-2} (\gamma[i] \times 2^{n-i-1})$ .

**Insertion:** The insertion process of the dynamic version of the SEAD Counter is similar to the static version, except for two differences: (1) In the dynamic version, the value of the sign part, *i.e.*,  $s_0$ , is equal to the number of ones in the sign part; For example, the value of the sign part in"111011" is 3. (2) In the dynamic version, when the counting part overflows, we turn the split digit to 1, and set the bits of the counting part to all zeroes. By doing this, the length of the sign part is increased by 1, the split digit is moved right by 1 bit, and the counting part is shortened by 1 bit.

**Query:** The query process of the dynamic version of the SEAD Counter is similar to the static version. To calculate the value of an SEAD Counter with Dynamic Sign Bits, the first step is also to get  $s_0$  and  $c_0$  from the counter, and read the value  $\gamma[s_0]$  and  $stage[s_0]$  from the expansion and stage arrays, respectively. The two differences are that  $s_0$  is the length of the sign part, and the stage array is computed through the following formula:

$$\begin{cases} stage[0] = 0\\ stage[i] = \sum_{j=0}^{i-1} \left(\gamma[j] \times 2^{n-j-1}\right)\\ = \begin{cases} i \cdot 2^{n-1}, & m = 2, \ i > 0\\ \frac{2^{n-1}\left(\left(\frac{m}{2}\right)^{i}-1\right)}{\frac{m}{2}-1}, & m \neq 2, \ i > 0 \end{cases}$$
(3)

The second step is the same as the static version. The value represented by a dynamic version SEAD Counter can be calculated using the first equality of Formula 2.

Even if there can be over-sampling, the *n*-bit SEAD Counter with Dynamic Sign Bits can increment  $2^n$  times, which is the same as the normal counter.



 Sign part: +1
 Counting part: 2

  $\gamma$ [3]=8
 0
 1
 1
 0
 0
 1
 0
  $\{\frac{148}{4} - [(2^5 - 1) - 10]\}$  /8 = 2 

Fig. 3. An example of SEAD Counter with Static Bits Version, where n = 8 and s = 3. To increase the counter by 148, the counting part will overflow. We increase the sign part by 1, and get a new expansion parameter  $\gamma[3] = 8$ .

# C. Insertion to the Counters With Larger Values

We show the steps of how to add v to an SEAD Counter.

Step 1: In an SEAD counter, we get the sign part  $s_0$ , the counting part  $c_0$ , and we check the value  $\gamma[s_0]$  in the expansion array. Since  $\gamma[s_0]$  indicates how many times the counting part should be expanded, we first calculate  $\frac{v}{\gamma[s_0]}$ , getting a quotient q and a remainder r.

Step 2: We compare q with  $g = (2^{n-s} - c_0)$ , and there are two cases:

- 1) q < g: This means the SEAD Counter can hold the value without changing the sign part. We only increase the counting part by q, *i.e.*,  $c_0 + q \mapsto c_0$ ,  $r \mapsto r$  and  $s_0 \mapsto s_0$ .
- 2) q ≥ g: We first increase the sign part by 1, *i.e.*, s<sub>0</sub>+1 → s<sub>0</sub>, and set the counting part to zero, *i.e.*, 0 → c<sub>0</sub>. Then, we calculate (q-g)×γ[s<sub>0</sub>]+r/γ[s<sub>0</sub>]+r, and get a new quotient q' and a new remainder r'. We let r' → r, q' → q and go back to Step 1.

*Step 3:* We further increase the counting part by 1 with probability  $\frac{r}{\gamma[s_0]}$ . If the counting part is increased to  $2^{n-s}$ , we increase the sign part by 1 and set the counting part to zero.

# IV. CASE STUDIES

To further illustrate the generality of the SEAD Counter technique, in this section, we show how to apply the SEAD Counter to the sketches of CM [18], CU [19] and C [20]. We further extend our counters to CBF and VI-CBF. Two functions of the SEAD Counters, *read* and *update*, are shown in Algorithm 1.

# A. Application to the CM Sketch

In the CM sketch, all normal counters are replaced by the SEAD Counters. After locating d counters using d hash functions, the insertion procedure is done by calling the "*update*" function of the corresponding SEAD Counter. The query function is done by reporting the minimum value of "*read*" by all corresponding SEAD counters.

# B. Application to the CU Sketch

The CU sketch is similar to the CM sketch but with a different update technique called "conservative update of counters". In a CU sketch, the insertion procedure is done by calling the "*update*" function on the smallest counters in the SEAD Counter. The query process is the same as for the CM sketch.

#### C. Application to the C Sketch

The C sketch consists of an array with  $t \times k$  counters. One important feature of the C sketch is that it requires two sets of hash functions  $h[1] \cdots h[t]$  and  $g[1] \cdots g[t]$ , where  $h[i] : [0, n] \rightarrow [0, k]$ ,  $g[i] : [0, n] \rightarrow \{-1, 1\}$ . Negative values can occur in the counters, so the first bit of each counter should be the sign bit. When inserting a packet, the sketch calls the "update" procedure of the SEAD Counter to add the value to the corresponding counter. The query function is done by reporting the median value of "read" by all corresponding counters.

Since we may add a negative number to the counters in the C sketch, in the two functions, *i.e.*, "*read*" and "*update*", we use the leftmost bit of a SEAD Counter to indicate whether the counter is positive or negative. As shown in Algorithm 1, the "*read*" function takes a counter c and the expansion array  $\gamma$  as input, and outputs the value represented by the counter. The "*update*" function takes a counter c, an increment value c, and the expansion array  $\gamma$  as input.

#### D. Application to CBF and VI-CBF

CBF has essentially the same structure as the CM sketch, so the "*update*" and the "*read*" functions behave the same. However, they are used differently and therefore sized differently.

VI-CBF [52] works in the same model as CBF, which means that the same element may be inserted more than once. VI-CBF differs from CBF in that the increments and decrements are variable. While all the corresponding hashed counters are incremented by one in CBF when inserting an element, in the case of VI-CBF, all the corresponding hashed counters are incremented by a certain hashed value, which is previously defined as a set of possible variable increments D. For the same element, the corresponding hashed counters are incremented with the same corresponding hashed value. For CBF, we check in each of its counters if its hashed value is positive. For VI-CBF, we check in each of its counters if its hashed value in D could be part of the sum. We adopt the scheme proposed in [52], which uses variable increments but only relies on a single variable-increment counter per entry, without the additional counter that indicates the number of hashed elements. Specifically, in each array entry, the counter is updated using variable increments selected from a set D = $\{v_1, v_2, \ldots, v_\ell\}$ . We use again two sets of k hash functions,  $H = \{h_1, \ldots, h_k\}$  and  $G = \{g_1, \ldots, g_k\}$ . Upon insertion, at each corresponding array position  $h_i(x)$ , the counter is incremented by the element  $v_{g_i(x)}$  of the set D. Likewise, upon deletion, the counter  $h_i(x)$  is decremented by  $v_{g_i(x)} \in D$ .

When an element is inserted, we use the "update" function in Algorithm 1 and increment the counter with the corresponding value. When we want to query whether the element ybelongs to the set, we see whether the values of the k hashed entry of y can be the sum of the increment. Specifically, let y be an element whose *i*-th hash function  $h_i(y)$  hashes into an entry of value  $c_i$ . If there exists *i* such that  $(c_i - v_{g_i(y)}) \in$  $(-\infty, -1] \cup [1, L-1]$ , then  $y \notin S$ . If there is no such *i*, we say that  $y \in S$ . Here the value  $c_i$  is *c* in Algorithm 1, which is derived by the "read" function.

# V. THEORETICAL ANALYSIS

In this section, we analyze the improvements that using SEAD Counter for a given amount of memory. We also compare the performance of the Static Sign Bits and the Dynamic Sign Bits version. The result shows that the Dynamic Sign Bits version of SEAD Counter solves the problem of errors introduced at an early stage with the Static Sign Bits version SEAD Counter.

#### A. More Capacities

Suppose a counter has n bits. Let's calculate the capacities of a regular and a self-adaptive counter. The capacity of a regular counter is  $2^n$ . In the Static Sign Bits version of the SEAD Counter, the capacity is raised to  $\sum_{i=0}^{2^s-1} (\gamma[i] \times 2^{n-s})$ , where s is the length of the sign part. In the Dynamic Sign Bits version, the capacity is  $\sum_{i=0}^{n-2} (\gamma[i] \times 2^{n-i-1})$ . For simplicity, we choose  $\gamma[i] = m^i$  as an example: with k bits, for the Static Sign Bits version to reach a capacity of  $2^n$ , we have:

$$2^{n} = 2^{k-s} \times \left(\frac{m^{2^{s}} - 1}{m - 1}\right)$$

Since k must be an integer, we can solve:

$$k = n + s - \lfloor \log_2(\frac{m^{2^*} - 1}{m - 1}) \rfloor \tag{4}$$

For example, if we set the length of the sign part to s = 2, and the expansion array to  $\gamma[i] = 4^i$ , the resulting space will be reduced to k = n - 4. That means that using the static version of SEAD Counter technique, four bits can be saved to reach the same capacity as a regular counter.

If we set the expansion array to  $\gamma[i] = 2^i$  in the Dynamic Sign Bits Version, we have:

$$2^n = 2^{k-1} \times (k-1)$$

For example, if we choose n = 16, then we have k = 13 and 3 bits can be saved.

For the Dynamic Sign Bits Version to reach a capacity of  $2^n$  with  $m \neq 2$ , we have:

$$2^{n} = 2^{k-1} \times \frac{\left(\frac{m}{2}\right)^{k-1} - 1}{\frac{m}{2} - 1}, m \neq 2$$

If we set the expansion array to  $\gamma[i] = 4^i$ , we have:

$$k = \lceil \frac{n}{2} \rceil + 1 \tag{5}$$

With the Dynamic Sign Bits Version, almost half of the space can be saved to reach the same capacity as for a regular counter.

#### B. Upper-Bound of Relative Error

To evaluate the accuracy of the SEAD Counter, we prove the unbiasness of SEAD Counter and give the analysis of its variance and relative error.

1) unbiasness: In each stage[s], the probability of incoming packet added to the SEAD Counter remains the same:  $p = \frac{1}{\gamma[s]}$ . Suppose there are t packet arrivals in this stage, we define X as the increments of the actual value stored in the counter, and we define V(X) as the increments of the represented value of X. X follows a Binomial distribution:  $X \sim B(t, \frac{1}{\gamma[s]})$ . Deriving the expectation of V(X) is not difficult:

$$E(V(X)) = E(\gamma[s] \times X)$$
  
=  $\gamma[s] \times E(X)$   
=  $\gamma[s] \times \frac{t}{\gamma[s]} = t$ 

This means that increments are unbiased in each stage, and the counter estimation is thus unbiased.

IEEE/ACM TRANSACTIONS ON NETWORKING

2) variance and relative error: we discuss the quality of estimation mainly in terms of the root mean squared relative error (RMSRE), or relative error in short. For each stage[s], their increments and decrements processes can be considered independent. Then using the same notation in 1), for stage[s],

$$var(V(X)) = E(V(X)^{2}) - E^{2}(V(X))$$
  
=  $E(\gamma^{2}[s] \times X^{2}) - t^{2}$   
=  $\gamma^{2}[s]E(X^{2}) - t^{2}$   
=  $\gamma^{2}[s](Var(X) + E^{2}(X)) - t^{2}$   
=  $\gamma^{2}[s](\frac{t(\gamma[s] - 1)}{\gamma^{2}[s]} + \frac{t^{2}}{\gamma^{2}[s]}) - t^{2}$   
=  $t(\gamma[s] - 1)$ 

We define  $\hat{Y}$  as the random variable representing the estimation value of a flow after Y packets have arrived. The root mean square relative error (RMSRE) is

$$RMSRE[Y] = \sqrt{\mathbb{E}\left[\left(\frac{\hat{Y} - Y}{Y}\right)^2\right]}$$

We give a similar proof as in [41]. Let  $Q_l(Y)$  be the probability to have a value *l* resided in the counter given that exactly *Y* packets have arrived at the flow. As mentioned before, the estimator is unbiased, thus

$$\sum_{l} Q_l(Y)V(l) = Y.$$
(6)

In order to calculate the relative error, we should first find the variance of the estimation value. The Static Sign Bits version can accurately record  $[0, 2^{(n-s-1)}]$ , while the Dynamic Sign Bits version can accurately record  $[0, 2^{n-1}]$ , where *n* is the number of bits in a counter and *s* is the length of the sign part in the static version. Therefore, when  $Y \leq 2^{(n-s-1)}$  for Static Sign Bits version and  $Y \leq 2^{(n-1)}$  for Dynamic Sign Bits version, the relative error is also 0. For greater values, we already know its mean, so let us find

$$\mathbb{E}\left[\hat{Y}^2\right] = \sum_l Q_l(Y)V^2(l).$$

We first compute  $\mathbb{E}\left[\left(\hat{Y}+1\right)^2 - \hat{Y}^2\right]$ . If the counter's value is l and it's at stage  $s_l$ , when a packet arrives the counter's value is incremented with probability  $\frac{1}{\gamma[s_l]}$  or remains unchanged with probability  $1 - \frac{1}{\gamma[s_l]}$ . Therefore,

$$\mathbb{E}\left[\left(\hat{Y}+1\right)^{2}-\hat{Y}^{2}\right] = \sum_{l}\left(V(l+1)^{2}-V(l)^{2}\right)\cdot\frac{1}{\gamma[s_{l}]}Q_{l}(Y)$$
$$= \sum_{l}\left(V(l+1)+V(l)\right)Q_{l}(Y)$$
$$= \sum_{l}\left(2V(l)+\gamma[s_{l}]\right)Q_{l}(Y)$$
$$= \sum_{l}2V(l)Q_{l}(Y)+\gamma[s_{l}]Q_{l}(Y)$$
$$= 2Y + \sum_{l}\gamma[s_{l}]Q_{l}(Y)$$
(7)

Therefore,

$$\mathbb{E}\left[\hat{Y}^{2}\right] = \sum_{\substack{i=0\\Y-1}}^{Y-1} \left(\mathbb{E}\left[(\widehat{i+1})^{2}\right] - \mathbb{E}\left[(\widehat{i})^{2}\right]\right) + \mathbb{E}\left[\widehat{0}^{2}\right]$$
$$= \sum_{\substack{i=0\\I=0}}^{Y-1} (2i + \sum_{l} \gamma[s_{l}]Q_{l}(i))$$
$$= Y(Y-1) + \sum_{\substack{i=0\\I=0}}^{Y-1} \sum_{l} \gamma[s_{l}]Q_{l}(i)$$
(8)

We define the expected counter's value of i as  $l_i$ . As the stage value remain constant in a relative long interval, we can assume  $l_i$ 's stage is approximately no larger than  $s_{l_i} + 1$ . Suppose  $a_k = \gamma[s_{l_i} + 2 - k], \ b_k = Q_l(i), \ k = 2, \dots, s_{l_i} + 2, a_1 = \gamma[s_{l_i} + 1], b_1 = \sum_{l=l_i}^{i} Q_l(i)$ , we can know  $a_k$  is nonincreasing and nonnegative,  $\sum_{l=1}^{k} Q_l(i) \leq 1$ . Then using Abel's inequality,

$$\mathbb{E}\left[\hat{Y}^{2}\right] = Y(Y-1) + \sum_{i=0}^{Y-1} \sum_{l} \gamma[s_{l}]Q_{l}(i)$$
  
$$\lesssim Y(Y-1) + \sum_{i=0}^{Y-1} \gamma[s_{l_{i}}+1] \times 1$$
  
$$\leq Y(Y-1) + Y(\gamma[s_{l_{Y}}+1])$$
(9)

Hence, the variance is

$$\mathbb{V}\left[\hat{Y}\right] = \mathbb{E}\left[\hat{Y}^2\right] - \mathbb{E}\left[\hat{Y}\right]^2 \le Y(\gamma[s_{l_Y} + 1] - 1) \quad (10)$$

and the relative error is

$$RMSRE[Y] = \sqrt{\frac{\mathbb{V}\left[\hat{Y}\right]}{Y^2}} \le \sqrt{\frac{(\gamma[s_{l_Y}+1]-1)}{Y}} \quad (11)$$

As  $\gamma[s_{l_Y}]$  is relevant with Y, to show RMSRE[Y] is bounded, we choose  $\gamma[i] = m^i, m = 2$  under Dynamic Sign Bits Version as an example to simplify the analysis. Suppose  $s_{l_Y} = i$ , from (3),

$$RMSRE[Y] \le \sqrt{\frac{(\gamma[s_{l_Y} + 1] - 1)}{Y}} \\ \le \sqrt{\frac{2^{i+1} - 1}{i \times 2^{n-1}}} \le \sqrt{\frac{1}{i \times 2^{n-i-2}}}$$
(12)

As *RHS* of (12) is nondecreasing, we know when Y gets bigger, its *RMSRE* value becomes larger. However, it's still bounded by  $\sqrt{\frac{1}{n-2}}$  (when i = n-2). When  $m \neq 2$ , the upper bound is similar:

$$RMSRE[Y] \le \sqrt{\frac{(\gamma[s_{l_Y} + 1] - 1)}{Y}} \le \sqrt{\frac{m^{i+1} - 1}{2^{n-1} \times \frac{(\frac{m}{2})^{i+1} - 1}{\frac{m}{2} - 1}}} \le \sqrt{\frac{m}{2^{n-i}}}$$
(13)

# C. Advantage of Dynamic Sign Bits Version

In real applications, the first entry of the expansion array  $\gamma[0]$  is always set to 1, which means that the SEAD Counter will accurately record values when the sign part is 0. Since there is enough memory to record accurate values when they are small, we shouldn't do worse than normal counters when counting small values. If we use  $\gamma[0]$  greater than 1,

the counter values are inaccurate even for small values, which means that the result is worse than that of normal counters and this is not acceptable. The Static Sign Bits version can accurately record  $[0, 2^{(n-s-1)}]$ , while the Dynamic Sign Bits version can accurately record  $[0, 2^{n-1}]$ , where *n* is the number of bits in a counter and *s* is the length of the sign part in the static version.

If we want these two versions to have the same number of stages, then  $2^s \approx n - 1$ . Typically, for a counter of 8 bits, n = 8 and  $s = log_2(8 - 1) \approx 3$ . Then, stage[1] = 16 for the Static Sign Bits version, and stage[1] = 64 for the Dynamic Sign Bits version. A larger stage[1] enables the Dynamic Sign Bits version to have a larger exact counting range. As a result, it is more accurate when recording the size of mouse flows.

At early stages of the SEAD Counter, when it is more likely to get accurate results, there is more capacity in the Dynamic Sign Bits Version than in the Static Sign Bits version. At later stages of the SEAD Counter on the other hand, when it is more likely to get inaccurate results, there is less capacity in the Dynamic Sign Bits Version than in the Static Sign Bits version. This property in terms of accuracy is a way to explain the advantage of the Dynamic Sign Bits Version of the SEAD Counter.

#### D. Computational Model of SEAD Counter on Sketches

For a vector a with dimension n, we define its current state at time t as  $[a_1(t), \dots, a_i(t), \dots, a_n(t)]$ . Initially, a is set to 0 and  $a_i(t)$  is 0 for all i. The *t*-th update to the individual entries of the vector is presented as pair  $(i_t, c_t)$ , which means,

$$a_{i_t}(t) = a_{i_{t-1}}(t-1) + c_t$$
  
$$a_{i'_t}(t) = a_{i'_{t-1}}(t-1) \qquad i'_t \neq i_t$$
(14)

In some cases,  $c_t$ s are always strictly positive, which means the entries will only increase. we call this model as *cash register* model. In other cases,  $c_t$  could be positive or negative. We call it as *turnstile* model. Under the *turnstile* model, if all  $a_i(t)$ s are non-negative for all time, we call this case as *non* – *negative* case; if  $a_i(t)$ s may be negative, we call this case as *general* case. Different models correspond to their specific scenarios and therefore figuring out the models sketch can be applied is important. For example, CM sketch works in the non-negative turnstile model while C sketch works in the general turnstile model.

When applying SEAD Counter on sketches, it will introduce its new error to the sketches. Therefore, we should determine when replacing counters with estimators, what kinds of computational model each sketch works on. For estimators, when we delete some elements, we will introduce probabilistic updates. After certain times of deletion, the error may not be proportional to the norm of the size-vector. Therefore, estimators don't satisfy the requirements of turnstile model. As for estimators in the cash register model, their error introduced by insertion is bounded by the norm of the sizevector. Therefore, we conclude estimators work in the cash register model.

For the error bounds of SEAD Counter on CM sketch, we give a similar theorem like Theorem 1 in [18]. To simplify the analysis, we only consider the Dynamic Sign Bits Version here.

Theorem 1: Let  $\|\mathbf{a}\|_1$  denote the sum of all entries of  $\mathbf{a}$ and s denote the stage number such that  $stage[s] < a_i + \frac{\|\mathbf{a}\|_1}{2e} < stage[s+1]$ . Given a small variable  $\epsilon$  no less than  $min\{\sqrt{\frac{m}{2^{n-s-2}}}, \frac{4e(max_{i=1,\dots,n}a_i)}{\|a\|_1}\}$ , the estimate  $\hat{a}_i$  of SEAD Counter on CM sketch( $w \times d$  counters) has the following guarantees: with probability at least  $1 - \delta$ ,

$$|a_i - \hat{a}_i| \le \epsilon \left\| \boldsymbol{a} \right\|_1 \tag{15}$$

*Proof:* Firstly, as the original CM sketch always overestimates  $a_i$ , we only need to proof the direction with larger error:  $\hat{a}_i - a_i \leq ||\boldsymbol{a}||_1$ . Besides, as SEAD Counter has no error before the first  $2^{n-1}$  updates, we can assume  $||\boldsymbol{a}||_1 > 2^{n-1}$ .

We introduce indicator variables  $I_{i,j,k}$ , which equals to 1 if  $(i \neq k) \land (h_j(i) == h_j(k))$  and 0 otherwise. As hash functions are pairwise independent, then by setting  $w = \frac{4e}{\epsilon}$ , we have

$$\mathbb{E}(I_{i,j,k} = Pr[h_j(i) = h_j(k)]) \le \frac{1}{range(h_j)} = \frac{\epsilon}{4e}$$

Define the random variable  $X_{i,j}$  as  $\sum_{k=1}^{n} I_{i,j,k} a_k$  and we can see  $X_{i,j} > 0$  as all  $a_k > 0$ . Define  $Z_j = count[j, h_j(i)]$  as the value which should be counted in the place  $(j, h_j(i))$  of the hash table and we have  $Z_j = a_i + X_{i,j}$ . Clearly, min  $Z_j > a_i$ . Then,

$$\mathbb{E}(X_{i,j}) = \mathbb{E}(\sum_{k=1}^{n} I_{i,j,k} a_k) \le \sum_{k=1}^{n} a_k \mathbb{E}(I_{i,j,k}) \le \frac{\epsilon}{4e} \|\boldsymbol{a}\|_1$$

Define  $\widehat{Z_j} = count[j, h_j(i)]$  as the estimation value after  $count[j, h_j(i)]$  has been counted by the SEAD Counter, we have

$$Pr(\hat{a}_{i} > a_{i} + \epsilon \|\boldsymbol{a}\|_{1})$$

$$= Pr(\forall j, \ \widehat{Z}_{j} > a_{i} + \epsilon \|\boldsymbol{a}\|_{1})$$

$$\leq Pr(\forall j, \ \widehat{Z}_{j} - Z_{j} + a_{i} + X_{i,j} > a_{i} + \epsilon \|\boldsymbol{a}\|_{1})$$

$$\leq Pr(\forall j, \ \widehat{Z}_{j} - Z_{j} > \frac{\epsilon}{2} \|\boldsymbol{a}\|_{1} \lor X_{i,j} > \frac{\epsilon}{2} \|\boldsymbol{a}\|_{1}) \quad (16)$$

For the former probability in the RHS of (16), by the Chebshev inequality,

$$Pr(\widehat{Z_j} - Z_j > \frac{\epsilon}{2} \|\boldsymbol{a}\|_1) \leq \frac{4\mathbb{V}(Z_j)}{\epsilon^2 \|\boldsymbol{a}\|_1^2} \leq \frac{4Z_j(\gamma[s] - 1)}{\epsilon^2 \|\boldsymbol{a}\|_1^2}$$
$$\leq \frac{4}{\epsilon^2} \cdot \frac{(a_i + X_{i,j})^2}{\|\boldsymbol{a}\|_1^2} \cdot \frac{\gamma[s] - 1}{Z_j}$$
$$\lesssim \frac{4}{\epsilon^2} \cdot (\frac{\epsilon}{4e} + \frac{\epsilon}{4e})^2 \cdot \frac{m}{2^{n-s}}$$
$$= \frac{1}{2e} \cdot \frac{m}{e \cdot 2^{n-s-1}} \lesssim \frac{1}{2e}$$
(17)

For the latter probability in the RHS of (16), by the Markov inequality,

$$Pr(X_{i,j} > \frac{\epsilon}{2} \|\boldsymbol{a}\|_1) < Pr(X_{i,j} > e\mathbb{E}(X_{i,j})) < \frac{1}{2e}$$
 (18)

Therefore, if we choose  $\delta = ln(1/d)$ 

RHS of (16) 
$$\leq (\frac{1}{2e} + \frac{1}{2e})^d = e^{-d} = \delta$$
 (19)

For the error bounds of C sketch, as the variance brought by SEAD technique (inequality (10)) is proportionally to the square of L1 norm, the L2 accuracy guarantee in [20] may be not satisfied. When applying other estimators like ICEBuckets on C sketch, as their variances are also  $O(||\boldsymbol{a}||_1^2)$ , they can only provide L1 accuracy guarantee. Therefore, when using C sketch and requiring L2 accuracy guarantee (typically enough memory space), one should use the original counter instead.

From the proof of SEAD on CM sketch above, some inequalities will not be satisfied when extreme cases happens. When applying counter estimation strategies (SEAD, ICEBuckets and etc.) to the sketches, we should be careful to avoid some extreme cases theoretically. For example, a large number of updates always come to the same entry, which means  $a_i = ||a||_1$ . However, in real network datasets, these extreme cases will not happen. We also know from Theorem 1, there is a lower error bound for  $\epsilon$  when counting large flows. Therefore, counter estimation strategies should only be used when the memory is limited and they have no advantage in scenarios where there are sufficient memories.

# VI. EXPERIMENTAL RESULTS

We applied our SEAD Counter to the task of flow measurements and set representation, where we use our SEAD Counter in sketches, as estimators, and in Bloom filters to evaluate the performance on real applications utilizing approximate counters. In this section, we start by evaluating the SEAD Counter technique on a real-world dataset, by comparing the Average Relative Error (ARE) and the Average Absolute Error (AAE) across (1) sketches without the SEAD Counter, (2) with the SEAD Counter and (3) Counter-Tree. Then, we generate a synthetic dataset which follows the Zipf distribution. We study how the ARE and AAE values change when we vary parameters (defined in Section VI-B). We also compare with other state-of-the-art methods on two large datasets. Finally, we extend our SEAD Counter to CBF and VI-CBF, and evaluate these on two real-world datasets by comparing the false positive rate (FPR) among Bloom filters without SEAD Counter and with SEAD Counter. We have opensourced our codes on Github [53].

# A. Experimental Setup

#### 1) Datasets:

a) IP Trace Datasets: We use the anonymous IP traces collected in 2016 from CAIDA [54]. In the experiments, a five-tuple is used as the ID of a flow, which includes source IP address, destination IP address, source port, destination port, and protocol. Each arriving packet consists of a certain amount of bytes. We consider about 250K packets for our experiment. To provide an evaluation of traces that are several orders of magnitude longer, we also use the anonymous IP traces collected in 2018 from CAIDA, for which we consider 30M packets and 1.46M flows for our experiment, and we denote the dataset as CAIDA-Large.

b) Kosarac Dataset: We use this dataset to evaluate our technique in Bloom filters. The Kosarac dataset contains (anonymized) click-stream data of a Hungarian on-line news portal and it is downloaded from [55]. It contains 41270 distinct items and around 8M clicks.

c) Synthetic Datasets: Since our goal is to find out how well SEAD Counter sketch performs on datasets with different characteristics, we also generate synthetic datasets following the **Zipf** [56] distribution  $(p(x) = \frac{x^{-a}}{\zeta(a)})$  with different total flow sizes F (1M to 10M), different numbers of packets  $N_d$ , and different a (from 0 to 3.0 with a step of 0.3). The flow size is the number of drawn samples from the parameterized Zipf distribution. The domain from which the elements are sampled is  $N_d$ . We generate them using a performance testing tool, the Web Polygraph [57].

*d*) Datacenter Datasets: We use the datacenter trace from [58]. In the experiments, we also use the five-tuple as the ID of a flow, which includes source IP address, destination IP address, source port, destination port, and protocol. We consider 30M packets and 8.64M flows for our experiment.

2) Implementation: We have implemented the sketches of CM, CU and C in C++. We apply the SEAD Counter technique to these sketches, and the results are denoted as SEAD CM, SEAD CU, and SEAD C, respectively. For comparison purposes, we also implemented Counter-Tree in C++, denoted as CT in our experiments. The hash functions used in the sketches are implemented using the 32-bit Bob Hash [59] with different initial seeds. When the width is small, we can compute a single hash for different indices. But when the width is large. We can only compute different hashes for different counters. To avoid such issues, we compute different hashes for different counters. For the CM and CU sketches, we set the number of arrays to 3 and use 3 32-bit Bob Hashes. For the C sketch, we set the number of arrays to 10 and use 20 32-bit Bob Hashes. We set the counters to 32 bits in the classical sketches. In the sketches using the SEAD Counter, the size of the counters is reduced to 16 bits. Furthermore, for each experiment on the datasets, we run 10 sub-experiments. We also implement CBF and VI-CBF in C++. For VI-CBF, We adopt the setting in [52] where D is  $D_L = [L, 2L-1]$  and L is set to 4. Therefore, we don't have to save the lookup table and we can rely on a single variable-increment counter per entry, without the additional counter that indicates the number of hashed elements. We apply our method to these Bloom filters, and their results are denoted as SEAD CBF and SEAD VI-CBF. We set the counters to 8 bits in the classical Bloom filters, and 4 bits in the Bloom filters using SEAD Counters.

#### B. Metrics

Average Absolute Error (AAE): AAE is defined as  $\frac{1}{|\phi|} \sum_{i \in [n]} |f_i - \hat{f}_i|$ , where  $f_i$  is the frequency of token *i* that appears in the stream,  $\hat{f}_i$  is the estimated frequency and  $|\Phi|$  is the volume of the set. We use AAE\_E and AAE\_M to denote the AAE of elephant flows and mouse flows respectively, and use AAE to denote the AAE of all flows.

Average Relative Error (ARE): ARE is defined as  $\frac{1}{|\phi|} \sum_{i \in [n]} \frac{|f_i - \hat{f}_i|}{f_i}$ , where  $f_i$  is the frequency of token *i* that appears in the stream,  $\hat{f}_i$  is the estimated frequency and  $|\Phi|$  is the volume of the set. We use ARE\_E and ARE\_M to denote the ARE of elephant flows and mouse flows respectively, and use ARE to denote the ARE of all flows.

**False Positive Rate (FPR):** FPR is defined as the fraction of false positives, *i.e.*, elements which don't appear in the set but are reported to appear by the algorithm.

**Per-Flow Memory Consumption:** This quantity is defined as the overall memory size of the sketch divided by the number of different flows in a data stream.

**Per-Packet Memory Consumption:** Per-Packet Memory consumption is defined as the sketch memory size divided by the number of packets in a data stream. We want to see how AAE and ARE change when we change per-flow memory or per-packet memory, so we explicit plot AAE/ARE as a curve of per-flow memory or per-packet memory.

**Throughput:** Maximum number of insertions that can be processed per second. We use Mega-operations per second (Mops) as the unit of throughput. We measure the processing speed by feeding one packet of a flow at a time.

LIU et al.: SEAD COUNTERS WITH DIFFERENT COUNTING RANGES



Fig. 4. Experiments on CAIDA.

All the experiments about speed are repeated 100 times with t < 0.05 for all the experiments to ensure statistical significance.

**Elephant Flows/Mouse Flows:** We define flows whose size is greater than 99% of the flows as elephant flows, and we define other flows as mouse flows.

# C. Effects of SEAD Counter Technique on Sketches

1) Flow Size Measurements on CAIDA Dataset: Since the Counter-Tree does not support flow volume measurements, we only compare our technique with the original sketch in the ARE and AAE experiments.

Effect of SEAD Counter on CM Sketch's ARE and AAE (Figures 4(a) and 4(d)): The range of the memory size in this experiment varies from 125KB to 1000KB. Here, the original CM sketch is compared to the CM sketch using the SEAD Counter (Dynamic Sign Bits version). We plot the ARE and AAE (in log scale) as a function of memory size. Our results show that when the memory is 1000KB, the original CM sketch has 8.7 times higher ARE than the CM sketch using the SEAD Counter. When the memory is below 600KB, the CM sketch using the SEAD Counter. When the memory is below 600KB, the CM sketch using the SEAD Counter have both lower ARE and AAE.

Effect of SEAD Counter on C Sketch's ARE and AAE (Figures 4(b) and 4(e)): Our experimental results show that the original C sketch has 15.98 times higher ARE and 4.49 times higher AAE than C sketch using SEAD Counter, when the memory is 1000KB. The SEAD Counter technique successfully reduces the ARE and AAE in the case of the C sketch. Comparing these results with those from the CM sketch, we can see that the CM sketch gives a better estimation of flow size than the C sketch for the same memory size.

Effect of SEAD Counter on CU Sketch's ARE and AAE (Figures 4(c) and 4(f)): Our experimental results show that the original CU sketch has 10.9 times higher ARE than the CU sketch using the SEAD Counter when the memory is 1000KB. The SEAD Counter technique successfully reduces the ARE and AAE under small memory consumption in the case of the CU sketch. Compared to the results of the CM and C sketches, we see that the CU sketch using the SEAD Counter has the best performance of all.

Throughput of CM Sketch Using SEAD Counter (Figure 4(g) and 4(h)): We find that when memory consumption is 1000KB, the throughput (insert) of the Counter-Tree is

the highest and is  $2.5 \sim 2.9$  times higher than that of CM sketch using the SEAD Counter on CAIDA dataset: Counter-Tree only needs one hash calculation in each insertion. However, to recover the counts from the Counter-Tree, it needs more hash functions to scatter the position of each insertion and its query is much slower. When memory consumption is 1000KB, the throughput (query) of the original CM sketch is the highest.

**Conclusion:** In this set of experiments, we tested our SEAD Counter technique for flow volume measurements. We found that the CU sketch using the SEAD Counter has the best performance of all sketches, hence we suggest its use for flow volume measurements.

2) Flow Size Measurements on Synthetic Datasets: In this experiment, we generate datasets following the Zipf distribution  $(p(x) = \frac{x^{-a}}{\zeta(a)})$  with different flow sizes and varying a (ranging from 0 to 3.0). Here the task consists in measuring the flow size of each flow. We compare the ARE and AAE of the original sketches, of sketches using the SEAD Counter, and of Counter-Tree.

Effect of SEAD Counter on CM Sketch's ARE and AAE (Figures 5(a) and 5(d)): We find that when the memory is 1000KB, the original CM sketch has 2.5 times higher ARE and AAE than the CM sketch using the SEAD Counter, while the Counter-Tree has an ARE 5.3 times higher than the CM sketch using the SEAD Counter. As the memory consumption decreases, the ARE and AAE of Counter-Tree gradually go down compared to the one of the CM sketch. The ARE and AAE of the CM sketch using the SEAD Counter are always the lowest.

Effect of SEAD Counter on C Sketch's ARE and AAE (Figures 5(b) and 5(e)): We observe that when the memory is 1000KB, the original C sketch has 1.79 times higher ARE and AAE than the C sketch using SEAD Counter, while the Counter-Tree has an ARE 13.6 times higher than the C sketch using SEAD Counter. The ARE and AAE of the C sketch using the SEAD Counter are the lowest for all considered memory sizes. Note that compared to Counter-Tree, the C sketch using the SEAD Counter improves accuracy by one order of magnitude.

Effect of SEAD Counter on CU Sketch's ARE and AAE (Figure 5(c) and 5(f)): We find that when the memory is 1000KB, the original CU sketch has 2.7 times higher ARE and AAE than the CU sketch using SEAD Counter while



Fig. 5. Experiments on Synthetic Datasets. (skewness is 1.2 except (g)).



Fig. 6. Effect of per-packet memory on ARE and AAE using synthetic datasets.

Counter-Tree has an ARE 10.6 times higher than the CU sketch using the SEAD Counter. As the memory consumption decreases, the ARE and AAE of Counter-Tree gradually become close to the one of the CU sketch. However, the ARE and AAE of the CM sketch using the SEAD Counter are always lowest.

CU sketch Using SEAD Counter's ARE on Synthetic Datasets With Different Skewness (Figure 5(g)): We find that the ARE decreases as skewness increases. The ARE for a skewness of 0 is 9.4 times higher than the ARE when skewness is 2.1. This means the CU sketch using the SEAD Counter is also accurate with very skewed datasets.

**Conclusion:** In this set of experiments, we tested the SEAD Counter technique for flow size measurements. In skewed traces, there are large number of mouse flows and the Counter-Tree's variance behaves not well for mouse flows: larger than  $flow \ size \times (r - 1)$  (r is one parameter much larger than 1) [30]. Therefore, for the Counter-Tree, their AAE and ARE is larger than other sketches. We found that the C sketch using the SEAD Counter has the best performance of all considered sketches, hence we suggest its use for flow size measurements.

Effect of Per-Packet Memory Size on C Sketch's ARE and AAE With Fixed a and Per-Flow Memory Consumption (Figures 6(a) and 6(c)): We find that for the C sketch, the change of per-packet memory affects ARE in a limited way compared to per-flow memory consumption. The AAE of both the C sketch and the C sketch using SEAD Counter drops with the increase of per-flow memory consumption. The original C sketch has 2 to 3 times larger AAE than SEAD Counter sketches.

Effect of Memory Size on CM Sketch's ARE and AAE With Fixed a and Per-Flow Memory Consumption (Figures 6(b) and 6(d)): We find that the original CM sketch has larger ARE and AAE values compared to the CM sketch

using SEAD Counter. The AAE of both the CM sketch and the CM sketch using SEAD Counter drop with the increase of per-flow memory consumption. The original C sketch has more than 10 times larger AAE than SEAD Counter sketches.

**Conclusion:** We found that the SEAD Counter technique effectively adapted to different counting ranges. For the C sketch and the CM sketch, the change of per-packet memory consumption affects ARE in a limited way compared to the change of per-flow memory. However, AAE drops with the increase of per-packet memory size. Under the same per-packet memory size, the ARE increases at least 10% when per-flow memory changes from  $48 \times 10^{-4}$ B to  $24 \times 10^{-4}$ B or  $24 \times 10^{-4}$ B to  $16 \times 10^{-4}$ B. Indeed, it shows that in a data stream with many distinct packets, the SEAD Counter technique can improve the accuracy under small per-flow memory size.

Effect of the Different Versions of the CM Sketch Using SEAD Counter on ARE and AAE (Figures 7(a) and 7(c)): In this experiment, we set the memory size of the sketches to be constant and vary the per-flow memory consumption.

We find that the ARE and AAE of the Static Sign Bits version of the CM sketch with the length of sign bits s = 3 is 2.43 times higher than the one of the Dynamic version on average.

Effect of the Different Versions of C Sketch Using SEAD Counter on ARE and AAE (Figures 7(c) and 7(d)): We find that when the per-flow memory consumption is 4B, the ARE and AAE of the Static Sign Bits version C sketch with the length of sign bits s = 3 is 1.25 times higher than the one of the Dynamic version on average. The ARE and AAE of the CM sketch using SEAD Counter in both versions drops when per-flow memory increases. Under any per-flow memory consumption, the ARE of the Dynamic Sign Bits version CM sketch is smaller.

LIU et al.: SEAD COUNTERS WITH DIFFERENT COUNTING RANGES



(a) Two versions of the CM sketch (b) Two versions of the C sketch using (c) Two versions of the CM sketch (d) Two versions of the C sketch using SEAD (ARE). SEAD (ARE). SEAD (ARE).

Fig. 7. Comparison between the two versions of SEAD on synthetic datasets.

|               | AAE   | AAE_E  | AAE_M | ARE   | ARE_E  | ARE_M | Throughput(insert) | Throughput(query) |
|---------------|-------|--------|-------|-------|--------|-------|--------------------|-------------------|
| CU            | 14.35 | 474.13 | 9.71  | 6.64  | 0.032  | 6.70  | 7.20 Mops          | 10.34 Mops        |
| SEAD-CU       | 8.35  | 32.12  | 8.11  | 5.52  | 0.0042 | 5.58  | 6.84 Mops          | 9.88 Mops         |
| SAC-CU        | 13.23 | 521.82 | 8.09  | 5.51  | 0.23   | 5.57  | 6.02 Mops          | 5.88 Mops         |
| ICEBuckets-CU | 10.09 | 194.40 | 8.23  | 5.55  | 0.018  | 5.60  | 2.76 Mops          | 3.33 Mops         |
| PCU           | 10.81 | 51.83  | 10.40 | 6.51  | 0.083  | 6.58  | 9.93 Mops          | 10.60 Mops        |
| СМ            | 22.79 | 489.74 | 18.08 | 11.10 | 0.064  | 11.21 | 8.06 Mops          | 9.94 Mops         |
| SEAD-CM       | 14.08 | 33.60  | 13.88 | 8.52  | 0.028  | 8.60  | 6.56 Mops          | 9.91 Mops         |
| SAC-CM        | 14.34 | 60.14  | 13.88 | 8.52  | 0.036  | 8.60  | 5.62 Mops          | 5.81 Mops         |
| ICEBuckets-CM | 14.24 | 23.19  | 14.15 | 8.68  | 0.028  | 8.77  | 3.47 Mops          | 3.32 Mops         |
| PCM           | 14.64 | 53.74  | 14.26 | 11.89 | 0.106  | 12.01 | 10.48 Mops         | 10.06 Mops        |

TABLE II Experiments of sketches on CAIDA-Large



Fig. 8. Detailed experiment with synthetic datasets.

**Conclusion:** This set of experiments showed that the Dynamic Sign Bits version has better performance than the Static one under different per-flow memory consumption.

Effect of Per-Flow Memory Consumption on ARE and AAE (Figures 8(a) and 8(b)): In this experiment, we vary per-flow memory consumption between 1.3B and 4B. The parameter *a* is set to 0 and the per-packet memory consumption is fixed to 0.0016B. We find that the ARE and AAE of original sketches is more impacted by the decrease of per-flow memory consumption than the one of SEAD Counter sketches. When the per-flow memory consumption drops to 1.3B, the ratio between the ARE of the original sketch and of the SEAD Counter sketch is 10.83 for the CM sketch, 22.93 for the CU sketch and 2.29 for the C sketch. This result is consistent with the conclusion we drew in the last section.

3) Flow Size Measurements on CAIDA-Large and Datacenter: We compare the ARE and AAE of the original sketches, of sketches using the SEAD Counter, SAC [38], ICEBuckets [41], and pyramid sketches [36]. We use 12.5 bits per counter for ICEBuckets and 4 bits per counter for Pyramid Sketches, which are the original settings in their papers. We use 12 bits per counter for other methods. The result is shown in Table II and III.

Effect of the Different Versions of the CU Sketch on CAIDA-Large (Table II): In this experiment, we set the memory size of the sketches to be 1MB. Here, the original CU sketch is compared to the CU sketch using the SEAD Counter (Dynamic Sign Bits version), SAC, ICEBuckets, and the Pyramid CU sketch. Our results show that when the

memory is 1MB, the CU sketch using the SEAD Counter has a lower AAE on all flows, and has a lower AAE and ARE on elephant flows. Though AAE\_M, ARE, and ARE\_E are larger, they are very close to the best result achieved by SAC. Besides, the insert and query throughput is much higher than that of SAC. PCU has the highest throughput among other methods. This is because PCU uses word acceleration and Ostrich Policy to achieve a higher insertion and query speed. Among the methods that simply replace counters, our method has a higher insertion and query speed.

Effect of the Different Versions of the CM Sketch on CAIDA-Large (Table II): In this experiment, we set the memory size of the sketches to be 1MB. Our results show that when the memory is 1MB, the CM sketch using the SEAD Counter has both a lower AAE and ARE on all flows, and the highest throughput among the sketches that simply replacing counters. The CM sketch using ICEBuckets has a lower AAE\_E. Among the sketches that simply replacing counters, our SEAD Counter has a lower AAE and ARE, and higher insertion and query speed. PCM has the highest throughput among other methods.

Effect of the Different Versions of the CU Sketch on Datacenter Trace (Table III): In this experiment, we set the memory size of the sketches to be 1MB. Our results show that when the memory is 1MB, PCU has both a lower AAE and ARE on all flows, and the highest throughput among other methods. This is because of their Ostrich policy where they ignore the second and higher layers when getting the reported values of the d mapped counters. Therefore, they increment the smallest counter(s) with high probability to achieve better accuracy. Among the sketches that simply replacing counters, our SEAD Counter has a lower AAE and ARE, and higher insertion and query speed. Due to high skewness of this dataset, for ICEBuckets, most flows with a few packets (including some elephant flows in this dataset) are affected by the adjustment of its parameter  $\epsilon$  brought by the largest flows in their buckets. Under such a circumstance (highly skewed data streams), our SEAD technique achieves better accuracy.

|               | AAE    | AAE_E  | AAE_M  | ARE    | ARE_E | ARE_M  | Throughput(insert) | Throughput(query) |
|---------------|--------|--------|--------|--------|-------|--------|--------------------|-------------------|
| CU            | 67.50  | 40.79  | 67.77  | 48.05  | 1.05  | 48.53  | 6.36 Mops          | 12.60 Mops        |
| SEAD-CU       | 53.58  | 25.43  | 53.86  | 38.33  | 0.73  | 38.70  | 6.07 Mops          | 9.97 Mops         |
| SAC-CU        | 53.60  | 27.74  | 53.86  | 38.33  | 0.73  | 38.70  | 4.47 Mops          | 5.58 Mops         |
| ICEBuckets-CU | 56.10  | 29.21  | 56.38  | 40.08  | 0.79  | 40.48  | 2.80 Mops          | 3.48 Mops         |
| PCU           | 30.95  | 21.69  | 31.04  | 22.09  | 0.62  | 22.31  | 9.29 Mops          | 10.73 Mops        |
| СМ            | 126.89 | 130.79 | 126.85 | 87.94  | 3.59  | 88.79  | 6.16 Mops          | 10.60 Mops        |
| SEAD-CM       | 93.82  | 94.00  | 93.82  | 65.04  | 2.66  | 65.67  | 6.05 Mops          | 9.94 Mops         |
| SAC-CM        | 93.82  | 94.03  | 93.82  | 65.04  | 2.66  | 65.67  | 4.25 Mops          | 5.86 Mops         |
| ICEBuckets-CM | 98.34  | 98.45  | 98.33  | 68.17  | 2.78  | 68.82  | 3.06 Mops          | 3.48 Mops         |
| PCM           | 174.88 | 175.74 | 174.87 | 121.17 | 4.98  | 122.34 | 10.07 Mops         | 10.09 Mops        |

TABLE III EXPERIMENTS OF SKETCHES ON DATACENTER



(a) Comparison of false positive rate (b) Comparison of false positive rate (c) Comparison of false positive rate (d) Comparison of false positive rate between CBF and SEAD CBF on between VI-CBF and SEAD VI-CBF between CBF and SEAD CBF on between VI-CBF and SEAD VI-CBF CAIDA dataset on CAIDA da

Fig. 9. Comparison between two kinds of bloom filters with and without SEAD Counter on CAIDA and Kosarac dataset.

# D. Effects of SEAD Counter Technique on Bloom Filters

We apply our SEAD Counter technique to both Counting Bloom Filters (CBF) and Variable-Increment Counting Bloom Filters (VI-CBF). We use both the static sign bits version and the dynamic sign bits version. The results are denoted as SEAD CBF and SEAD VI-CBF.

1) Membership Query on CAIDA Dataset: In this experiment, we perform a membership query for each element of the dataset. We considered the same 250K packets in the previous experiments on CAIDA. We compare the false positive rate of the original bloom filters and of the bloom filters using the SEAD Counter.

Effect of the Different Versions of CBF Using SEAD Counter on FPR (Figure 9(a)): The range of the memory size (in bits per element) in this experiment goes from 20 to 55 bits. The original CBF is compared with CBF using the SEAD Counter (Dynamic Sign Bits version). We plot how the FPR changes as a function of memory size. Our results show that the proposed SEAD Counter mechanism can improve upon CBF. CBF with the SEAD Counter yields the better performance, especially for more bits per element. For instance, for 30 bits per element, the original CBF has a false positive rate 10.05 times higher than the CBF with SEAD Counters. Likewise, for 50 bits per element, the SEAD Counters improve the results by 19.12 times.

Effect of the Different Versions of VI-CBF Using SEAD Counter on FPR (Figure 9(b)): We find that, for 55 bits per element, the original VI-CBF has a false positive rate 36.5 times higher than the VI-CBF using SEAD Counters. As the memory consumption decreases, the false positive rate of the original VI-CBF becomes close to the VI-CBF using SEAD Counters. However, the improvement from the SEAD Counter is still significant. We can achieve an order of magnitude improvement even when the number of bits per element is as small as 20 bits per element.

**Conclusion:** This set of experiments showed that the Bloom filters using SEAD Counters has better performance than the original Bloom filters in the task of membership query on the CAIDA dataset.

2) Membership Query on Kosarac Dataset: In this experiment, we perform a membership query for each element in the Kosarac dataset.

Effect of the Different Versions of CBF Using SEAD Counter on FPR (Figure 9(c)): Our results show that the proposed SEAD Counter mechanism can improve upon CBF. For 30 bits per element, the original CBF has a false positive rate 3.17 times higher than the CBF with SEAD Counters. Likewise, for 50 bits per element, the SEAD Counters improve the result by 7.36 times.

Effect of the Different Versions of VI-CBF Using SEAD Counter on FPR (Figure 9(d)): For 55 bits per element, the original VI-CBF has a false positive rate nearly two orders of magnitude higher than the VI-CBF using SEAD Counters. As the memory consumption decreases, the false positive rate of the original VI-CBF becomes close to the VI-CBF using SEAD Counters. However, we can still improve the false positive rate using our technique by 6.24 times.

**Conclusion:** This set of experiments showed that the Bloom filters using SEAD Counters has better performance than the original Bloom filters in the task of membership query on the Kosarac dataset.

# E. Estimators

We use one counter for each flow and estimate the packet number using the counter. We use AAE and ARE as the metrics to evaluate the performance of the counters. We define flows whose size is greater than 99% of the flows as elephant flows, and we define other flows as mouse flows. We compare our SEAD Counters with SAC [38], ICEBuckets [41], and BRICK [42] on CAIDA-Large and the datacenter trace. The result is shown in Table IV and V. It is shown that our SEAD Counter has a greater throughput for both insertion and query. SEAD Counter works better on mouse flows, and thus achieves a smaller ARE than the other two counters. The AAE of SEAD Counter is larger due to larger AAE of elephant flows, which accounts more for AAE. For BRICK, we use the same parameter settings as the evaluations in [42] with 64 counters per bucket, four levels, and a failure probability of

TABLE IV EXPERIMENTS OF ESTIMATORS ON CAIDA-LARGE

|            | AAE      | AAE_E     | AAE_M    | ARE      | ARE_E    | ARE_M    | Throughput(insert) | Throughput(query) |
|------------|----------|-----------|----------|----------|----------|----------|--------------------|-------------------|
| SEAD       | 0.217511 | 21.750474 | 0.000006 | 0.000029 | 0.002237 | 0.000006 | 64.04 Mops         | 16.15 Mops        |
| SAC        | 0.557416 | 55.728089 | 0.000136 | 0.000283 | 0.027915 | 0.000007 | 27.61 Mops         | 16.25 Mops        |
| ICEBuckets | 0.114035 | 9.841142  | 0.015817 | 0.002977 | 0.001786 | 0.002989 | 9.57 Mops          | 13.48 Mops        |

TABLE V

EXPERIMENTS OF ESTIMATORS ON DATACENTER

|            | AAE      | AAE_E    | AAE_M    | ARE      | ARE_E    | ARE_M    | Throughput(insert) | Throughput(query) |
|------------|----------|----------|----------|----------|----------|----------|--------------------|-------------------|
| SEAD       | 0.001457 | 0.145599 | 0.000001 | 0.000001 | 0.000005 | 0.000001 | 15.20 Mops         | 14.48 Mops        |
| SAC        | 0.002761 | 0.275979 | 0.000001 | 0.000001 | 0.000036 | 0.000001 | 12.17 Mops         | 14.27 Mops        |
| ICEBuckets | 0.001124 | 0.104862 | 0.000076 | 0.000012 | 0.000023 | 0.000011 | 6.83 Mops          | 10.23 Mops        |

 $P_f = 10^{-10}$ . The probability is relatively low because BRICK maintains exact active counters. We adopt the same failure probability as in [42]. The total storage space required for the CAIDA-Large is 1.75 MB, and the total required for the datacenter trace is 7.68 MB. Though exact statistics counters could record the value accurately, they are not suitable for data flow measurement where the memory is constrained.

### F. Parameter Settings

The choice of n depends on the circumstances where we use the counters. If we assign n bits for a normal counter, it is also safe to use n bits for SEAD Counter. If the memory is constrained, then we could assign fewer bits to SEAD Counter. As shown in our experimental results, we use half the size as the normal counter and still achieve better performance. Therefore, we empirically suggest that when n bits are required in a normal counter, we could assign  $\frac{n}{2}$  bits in SEAD Counter.

When the counting part is large, SEAD Counter can have a better accuracy when the value is small since it can count more small values accurately. In the other hand, we can count larger values if the sign part is large enough. As for *s*, we would like to recommend to use  $\frac{1}{4}$  of the bits as the sign bits to achieve the balance between the counting range and the counting part. We would highly recommend users to choose dynamic sign bits version since it allows to count more small values accurately than the static sign bits version. What's more, we don't have to tune *s* in this case.

As for the choice of the expansion array, we recommend using  $\gamma[i] = m^i$  though users may choose other expansion arrays. The reasons are twofold. For one thing, it is both easier to implement and faster to process if we choose the expansion array to be a geometric sequence. For the other, it is consistent with the characteristics of common data streams where most flows are small flows. We would like to accurately count the flows when they are small  $(\gamma[0] = 1$  makes SEAD Counter work as a normal counter), and allow for some error when we need to count large flows. We empirically discuss the choice of the expansion array. We choose m to range from 1 to 6 and use SEAD Counter as the estimator on CAIDA-Large. It is shown in Table VI that our SEAD counter achieves the best performance when m = 4. Note that we could achieve better performance by tuning m if possible, and our general expression of the expansion array allows for improvement by more careful design.

#### G. Discussion

The experiments on both sketches and bloom filters show that the SEAD Counter technique is very generic and can be adapted to all kinds of sketches and bloom filters using

TABLE VI

AAE AND ARE ON CAIDA-LARGE FOR DIFFERENT mS

|     | AAE      | ARE      |
|-----|----------|----------|
| m=1 | 4.726804 | 0.000302 |
| m=2 | 1.387580 | 0.000289 |
| m=3 | 0.272841 | 0.000030 |
| 4   | 0.223005 | 0.000028 |
| m=5 | 0.230716 | 0.000031 |
| m=6 | 0.266696 | 0.000033 |

counters. The technique achieves good performance in the case of sketches, as our mechanism can improve the space efficiency and count of both mouse and elephant flows with good accuracy. Our algorithm performs consistently well in almost all settings: when we change the per-packet memory, the per-flow memory, the flow sizes, and the skewness of the flows. Our technique is also very useful when applied to VI-CBF, as a typical VI-CBF's counter is 8 bits, and our technique can save half the space per counter. Therefore SEAD VI-CBF achieves both the efficiency and accuracy compared to VI-CBF. As long as the considered data structure uses counters where the memory usage is limited, our method can improve the space efficiency while achieving good accuracy. The SEAD Counter also allows greater counting ranges and provides more flexibility when counting, and we don't need extra space or time to read the counter. Therefore, this technique achieves a good balance between space efficiency and accuracy with little overhead. The expansion array we choose here is very simple, and there may be better choices depending on the specific setting of the task. We encourage readers to explore the use of our mechanism in different applications and try other possibilities of the expansion array.

Compared with estimators, variable-length solutions like Pyramid and BRICK are also superior in some ways. BRICK can maintain exact active counters with a low failure probability. The strength is that the values are exact but the weakness is that it needs much more space. Pyramid Sketch can dynamically assign appropriate number of bits for different items with different frequencies. It has higher throughput due to the word acceleration technique. Also, the CU sketch with Pyramid framework has a higher accuracy due to the Ostrich Policy, which can be seen from Table III. Empirically, SEAD Counter has a higher accuracy on CAIDA-Large. CM sketch with SEAD Counter and CU sketch with Pyramid framework has a higher accuracy on Datacenter.

#### VII. CONCLUSION

Thanks to their memory efficient and fast and sufficient speed, sketches have attracted much attention for network measurements. If the flow size distribution is uniform, there is little room for improvements compared to previous work. However, when the flow size distribution is highly skewed, existing sketches are very inefficient in memory usage. No previous work could achieve memory efficiency without hurting the sufficient and fast speed, which is really important in highspeed network traffic. To address this, we proposed a generic technique, the self-adaptive counters (SEAD Counter), in two versions, static and dynamic. Our main idea is the following: When a counter is going to overflow, we do not increase it one by one, but increase it by a predefined probability. When the counter is small, it just works like a normal counter. The SEAD Counter makes small counters capable of representing both small and large values. The error incurred by the probabilistic increase is theoretically and experimentally proved to be negligible compared to the size of elephant flows. We applied the SEAD Counter to three typical sketches: sketches of CM, C, and CU. We extended our technique to two typical Bloom filters: CBF and VI-CBF. We also used our technique as the estimator. Our experimental results showed that, compared to the state-of-the-art, sketches using the SEAD Counter improve the accuracy by up to 13.6 times. The CBF and VI-CBF using the SEAD Counters can improve the false positive rate by up to one or two orders of magnitude respectively.

#### ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers for their thoughtful suggestions.

#### REFERENCES

- T. Yang *et al.*, "A generic technique for sketches to adapt to different counting ranges," in *Proc. IEEE Conf. Comput. Commun. (INFOCOM)*, Apr. 2019, pp. 2017–2025.
- [2] N. Duffield, C. Lund, and M. Thorup, "Learn more, sample less: Control of volume and variance in network measurement," *IEEE Trans. Inf. Theory*, vol. 51, no. 5, pp. 1756–1775, May 2005.
- [3] J. Zheng, H. Xu, G. Chen, and H. Dai, "Minimizing transient congestion during network update in data centers," in *Proc. IEEE 23rd Int. Conf. Netw. Protocols (ICNP)*, Nov. 2015, pp. 1–10.
- [4] Y. Zhou, Y. Zhou, S. Chen, and O. P. Kreidl, "Limiting self-propagating malware based on connection failure behavior," in *Proc. 7th Int. Conf. Netw. Commun. Secur. (NCS)*, Dec. 2015, pp. 1–16.
- [5] H. Xu, Z. Yu, C. Qian, X.-Y. Li, and L. Huang, "Minimizing flow statistics collection cost using wildcard-based requests in SDNs," *IEEE/ACM Trans. Netw.*, vol. 25, no. 6, pp. 3587–3601, Dec. 2017.
- [6] Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund, "Online identification of hierarchical heavy hitters: Algorithms, evaluation, and applications," in *Proc. 4th ACM SIGCOMM Conf. Internet Meas.*, 2004, pp. 101–114.
- [7] R. B. Basat, G. Einziger, R. Friedman, M. C. Luizelli, and E. Waisbard, "Constant time updates in hierarchical heavy hitters," in *Proc. Conf.* ACM Special Interest Group Data Commun., Aug. 2017, pp. 127–140.
- [8] X. Dimitropoulos, P. Hurley, and A. Kind, "Probabilistic lossy counting: An efficient algorithm for finding heavy hitters," ACM SIGCOMM Comput. Commun. Rev., vol. 38, no. 1, p. 5, Jan. 2008.
- [9] R. Schweller, A. Gupta, E. Parsons, and Y. Chen, "Reversible sketches for efficient and accurate change detection over network data streams," in *Proc. 4th ACM SIGCOMM Conf. Internet Meas. (IMC)*, 2004, pp. 207–212.
- [10] M. Moshref, M. Yu, R. Govindan, and A. Vahdat, "Trumpet: Timely and precise triggers in data centers," in *Proc. ACM SIGCOMM Conf.*, Aug. 2016, pp. 129–143.
- [11] Y. Li, R. Miao, C. Kim, and M. Yu, "FlowRadar: A better NetFlow for data centers," in *Proc. NSDI*, 2016, pp. 311–324.
- [12] Q. Huang et al., "SketchVisor: Robust network measurement for software packet processing," in Proc. Conf. ACM Special Interest Group Data Commun., Aug. 2017, pp. 113–126.
- [13] P. Roy, A. Khan, and G. Alonso, "Augmented sketch: Faster and more accurate stream processing," in *Proc. Int. Conf. Manage. Data*, Jun. 2016, pp. 1449–1463.
- [14] D. Kim, Y. Zhu, C. Kim, J. Lee, and S. Seshan, "Generic external memory for switch data planes," in *Proc. 17th ACM Workshop Hot Topics Netw.*, Nov. 2018, pp. 1–7.
- [15] Intel FPGA. Accessed: Sep. 16, 2020. [Online]. Available: https://www. intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stx3/ stx3\_siii51001.pdf

- [16] Intel Tofino. Accessed: Sep. 16, 2020. [Online]. Available: https://www.intel.com/content/www/us/en/products/network-io/ programmable-ethernet-switch/tofino-series/tofino.html
- [17] Xlinx FPGA. Accessed: Sep. 16, 2020. [Online]. Available: https://www. xilinx.com/products/technology/memory.html#internalMemory
- [18] G. Cormode and S. Muthukrishnan, "An improved data stream summary: The count-min sketch and its applications," *J. Algorithms*, vol. 55, no. 1, pp. 58–75, 2005.
- [19] C. Estan and G. Varghese, "New directions in traffic measurement and accounting," ACM SIGCOMM Comput. Commun. Rev., vol. 32, no. 4, pp. 323–336, 2002.
- [20] M. Charikar, K. Chen, and M. Farach-Colton, "Finding frequent items in data streams," in *Proc. Int. Colloq. Automata, Lang., Program.* Berlin, Germany: Springer, 2002, pp. 693–703.
- [21] K. Cheng, L. Xiang, M. Iwaihara, H. Xu, and M. M. Mohania, "Timedecaying Bloom filters for data streams with skewed distributions," in *Proc. 15th Int. Workshop Res. Issues Data Eng., Stream Data Mining Appl. (RIDE-SDMA)*, 2005, pp. 63–69.
- [22] I. N. Bozkurt, Y. Zhou, and T. Benson, "Dynamic prioritization of traffic in home networks," in *Proc. CoNEXT Student Workshop*, 2015, pp. 1–3.
- [23] G. Cormode, "Sketch techniques for approximate query processing," in Foundations and Trends in Databases. Boston, MA, USA: NOW, 2011.
- [24] Y. Zhang, M. Roughan, W. Willinger, and L. Qiu, "Spatio-temporal compressive sensing and internet traffic matrices," ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 4, pp. 267–278, 2009.
- [25] T. Benson, A. Akella, and D. A. Maltz, "Unraveling the complexity of network management," in *Proc. NSDI*, 2009, pp. 335–348.
- [26] G. Cormode, B. Krishnamurthy, and W. Willinger, "A manifesto for modeling and measurement in social media," *1st Monday*, vol. 15, no. 9, pp. 1–18, Sep. 2010.
- [27] P. B. Gibbons and Y. Matias, "Synopsis data structures for massive data sets," *External Memory Algorithms*, vol. 50, pp. 39–70, Jan. 1999.
- [28] Z. Liu, A. Manousis, G. Vorsanger, V. Sekar, and V. Braverman, "One sketch to rule them all: Rethinking network flow monitoring with UnivMon," in *Proc. ACM SIGCOMM Conf.*, Aug. 2016, pp. 101–114.
- [29] CMS in Redis Module. Accessed: Nov. 3, 2020. [Online]. Available: https://oss.redislabs.com/redisbloom/CountMinSketch\_Commands/
- [30] C. Min and S. Chen, "Counter tree: A scalable counter architecture for per-flow traffic measurement," *IEEE/ACM Trans. Netw.*, vol. 25, no. 2, pp. 1249–1262, Apr. 2017.
- [31] B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," *Commun. ACM*, vol. 13, no. 7, pp. 422–426, Jul. 1970.
- [32] S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz, "Theory and practice of Bloom filters for distributed systems," *IEEE Commun. Surveys Tuts.*, vol. 14, no. 1, pp. 131–155, 1st Quart., 2012.
- [33] A. Broder and M. Mitzenmacher, "Network applications of Bloom filters: A survey," *Internet Math.*, vol. 1, no. 4, pp. 485–509, 2004.
- [34] S. Geravand and M. Ahmadi, "Bloom filter applications in network security: A state-of-the-art survey," *Comput. Netw.*, vol. 57, no. 18, pp. 4047–4064, Dec. 2013.
- [35] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, "An improved construction for counting Bloom filters," in *Proc. Eur. Symp. Algorithms.* Berlin, Germany: Springer, 2006, pp. 684–695.
- [36] T. Yang, Y. Zhou, H. Jin, S. Chen, and X. Li, "Pyramid sketch: A sketch framework for frequency estimation of data streams," *Proc. VLDB Endowment*, vol. 10, no. 11, pp. 1442–1453, Aug. 2017.
- [37] R. Morris, "Counting large numbers of events in small registers," *Commun. ACM*, vol. 21, no. 10, pp. 840–842, Oct. 1978.
- [38] R. Stanojevic, "Small active counters," in Proc. 26th IEEE Int. Conf. Comput. Commun. (INFOCOM), May 2007, pp. 2153–2161.
- [39] C. Hu et al., "Discount counting for fast flow statistics on flow size and flow volume," *IEEE/ACM Trans. Netw.*, vol. 22, no. 3, pp. 970–981, Jun. 2014.
- [40] E. Tsidon, I. Hanniel, and I. Keslassy, "Estimators also need shared values to grow together," in *Proc. IEEE INFOCOM*, Mar. 2012, pp. 1889–1897.
- [41] G. Einziger, B. Fellman, R. Friedman, and Y. Kassner, "ICE buckets: Improved counter estimation for network measurement," *IEEE/ACM Trans. Netw.*, vol. 26, no. 3, pp. 1165–1178, Jun. 2018.
- [42] N. Hua, J. J. Xu, B. Lin, and H. C. Zhao, "BRICK: A novel exact active statistics counter architecture," *IEEE/ACM Trans. Netw.*, vol. 19, no. 3, pp. 670–682, Jun. 2011.
- [43] C. Hu et al., "Discount counting for fast flow statistics on flow size and flow volume," *IEEE/ACM Trans. Netw.*, vol. 22, no. 3, pp. 970–981, Jun. 2014.
- [44] T. Li, S. Chen, and Y. Ling, "Per-flow traffic measurement through randomized counter sharing," *IEEE/ACM Trans. Netw.*, vol. 20, no. 5, pp. 1622–1634, Oct. 2012.

- [45] N. Alon, Y. Matias, and M. Szegedy, "The space complexity of approximating the frequency moments," in Proc. 28th Annu. ACM Symp. Theory Comput. (STOC), 1996, pp. 20-29.
- [46] N. Hua, A. Lall, B. Li, and J. Xu, "A simpler and better design of error estimating coding," in Proc. IEEE INFOCOM, Mar. 2012, pp. 235-243.
- [47] G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine, "Synopses for massive data: Samples, histograms, wavelets, sketches," Found. Trends Databases, vol. 4, nos. 1–3, pp. 1–294, 2012.
- [48] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, "Summary cache: A scalable wide-area web cache sharing protocol," IEEE/ACM Trans. Netw., vol. 8, no. 3, pp. 281-293, Jun. 2000.
- [49] Y. Yu, D. Belazzougui, C. Qian, and Q. Zhang, "Memory-efficient and ultra-fast network lookup and forwarding using Othello hashing," IEEE/ACM Trans. Netw., vol. 26, no. 3, pp. 1151-1164, Jun. 2018.
- [50] Z. Yu, Z. Ge, A. Lall, J. Wang, J. Xu, and H. Yan, "Crossroads: A practical data sketching solution for mining intersection of streams," in Proc. Internet Meas. Conf., Nov. 2014, pp. 223-234.
- [51] J. Liu, P. Zhang, H. Wang, and C. Hu, "CounterMap: Towards generic traffic statistics collection and query in software defined network," in Proc. IEEE/ACM 25th Int. Symp. Qual. Service (IWQoS), Jun. 2017, pp. 1–5.
- [52] O. Rottenstreich, Y. Kanizo, and I. Keslassy, "The variable-increment counting Bloom filter," IEEE/ACM Trans. Netw., vol. 22, no. 4, pp. 1092–1105, Aug. 2014. [53] *The Source Codes of SEAD Counter at GitHub*. Accessed: Dec. 3, 2020.
- [Online]. Available: https://github.com/SEADCounter/SEADCounter
- [54] The CAIDA Anonymized 2016 Internet Traces. Accessed: May 16, 2018. [Online]. Available: http://www.caida.org/data/overview/
- [55] Real-Life Transactional Dataset. Accessed: Apr. 6, 2019. [Online]. Available: http://fimi.ua.ac.be/data/
- [56] D. M. Powers, "Applications and explanations of Zipf's law," in Proc. Joint Conf. New Methods Lang. Process. Comput. Natural Lang. Learn. Stroudsburg, PA, USA: Association for Computational Linguistics, 1998, pp. 151-160.
- [57] A. Rousskov and D. Wessels, "High performance benchmarking with web polygraph," Softw.-Pract. Exper., vol. 34, no. 2, pp. 187-211, 2003.
- [58] T. Benson, A. Akella, and D. A. Maltz, "Network traffic characteristics of data centers in the wild," in Proc. 10th Annu. Conf. Internet Meas. (IMC), 2010, pp. 267-280.
- [59] Hash Website. Accessed: May 19, 2018. [Online]. Available: http://burtleburtle.net/bob/hash/evahash.html



Xilai Liu graduated from Peking University, advised by T. Yang. He is currently pursuing the Ph.D. degree with the Institute of Computing Technology, Chinese Academy of Sciences (CAS), and the University of Chinese Academy of Sciences (UCAS). His research interests include data sketches, in-network computing, and data stream processing systems.



Yan Xu received the bachelor's degree from Peking University, advised by T. Yang. He is currently pursuing the M.Sc. degree with the University of Maryland. His research interest includes artificial intelligence.



Peng Liu is currently pursuing the master's degree with Peking University, advised by T. Yang. He has published several papers in network area. He is interested in networks and data stream processing.





Tong Yang (Member, IEEE) received the Ph.D. degree in computer science from Tsinghua University in 2013. He visited the Institute of Computing Technology, Chinese Academy of Sciences (CAS). He is currently an Associate Professor with the Department of Computer Science, Peking University. He has published papers in SIGCOMM, SIGKDD, SIGMOD, SIGCOMM CCR, VLDB, ATC, ToN, ICDE, and INFOCOM. His research interests include network measurements, sketches, IP lookups, Bloom filters, sketches, and KV stores.

Jiaqi Xu (Graduate Student Member, IEEE) graduated from Peking University, with double degree in physics and computer science. He is currently with the Queen Mary University of London. His research interests include wireless communication and network measurements.



Lun Wang received the bachelor's degree from Peking University, advised by T. Yang. He is currently pursuing the Ph.D. degree with the EECS Department, UC Berkeley, advised by Prof. D. Song. He is also a young passionate researcher with a broad interest in privacy and security, including differential privacy, applied cryptography, programming language theory, robust machine learning, and blockchain systems.



Gaogang Xie received the Ph.D. degree in computer science from Hunan University in 2002. He is currently a Professor with the Computer Network Information Center (CNIC), Chinese Academy of Sciences (CAS), and the University of Chinese Academy of Sciences (UCAS). His research interests include internet architecture, packet processing and forwarding, and internet measurement.



Xiaoming Li (Senior Member, IEEE) is currently a Professor in computer science and technology and the Director of the Institute of Network Computing and Information Systems (NCIS), Peking University, China. His current research interests include search engine and web mining. He led the effort of developing a Chinese search engine (Tianwang) since 1999, and is the Founder of the Chinese web archive (Web InfoMall).



Steve Uhlig received the Ph.D. degree in applied sciences from the Catholic University of Louvain, Belgium, in 2004. From 2004 to 2006, he was a Post-Doctoral Fellow with the Belgian National Fund for Scientific Research (F.N.R.S.). His thesis won the annual IBM Belgium/F.N.R.S. Computer Science Prize 2005. From 2004 to 2006, he was a Visiting Scientist at Intel Research Cambridge, U.K., and at the Applied Mathematics Department, The University of Adelaide, Australia. From 2006 to 2008, he was with Delft University of Technology,

The Netherlands. Prior to joining Queen Mary, he was a Senior Research Scientist with Technische Universität Berlin/Deutsche Telekom Laboratories, Berlin, Germany. In January 2012, he was a Professor of networks and Head of the Networks Research Group, Queen Mary University of London. From 2012 to 2016, he was a Guest Professor with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.