Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Algorithms for Data and Computation Privacy

Algorithms for Data and Computation Privacy

Published by Willington Island, 2021-07-23 03:59:41

Description: This book introduces the state-of-the-art algorithms for data and computation privacy. It mainly focuses on searchable symmetric encryption algorithms and privacy preserving multi-party computation algorithms. This book also introduces algorithms for breaking privacy, and gives intuition on how to design algorithm to counter privacy attacks. Some well-designed differential privacy algorithms are also included in this book.

Driven by lower cost, higher reliability, better performance, and faster deployment, data and computing services are increasingly outsourced to clouds. In this computing paradigm, one often has to store privacy sensitive data at parties, that cannot fully trust and perform privacy sensitive computation with parties that again cannot fully trust. For both scenarios, preserving data privacy and computation privacy is extremely important.

ALGORITHM'S THEOREM

Search

Read the Text Version

Part II Privacy Preserving Computation

Chapter 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks 6.1 Introduction 6.1.1 Background and Motivation Virtual Private Network (VPN) is a widely deployed technology that allows roaming users to securely use a remote computer on the public Internet as if that computer were residing on their organization’s network, which henceforth allows roaming users to access some resources that are only accessible from their organization’s network. VPN works in the following manner. Suppose IBM sends a field representative to one of its customers, say Michigan State University (MSU). Assume that MSU’s IP addresses are in the range 1.1.0.0–1.1.255.255 and IBM’s IP addresses are in the range 2.2.0.0–2.2.255.255. To access resources (say a confidential customer database server with IP address 2.2.0.2) that are only accessible within IBM’s network, the IBM representative uses an MSU computer (or his laptop) with an MSU IP address (say 1.1.0.10) to establish a secure VPN tunnel to the VPN server (with IP address 2.2.0.1) in IBM’s network. Upon establishing the VPN tunnel, the IBM representative’s computer is temporarily assigned a virtual IBM IP address (say 2.2.0.25). Using the VPN tunnel, the IBM representative can access any computer on the Internet as if his computer were residing on IBM’s network with IP address 2.2.0.25. The payload of each packet inside the VPN tunnel is another packet (to or from the newly assigned IBM IP address 2.2.0.25), which is typically encrypted. Figure 6.1 illustrates an example packet that traverses from the IBM representative’s computer on MSU’s network to the customer database server in IBM’s network. While the VPN tunnel is very useful for the IBM representative, it imposes security threats on MSU’s network because MSU’s firewall does not know what traffic is flowing inside the VPN tunnel. For example, if MSU’s firewall blocks access to a remote site (say www.malicious.com) or disallows machines to run peer- © The Editor(s) (if applicable) and The Author(s), under exclusive license to 139 Springer Nature Switzerland AG 2021 A. X. Liu, R. Li, Algorithms for Data and Computation Privacy, https://doi.org/10.1007/978-3-030-58896-0_6

140 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks Fig. 6.1 A typical VPN example to-peer applications due to copyright concerns, MSU’s firewall cannot enforce its policies on the IBM representative’s computer although that computer is physically on MSU’s network. Thus, the VPN tunnel opens a hole to MSU’s firewall that may allow unwanted traffic to flow in and out. Having such a hole is very dangerous because viruses or worms could flood in through it to the IBM representative’s computer first and then further spread to other computers on MSU’s network. 6.1.2 Technical Challenges This problem is technically challenging. First, MSU cannot simply block VPN connections because otherwise the IBM representative may fail to perform his duties. Second, MSU cannot share its firewall policy with IBM. Firewall policies are typically kept confidential due to security and privacy concerns. Knowing the firewall policy of a network could allow attackers to easily spot the security holes in the policy and launch corresponding attacks. A firewall policy also reveals the IP addresses of important servers, which are usually kept confidential to reduce the chance of being attacked. Furthermore, from a firewall policy, one may derive the business relationship of the organization with their partners. Third, IBM cannot share the traffic in its VPN tunnel with MSU due to security and privacy concerns. For example, IBM may want to keep the IP address of its customer database server confidential to reduce the likelihood of being attacked. One main purpose of VPN is to achieve such confidentiality. The fundamental problem in the above application is: how can we collaboratively enforce firewall policies in a privacy preserving manner for VPN tunnels in an open distributed environment? A satisfactory solution to this problem should meet

6.1 Introduction 141 the following three requirements: (1) The request owner cannot gain any more knowledge on the policy after any number of runs of the protocol than they would by brute force probing of the policy. We refer to this requirement as policy privacy. (2) It should be computationally infeasible for the policy owner to reveal a request. We refer to this requirement as request privacy. (3) The overhead of the solution should be marginal. Timely processing of every request (or packet) is critical for distributed applications. We refer to this requirement as protocol efficiency. Throughout this chapter, we use “MSU” to represent the policy owner and “IBM” to represent the request owner. 6.1.3 Limitations of Prior Art Although this is a fundamentally important problem, it is largely underinvestigated. The state-of-the-art on this problem is the seminal work in [7], where Cheng et al. proposed a scheme called CDCF. However, CDCF is vulnerable to selective policy updating attacks, by which the policy owner can quickly reveal the request of the other party. Furthermore, CDCF is inefficient because it uses commutative encryption functions (such as the Pohlig-Hellman Exponentiation Cipher [25] and Secure RPC Authentication (SRA) [17]), which are extremely expensive in nature, as the core cryptography primitive. 6.1.4 Our Solution In this chapter, we present VGuard, a secure and efficient framework for col- laborative enforcement of firewall policies. In VGuard, different from CDCF, the policy owner does not know which rule matches which request; thus, it makes the selective policy updating attacks infeasible. Furthermore, unlike CDCF, VGuard obfuscates rule decisions, which prevents MSU from knowing the decision for the given packet. To make VGaurd efficient, we propose a new oblivious comparison scheme, called Xhash, which uses XOR and secure hash functions. Xhash is three orders of magnitude faster than the commutative encryption scheme used in CDCF. Moreover, VGuard uses decision diagrams to process packets, which is much faster than the linear search used in CDCF. By side by side comparison, our experimental results show that VGuard is 552 times faster than CDCF on MSU side and 5035 times faster than CDCF on IBM side.

142 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks 6.1.5 Key Contributions We make the following three key contributions in this chapter. First, we propose Xhash, a very efficient oblivious comparison scheme that simply uses XOR and secure hash functions. Second, we propose VGuard, a privacy preserving framework for collaborative enforcement of firewall policies. Third, we implement both VGuard and CDCF and perform extensive experiments to evaluate their performance. 6.2 Threat Model First, we assume that the two parties of policy owner MSU and request owner IBM are semi-honest; that is, they follow the preestablished VGuard protocol, but the policy owner may attempt to reveal the request and the request owner may attempt to reveal the policy. In particular, the enforcement party IBM does enforce the decision made by MSU. The assumption that the two parties follow the VGuard protocol can be realized by the service level agreement between MSU and IBM. Furthermore, we assume that neither MSU nor IBM has the computational power to break secure hash functions such as HMAC-MD5 or HMAC-SHA1 [11, 19, 26]. Second, we assume that there exists a third party that facilitates the execution of our protocol. This third party shares a secret key with MSU. We assume that this third party follows our protocol and will collude with neither MSU nor IBM. Third, we assume that between any two of the three parties, MSU, IBM, and the third party, there exists a reliable and secure channel. These channels can be established using protocols such as SSL. Our VGuard protocol runs inside these channels. Thus, we do not consider the network level attacks on the communication channels that VGuard is built upon. 6.3 Background We first formally define the concepts of fields, packets, and firewalls. A field Fi is a variable of finite length (i.e., of a finite number of bits). The domain of field Fi of w bits, denoted D(Fi), is [0, 2w − 1]. A packet over the d fields F1, · · · , Fd is a d-tuple (p1, · · · , pd ) where each pi (1 ≤ i ≤ d) is an element of D(Fi). Firewalls usually check the following five fields: source IP address, destination IP address, source port number, destination port number, and protocol type. The lengths of these packet fields are 32, 32, 16, 16, and 8 respectively. We use to denote the set of all packets over fields F1, · · · , Fd . It follows that is a finite set and | | = |D(F1)| × · · · × |D(Fd )|, where | | denotes the number of elements in set and |D(Fi)| denotes the number of elements in set D(Fi).

6.4 Oblivious Comparison 143 Table 6.1 An example firewall Rule Src. IP Dest. IP Src. Port Dest. Port Prot. Action r1 1.2.*.* 192.168.0.1 * 25 TCP Accept r2 * * ** * Discard A rule has the form predicate → decision . A predicate defines a set of packets over the fields F1 through Fd , and is specified as F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd where each Si is a subset of D(Fi) and is specified as either a prefix or a range. A prefix {0, 1}k{∗}w−k with k leading 0s or 1s for a packet field of length w denotes the range [{0, 1}k{0}w−k, {0, 1}k{1}w−k]. For example, prefix 01** denotes the range [0100, 0111]. A rule F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd → decision is a prefix rule if and only if each Si is represented as a prefix. In firewall rules, source IP addresses, destination IP addresses, and protocol types are typically specified as prefixes, and source ports and destination ports are typically specified as ranges. A packet (p1, · · · , pd ) matches a predicate F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd and the corresponding rule if and only if the condition p1 ∈ S1 ∧ · · · ∧ pd ∈ Sd holds. For firewalls, the typical decisions include permit, deny, permit with logging, and deny with logging. A rule F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd → decision is called a singleton rule if and only if |Si| = 1 for every i. A sequence of rules r1, · · · , rn is complete if and only if for any packet p, there is at least one rule in the sequence that p matches. To ensure that a sequence of rules is complete and thus is a firewall, the predicate of the last rule is usually specified as F1 ∈ D(F1) ∧ · · · ∧ Fd ∈ D(Fd ). A firewall is a sequence of rules that is complete. Two rules in a firewall may overlap; that is, there exists at least one packet that matches both rules. Furthermore, two rules in a firewall may conflict; that is, the two rules not only overlap but also have different decisions. Firewalls typically resolve conflicts by employing a first-match resolution strategy where the decision for a packet p is the decision of the first (i.e., highest priority) rule that p matches in the firewall. Table 6.1 shows an example firewall. The format of these rules is based upon the format used in Cisco Access Control Lists. 6.4 Oblivious Comparison In this section, we consider the following oblivious comparison problem. Suppose we have two parties, denoted MSU and IBM, where MSU has a private number N1 and IBM has a private number N2. MSU wants to compare whether N1 = N2; however, neither MSU nor IBM wants to disclose its number to others. If N1 = N2, no party should learn the value of the other party. This is a technically challenging problem because MSU needs to have some information about N2 to enable the comparison; yet, such information about N2 should not allow MSU to reveal the

144 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks value of N2. Next, we introduce the concept of oblivious comparison functions and an oblivious comparison protocol based on such functions. Oblivious Comparison Functions Two functions f1 and f2 are called a pair of oblivious comparison functions if and only if they satisfy the following four properties: 1. Secrecy: Neither f1(x, K) nor f2(x, K) reveals the values of x and K. 2. Nondeducibility: Given x and f2(x, K), it is computationally infeasible to compute K. 3. Commutativity: For any x, K1, K2, we have f2(f1(x, K1), K2) = f2(f1(x, K2), K1). 4. Distinguishability: For any x, y, and K, if x = y, then we have f1(x, K) = f1(y, K) and f2(x, K) = f2(y, K). Here f1 is called the inner oblivious comparison function and f2 is called the outer oblivious comparison function. We discuss the construction of f1 and f2 later. Oblivious Comparison Protocol Assuming that we have a pair of oblivious com- parison functions f1 and f2, MSU and IBM can achieve oblivious comparison in the following three steps. Assume that MSU has a secret key K1 and IBM has a secret key K2. First, MSU computes f1(N1, K1) and sends the result to IBM. Because of the secrecy property of f1, IBM cannot reveal the values of N1 and K1. Second, after receiving f1(N1, K1) from MSU, IBM computes f2(f1(N1, K1), K2) and sends the result to MSU. Because of the nondeducibility property of f2, MSU cannot compute the value of IBM’s secret key K2. Third, IBM computes f1(N2, K2) and sends the result to MSU. Because of the secrecy property of f1, from f1(N2, K2), MSU cannot reveal the values of N2 and K2. After receiving f1(N2, K2) from IBM, MSU computes f2(f1(N2, K2), K1) and compares the result with f2(f1(N1, K1), K2), which was received from IBM in the second step. Because of the commutativity and distinguishability properties of f1 and f2, N1 = N2 if and only if f2(f1(N1, K1), K2) = f2(f1(N2, K2), K1). Figure 6.2 shows the oblivious comparison protocol. The Xhash Protocol We propose a simple and efficient protocol, called Xhash, to achieve oblivious comparison. Xhash works as follows. First, MSU sends N1 ⊕ K1 to IBM. Then, IBM computes HMACk(N1 ⊕ K1 ⊕ K2) and sends the result to MSU. Second, IBM sends N2⊕K2 to MSU. Third, MSU computes HMACk(N2⊕K2⊕K1) and compares it with HMACk(N1 ⊕ K1 ⊕ K2), which was received from IBM. Fig. 6.2 The oblivious MSU f1(N1, K1) IBM comparison protocol f2(f1(N1, K1), K2) Check whether f2(f1(N1, K1), K2) f1(N2, K2) = f2(f1(N2, K2), K1)

6.4 Oblivious Comparison 145 Fig. 6.3 The Xhash protocol MSU N1 K1 IBM HMACk(N1 K1 K2) Check whether N2 K2 HMACk(N1 K1 K2) = HMACk(N2 K2 K1) Finally, the condition N1 = N2 holds if and only if HMACk(N2 ⊕ K2 ⊕ K1) = HMACk(N1 ⊕ K1 ⊕ K2). Figure 6.3 illustrates the Xhash protocol. The above function HMAC is a keyed-Hash Message Authentication Code, such as HMAC-MD5 or HMAC-SHA1, which satisfies the one-wayness property (i.e., given H MACk(x), it is computationally infeasible to compute x and k) and the collision resistance property (i.e., it is computationally infeasible to find two distinct numbers x and y) such that H MACk(x) = H MACk(y). Note that the key k is shared between MSU and IBM. Although hash collisions for HMAC do exist in theory, the probability of collision is negligibly small in practice. Furthermore, by properly choosing the shared key k, we can safely assume that HMAC has no collision. To prevent brute force attacks, we need to choose key K to be sufficiently long. In our implementation, we choose K to be 128 bits. Note that in our framework x is at most 38 bits. To meet the length of K such that x can be XORed with K, we first use a pseudo random generation function R to generate x1 = R(x). Second, we apply R to x1 to generate x2 = R(x1). Repeat this process until we can concatenate x, x1, x2, · · · to form a bit string that meets the length of K. Extra bits in the concatenation beyond the length of K are discarded. The correctness of Xhash follows from the commutative property of XOR operation (i.e., x ⊕ K1 ⊕ K2 = x ⊕ K2 ⊕ K1) and the one-wayness and collision resistance properties of HMAC functions. Nondeducibility Property of f1 Note that if f1 does not satisfy the nondeducibility property, when N1 = N2, MSU is able to compute K2 because MSU knows both f1(N2, K2) and N2. This is fine if MSU and IBM only want to compare two numbers where K2 will be used only once. However, in VGuard, MSU and IBM need to compare MSU’s firewall with all the packets in the VPN tunnel rather than comparing two numbers. IBM will apply f1 to all the packets with its key K2. In this case, as long as MSU reveals K2, it can compute the plaintext of all these packets. To address this issue, we have two options. The first option is that we can introduce a third party to prevent MSU from knowing f1(N2, K2) such that MSU cannot reveal K2. The second option is that instead of introducing the third party, we find a function f1 that satisfies the nondeducibility property. To our best knowledge, the only function that satisfies the nondeducibility property and the four properties of oblivious comparison functions is the commutative encryption function such as the Pohlig-Hellman Exponentiation Cipher [25]. A commutative encryption function satisfies the following four properties, where (x)K denotes the encryption of x using

146 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks key K: (1) Given x and (x)K , it is computationally infeasible to compute the value of K. (2) Given x, K1, and K2, we have ((x)K1 )K2 = ((x)K2 )K1 . (3) Given x, y, and K, if x = y, we have (x)K = (y)K . (4) Given K, (x)K can be decrypted in polynomial time. However, such commutative encryption functions are computationally too expensive. Thus, we choose the first option in our VGuard framework. We defer the discussion of preventing MSU from knowing K2 in Sect. 6.6. 6.5 Bootstrapping Protocol In the bootstrapping protocol, MSU first converts its firewall policy to a set of non- overlapping prefix rules. Second, MSU converts each prefix to a number. Third, MSU applies an XOR operation to every number using its secret key K1. Finally, MSU sends the anonymized policy to IBM. IBM then applies XOR and HMAC operations to every number in the received policy using its secret key K2, obfuscates the decision of each rule, and shuffle the resulting rules. To complete the process, IBM sends the resulting policy back to MSU. Figure 6.4 illustrates the bootstrapping protocol. Converting a firewall policy to a set of non-overlapping prefix rules consists of four steps: FDD construction, range conversion, prefix numericalization, and rule generation. 6.5.1 FDD Construction In this step, MSU converts its firewall policy to an equivalent Firewall Decision Diagram [14]. A Firewall Decision Diagram (FDD) with a decision set DS and over fields F1, · · · , Fd is an acyclic and directed graph that has the following five properties: (1) There is exactly one node that has no incoming edges. This node is called the root. The nodes that have no outgoing edges are called terminal nodes. MSU (K1) IBM (K2) 1. Convert firewall to non-overlapping rules 2. XOR the rules with K1 3. Send the rules to IBM 4. XOR the received rules with IBM’s key K2 5. Apply HMAC to the rules 6. Send the resulting rules back to MSU 7. Build a decision tree from the received rules Fig. 6.4 The bootstrapping protocol

6.5 Bootstrapping Protocol 147 Fig. 6.5 Example of F1∈ [4,11] ∧ F2 ∈[0, 5] → accept (a) bootstrapping at MSU F1∈ [0, 3] ∧F2 ∈[2, 7] → accept F1∈ [12,15]∧ F2∈[2, 7] → accept F1∈ [0,15]∧ F2 ∈[0,15] → discard FDD construction [4, 11] F1 [0, 3] [12,15] F2 F2 (b) [0,5] [6,15] [2,7] [0,1] a [8,15] dad Prefix FDD conversion F1 00** 11** 01** (c) 10** F2 F2 00** 011* 001* 000* 010* 1*** 01** 1*** adad Prefix numericalization F1 00100 01100 11100 10100 (d) F2 F2 00100 01110 00110 00010 01010 11000 01100 11000 adad Applying XOR 01100⊕ K1 F1 0010101⊕1K001⊕ K1 10100⊕ K1 F2 F2 (e) 010001100⊕0⊕KK11 011111000⊕0 ⊕K1K1 0101001010⊕⊕KK11 001011000⊕0 ⊕K1K1 ad ad Non-overlapping rule generation (01100 ⊕ K1, 00100 ⊕ K1) → a (00100 ⊕ K1, 00110 ⊕ K1,) → a (01100 ⊕ K1, 01010 ⊕ K1) → a (00100 ⊕ K1, 01100 ⊕ K1,) → a (10100 ⊕ K1, 00100 ⊕ K1) → a (11100 ⊕ K1, 00110 ⊕ K1,) → a (10100 ⊕ K1, 01010 ⊕ K1) → a (01100 ⊕ K1, 01110 ⊕K1) → d (f)(11100 ⊕ K1, 01100 ⊕ K1,) → a (01100 ⊕ K1, 11000 ⊕K1) → d (10100 ⊕ K1, 01110 ⊕K1) → d (00100 ⊕ K1, 00010⊕ K1,) → d (10100 ⊕ K1, 11000 ⊕K1) → d (00100 ⊕ K1, 11000 ⊕ K1) → d (11100 ⊕ K1, 00010⊕ K1,) → d (11100 ⊕ K1, 11000 ⊕ K1) → d

148 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks (2) Each node v has a label, denoted F (v). If v is a nonterminal node, then F (v) ∈ {F1, · · · , Fd }. If v is a terminal node, then F (v) ∈ DS. (3) Each edge e:u → v is labeled with a nonempty set of integers, denoted I (e), where I (e) is a subset of the domain of u’s label (i.e., I (e) ⊆ D(F (u))). (4) A directed path from the root to a terminal node is called a decision path. No two nodes on a decision path have the same label. (5) The set of all outgoing edges of a node v, denoted E(v), satisfies the following two conditions: (a) Consistency: I (e) ∩ I (e ) = ∅ for any two distinct edges e and e in E(v). (b) Completeness: e∈E(v) I (e) = D(F (v)). Figure 6.5a shows an example firewall policy over two fields F1 and F2, where the domain of each field is [0, 15]. The FDD that is semantically equivalent to this firewall policy is shown in Fig. 6.5b. Note that in labeling terminal nodes, we use “a” as a shorthand for “accept” (i.e., “permit”) and “d” as a shorthand for “discard” (i.e., “deny”). The algorithm for converting a firewall to an FDD is in [21]. 6.5.2 Range Conversion For every edge e in the FDD, MSU converts its label I (e) to the minimum set of prefixes whose union is equal to I (e). As one prefix can be converted to one range, a range may be converted to multiple prefixes. In converting a range to prefixes, we want to find the minimum set of prefixes such that the union of the prefixes is equal to the range. For example, given range [0001, 1110], the corresponding minimum set of prefixes would be 0001, 001∗, 01 ∗ ∗, 10 ∗ ∗, 110∗, 1110. The minimum number of prefixes for representing an integer interval [a, b], where a and b are two numbers of w bits, is at most 2w − 2 [15]. We call such FDDs, where each edge is labeled by a set of prefixes, prefix FDDs. Figure 6.5c shows the prefix FDD converted from the FDD in Fig. 6.5b. 6.5.3 Prefix Numericalization In this step, MSU converts each prefix in the FDD to a concrete number. This process is called prefix numericalization. A prefix numericalization function f needs to satisfy the following two properties: (1) for any prefix P, f (P) is a binary string; (2) for any two prefixes P1 and P2, f (P1) = f (P2) if and only if P1 = P2. There are many ways to do prefix numericalization. We use the prefix numericalization scheme used in [4]. Given a prefix b1b2 · · · bk ∗ · · · ∗ of w bits, we first insert 1 after bk. The bit 1 represents a separator between b1b2 · · · bk and ∗ · · · ∗. Second, we replace every * by 0. Note that if there is no * in a prefix, we add 1 at the end of this prefix. For example, 101* is converted to 10110. After prefix numericalization, the FDD in Fig. 6.5c becomes the one in Fig. 6.5d.

6.5 Bootstrapping Protocol 149 6.5.4 Applying XOR by MSU After prefix numericalization, MSU applies XOR to every number in the numer- icalized FDD using its secret key K1. Figure 6.5e shows the numericalized and XORed FDD. Then, MSU generates non-overlapping rules from the numericalized and XORed FDD. From each decision path in the FDD, MSU generates a set of non-overlapping rules. For example, from the left-most decision path in Fig. 6.5e, MSU generates the following four non-overlapping rules: F1 ∈ 01100 ⊕ K1 ∧ F2 ∈ 00100 ⊕ K1 → a, F1 ∈ 01100 ⊕ K1 ∧ F2 ∈ 01010 ⊕ K1 → a, F1 ∈ 10100 ⊕ K1 ∧ F2 ∈ 00100 ⊕ K1 → a, F1 ∈ 10100 ⊕ K1 ∧ F2 ∈ 01010 ⊕ K1 → a, Figure 6.5f shows the disjoint rules generated from the FDD in Fig. 6.5e. After non-overlapping rules are generated, MSU sends the resulting policy to IBM. If MSU needs to prevent IBM from knowing the number of non-overlapping prefix rules that MSU’s firewall is converted to, MSU can randomly insert some dummy rules formulated by out-of-range dummy numbers and random decisions into the set of non-overlapping numerical rules before applying XOR. An out-of- range dummy number is a number that corresponds to no prefix. Thus, no packet will match a dummy rule that consists of at least one out-of-range dummy number. According to our prefix numericalization scheme, there is only one dummy number in which every bit is 0. To create more dummy numbers, we can simply add extra bits. Note that IBM knowing the number of converted rules is not much a concern. As we will show in the experimental results, the number of non-overlapping prefix rules that a firewall is converted to far exceeds the number of original rules. 6.5.5 Applying XOR and HMAC by IBM Upon receiving a sequence of non-overlapping numerical rules from MSU, IBM further applies XOR and HMAC to every number in the received policy using its secret key K2. To destroy the correspondence between the rules after applying XOR and HMAC and the rules received from MSU, IBM randomly shuffles the resulting rules after applying XOR and HMAC. To prevent MSU from knowing the decision of IBM’s packet, IBM obfuscates the decision of each rule by mapping each decision to another distinct decision. More formally, the decision obfuscation is a one-to-one mapping function f from the set of all decisions to the same set of all decisions. IBM stores the mapping function f in its decision obfuscation table and replaces the decision of each rule in ri, say di, by f (di). To prevent MSU from statistically discovering the obfuscation mapping function f , for any decision di, IBM needs

150 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks (01100 ⊕ K1, 00100 ⊕ K1) → a (00100 ⊕ K1, 00110⊕ K1 ) → a (a) (01100 ⊕ K1, 01010 ⊕ K1) → a (00100 ⊕ K1, 01100 ⊕ K1 ) → a (10100 ⊕ K1, 00100 ⊕ K1) → a (11100 ⊕ K1, 00110⊕ K1 ) → a (10100 ⊕ K1, 01010 ⊕ K1) → a (11100 ⊕ K1, 01100⊕ K1 ) → a (01100 ⊕ K1, 01110 ⊕ K1) → d (00100 ⊕ K1, 00010⊕ K1 ) → d (01100 ⊕ K1, 11000 ⊕ K1) → d (00100 ⊕ K1, 11000 ⊕ K1) → d (10100 ⊕ K1, 01110 ⊕ K1) → d (11100 ⊕ K1, 00010⊕ K1 ) → d (10100 ⊕ K1, 11000 ⊕ K1) → d (11100 ⊕ K1, 11000 ⊕ K1) → d Applying XOR and HMAC (h(01100⊕K1 ⊕K 2 ), h(00100⊕K1 ⊕K 2 ))→a (h(00100⊕K1⊕ K2 ), h( 00110⊕K1⊕ K2 )) →a (h(01100⊕K1 ⊕K 2 ), h(01010⊕K1 ⊕K 2 ))→a (h(00100⊕K1 ⊕ K2 ), h(01100⊕K1⊕ K2 )) →a (h(10100⊕K1 ⊕K 2 ), h(00100⊕K1 ⊕K 2 ))→a (b)(h(11100 ⊕K1⊕ K2), h(00110 ⊕K1⊕ K2))→a (h(10100⊕K1 ⊕K 2 ), h(01010⊕K1 ⊕K 2 ))→a (h(01100⊕K1 ⊕K 2 ), h(01110⊕K1 ⊕K 2 ))→d (h(11100 ⊕K1 ⊕ K2), h(01100 ⊕K1⊕ K2))→a (h(01100⊕K1 ⊕K 2 ), h(11000⊕K1 ⊕K 2 ))→d (h(00100⊕K1 ⊕ K2 ), h(00010⊕K1⊕ K2 )) →d (h(10100⊕K1 ⊕K 2 ), h(01110⊕K1 ⊕K 2 ))→d (h(00100⊕K1 ⊕ K2 ), h(11000⊕ K1⊕ K2 )) →d (h(10100⊕K1 ⊕K 2 ), h(11000 ⊕K1 ⊕K 2 ))→d (h(11100⊕K1 ⊕K2), h (00010⊕ K1 ⊕ K2 )) →d (h(11K00⊕K1⊕ K2), h(11000⊕ K1⊕ K2))→d Rule shuffling and decision obfuscation (h(01100⊕K1 ⊕K 2 ), h(00100⊕K1 ⊕K 2 ))→d (h(00100⊕K1⊕ K2 ), h( 00110⊕K1⊕ K2 )) →d (h(01100⊕K1 ⊕K 2 ), h(01010⊕K1 ⊕K 2 ))→d (h(01100⊕K1 ⊕ K2 ), h(11000⊕K1⊕ K2 )) →a (h(00100⊕K1 ⊕K 2 ), h(11000⊕K1 ⊕K 2 ))→a (h(11100 ⊕K1⊕ K2), h(00110 ⊕K1⊕ K2))→d (h(10100⊕K1 ⊕K 2 ), h(01010⊕K1 ⊕K 2 ))→d (c)(h(11100 ⊕K1 ⊕ K2), h(01100 ⊕K1⊕ K2))→d (h(01100⊕K1 ⊕K 2 ), h(01110⊕K1 ⊕K 2 ))→a (h(00100⊕K1 ⊕K 2 ), h(01100⊕K1 ⊕K 2 ))→d (h(00100⊕K1 ⊕ K2 ), h(00010⊕K1⊕ K2 )) →a (h(10100⊕K1 ⊕K 2 ), h(01110⊕K1 ⊕K 2 ))→a (h(10100⊕K1 ⊕ K2 ), h(00100⊕ K1⊕ K2 )) →d (h(10100⊕K1 ⊕K 2 ), h(11000 ⊕K1 ⊕K 2 ))→d (h(11100⊕K1 ⊕K2), h (00010⊕ K1⊕ K2 )) →a (h(11100⊕K1 ⊕ K2), h(11000⊕ K1⊕ K2))→a Decision obfuscation table a maps to d (d) d maps to a Fig. 6.6 Example of bootstrapping at IBM to ensure that the number of rules that have decision di is the same. This can be easily achieved by adding dummy rules. Due to the rule shuffling and decision obfuscation, MSU cannot correlate the received rules with the original rules, and also cannot identify the decision of each rule. Figure 6.6b shows the rules after IBM applies XOR and HMAC, and Fig. 6.6c shows the rules after IBM shuffles rules and obfuscates decisions. The obfuscation mapping function is shown in Fig. 6.6d. Note that in these figures h denotes the HMAC function. Finally, IBM sends the resulting rules to MSU. 6.6 Filtering Protocol In the filtering protocol, each time IBM receives a packet that originated from or was sent to its representative, IBM first converts the packet to prefixes and then further converts each prefix to a number using the same prefix numericalization scheme. Then, IBM XORs every number in the packet with its secret key K2, then sends the resulting packet to the third party. The third party further applies XOR and HMAC to

6.6 Filtering Protocol Third Party (K1) 151 MSU (K1) IBM (K2) 4. Apply XOR with K1 and HMAC to 1. Convert a packet to 5 sequences of numbers every number of the received packet 2. Apply XOR to every number with K2 3. Send the resulting packet represented by 5 5. Send the resulting packet to MSU sequences of numbers to the third party 6. Make decision using the decision tree 8. Enforce the decision 7. Send the decision to IBM Fig. 6.7 The filtering protocol the received packet with the secret key K1. Note that the third party and MSU share key K1. Then, the third party sends the resulting packet to MSU. MSU then searches the obfuscated decision for the packet using the received firewall policy from IBM in the bootstrapping protocol. Finally, MSU sends the obfuscated decision to IBM and IBM finds the original decision using its decision obfuscation table. Figure 6.7 shows the filtering protocol. 6.6.1 Address Translation When the IBM VPN server sends (or receives) a packet on behalf of its represen- tative in MSU, the source (or destination) IP address of the packet is an IBM IP address that the IBM VPN server assigned to the IBM representative’s computer in MSU. To inquiry the decision for this packet from MSU, IBM needs to replace the source (or destination) IP address in the packet by IBM representative’s MSU IP address. Otherwise, it is likely that MSU firewall policy blocks all incoming packets that are not sent to MSU and all outgoing packets that are not originated from MSU. Take the example in Fig. 6.1, the packet that IBM should ask MSU for a decision has a source IP 1.1.0.10 and a destination IP 2.2.0.2. 6.6.2 Prefix Membership Verification We first define two concepts: k-prefix and prefix family. We call the prefix {0, 1}k{∗}w−k with k leading 0s and 1s followed by w − k ∗s a k-prefix. If a value x matches a k-prefix, the first k bits of x and the k-prefix are the same. For example, if x ∈ 01 ∗ ∗ (i.e., x ∈ [0100, 0111]), then the first two bits of x must be 01. Given a binary number b1b2 · · · bw of w bits, the prefix family of this number is the set of w + 1 prefixes {b1b2 · · · bw, b1b2 · · · bw−1∗, · · · , b1 ∗ · · · ∗, ∗ ∗ . . . ∗}, where the i-th prefix is b1b2 · · · bw−i+1 ∗ · · · ∗. We use P F (x) to represent the prefix family of x. For example, P F (0101) = {0101, 010∗, 01 ∗ ∗, 0 ∗ ∗∗, ∗ ∗ ∗∗}. Based on the above definitions, it is easy to draw the following conclusion: given a number x and a prefix P, x ∈ P if and only if P ∈ P F (x).

152 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks 6.6.3 Packet Preprocessing by IBM For each of the d fields of a packet, IBM first generates its prefix family. Second, IBM converts each prefix to a number using the same prefix numericalization scheme in the bootstrapping protocol. Third, IBM applies XOR to each number using its secret key K2. Last, IBM sends a sequence of d sets of numbers, which corresponds to the d fields of the packet, to the third party. For example, given a packet (0101, 0011) as shown in Fig. 6.8a, the prefix family of each field is shown in Fig. 6.8b. The result of prefix numericalization is shown in Fig. 6.8c. The final two sequences of numbers are shown in Fig. 6.8d. ( 0101 0011) (a) Prefix family construction 0101 0011 (b) 010* 001* 01** 00** 0*** 0*** **** **** Prefix numericalization 01011 00111 (c) 01010 00110 01100 00100 01000 01000 10000 10000 Applying XOR by IBM 01011⊕ K2 00111⊕ K2 (d) 01010⊕ K2 00110⊕ K2 01100⊕ K2 00100⊕ K2 01000⊕ K2 01000⊕ K2 10000⊕ K2 10000⊕ K2 Applying XOR and HMAC by the third party h(01011 ⊕ K2 ⊕ K1 ) h(00111 ⊕ K2 ⊕ K1 ) (e) h(01010 ⊕ K2 ⊕ K1 ) h(00110 ⊕ K2 ⊕ K1 ) h(01100 ⊕ K2 ⊕ K1 ) h(00100 ⊕ K2 ⊕ K1 ) h(01000 ⊕ K2 ⊕ K1 ) h(01000 ⊕ K2 ⊕ K1 ) h(10000 K2 ⊕ K1 ) h(10000 K2 ⊕ K1 ) ⊕ ⊕ Fig. 6.8 Example of packet preprocessing

6.6 Filtering Protocol 153 6.6.4 Packet Preprocessing by The Third Party Upon receiving the packet as d sequences of numbers from IBM, the third party further applies XOR using key K1 and HMAC to each number and then sends the resulting packet to MSU. Here we choose the third party, instead of MSU, to apply XOR and HMAC for the purpose of preventing MSU from knowing the IBM’s XOR results (e.g., Fig. 6.8d) before applying HMAC. Otherwise, MSU may break IBM’s secret key K2 and further reveal packet headers. If MSU knows IBM’s XOR results, to break K2, MSU first stores its rules before anonymization in the bootstrapping protocol (e.g., the rules generated from Fig. 6.5d). Let r1, · · · , rn denote these rules, where each rule rj (1 ≤ j ≤ n) is in the form (mj1, · · · , mjd ) → decj . In the filtering protocol, when MSU finds that a packet (p1, p2, · · · , pd ) matches a rule, according to the property of prefix membership verification, for each 1 ≤ i ≤ d, there must be a number ni in P F (pi) that is equal to one number in the set {m1i , · · · , min}. Third, for each 1 ≤ i ≤ d, MSU XORs ni ⊕ K2 received from IBM with every number in the set {mi1, · · · , min}. Because one of the numbers in {m1i , · · · , min} is equal to ni, then the resulting set, denoted Si, must contains K2. For example, if m1i = ni, then mi1 ⊕ ni ⊕ K2 = K2. Thus, for each packet P , we can compute a set S(P ) = S1 ∩· · ·∩Sd , which contains K2. When MSU receives a large set of packets P1, · · · , Pg, the set S(P1) ∩ · · · ∩ S(Pg) may only contains K2. After finding K2, MSU can reveal packet headers by applying XOR to every number of packets received from IBM using K2. However, in VGuard, using the third party to apply XOR and HMAC to packets eliminates this possibility. 6.6.5 Packet Processing by MSU Upon receiving the packet from the third party, MSU searches the obfuscated deci- sion for the packet using the resulting firewall rules from the bootstrapping protocol. Recall that each rule is represented as d numbers and an obfuscated decision. A packet (p1, · · · , pd ) matches a rule (m1, · · · , md ) → obfuscated decision if and only if the condition m1 ∈ P F (p1) ∧ · · · ∧ md ∈ P F (pd ) holds. Therefore, MSU can use linear search to find the first rule that the packet matches. Then, MSU sends the obfuscated decision to IBM and IBM finds the original decision using its decision obfuscation table. Because all the firewall rules resulted from the bootstrapping protocol are non-overlapping, there exists one and only one rule that the packet matches. For example, given the resulting firewall rules in Fig. 6.6c and the preprocessed packet in Fig. 6.8e, the only rule that matches the packet is (h(01100 ⊕ K2 ⊕ K1), h(00100 ⊕ K2 ⊕ K1)) → d. To improve search efficiency, MSU can use the following two techniques: decision tree and hash table. First, MSU converts the non-overlapping rules resulted from the bootstrapping protocol to an equivalent decision tree. For example, Fig. 6.9 shows the decision tree constructed from the firewall in Fig. 6.6c. Thus, MSU can

154 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks h(01100 ⊕ K1 ⊕ K2) F1 h(00100 ⊕ K1 ⊕ K2) h(10100 ⊕ K1 ⊕ K2) h(11100 ⊕ K1 ⊕ K2) F2 F2 h(00100 ⊕ K1 ⊕ K2) h(01110 ⊕ K1 ⊕ K2) h(00110 ⊕ K1 ⊕ K2) h(00010 ⊕ K1 ⊕ K2) h(01010 ⊕ K1 ⊕ K2) h(11000 ⊕ K1 ⊕ K2) h(01100 ⊕ K1 ⊕ K2) h(11000 ⊕ K1 ⊕ K2) a d ad Fig. 6.9 Decision tree constructed from Fig. 6.6c search the decision for a packet using the decision tree. Second, for the basic operation of testing mi ∈ P F (pi), MSU builds one hash table for each P F (pi) and then tests whether mi is in the hash table that constructed from P F (pi). 6.7 VGuard for Deep Packet Inspection With the growing need to filter malicious packets, advanced firewalls, as well as intrusion detection/prevention systems such as Snort [27], Bro [24], 3Com’s TippingPoint X505 [31], and a variety of Cisco Systems [8], examine not only packet headers but also packet payload by checking whether its payload contains some predefined strings in a signature database. More formally, given a string a1a2 · · · an and a packet payload s1s2 · · · sm where each ai (1 ≤ i ≤ n) and sj (1 ≤ j ≤ m) are characters, we want to check whether the string s1s2 · · · sm contains the sub-string sk+1sk+2 · · · sk+n that is the same as the string a1a2 · · · an. If so, the packet payload s1s2 · · · sm matches the string a1a2 · · · an. We can adapt our VGuard framework to deal with the cases where MSU’s firewall performs deep packet inspection. The basic idea is that MSU and IBM apply Xhash protocol to each character of every string in the signature database and each character of the packet payload, and check whether the resulting packet payload contains the resulting string. 6.7.1 The Bootstrapping Protocol In the bootstrapping protocol, MSU first applies XOR to every character of the strings in its signature database using its secret key K1, and then sends the resulting strings to IBM. To prevent IBM from knowing the number of strings in its signature database, MSU adds some random strings and XORs them with K1. Upon receiving the anonymized strings from MSU, IBM further applies XOR and HMAC operations to each character using its secret key K2. To prevent MSU from identifying the original string that a packet matches by comparing the number of characters in each resulting string with that in each original string, IBM adds some

6.7 VGuard for Deep Packet Inspection 155 dummy strings and XORs them with its secret key K2. Then, IBM obfuscates the decision associated with each string and shuffles the strings. At last, IBM sends the resulting strings back to MSU. Note that all the random strings added by MSU and the dummy strings added by IBM should have the default action, which is “permit”. Figure 6.10 shows the bootstrapping protocol for deep packet inspection. Suppose a intrusion detection system has n rules and the i-th (1 ≤ i ≤ n) rule has ci characters. In the bootstrapping protocol, the computation overhead of MSU and IBM is O ( n ci ), and the communication overhead between MSU and IBM is also O( i=1 n i=1 ci ). Considering three strings “eb, ebf , ecg” in Fig. 6.11a, b shows the anonymized string s after MSU applies XOR to these strings. Figure 6.11c shows the resulting strings after MSU adds the random string r1r2, where r1 and r2 denote two random characters. Figure 6.11d shows the resulting strings after IBM adds the dummy string d1d2, where d1 and d2 denote two random characters. Figure 6.11e shows the strings after IBM applies XOR and HMAC, and Fig. 6.11f shows the strings after IBM shuffles rules and obfuscates decisions. As the dummy strings that IBM generated are unlikely to match any packet, MSU may identify them and then delete them. To prevent MSU from identifying such strings, IBM can generate fake packets that match the dummy strings and periodically send them to MSU. 6.7.2 The Filtering Protocol In the filtering protocol, each time IBM receives a packet originated from or sent to its representative, IBM first applies XOR to every character in the packet payload using K2 and sends the resulting packet to the third party, which further applies XOR and HMAC to the packet payload using key K1 and then sends the resulting packet to MSU. String matching algorithms have been investigated for many years and several famous algorithms have been proposed, such as Aho-Corasick algorithm [2] and Commentz-Walter algorithm [9]. MSU can use these algorithms to search the obfuscated decision for the packet based on the received strings from IBM in the bootstrapping protocol. Finally, MSU sends the obfuscated decision to IBM and IBM finds the original decision using its decision obfuscation table. For example, given a packet that contains strings “ebkf ” as shown in Fig. 6.12a, the packet payload after IBM applies XOR is in Fig. 6.12b. Figure 6.12c shows the result payload after the third party applies XOR and HMAC. For the resulting strings in Fig. 6.11f, the only string in the signature database that matches the packet payload is h(e ⊕ K1 ⊕ K2), h(b ⊕ K1 ⊕ K2) → d. Fig. 6.10 Bootstrapping for MSU String K1 IBM deep packet inspection h(String K1 K2)

156 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks Fig. 6.11 Example of string eb → a (a) processing ebf → d ecg → d Applying XOR by MSU e⊕ K1, b ⊕ K1→ a (b) e⊕ K1, b ⊕ K1, f ⊕ K1→ d e⊕ K1, c ⊕ K1, g⊕ K1→ d Adding random strings by MSU e⊕ K1, b ⊕ K1→ a (c) e⊕ K1, b ⊕ K1, f ⊕ K1→ d e⊕ K1, c ⊕ K1, g⊕ K1→ d r1⊕ K1, r2⊕ K1, → a Adding random strings by MSU e⊕ K1, b ⊕ K1→ a (d) e⊕ K1, b ⊕ K1, f ⊕ K1→ d e⊕ K1, c ⊕ K1, g⊕ K1→ d r1⊕ K1, r2⊕ K1, → a d1 , d2 → a Applying XOR and HMAC by IBM h (e⊕ K1⊕ K2), h (b⊕ K1⊕ K2), → a (e) h (e⊕ K1⊕ K2), h (b⊕ K1⊕ K2), h ( f ⊕ K1⊕ K2), → d h (e⊕ K1⊕ K2), h (c⊕ K1⊕ K2), h ( g⊕ K1⊕ K2), → d h (r1⊕K1⊕ K2), h (r2⊕K1⊕ K2), → a h(d1⊕ K2) , h(d2⊕ K2) → a Applying XOR and HMAC by IBM h (e⊕ K1⊕ K2), h (c⊕ K1⊕ K2), h ( g⊕ K1⊕ K2), → a (f) h (e⊕ K1⊕ K2), h (b⊕ K1⊕ K2) → d h ( d1⊕ K2) , h ( d1⊕ K2) → d h (r1⊕K1⊕ K2), h (r2⊕K1⊕ K2), → d h (e⊕ K1⊕ K2), h(b⊕ K1⊕ K2), h ( f ⊕ K1⊕ K2), → d Decision obfuscation table a maps to d (g) d maps to a

6.8 Discussion 157 Fig. 6.12 Example of packet ... ebkf ... (a) payload processing Applying XOR by IBM ..., e ⊕ K2 , b ⊕ K2, k ⊕ K2, f ⊕ K2, ... (b) Applying XOR and HMAC by the third party ..., h (e⊕ K2⊕ K1), h (b⊕ K2⊕ K1), (c) h (k⊕ K2⊕ K1), h (f ⊕ K2⊕ K1), ... 6.8 Discussion 6.8.1 Firewall Updates When MSU updates its firewall policy, MSU and IBM need to run the bootstrapping protocol again. To prevent IBM from identifying the unchanged rules, in each run of the bootstrapping protocol, MSU and IBM should change their secret keys K1 and K2. Thus, the rules that IBM receives in each run of the bootstrapping protocol are totally different. Note that MSU and IBM do not need to run the bootstrapping protocol as long as MSU’s firewall remains the same. 6.8.2 Decision Caching A packet contains a header (with fields of source and destination IP addresses, source and destination port numbers, and protocol type) and a payload. For different packets with the same header, IBM only needs to ask MSU once and then cache the packet header along with its decision. Whenever the IBM representative builds a connection through the VPN tunnel, IBM first checks whether its cache has an entry that corresponds to the connection. If yes, then IBM executes that decision; if no, IBM asks MSU for the decision using the filtering protocol and then adds an entry into its cache. Because IBM may have multiple collaborators, IBM should record the name of its collaborator (which is “MSU” in this case) to every entry in the cache. Thus, IBM only needs to maintain one cache. When MSU updates its firewall policy and reruns the bootstrapping protocol, IBM needs to delete all the entries that corresponds to MSU. IBM can implement caches efficiently using hash tables or by counting Bloom filters [12].

158 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks 6.8.3 Decision Obfuscation vs. Decision Encryption In the bootstrapping protocol, to prevent MSU from knowing the decision of IBM’s packet, we have two choices, decision obfuscation and decision encryption. A decision obfuscation function is a one-to-one mapping function from the set of all decisions to the same set of all decisions. The idea of decision obfuscation is to apply such a function to the decisions of the rules sent from IBM to MSU in the bootstrapping protocol. The idea of decision encryption is to concatenate each decision dec with a number i and then encrypt the decision dec|i using another secret key K, i.e., (dec|i)K . Although decision encryption obfuscates the decisions well, it can be exploited. Using decision encryption, IBM can add more information to the decision and then encrypt it. In particular, if IBM attaches the identification number of a rule to each decision, IBM can identify the exact rule that matches the packet. This would help IBM compute MSU’s secret key K1 and further reveal the exact context of the rule. More formally, IBM can store the anonymized rules received from MSU in the bootstrapping protocol. These anonymized rules are in the form (m1 ⊕ K1, m2 ⊕ K1, · · · , m5 ⊕ K1) → dec (where each mi is a number). If IBM can identify the one rule that matches the packet (p1, p2, · · · , p5), IBM can generate prefix family for each attribute of the packet P F (pi) (1 ≤ i ≤ 5). Due to the property of prefix membership verification, there should be one number ni in each P F (pi) that is equal to mi. Therefore, IBM can compute MSU’s secret key K1 by applying XOR to the number mi ⊕ K1 using ni. If mi = ni , then mi ⊕ K1 ⊕ ni = K1. After finding K1, IBM can reveal the exact context of non-overlapping rules in MSU firewall policy by applying XOR to every number of anonymized rules using K1. Because of the above security problem, we use the decision obfuscation technique. 6.8.4 Special Treatment of IP Addresses As we discussed previously, for every packet from (or to) the IBM representative, IBM needs to translate the source (or destination) IP from an address in IBM’s domain to the address in MSU’s domain. Thus, for each packet that IBM asks MSU for a decision, either the source or the destination IP of the packet is the IP that MSU assigns to the IBM representative. This IP address is known to MSU, MSU can use it to compute the secret key K2 of IBM, which henceforth violates the packet privacy. Note that x ⊕ K2 ⊕ x = K2. To prevent this type of attacks, we modify VGuard as follows. First, MSU chooses five secret keys K1, K2, · · · , K5 that correspond to the five packet fields (source IP, destination IP, source port, destination port, and protocol type), and similarly IBM chooses five secret keys K1, K2, · · · , K5 as well. Second, in the bootstrapping protocol, for each non-overlapping rule (m1, m2, · · · , m5) → dec (where each mi is a number), MSU applies XOR to each mi using key Ki. Thus, the

6.8 Discussion 159 rule becomes (m1 ⊕ K1, m2 ⊕ K2, · · · , m5 ⊕ K5) → dec . Then, MSU sends these rules to IBM. For each rule (m1 ⊕ K1, m2 ⊕ K2, m3 ⊕ K3, m4 ⊕ K4, m5 ⊕ K5) → dec that IBM receives from MSU, assuming that m1 corresponds to the field of source IP and m2 corresponds to the field of destination IP, IBM creates the two rules: (m1 ⊕ K1, HMACk(m2 ⊕ K2 ⊕ K2), HMACk(m3 ⊕ K3 ⊕ K3), HMACk(m4 ⊕ K4 ⊕ K4), HMACk(m5 ⊕ K5 ⊕ K5)) → dec (HMACk(m1 ⊕ K1 ⊕ K1), m2 ⊕ K2, HMACk(m3 ⊕ K3 ⊕ K3), HMACk(m4 ⊕ K4 ⊕ K4), HMACk(m5 ⊕ K5 ⊕ K5)) → dec Basically, IBM keeps the source IP field unchanged in the first rule and keeps the destination IP field unchanged in the second rule. At last, IBM sends two sets of rules back to MSU, where in one set the source IP is unchanged and in the other set the destination IP is unchanged. Third, in the filtering protocol, when IBM receives a packet, without loss of generality assuming that the packet is originated from its representative, IBM applies XOR to four fields of the packet (destination IP, source port, destination port, and protocol type) using its four corresponding keys K2, K3, K4, and K5. In other words, when the source IP of the packet is the MSU IP address, IBM does not apply XOR to that field. When the resulting packet (p1, p2 ⊕ K2, p3 ⊕K3, p4 ⊕K4, p5 ⊕K5) is sent to the third party, IBM asks the third party to apply only XOR to p1 using key K1 and process p2⊕K2, p3⊕K3, p4⊕K4, p5⊕K5 as usual using keys K2, K3, K4 and K5 respectively. Then, the third party sends the resulting packet to MSU. Finally, MSU searches the decision for the packet in the rule set where the source IP field was not processed by IBM. Thus, leaving the source (or destination) IP field unprocessed by IBM, MSU cannot break any secret key of IBM. 6.8.5 Securing Keys of MSU In our VGuard, IBM only knows mi ⊕ Ki (1 ≤ i ≤ 5) but doesn’t know mi. However, if MSU’s firewall policy has popular values in a field, say the j -th field (1 ≤ j ≤ 5), IBM may reveal Kj by XORing these popular values with mj ⊕ Kj of all non-overlapping rules received from MSU. To prevent such attack, when MSU’s firewall has some popular values, MSU can XOR each popular value using a different key in the bootstrapping protocol. The reason is that for the j -th field, although IBM can obtain a set of values through the above attack, it cannot distinguish the keys of each popular value from that set. Similarly, for deep packet inspection, if MSU has some popular strings in its signature database, IBM may reveal MSU’s secret key. The strings that MSU needs

160 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks to keep confidential are the ones that are specific to MSU needs. However, if some strings are popular among signature databases, it is unlikely that MSU needs to keep them confidential. Thus, to prevent such attack, MSU can simply tell IBM these popular strings and ask IBM to check whether a packet payload matches these strings. 6.8.6 Stateful Firewalls So far we have assumed that firewalls are stateless. A stateful firewall is a firewall that keeps track of the state of network connections across the firewall. When a stateful firewall receives a packet, it first checks its connection table to see whether the packet belongs to an ongoing connection. If yes, the packet is permitted right away. If no, the packet needs to be checked with its stateless rules to determine whether the packet should be permitted; if the stateless rules allows the packet to be permitted, then a new connection is built and inserted into the connection table of the firewall. Such stateful firewalls typically allow inside non-server computers (i.e., the computers that are not servers) to initiate connection with outside computers but disallow outside computers to initiate connection with non-server computers. When an inside non-server computer sends a packet to an outside computer, the stateful firewall uses its stateless rules to decide whether the packet should be permitted; if yes, the firewall adds an entry in its connection table that will allow subsequent packets sent from that outside computer to the inside computer. When an outside computer sends a packet to an insider non-server computer, if there is no corresponding entry in the connection table, the firewall will use its stateless rules to make a decision, which is typically deny. Our framework can be extended to handle stateful firewalls. The basic idea is to let IBM maintain a connection table for its representative. If MSU’s firewall is stateful, in the extended framework, when IBM receives a packet from or to its representative, IBM first consults its connection table to see whether the packet belongs to an ongoing connection. If yes, the packet is accepted right away. If no, IBM asks MSU for the decision for this packet; if the packet is permitted, IBM adds an entry into the connection table. Note that the connection table is different from IBM’s decision cache. If MSU’s firewall is stateless, for every connection, with the help of cache, IBM needs to ask MSU the decision for two packets that go in exactly the opposite direction. If MSU’s firewall is stateful, with the help of the cache and the connection table, IBM only needs to ask MSU the decision for one packet, which is the one that initiates the connection.

6.8 Discussion 161 01100 F1 00100 01100 F1 00100 F2 F2 F2 F2 00100 01110 00110 00100 01110 00110 dummy ad a ad ad (b) (a) Fig. 6.13 An example of statistical analysis attack 6.8.7 Statistical Analysis Attack and Countermeasures MSU could launch a statistical analysis attack to reduce the number of possible rules that a packet matches. This attack works as follows. After non-overlapping rule generation, MSU calculates the frequency for each number in the rules. The frequency of each number is preserved in processing the policy by MSU and IBM in the bootstrapping protocol. MSU could exploit such frequency information to reduce the number of possible rules that an IBM packet can match. For example, considering the numericalized FDD in Fig. 6.13a, the frequency of the number 01100 in the generated non-overlapping rules is 2, and none of the other numbers have the same frequency. Thus, MSU can identify which rules received from IBM correspond to the left branch of the FDD. Interestingly, MSU cannot correlate any rule received from IBM with the right most decision path of this FDD because of the use of dummy rules. The statistical analysis attack is based on the assumption that the frequency of each number remains the same before and after IBM’s processing. Actually, to prevent statistical analysis attacks, IBM can also make a statistical analysis of the hashed rules and add some dummy rules to disturb the statistical properties of the rules. Taking the example numericalized FDD in Fig. 6.13a, IBM can easily destroy the frequency information of the number in the FDD by adding a dummy number to the right F2 node as shown in Fig. 6.13b. Basically, IBM generates dummy rules to make all the numbers for each field have the same frequency. This will prevent MSU from launching statistical analysis attacks. Recall that dummy rules use out-of-range numbers and they cannot be matched by any packet. 6.8.8 Hash Collision The chance of having hash collisions for HMAC is extremely small. However, to be on the safe side, we propose the following solution to the problem. Our solution is based on the observation that the bootstrapping protocol in our framework can detect hash collisions easily. Recall that the bootstrapping protocol converts firewall

162 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks policies to non-overlapping rules. If hash collision happens, then among the rules that MSU receives from IBM, there exist at least two rules that are exactly the same. This fact can be used by MSU to easily detect whether hash collision happens. In the case that hash collision does happen, MSU and IBM can simply rerun the bootstrapping protocol, in which they will choose different secret keys and henceforth the hash collision is most likely removed. 6.9 Related Work 6.9.1 Secure Function Evaluation Secure Function Evaluation (SFE) was first introduced by Yao with the famous “Two-Millionaire Problem” [32]. A secure function evaluation protocol enables two parties, one with input x and the other with y, to collaboratively compute a function f (x, y) without disclosing one party’s input to the other. The classical solutions for SFE are Yao’s “garbled circuits” protocol [33] and Goldreich’s protocol [23]. The method provided by Yao has a computational cost of O(2b), where b is the number of bits needed to encode x and y. Later, the Secure Function Evaluation problem was generalized to the Secure Multiparty Computation (SMC) problem. Chaum et al. proved that any multiparty protocol problem can be solved if there is an authenticated secrecy channel between every pair of participants [5]. Zero- knowledge protocols [3, 10, 22] also aims to provide the privacy between two parties. A zero knowledge protocol is an interactive method for one party (suppose IBM) to prove to another (suppose MSU) that a statement is true without revealing anything other than the veracity of the statement. Although we could use SFE or SMC solutions to solve this problem as well as the problem of checking whether a value is in a range, the O(2b) complexity makes such solutions infeasible. Li et al. proposed Oblivious Attribute Certificates (OACerts) [20]. In OACerts, a credential holder uses his attributes in an oblivious manner to access resources if and only if the attributes in his credential satisfy the server’s policy, and the server does not learn anything about the attribute values of the credential holder, no matter whether the values satisfy the policy or not. Our framework VGuard differs from Li’s scheme in many ways. First, the computational cost of VGuard is much lower than that of Li’s scheme because VGuard is based on efficient XOR and HMAC functions and Li’s scheme is based on expensive PKI operations. Second, the communication cost of VGuard is much lower than that of Li’s scheme because VGuard uses one round of message exchange for processing one packet and Li’s scheme only compares one bit in each round of communication for evaluating greater or equal and less or equal functions.

6.9 Related Work 163 6.9.2 CDCF Framework Prior work that is closest to ours is the Cross-Domain Cooperative Firewall (CDCF) framework [7]. The CDCF framework also consists of a bootstrapping protocol and a filtering protocol. In CDCF bootstrapping protocol, for each firewall rule specified as F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd where each Si is a range, MSU first converts each range Si to a set of prefixes and then converts each prefix to a concrete number using their numericalization scheme. Second, MSU and IBM use a commutative encryption function (such as the Pohlig-Hellman Exponentiation Cipher [25] and Secure RPC Authentication (SRA) [17]) to encrypt each number generated from the prefixes with MSU’s secret key K1 and IBM’s secret key K2, respectively. A commutative encryption function satisfies the following four properties, where K1 and K2 are two secret keys: (a) For any x and K, given x and (x)K , it is computationally infeasible to compute the value of K. (b) For any x, K1, K2, we have ((x)K1 )K2 = ((x)K2 )K1 . (c) For any x, y, and K, if x = y, then we have (x)K = (y)K . (d) For any x and K, given K, (x)K can be decrypted in polynomial time. Finally, MSU stores the resulting firewall rules after commutative encryption. In the filtering protocol of CDCF, for each packet p1, p2, · · · , pd , IBM first computes the prefix family P F (pi) (1 ≤ i ≤ d) and then converts each prefix in P F (pi) to a number. Second, for each number y, IBM computes (y)K2 using commutative encryption. Third, IBM sends d sequences of numericalized and encrypted prefixes to MSU. Upon receiving the d sequences, for each number (y)K2 , MSU computes ((y)K2 )K1 . Finally, based on the properties of prefix membership verification and commutative encryption, MSU checks which rule that matches the received packet and hence finds the decision for the packet. There are five major differences between VGuard framework and CDCF frame- work. (1) The computation and communication costs of CDCF is much more expensive than those of VGuard because of the following two reasons. First, to achieve oblivious comparison, VGuard uses the Xhash protocol and CDCF uses com- mutative encryption functions, which are computationally expensive. Second, VGuard uses firewall decision diagrams to speed up the processing of packets, while CDCF uses the straightforward sequential search. (2) CDCF allows MSU to discover the original firewall rule that a packet matches, which jeopardizes packet privacy. This is particularly dangerous if the matching rule is a singleton rule, which will allow MSU to immediately know the corresponding value in the matching packet. In comparison, VGuard does not allow MSU to discover the original firewall rule that a packet matches. The key operation in VGuard is that it converts the original firewall rules to non- overlapping rules, which enables IBM to shuffle the rules. Thus, MSU cannot reveal the correspondence between the original rules and the received rules from IBM. Because a firewall policy follows first-match semantics, without such a conversion, IBM cannot disturb the order among rules in CDCF.

164 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks (3) CDCF allows MSU to know the decision for each IBM’s packet, while VGuard does not. Knowing both the original rules and the decision of a packet p, MSU could guess what packets that p could be. (4) CDCF does not perform the address translation that we discussed in Sect. 6.6.1, which could render MSU’s firewall policy ineffective for IBM’s packets. However, the address translation procedure could be easily added to CDCF. (5) CDCF is vulnerable to a type of attacks that we call selective policy updating attacks. Such attacks allow MSU to quickly discover the field values in a packet in the following manner. When MSU receives a double encrypted packet p, assuming p matches the prefix rule F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd → decision , MSU splits each prefix Si into two prefixes by instantiating the first ∗ by 0 and 1 respectively, and therefore converts the rule to a maximum of 2d rules. For example, suppose a packet p from IBM matches the prefix rule F1 ∈ 10 ∗ ∗ ∧ F2 ∈ 001∗ → a. Then, MSU splits the rule into the following four rules: F1 ∈ 100 ∗ ∧F2 ∈ 0010 → a, F1 ∈ 101 ∗ ∧F2 ∈ 0010 → a, F1 ∈ 100 ∗ ∧F2 ∈ 0011 → a, F1 ∈ 101 ∗ ∧F2 ∈ 0011 → a. Then, MSU requests to rerun the CDCF protocol due to firewall update. In the new run of CDCF, MSU replaces the above rule by the rules after splitting. (Note that d is typically 4 or 5.) After MSU receives the double encrypted rules from IBM in the new run, MSU compares p with the split rules again. One of the split rules must match p. The above process repeats with a maximum of 32 times before p matches a singleton rule at the end. To counter the selective policy updating attacks, CDCF can be fixed by updating the secret keys on both MSU and IBM sides in each run of the CDCF protocol. (6) CDCF does not support deep packet inspection while VGuard supports it. 6.9.3 Secure Queries Secure queries in outsourced database systems have been studied in prior work (e.g., [1, 16, 18]). These work aims to design a scheme for querying encrypted data in the outsourced database system where sensitive data are outsourced to an untrusted server. Later, researchers extended these work to securely process range queries in two-tiered sensor networks [6, 28, 29, 34], where storage nodes gather data from nearby sensors and answer queries from the sink. They proposed different schemes for preventing compromised storage nodes from gaining information from the sensitive data and forging query results to the sink. These schemes cannot be directly applied to the problem in this chapter for two reasons. First, both outsourced database systems and two-tiered sensor networks only deal with one untrusted party, i.e., untrusted servers or untrusted storage nodes, while in this chapter two parties MSU and IBM don’t trust each other. Second, firewall policies are different from the data stored in database or collected by sensors in terms of both structure and semantics.

6.10 Experimental Results 165 Some prior work (e.g., [13, 30]) has investigated keyword searching on encrypted data, where a keyword is a word of a natural language and data are text. 6.10 Experimental Results In this section, we evaluate the performance of our schemes on both real-life and synthetic firewall policies. In particular, we implemented our schemes without and with adding dummy rules. For ease presentation, we use VGuard and VGuard+ to denote our schemes without and with adding dummy rules, respectively. Then, we compared VGuard, VGuard+, and CDCF, side by side. We implemented VGuard, VGuard+, and CDCF using Java 1.6.3. Our experiments were carried out on a desktop PC running Windows XP SP2 with 3G memory and dual 3.4 GHz Intel Pentium processors. On real-life firewall policies, for processing packets, our experimental results show that VGuard is 552 times faster than CDCF on MSU side and 5035 times faster than CDCF on IBM side; VGuard+ is 544 times faster than CDCF on MSU side and 5021 times faster than CDCF on IBM side. On synthetic firewall policies, for processing packets, our experimental results show that VGuard is 252 times faster than CDCF on MSU side and 5529 times faster than CDCF on IBM side; VGuard+ is 248 times faster than CDCF on MSU side and 5513 times faster than CDCF on IBM side. 6.10.1 Efficiency on Real-Life Firewall Policies We conducted experiments on 16 real-life firewall policies that we collected from a variety of sources. Each firewall examines five packet fields of source IP, destination IP, source port, destination port, and protocol type. The number of rules ranges from dozens to hundreds. We measured the computational cost of the two parties MSU and IBM for both bootstrapping and filtering protocols. We also measured the communication overhead for both bootstrapping and filtering protocols. For fair comparison, in implementing CDCF, we used the same parameters as in [7], i.e., we used the Pohlig-Hellman algorithm [25] with a 1024-bit prime modulus and 160-bit encryption keys. In implementing VGuard and VGuard+, we chose the HMAC- MD5 hash function with 128-bit keys. Figure 6.14 shows the communication overhead in the bootstrapping protocol for VGuard, VGuard+, and CDCF. We observe that the communication overhead in the bootstrapping protocol of VGuard is similar as that of CDCF for all firewalls. We have two explanations. First, CDCF needs to convert the range to prefix numbers for every field in the firewall rules. Recall that the minimum number of prefixes for representing an integer [a, b], where a and b are two numbers of w bits, is at most 2w − 2. For example, the minimum number of prefixes for representing the range of IP address that has 32 bits is at most 62. Thus, the number of prefix rules converted

166 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks Fig. 6.14 Communication Communication overhead (KB) 4 x 104 overhead in the bootstrapping VGuard protocol VGuard+ 3 CDCF 2 1 0 1 2 3 4 5 6 7 8 9 10111213141516 Real−life firewalls from a firewall rule can be at most 62 × 62 × 30 × 30 × 14 = 48,434,400. Second, every prefix number will be encrypted to be 1024 bits in CDCF instead of 128 bits in VGuard. Therefore, even though the ratio between the number of non-overlapping firewall rules and the number of original firewall rules is large (i.e., in the 16 real- life firewall policies, the average ratio between the number of non-overlapping firewall rules and the number of original firewall rules is 887), the communication overhead in the bootstrapping protocol of VGuard is still similar as that of CDCF. We also observe that on average, the communication overhead in the bootstrapping protocol of VGuard+ is 15% higher than that of VGuard because of adding dummy rules. In the filtering protocol, the only difference between CDCF and VGuard is that the encryption method is different. CDCF applies double encryption to every prefix number, and VGuard applies Xhash to every prefix number. Therefore, in the filtering protocol, the overhead of CDCF is 1024/128=8 times higher than that of VGuard. Similarly, in the filtering protocol, the overhead of CDCF is 8 times higher than that of VGuard+. Figure 6.15a and b show the computational cost of MSU and IBM respectively in the bootstrapping protocol for VGuard, VGuard+, and CDCF. We observe that the bootstrapping costs of VGuard and VGuard+ are lower than that of CDCF for most firewalls. Although the Xhash scheme is three orders of magnitude faster than the commutative encryption scheme, the bootstrapping costs of VGuard and VGuard+ are not three orders of magnitude lower than CDCF because VGuard and VGuard+ convert the given firewall policy to non-overlapping prefix rules, which result in a significant expansion. Note that the bootstrapping protocol only needs to run once between MSU and IBM unless MSU updates its firewall policy. The performance of the bootstrapping protocol is less critical than that of the filtering protocol. Figure 6.16a and b show the computational cost of MSU and IBM respectively in the filtering protocol for VGuard, VGuard+, and CDCF. Note that the vertical axis of these two figures are in a logarithmic scale. We observe that the filtering costs of VGuard and VGuard+ are significantly lower than that of CDCF. On average, VGuard is 552 times faster than CDCF on MSU side and 5035 times faster than CDCF on IBM side; VGuard+ is 544 times faster than CDCF on MSU side and

6.10 Experimental Results 167 40 40 VGuard VGuard VGuard+ VGuard+ 30 CDCF 30 CDCF Time(s) Time(s) 20 20 10 10 0 1 2 3 4 5 6 7 8 9 10111213141516 0 1 2 3 4 5 6 7 8 9 10111213141516 Real−life firewalls Real−life firewalls (a) (b) Fig. 6.15 The bootstrapping time on real-life firewalls. (a) MSU. (b) IBM 5021 times faster than CDCF on IBM side. Note that the packet processing time for CDCF on both MSU and IBM side in these two figures seems constant, instead of increasing as the number of rules increases. This is because in processing each packet on both MSU and IBM side, for CDCF, the encryption time, which is roughly constant for each packet, is about 20 times more than the time for performing a linear search in the firewall. 6.10.2 Efficiency on Synthetic Firewall Policies Firewall policies are considered confidential due to security concerns. It is difficult to get a large number of real-life firewall policies to experiment with. To further evaluate the performance of VGuard and VGuard+ in comparison with CDCF, we generated a large number of synthetic firewall policies and conducted experiments on them. Every predicate of a rule in our synthetic firewall has five fields: source IP address, destination IP address, source port number, destination port number, and protocol type. We first randomly generated a list of values for each field. For IP addresses, we generated a random class C address then generated single IP addresses within the class C addresses; for ports we generated a random range; for protocols, we choose either TCP, UDP, or ICMP. Every field also has the “*” value included in the list. We then generated a list of predicates by taking the cross product of these five lists and randomly selected from the cross product until we reached our desired classifier size by including a final default predicate. Finally, we randomly assigned one of two decisions, accept or discard, to each predicate to make a complete rule. We generated firewall policies with the number of rules ranging from 100 to 1000, where for each number we generated ten synthetic firewall policies. Figure 6.17a and b show the computational cost of MSU and IBM in the bootstrapping protocol for VGuard, VGuard+, and CDCF. Figure 6.18a and b show the computational cost of MSU and IBM respectively in the filtering protocol for

168 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks VGuard VGuard 103 VGuard+ 103 VGuard+ CDCF CDCF 102 Time(ms) Time(ms) 102 101 101 100 100 10−1 1 2 3 4 5 6 7 8 9 10111213141516 10−1 Real−life firewalls 10−2 1 2 3 4 5 6 7 8 9 10111213141516 (a) Real−life firewalls (b) Fig. 6.16 The filtering time on real-life firewalls. (a) MSU. (b) IBM 400 400 VGuard VGuard VGuard+ VGuard+ 300 CDCF 300 CDCF Time(s) 200 200 Time(s) 100 100 0 0 100 200 300 400 500 600 700 800 9001000 100 200 300 400 500 600 700 800 9001000 Number of rules Number of rules (a) (b) Fig. 6.17 The bootstrapping time on synthetic firewalls. (a) MSU. (b) IBM Time(ms) VGuard Time(ms) VGuard 103 VGuard+ 103 VGuard+ CDCF CDCF 102 102 101 101 100 100 10−1 10−1 100 200 300 400 500 600 700 800 9001000 10−2 Number of rules 100 200 300 400 500 600 700 800 9001000 (a) Number of rules (b) Fig. 6.18 The filtering time on synthetic firewalls. (a) MSU. (b) IBM

References 169 VGuard, VGuard+, and CDCF. On average, for these synthetic firewall policies, VGuard is 252 times faster than CDCF on the MSU side and 5529 times faster than CDCF on the IBM side; VGuard+ is 248 times faster than CDCF on MSU side and 5513 times faster than CDCF on IBM side. 6.11 Concluding Remarks In this chapter, we propose VGuard, a privacy preserving framework for col- laborative enforcement of firewall policies. In terms of security, compared with the state-of-the-art CDCF scheme, VGuard is more secure because of two major reasons. First, VGuard converts a firewall policy of an ordered list of overlapping rules to an equivalent non-ordered set of non-overlapping rules, which enables rule shuffling and consequently MSU cannot identify which original rule matches the given packet. Second, VGuard obfuscates rule decisions, which prevents MSU from knowing the decision for the given packet. In terms of efficiency, compared with the state-of-the-art CDCF scheme, VGuard is hundreds of times faster than CDCF in processing packets because of two reasons. First, VGuard uses a new oblivious comparison scheme proposed in this chapter, which is three orders of magnitude faster than the commutative encryption scheme used in CDCF. Second, VGuard uses firewall decision diagrams for processing packets, which is much faster than the linear search used in CDCF. We want to emphasize that the VGuard framework can be applied to other types of security policies as well. It is also worth noting that the Xhash scheme can be used for other applications that require oblivious comparison. References 1. R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, Order preserving encryption for numeric data, in Proc. ACM Int. Conf. on Management of Data (SIGMOD) (2004), pp. 563–574 2. A.V. Aho, M.J. Corasick, Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975) 3. F. Boudot, Efficient proofs that a committed number lies in an interval, in Proc. Advances in Cryptology (EUROCRYPT). Lecture Notes in Computer Science, vol. 1807, May 2000 4. Y.-K. Chang, Fast binary and multiway prefix searches for packet forwarding. Comput. Netw. 51(3), 588–605 (2007) 5. D. Chaum, C. Crepeau, I. Damgard, Multiparty unconditionally secure protocols, in Proc. ACM Symposium on Theory of Computing (1988), pp. 11–19 6. F. Chen, A.X. Liu, SafeQ: secure and efficient query processing in sensor networks, in Proc. IEEE Int. Conf. on Computer Communications (INFOCOM) (2010) 7. J. Cheng, H. Yang, S.H.Y. Wong, S. Lu, Design and implementation of cross-domain cooperative firewall, in Proc. IEEE Int. Conf. on Network Protocols (ICNP) (2007) 8. Cisco ios ips deployment guide, www.cisco.com 9. B. Commentz-Walter, A string matching algorithm fast on the average, in Proc. Colloquium, on Automata, Languages and Programming (1979), pp. 118–132

170 6 Collaborative Enforcement of Firewall Policies in Virtual Private Networks 10. R. Cramer, M.K. Franklin, B. Schoenmarks, M. Yung, Multi-authority secret-ballot elections with linear work, in Proc. Advances in Cryptology (EUROCRYPT). Lecture Notes in computer Science, vol. 1070 (1996) 11. D. Eastlake, P. Jones, Us secure hash algorithm 1 (SHA1). RFC 3174, 2001 12. L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol, in Proc. ACM SIGCOMM, Sept 1998 13. P. Golle, J. Staddon, B. Waters, Secure conjunctive keyword search over encrypted data, in Proc. Int. Conf. on Applied Cryptography and Network Security (ACNS) (2004), pp. 31–45 14. M.G. Gouda, A.X. Liu, Structured firewall design. Comput. Netw. J. 51(4), 1106–1120 (2007) 15. P. Gupta, N. McKeown, Algorithms for packet classification. IEEE Netw. 15(2), 24–32 (2001) 16. H. Hacigümüs¸, B. Iyer, C. Li, S. Mehrotra, Executing SQL over encrypted data in the database- service-provider model, in Proc. ACM Int. Conf. on Management of Data (SIGMOD) (2002), pp. 216–227 17. D.K. Hess, D.R. Safford, D.L. Schales, Secure RPC authentication (SRA) for TELNET and FTP. Technical report, 1993 18. B. Hore, S. Mehrotra, G. Tsudik, A privacy-preserving index for range queries, in Proc. Int. Conf. on Very Large Data (VLDB) (2004), pp. 720–731 19. H. Krawczyk, M. Bellare, R. Canetti, HMAC: Keyed-hashing for message authentication. RFC 2104, 1997 20. J. Li, N. Li, OACerts: oblivious attribute certificates, in Proc. Conf. on Applied Cryptography and Network Security (ACNS), June 2005, pp. 301–317 21. A.X. Liu, M.G. Gouda, Diverse firewall design, in Proc. Int. Conf. on Dependable Systems and Networks (DSN), June 2004, pp. 595–604 22. W. Mao, Guaranteed correct sharing of integer factorization with off-line shareholders, in Proc. Public Key Cryptography (PKC). Lecture Notes in Computer Science, vol. 1431, Feb 1998 23. S. Micali, O. Goldreich, A. Wigderson, How to play any mental game, in Proc. ACM Conf. on Theory of Computing (1987) 24. V. Paxson, Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23– 24), 2435–2463 (1999) 25. S.C. Pohlig, M.E. Hellman, An improved algorithm for computing logarithms over GF(p) and its cryptographic significance. IEEE Trans. Inf. Syst. Secur. 24, 106–110 (1978) 26. R. Rivest, The MD5 message-digest algorithm. RFC 1321 (1992) 27. M. Roesch, Snort: lightweight intrusion detection for networks, in Proc. Systems Administra- tion Conference (LISA), USENIX Association (1999), pp. 229–238 28. B. Sheng, Q. Li, Verifiable privacy-preserving range query in two-tiered sensor networks, in Proc. IEEE Int. Conf. on Computer Communications (INFOCOM) (2008), pp. 46–50 29. J. Shi, R. Zhang, Y. Zhang, Secure range queries in tiered sensor networks, in Proc. IEEE Int. Conf. on Computer Communications (INFOCOM) (2009) 30. D.X. Song, D. Wagner, A. Perrig, Practical techniques for searches on encrypted data, in Proc. IEEE Symposium on Security and Privacy (2000) 31. Tippingpoint x505, www.tippingpoint.comproducts_ips.html 32. A.C. Yao, Protocols for secure computations, in Proc. IEEE Symposium on the Foundations of Computer Science (FOCS) (1982), pp. 160–164 33. A.C. Yao, How to generate and exchange secrets, in Proc. IEEE Symposium on Foundations of Computer Science (FOCS) (1986), pp. 162–167 34. R. Zhang, J. Shi, Y. Zhang, Secure multidimensional range queries in sensor networks, in Proc. ACM Int. Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc) (2009)

Chapter 7 Privacy Preserving Quantification of Cross-Domain Network Reachability 7.1 Introduction 7.1.1 Background and Motivation Network reachability for a given network path from the source subnet to the destina- tion subnet is defined as the set of packets that are allowed by all network devices on the path. Network reachability quantification is important for understanding end-to- end network behavior and detecting the violation of security policies. Several critical concerns like router misconfiguration, policy violations and service availability can be verified through an accurate quantification. Quantifying network reachability is a difficult and challenging problem for two reasons. First, various complex mechanisms, such as Access Control Lists (ACLs), dynamic routing, and network address translation (NAT), have been deployed on network devices for restricting network reachability. To perform an accurate analysis, administrators need to collect all the reachability restriction information from these network devices. However, collecting such information is very difficult due to the privacy and security concerns. Second, the explosion of the Internet has caused an increase in the complexity and sophistication of these devices, thus, making reachability analysis computationally expensive and error-prone. The current practice of reachability management is still “trial and error” due to the lack of network reachability analysis and quantification tools. This approach leads to significant number of configuration errors and has been shown to be the major cause of failure for Internet services [28]. Industry research also shows that a significant percentage of human effort and monetary resources are employed in maintaining the operational status of the network. Reports place the revenue losses per hour for media, banking and brokerage industry to 1.2, 2.6 and 4.5 million dollars, respectively [19]. Several critical business applications and sen- sitive communications are severely affected due to network outages caused by © The Editor(s) (if applicable) and The Author(s), under exclusive license to 171 Springer Nature Switzerland AG 2021 A. X. Liu, R. Li, Algorithms for Data and Computation Privacy, https://doi.org/10.1007/978-3-030-58896-0_7

172 7 Privacy Preserving Quantification of Cross-Domain Network Reachability misconfiguration errors. These events place a tremendous amount of pressure on network operators to debug the problems quickly. Thus, systematic analysis, and, tools for quantification of network reachability are needed for understanding end- to-end network behavior and detecting configuration errors. 7.1.2 Limitation of Prior Art The current practice of determining network reachability through probing has two major drawbacks. First, probing is expensive because it needs to generate and send a significant amount of probe packets. Second, probing is inaccurate, e.g., it cannot probe the open ports with no server listening on them. Due to these drawbacks, many approaches were proposed to address the reachability problem [2, 18, 26, 27, 33, 35, 36]. The main assumption in all these approaches is that the reachability restriction information of each network device and configuration state are known to a central network analyst, who is quantifying the network reachability. However, in practice, it is common that the network devices along a given path belong to different domains where the reachability restriction information cannot be shared with others including the network analyst. Figure 7.1 shows a typical scenario of network reachability, where User1 wants to know what packets he can send to User2 through the given path. The network devices deployed along this path belong to three different parties, i.e., S1 and FW1 belong to Subnet1, FW2, FW3, and R1 belong to ISP, FW4 and S2 belong to Subnet2. Keeping the reachability restriction information private is important for two reasons. First, such information is often mis-configured and has security holes that can be exploited by attackers if it is disclosed. In reality, most firewall policies have security holes [34]. Disclosing ACLs allows attackers to analyze and utilize the vulnerabilities of subnets along a given path. For example, if ACLs along a path from Subnet1 to Subnet2 do not block some worm traffic, attackers can break into Subnet2 from Subnet1. In practice, neither ISPs nor private networks disclose their ACLs. Second, the reachability restriction information of a network device contains private information, e.g., the ACLs of a network device contain the IP Fig. 7.1 An example of end-to-end network reachability

7.1 Introduction 173 addresses of servers, which can be used by attacker to launch more targeted attacks. In practice, even within an organization, often no employees other than the firewall administrators are allowed to access their firewall policies. Hence, safe-guarding the privacy of network devices is a critical requirement in network reachability quantification. 7.1.3 Cross-Domain Quantification of Reachability To our best knowledge, no prior work has addressed the problem of privacy- preserving network reachability quantification. We propose the first privacy- preserving protocol for quantifying network reachability for a given network path across multiple domains. First, for the network devices belonging to each party, we convert the reachability restriction information of these devices to an access control list (ACL) by leveraging an existing network reachability quantification tool [26]. This tool takes as input the reachability restriction information, including ACLs, all possible network transforms (e.g., NAT and PAT), and protocol states (e.g., connection-oriented and state-less), and outputs an ACL. Note that ACLs are actually the most important security components for network devices, such as network routers, firewalls, and layer 3 switches, to filter the network traffic. The network devices in each party may include multiple ACLs and also other reachability restriction information. Considering the example in Fig. 7.1, Fig. 7.2 shows the three resulting ACLs, A1, A2, and A3, for Subnet1, ISP, and Subnet2, respectively. For ease of presentation, in the rest of this chapter, we use “ACL” to denote the resulting ACL converted from multiple ACLs as well as other reachability restriction information in one party. Second, we calculate the set of packets that are accepted by all the resulting ACLs on a given network path in a privacy-preserving manner. Typically, an ACL consists of multiple rules and each rule has several fields, such as source and destination IP addresses e.g., with an accept or discard decision. Given this, the network reachability quantification requires the comparison of the rules of adjacent ACLs. However, a direct comparison of the ACL rules is complex and error-prone as there may be several overlapping rules which require further analysis. For example, a single rule in one ACL may overlap with multiple rules in another ACL. Determining the exact nature of the overlap is important for accurate quantification of network reachability. Our proposed cross-domain quantification approach of network reachability can be very useful for many applications. We illustrate this using two example scenarios. First, a global view of the network reachability can help Internet service providers (ISPs) to define better QoS policies to improve traffic management. For example, the knowledge of the different paths through which a particular type of traffic is allowed by the ACLs can help the ISPs to maintain a rated list of the best-quality paths in case of path failures. Second, since network reachability is crucial for many companies that provide their services over the Internet, performing a privacy-

174 7 Privacy Preserving Quantification of Cross-Domain Network Reachability Fig. 7.2 Three resulting ACLs converted from Fig. 7.1 preserving computation of the network reachability could become a new business for the ISPs and other parties that involve in this computation. The ISPs can answer the reachability queries of these companies using this global knowledge and even provide some information regarding the quality of various paths. Note that ACLs (also called stateless firewalls), stateful firewalls, and Intrusion Prevention Systems (IPSes) are three different mechanisms for packet filtering. ACLs are stateless by definition as they make the decision for accepting or discarding a packet based on the packet itself and do not consider the packets that they see in the past. They mainly check the well known five tuple of the packet header: source IP address, destination IP address, source port number, destination port number, and protocol type and make a decision regarding the packet. Stateful firewalls make the decision for accepting or discarding a packet based on both the current packet and the packets that they have seen in the past. They consider previous packets by checking whether the connection has been established and allow only those packets which have a matching connection in the past. Intrusion Prevention Systems (IPSes) make decision on accepting or discarding packets based on packet payload. They check packet payload mainly by performing regular expression matching where the regular expressions correspond to malicious packet signatures. The scope of this chapter is on network reachability based on packet headers. Thus, the technical discussion of this chapter is primarily based on ACLs. We will also discuss how to adapt our solutions to deal with stateful firewalls in Sect. 7.6. 7.1.4 Technical Challenges There are four key challenges in the privacy preserving quantification of network reachability. (1) It is computationally expensive. An ACL may consist of many rules, and each rule consists of multiple fields. Therefore, comparing multiple ACLs with a large number of rules can be quite expensive, even if only a few ACLs are involved in the process. Furthermore, the complexity of comparison can be expensive due to overlapping rules resulting in many comparisons. (2) Protecting the privacy of the ACL rules is crucial. Since a rule has to be sent to other parties to enable comparison, it is necessary to propose a protocol that will not reveal the rule but still allows the different ACLs to calculate the intersection. (3) Communication cost is high as

7.1 Introduction 175 even calculating the intersection of a small number of ACLs is a tedious process and requires a number of messages to be exchanged among different parties. (4) Computing the reachability information when ACLs are updated is an important performance related issue. It is necessary to explore optimization approaches in such scenarios without sacrificing the privacy of individual ACLs. 7.1.5 Our Approach We propose the first cross-domain privacy-preserving protocol for quantifying network reachability. We note that, the domains could be connected through multiple ISPs or they could be independently administered domains within the same ISP. We consider n ACLs, A1, · · · , An, n ≥ 2, in a given network path where each ACL belongs to a distinct party. Our protocol consists of three phases: ACL preprocessing, ACL encoding and encryption, and ACL comparison. In the first phase, we transform all the ACLs into an equivalent representation, Firewall Decision Diagram (FDD) [15, 16], and then extract the non-overlapping rules with accept decisions. In the second phase, to perform privacy preserving comparison, we reduce the problem to that of computing privacy-preserving intersection of two numerical ranges. Accordingly, we first transform the rules, which are represented as ranges, into a sequence of prefix numbers and then encrypt these numbers with secret keys of different parties. This phase enables different parties to compute the intersection of non-overlapping rules in their ACLs without revealing these rules. In the third phase, the destination ACL computes the intersection of its non- overlapping rules with the rules from its adjacent ACL, and then the adjacent ACL further repeats this computation with its adjacent ACL until the source ACL is reached. Finally, all the ACLs collaboratively decrypt the encrypted intersection of the non-overlapping rules, but only the first party (with the source ACL) obtains the result. To reduce the computation and communication cost, we use the divide-and- conquer strategy to divide the problem of computing reachability of n ACLs to the problem of computing reachability of three ACLs. The initial intersection is performed among the rules of three adjacent ACLs that are located in a sequence along the network path. Subsequent comparisons are grouped in a similar manner, i.e., the intersection of three ACLs can be treated as a new ACL and the process is repeated among three new ACLs. This optimization technique reduces the number of ACL encryptions and the number of messages in our protocol from O(n2) to O(n). To handle the issue of changes to ACL rules, we analyze various plausible scenarios and use the approach from [8] to localize the reachability computation. We show that our approach is effective in reducing reachability computation across the entire network in the event of such ACL updates.

176 7 Privacy Preserving Quantification of Cross-Domain Network Reachability 7.1.6 Summary of Experimental Results We performed extensive experiments over real and synthetic ACLs. Our experi- mental results show that the core operation of our protocol is efficient and suitable for real applications. The on-line processing time of an ACL with thousands of rules is less than 25 s and the comparison time of two ACLs is less than 5 s. The communication cost between two ACLs with thousands of rules is less than 2100 KB. 7.1.7 Key Contributions The following are our key contributions. (1) We propose the first cross-domain privacy-preserving protocol to quantify network reachability across multiple domains. Our protocol can accurately compute the intersection of the rules among the ACLs along a given network path without the need to share these rules across those domains. This is the first step towards privacy-preserving quantification of network reachability and it can be extended to other network metric measurements that are sensitive in nature. Furthermore, we propose an optimization technique to reduce computation and communication cost of our protocol. It reduces the number of ACL encryptions and the number of messages from O(n2) to O(n). (2) We describe an approach to handle ACL updates in an effective manner. (3) We conducted extensive experiments on both real and synthetic ACLs and the results show that our protocol is efficient and suitable for real applications. 7.2 Related Work 7.2.1 Network Reachability Several existing approaches [2, 18, 26, 27, 33, 35, 36] have studied the problem of quantifying network reachability. The key challenges in network reachability include, misconfiguration of ACLs, changes of routing policies, and link fail- ures, which could prevent accessibility to essential network services. To estimate reachability, existing approaches analyze ACLs while considering other critical parameters like dynamic routing policies, packet transforms, and variations in protocol operations. To estimate the bounds on reachability, Xie et al. [35] defined union and intersection operations over ACLs while taking into account the routing decisions, packet transforms, and link failures. This approach, however, over approximates and does not yield exact bounds. Ingols et al. [18] used Binary Decision Diagrams (BDDs) to reduce the complexity of handling ACLs and to estimate reachability more accurately. Matousek et al. [27] described a formal model

7.2 Related Work 177 using Interval Decision Diagrams (IDDs) to analyze network reachability under all possible network failure conditions. However, the approach is not scalable as it performs an exhaustive evaluation of failure scenarios that may or may not occur. Al-Shaer et al. [2] proposed a more accurate model using BDDs. They applied symbolic model checking techniques on properties specified in computation tree logic (CTL) [11], to verify reachability across the network for any given packet. Sung et al. [33] studied the effect of reachability constraints on class-of-service flows, where the packets are subjected to an additional constraint based on their class-of-service. Liu et al. [26] used Firewall Decision Diagrams (FDDs) to quantify reachability while considering all possible network transforms like NAT, PAT as well as protocol states like connection-oriented, state-less and so on. While most solutions operate on static configurations, some work has been proposed for estimating reachability in an on-line manner. Bandhakavi et al. [3] analyzed the network reachability using a simulated routing environment, i.e., they constructed a routing graph which represents the possible paths that could be taken by routing advertisements under the current router configurations. Analysis of this graph helps to identify violations in security policies and in verifying reachability. Zhang et al. [36] described a real-time monitoring and verification system for network reachability. A monitoring software runs on all the routers and collects up-to-date ACL and forwarding state information, which enables the network administrator to determine instantaneous reachability between any source destination pair. Several other works have been proposed to reduce the complexity of managing networks and to verify network configurations. Casado et al. [5] described a novel architecture for enterprise, Secure Architecture for the Networked Enter- prise (SANE), which comprises of a centralized authentication server that allows authorized users to access services. In SANE, the ACLs can be specified in a natural way so as to capture the semantics clearly. Le et al. [21] used data mining techniques to analyze security policies and to detect possible mis-configurations in the policies. They considered the notion of association rule mining to extract usable safe configurations of routers and detect anomalies in other routers using the extracted patterns. Benson et al. [4] described complexity metrics to evaluate relative complexity among alternate network designs. The metrics allow network operators to compare configurations with standard configurations and identify errors. All these approaches are based on the same assumption, that, there is a central network analyst who has the complete knowledge of the network configuration and other critical information. However, in reality, this assumption is not true for a network where network devices belong to different independent domains and whose network configuration cannot be shared with other domains due to privacy concerns. Therefore, these approaches cannot quantify network reachability across a network spanning different domains.

178 7 Privacy Preserving Quantification of Cross-Domain Network Reachability 7.2.2 Privacy Preserving Set Operation The other work partly related to our work is on privacy preserving set operations. These solutions enable n parties, where each party has a private set Si, to collaboratively compute the intersection of all sets, S1 ∩ · · · ∩ Sn, without disclosing more information of one party’s private set beyond the intersection to other parties [12, 20, 31]. Although it is possible to explore the application of these solutions to solve the problem of privacy preserving network reachability, the communication cost of these solutions is prohibitive due to the messages exchanged during the privacy preserving set operations. In privacy preserving set operations, after the computation, every party needs to know the final result, i.e., s1 ∩ · · · ∩ sn, while in privacy preserving network reachability, only the first party needs to know the result. This requirement significantly alters the communication cost of the resulting protocol. Under the semi-honest model [14], the state-of-the-art solution for privacy preserving set operations [20] can achieve O(ndw2md ) for n parties, each party with an ACL of m rules over d w-bit domains, while our solution can achieve min (O(ndwmd ), O(n2w)), which is far more efficient when w is large. Such complexity analysis will be discussed in Sect. 7.7.2. 7.2.3 Privacy Preserving Collaborative Firewall Enforcement in VPN Previous work on collaborative firewall enforcement in virtual private networks (VPN) [9, 22, 23] differs from our work from three perspectives. First and most importantly, the problems being addressed are different. Such work focuses on enforcing firewall policies over encrypted VPN tunnels without leaking the privacy of the remote network’s firewall policy, whereas our work focuses on privacy- preserving calculation of the intersection of multiple ACLs, which is a new problem. Second, the privacy requirements are different. Such work preserves only the privacy of the remote network’s policy, whereas our work preserves the privacy of all parties that are involved in the quantification of network reachability. Last but not least, previous approaches do not require decryption or decoding because their comparison result is whether a packet matches a rule, whereas our work requires decryption of network reachability that is represented by a set of ACLs.

7.3 Problem Statement and Threat Model 179 7.3 Problem Statement and Threat Model 7.3.1 Access Control Lists (ACLs) We consider quantifying network reachability using the ACLs of different domains which are converted to enable this computation. Each ACL A is an ordered list of rules and each rule is composed of a predicate over d f ields, F1, · · · , Fd and a decision for the packets that match the predicate. Typically, an ACL checks five fields, source IP (32 bits), destination IP (32 bits), source port (16 bits), destination port (16 bits), and protocol type (8 bits). A packet over the d fields F1, · · · , Fd is a d-tuple (p1, · · · , pd ) where each pi is an element of the domain of field Fi, denoted as D(Fi). A predicate specifies a set of packets over the d fields, F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd , where each Si ⊆ D(Fi) and is specified as either a prefix or a range. A packet (p1, · · · , pd ) matches a rule F1 ∈ S1 ∧ · · · ∧ Fd ∈ Sd → decision if and only if the condition p1 ∈ S1 ∧ · · · ∧ pd ∈ Sd holds. Typical decisions include: accept, discard, accept with logging, and discard with logging. Without loss of generality, we only consider accepting rules, i.e., rules with accept decisions, and discarding rules, i.e., rules with discard decisions. Two rules in an ACL may conflict, i.e., they have different decisions and there is at least one packet that matches both rules. To resolve these conflicts, ACLs usually employ a first-match semantics, i.e., the decision of the first-matching rule is enforced on the packet. The matching set of ri, M(ri), is the set of all possible packets that match the rule ri [24]. Table 7.1 shows an example ACL, the format of which is based upon the format of Cisco Access Control Lists. 7.3.2 Problem Statement We focus on quantifying the end-to-end network reachability for a given network path with multiple network devices belonging to different parties. The network devices are connected with physical interfaces for filtering outgoing packets and incoming packets. A network path is a unidirectional path for transferring packets from the source to the destination. Along the given network path, there are multiple ACLs for filtering these packets. There may be multiple ACLs that are enforced by the same domain. To convert them to a single ACL for one party, we first employ the existing network reachability approach, Quarnet [26], and generate reachability Table 7.1 An example ACL Rule Src. IP Dest. IP Src. Port Dest. Port Prot. Action r1 123.24.*.* 242.168.0.1 * 80 TCP Accept r2 219.27.*.* 242.168.0.2 * 25 TCP Discard r3 * * ** * Discard

180 7 Privacy Preserving Quantification of Cross-Domain Network Reachability matrices. Second, we use the query language of Quarnet to obtain a set of packets that can pass through the given path within the party. Finally, we convert the set of packets to a single ACL. Without loss of generality, in the rest of this chapter, we use the term “ACL” to denote the resulting ACL converted from multiple ACLs in one party. Given an ACL A, let M(A) denote the set packets that are accepted by A. Given a network path with n ACLs A1, A2, . . . , An for transferring packets from A1 to An, where Ai belongs to the party Pi (1 ≤ i ≤ n), quantifying the network reachability is computing the intersection among M(A1), · · · , M(An), i.e., M(A1) ∩ M(A2) · · · ∩ M(An). In our context, we aim to design a privacy preserving protocol which enables the first party P1 to compute the intersection of n ACLs , n ≥ 2, M(A1) ∩ M(A2) · · · ∩ M(An) without revealing rules in an ACL Ai (1 ≤ i ≤ n) to any other party Pj (j = i). We make the following assumptions. (1) The destination of the network path cannot be an intermediate network device. In other words, the destination ACL An should filter the packets to end users but not to another network device. (2) All parties follow the privacy-preserving protocol for quantifying the network reachability. Note that if one party does not want to involve or only wants to provide part of its ACL rules, the party P1 can still run the protocol to compute network reachability among the remaining ACLs. This requirement is very important especially for the party who really cares about the security of its private network, e.g., a bank who will not share with other parties the information about which packets are allowed to enter into its private network. 7.3.3 Threat Model We consider the semi-honest model, where each party follows our protocol correctly but it may try to learn the ACL rules of other parties [7, 14]. For example, the party P1 may use the intermediate results to reveal the ACL rules of other parties. The semi-honest model is realistic in our context because a malicious party cannot gain benefits by providing a forged ACL or not following our protocol. For instance, in [7], the benefit of launching active collusion attacks is almost negligible, and even damaging to the individual parties. A similar scenario is present in our protocol, where launching such attacks neutralizes the benefits of computing network reachability, especially if the other ACL might undergo changes in a short period of time.

7.4 Privacy-Preserving Quantification of Network Reachability 181 7.4 Privacy-Preserving Quantification of Network Reachability To compute the network reachability from A1 to An, our privacy-preserving protocol consists of three phases, ACL preprocessing, ACL encoding and encryption, and ACL comparison. In the first phase, ACL preprocessing, each party converts its ACL to a sequence of accepting rules. The union of the matching sets of these accepting rules is equal to the set of packets that are accepted by the ACL. In the second phase, ACL encoding and encryption, each party encodes and encrypts each field of its accepting rules for preserving the privacy of its ACL. In the third phase, ACL comparison, all parties compare their ACLs and finally the party P1 finds out the set of packets that are accepted by all ACLs. Particularly, Pn−1 compares the encoded and encrypted accepting rules from An−1 with those from An, and finds out the multiple accepting rules whose union is equal to the intersection of M(An) and M(An−1). Then, Pn−2 compares the accepting rules from ACL An−2 with the resulting accepting rules in the first step, and finds out the multiple accepting rules whose union is equal to M(An)∩M(An−1)∩M(An−2). This step is repeated until P1 finds out the multiple accepting rules whose union is equal to M(A1)∩· · ·∩M(An). Note that, the resulting accepting rules of each step are in an encrypted format which prevents any party from revealing these rules by itself. To reveal the final accepting rules, P1 requires all other parties to decrypt these rules with their private keys and then, P1 decrypts these rules. We assume that each Si in an ACL rule S1 ∧ · · · Si ∧ · · · ∧ Sn, is represented in the range format. Therefore, the basic problem of privacy-preserving network reachability is to compute the intersection among the multiple ranges of the ACL rules belonging to different parties in a privacy preserving manner. This problem boils down to the problem of computing intersection of two ranges [a, b] and [a , b ], denoted as [a, b] ∩ [a , b ]. Thus, we first describe the privacy-preserving protocol for computing [a, b] ∩ [a , b ], and then describe the three phases in our network reachability protocol. 7.4.1 Privacy-Preserving Range Intersection To compute the intersection of a range [a, b] from Ai and a range [a , b ] from Aj , our basic idea is to check which range among [min, a − 1], [a, b], and [b + 1, max] includes a or b , where min and max are the minimum and maximum numbers, respectively. Thus, the problem of computing [a, b] ∩ [a , b ] reduces to the problem of checking whether a number is in a range, e.g., a ∈ [min, a − 1], which can be solved by leveraging the prefix membership verification scheme in [22]. Our scheme consists of five steps. We label each step with the party that is performing that step.

182 7 Privacy Preserving Quantification of Cross-Domain Network Reachability 7.4.1.1 [P i]: Range Transformation The party Pi transforms each range [a, b] to three ranges [min, a − 1], [a, b], and [b + 1, max], where min and max are the minimum and maximum numbers of the corresponding field’s domain, respectively. For example, [5,7] is converted to [0,4], [5,7], and [8,15], where 0 and 15 are the minimum and maximum numbers. Note that [min, a − 1] and [b + 1, max] may not exist. If a = min, then [min, a − 1] does not exist; if b = max, then [b + 1, max] does not exist. 7.4.1.2 [P i]: Range to Prefix Set Conversion The party Pi converts each range to a minimum set of prefixes, whose union corresponds to the range. Let S([min, a − 1]), S([a, b]), and S([b + 1, max) denote the resulting sets of prefixes for the three ranges, respectively. For example, S([5, 7])={0101, 011*}, where “*” denotes that this bit can be 0 or 1. 7.4.1.3 [P j ]: Prefix Family Generation The party Pj generates the prefix families of a and b, denoted as F(a) and F(b). The prefix family F(a) consists of a and all the prefixes that contains a. Assuming w is the bit length of a, F(a) consists of w + 1 prefixes where the l-th prefix is obtained by replacing the last l − 1 bits of a by ∗. For example, as the binary representation of 6 is 0110, we have F(6)={0110, 011*, 01**, 0***, ****}. It is easy to prove that a ∈ [a, b] if and only if F(a ) ∩ S([a, b]) = ∅. 7.4.1.4 [P i, P j ]: Prefix Numericalization Pi and Pj convert the resulting prefixes to numbers so that they can encrypt them in the next step. We use the prefix numericalization scheme in [6]. This scheme basically inserts 1 before ∗s in a prefix and then replaces every ∗ by 0. For example, 01** is converted to 01100. If the prefix does not contain ∗s, we place 1 at the end of the prefix. For example, 1100 is converted to 11001. Given a set of prefixes S, we use N(S) to denote the resulting set of numericalized prefixes. Thus, a ∈ [a, b] if and only if N(F(a )) ∩ N(S([a, b])) = ∅. 7.4.1.5 [P i, P j ]: Private Set Intersection Checking whether N(F(a )) ∩ N(S([a, b])) = ∅ is basically checking whether an element from N(F(a )) is equal to an element from N(S([a, b])). We use commutative encryption (e.g., [29, 30]) to do this check. Given a number x and two encryption keys Ki and Kj , a commutative encryption satisfies the property

7.4 Privacy-Preserving Quantification of Network Reachability 183 ((x)Ki )Kj = ((x)Kj )Ki , i.e., encryption with Ki and then Kj is equivalent to encryption with Kj and then Ki. For ease of presentation, we use (x)Kij to denote ((x)Ki )Kj . [Pi]: Commutative Encryption of N(S([a, b])) In our scheme, to check whether N(F(a )) ∩ N(S([a, b])) = ∅, Pi encrypts numbers in N(S([a, b])) with its private key Ki, and sends it to Pj . Then, Pj further encrypts them by its private key Kj and sends them back to Pi. Let N(S([a, b]))Kij denote the result. [Pj ]: Commutative Encryption of N(F(a )) Pj encrypts numbers in N(F(a )) with Kj , sends it to Pi. Then, Pi encrypts them by Ki. Let F(a )Kji denote the result. Finally, Pi can check whether there is a common element in two sets N(S([a, b]))Kij and F(a )Kji . Through the previous steps, Pi knows which range among [min, a − 1], [a, b], and [b + 1, max] includes a or b . Then, Pi can compute [a, b] ∩ [a , b ]. For example, if a ∈ [min, a − 1] and b ∈ [a, b], [a, b] ∩ [a , b ] = [a, b ]. Note that a and b are in the form of F(a )Kji and F(b )Kji . Pi cannot reveal a and b without knowing Pj ’s private key Kj . Figure 7.3 illustrates the intersection computation of [5, 7] (in A1) and [6, 15] (in A2). 7.4.2 ACL Preprocessing In the ACL preprocessing phase, each party Pi (1 ≤ i ≤ n) computes the set of packets M(Ai) that are accepted by its ACL Ai. To achieve this purpose, Pi first converts its ACL to an equivalent sequence of non-overlapping rules. Non- Fig. 7.3 Privacy-preserving range intersection

184 7 Privacy Preserving Quantification of Cross-Domain Network Reachability overlapping rules have an important property, that is, for any two non-overlapping rules nr and nr , the intersection of the two corresponding matching sets is empty, i.e., M(nr) ∩ M(nr ) = ∅. Thus, any packet p matches one and only one non- overlapping rule converted from Ai and the decision of this non-overlapping rule is the decision of Ai for the packet p. Therefore, instead of computing the set M(Ai), Pi only needs to retrieve all non-overlapping accepting rules because the union of matching sets of these rules is equal to M(Ai). The preprocessing of each Ai includes three steps: (1) Pi converts its ACL Ai to an equivalent acyclic directed graph, called firewall decision diagram (FDD) [15]. An FDD construction algorithm, which converts an ACL to an equivalent FDD, is presented in [25]. Figure 7.4b shows the FDD constructed from Fig. 7.4a. (2) Pi extracts non-overlapping accepting rules from the FDD. We do not consider non-overlapping discarding rules because the packets discarded by any ACL Ai cannot pass through the path. Figure 7.4c shows the non-overlapping accepting rules extracted from the FDD in Fig. 7.4b. (3) Next, P1 needs to compare its accepting rules from A1 with those from the other n − 1 ACLs. Without loss of generality, in the next two subsections, we use a simplified example in Fig. 7.5 to show that how to compute the network reachability among three ACLs. Each ACL has only one field and the domain for the field is [0,15]. Note that in Fig. 7.5, nr1(i) and nr2(i) denote two non- overlapping accepting rules for ACL Ai (1 ≤ i ≤ 3). Clearly, the network reachability among these three ACLs can be denoted as two accepting rules F1 ∈ [0, 2] → a and F1 ∈ [6, 7] → a. Next, we will show how to compute these two rules in a privacy preserving manner. Fig. 7.4 The conversion of A1

7.4 Privacy-Preserving Quantification of Network Reachability 185 Fig. 7.5 The example three adjacent ACLs 7.4.3 ACL Encoding and Encryption In the ACL encoding and encryption phase, all parties need to convert their non- overlapping accepting rules to another format such that they can collaboratively compute the network reachability while one party cannot reveal the ACL of any other party. Recall the privacy-preserving range intersection scheme in Sect. 8.4.1, two parties employ different encoding methods, one converts a range [a, b] to a set of prefixes S([a, b]), and another converts a number a to its prefix family F(a ). Assume that each party Pi (1 ≤ i ≤ n) has a private key Ki. Let H denote the encoding function used by the party Pj (1 ≤ j ≤ n − 1). Let F1 ∈ [a1, b1] ∧ · · · ∧ Fd ∈ [ad , bd ] denote the predicate of an accepting rule over d fields. The encoding and encryption result of this accepting rule for Aj is HKj···n ([a1, b1]) ∧ · · · ∧ HKj···n ([ad , bd ]). Let L denote the encoding function used by the party Pn. Considering the above accepting rule, the result is LKn ([a1, b1]) ∧ · · ·∧LKn ([ad , bd ]). We discuss the procedure of these two encoding and encryption methods in detail as follows. 7.4.3.1 Encoding and Encryption of ACL Aj (1 ≤ j ≤ n − 1) (a) For each non-overlapping accepting rule F1 ∈ [a1, b1] ∧ · · · ∧ Fd ∈ [ad , bd ], Pj performs Range Transformation as described in Sect. 8.4.1(1). Figure 7.6b shows the ranges generated from Fig. 7.6a. (b) Pj uses the approach in Sect. 8.4.1(2) to convert each range into sets of prefixes. Figure 7.6c shows the prefixes generated from Fig. 7.6b. That is, for the three ranges converted from [al, bl], compute S([minl, al − 1]), S([al, bl]), S([bl + 1, maxl]). (c) Pj unions all these prefix sets and permutes these prefixes. Figure 7.6d shows the resulting prefix set. This step has two benefits. First, it avoids encrypting and sending duplicate prefixes, and hence, significantly reduces the computation and communication costs for the next two steps. Second, it enhances the security, any other parties except Pj cannot reconstruct the non-overlapping accepting rules, because it is difficult to correlate the prefixes to their corresponding rules without the knowledge of the original ACL. (d) Pj uses the Prefix Numericalization approach of Sect. 8.4.1(4) to numericalize and encrypt each number using Kj . Figure 7.6e and f show the numericalized and encrypted prefixes.

186 7 Privacy Preserving Quantification of Cross-Domain Network Reachability Fig. 7.6 Encoding and encryption of ACL A1 (e) Pj sends the resulting encrypted prefixes to Pj+1 which further encrypts them with its private key Kj+1. Then, Pj+1 sends the resulting values to Pj+2 which further encrypts them with Kj+2. This process is repeated until Pn encrypts them. Finally, Pn sends to Pj the resulting prefixes that are encrypted n − j + 1 times. Figure 7.6f shows the result after encrypting by P1 and Fig. 7.6g shows the result after encrypting by P2 and P3. (f) Pj reconstructs the non-overlapping accepting rules from the multiple encrypted prefixes because Pj knows which prefix belongs to which field of which rule. Based on the above steps, the encoding and encryption function used by Pj (1 ≤ j ≤ n − 1) is defined as HKj···n ([al, bl]) = (N(S(minl, al − 1))Kj···n , N(S(al, bl))Kj···n , N(S(bl + 1, maxl))Kj···n ), where [al, bl] is the range in the l-

7.4 Privacy-Preserving Quantification of Network Reachability 187 th field of a rule in Aj . Figure 7.8a illustrates the encoding and encryption result of ACL A2 in Fig. 7.5. The only difference between operations for A1 and A2 is that A1’s numericalized prefixes are encrypted by all the three parties while A2’s numericalized prefixes are only encrypted by two parties P2 and P3. 7.4.3.2 Encoding and Encryption of ACL An (a) For each range [al, bl] of a non-overlapping rule, Pn generates two prefix families F(al) and F(bl) using the Prefix Family Generation approach from Sect. 8.4.1(3). Figure 7.7b shows the result from Fig. 7.7a. (b) Pn uses the Prefix Numericalization approach from Sect. 8.4.1(4) to numerical- ize and then, encrypts the prefixes using its private key Kn. Figure 7.7c shows the resulting prefixes. Based on the above steps, the encoding and encryption function used by An is defined as LKn ([al, bl]) = (N(F(al))Kn , N(F(bl))Kn ) where [al, bl] is the range in the l-th field of a rule. 7.4.4 ACL Comparison In the ACL comparison phase, we need to compare the n sequences of encrypted non-overlapping accepting rules or encrypted numbers from every two adjacent ACLs. Without loss of generality, we only present the comparison between An−1 and An. Figure 7.8a shows the encrypted non-overlapping rules converted from A2 in Figs. 7.5 and 7.8b shows the encrypted numbers converted from A3 in Fig. 7.5. The comparison includes four steps: (1) Pn sends the resulting sequence to Pn−1 which further encrypts them with its private key Kn−1. Let LKn ([a1, b1]) ∧ · · · ∧ LKn ([ad , bd ]) denote the encoded and encrypted result of the accepting rule nr from An. Pn−1 encrypts it with Fig. 7.7 Encoding and encryption of ACL A3


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook