Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Algorithm Theory - SWAT 2010

Algorithm Theory - SWAT 2010

Published by Willington Island, 2021-08-04 02:36:44

Description: Optimal Exploration of Terrains with Obstacles.- Reconstructing a Simple Polygon from Its Angles.- Semidefinite Programming and Approximation Algorithms: A Survey.- Strictly-Regular Number System and Data Structures.- An O(log log n)-Competitive Binary Search Tree with Optimal Worst-Case Access Times.- The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition.- A Bottom-Up Method and Fast Algorithms for max independent set.- Capacitated Domination Faster Than O(2 n ).- Isomorphism for Graphs of Bounded Feedback Vertex Set Number.- On Feedback Vertex Set New Measure and New Structures.- Conflict-Free Coloring Made Stronger.- Polychromatic Coloring for Half-Planes.- A 3/2-Approximation Algorithm for Multiple Depot Multiple Traveling Salesman Problem.- Minimum and Maximum against k Lies.- Feasible and Accurate Algorithms for Covering Semidefinite Programs.

Search

Read the Text Version

["36 A. Elmasry, C. Jensen, and J. Katajainen element comparisons. Thus, the total cost is O(r) and at most 2r + O(1) element comparisons are performed. Using Lemma 4, r \u2264 lg n, and the claim follows. In summary, our data structure improves the original data structure in two ways. First, by Lemma 4, the new system reduces the maximum number of children a node can have from 2 lg n to lg n. Second, the new system breaks the bottleneck resulting from delayed melding, since two subtrees can be merged with O(1) element comparisons. The above discussion implies the following theorem. Theorem 4. Let n denote the number of elements stored in the data structure prior to a deletion. There exists a priority queue that supports \ufb01nd-min, insert, and meld at O(1) worst-case cost, and delete at O(lg n) worst-case cost including at most 2 lg n + O(1) element comparisons. 4 Other Applications Historically, it is interesting to note that in early papers a number system sup- porting increments and decrements of an arbitrary digit was constructed by putting two regular systems back to back, i.e. di \u2208 {0, 1, 2, 3, 4, 5}. It is rel- atively easy to prove the correctness of this system. This approach was used in [14] for constructing catenable deques, in [9] for constructing catenable \ufb01n- ger search trees, and in [8] for constructing meldable priority queues. (In [8], di \u2208 {2, 3, 4, 5, 6, 7} is imposed, since an extra constraint that di \u2265 2 was required to facilitate the violation reductions and to guarantee the exponentiality prop- erty.) Later on, it was realized that the extended-regular system, di \u2208 {0, 1, 2, 3}, could be utilized for the same purpose (see, for example, [6]). The strictly-regular system may be employed in applications where these more extensive number sys- tems have been used earlier. This replacement, when possible, would have two important consequences: 1. The underlying data structures become simpler. 2. The operations supported may become a constant factor faster. While surveying papers that presented potential applications to the new number system, we found that, even though our number system may be applied, there were situations where other approaches would be more favourable. For example, the relaxed heap described in [11] relies on the zeroless extended-regular system to support increments and decrements. Naturally, the strictly-regular system could be used instead, and this would reduce the number of trees that have to be maintained. However, the approach of using a two-tier structure as described in [11] makes the reduction in the number of trees insigni\ufb01cant since the amount of work done is proportional to the logarithm of the number of trees. Also, a fat heap [6] uses the extended-regular binary system for keeping track of the potential violation nodes and the extended-regular ternary system for keeping track of the trees in the structure. However, we discovered that a priority queue with the same functionality and e\ufb03ciency can be implemented with simpler tools without using number systems at all. The reader is warned: number systems are powerful tools but they should not be applied haphazardly.","Strictly-Regular Number System and Data Structures 37 Up till now we have ignored the cost of accessing the extreme digits in the vicinity of a given digit. When dealing with the regular or the extended-regular systems this can be done at O(1) cost by using the guides described in [8]. In contrary, for our number system, accessing the extreme digits in the vicinity of any digit does not seem to be doable at O(1) cost. However, the special case of accessing the \ufb01rst and last extreme digits is soluble at O(1) cost. In some applications, like fat heaps [6] and the priority queues described in [8], the underlying number system is ternary. We have not found a satisfactory solu- tion to extend the strictly-regular system to handle ternary numbers e\ufb03ciently; it is an open question whether such an extension exists. References 1. Clancy, M., Knuth, D.: A programming and problem-solving seminar. Technical Report STAN-CS-77-606, Dept. of Computer Science, Stanford University (1977) 2. Guibas, L.J., McCreight, E.M., Plass, M.F., Roberts, J.R.: A new representation for linear lists. In: Proceedings of the 9th Annual ACM Symposium on Theory of Computing, pp. 49\u201360. ACM Press, New York (1977) 3. Vuillemin, J.: A data structure for manipulating priority queues. Communications of the ACM 21(4), 309\u2013315 (1978) 4. Okasaki, C.: Purely Functional Data Structures. Cambridge University Press, Cam- bridge (1998) 5. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, & Tools, 2nd edn. Pearson Education, Inc., London (2007) 6. Kaplan, H., Shafrir, N., Tarjan, R.E.: Meldable heaps and Boolean union-\ufb01nd. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 573\u2013582. ACM Press, New York (2002) 7. Brodal, G.S.: Fast meldable priority queues. In: Sack, J.-R., Akl, S.G., Dehne, F., Santoro, N. (eds.) WADS 1995. LNCS, vol. 955, pp. 282\u2013290. Springer, Heidelberg (1995) 8. Brodal, G.S.: Worst-case e\ufb03cient priority queues. In: Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 52\u201358. ACM\/SIAM (1996) 9. Kaplan, H., Tarjan, R.E.: Purely functional representations of catenable sorted lists. In: Proceedings of the 28th Annual ACM Symposium on Theory of Comput- ing, pp. 202\u2013211. ACM, New York (1996) 10. Elmasry, A.: A priority queue with the working-set property. International Journal of Foundations of Computer Science 17(6), 1455\u20131465 (2006) 11. Elmasry, A., Jensen, C., Katajainen, J.: Two-tier relaxed heaps. Acta Informat- ica 45(3), 193\u2013210 (2008) 12. Elmasry, A., Jensen, C., Katajainen, J.: Multipartite priority queues. ACM Trans- actions on Algorithms, Article 14 5(1) (2008) 13. Jensen, C.: A note on meldable heaps relying on data-structural bootstrapping. CPH STL Report 2009-2, Department of Computer Science, University of Copen- hagen (2009), http:\/\/cphstl.dk 14. Kaplan, H., Tarjan, R.E.: Persistent lists with catenation via recursive slow-down. In: Proceedings of the 27th Annual ACM Symposium on Theory of Computing, pp. 93\u2013102. ACM, New York (1995)","An O(log log n)-Competitive Binary Search Tree with Optimal Worst-Case Access Times Prosenjit Bose1, , Karim Dou\u00efeb1, , Vida Dujmovi\u01071, , and Rolf Fagerberg2 1 School of Computer Science, Carleton University {jit,karim,vida}@cg.scs.carleton.ca 2 Department of Mathematics and Computer Science, University of Southern Denmark [email protected] Abstract. We present the zipper tree, an O(log log n)-competitive on- line binary search tree that performs each access in O(log n) worst-case time. This shows that for binary search trees, optimal worst-case ac- cess time and near-optimal amortized access time can be guaranteed simultaneously. 1 Introduction A dictionary is a basic data structure for storing and retrieving information. The binary search tree (BST) is a well-known and widely used dictionary imple- mentation which combines e\ufb03ciency with \ufb02exibility and adaptability to a large number of purposes. It constitutes one of the fundamental data structures of computer science. In the past decades, many BST schemes have been developed which perform element accesses (and indeed many other operations) in O(log n) time, where n is the number of elements in the tree. This is the optimal single-operation worst-case access time in a comparison based model. Turning to sequences of accesses, it is easy to realize that for speci\ufb01c access sequences, there may be BST algorithms which serve m accesses in less than \u0398(m log n) time. A common way to evaluate how well the performance of a given BST algorithm adapts to individual sequences, is competitive analysis: For an access sequence X, de\ufb01ne OPT(X) to be the minimum time needed by any BST algorithm to serve it. A given BST algorithm A is then said to be f (n)-competitive if it performs X in O(f (n) OPT(X)) time for all X. To make this precise, a more formal de\ufb01nition of a BST model and of the sequences X considered is needed\u2014standard in the area is to use the binary search tree model (BST model) de\ufb01ned by Wilber [12], in which the only existing non-trivial lower bounds on OPT(X) have been proven [3,12]. In 1985, Sleator and Tarjan [10] developed a BST called splay trees, which they conjectured to be O(1)-competitive. Much of the research on the e\ufb03ciency of BSTs on individual input sequences has grown out of this conjecture. However, The authors are partially supported by NSERC and MRI. H. Kaplan (Ed.): SWAT 2010, LNCS 6139, pp. 38\u201349, 2010. c Springer-Verlag Berlin Heidelberg 2010","An O(log log n)-Competitive BST with Optimal Worst-Case Access Times 39 despite decades of research, the conjecture is still open. More generally, it is unknown if there exist asymptotically optimal BST data structures. In fact, for many years the best known competitive ratio for any BST structure was O(log n), which is achieved by a plain static balanced tree. This situation was recently improved by Demaine et al., who in a seminal paper [3] developed an O(log log n)-competitive BST structure, called the tango tree. This was the \ufb01rst improvement in competitive ratio for BSTs over the trivial O(log n) upper bound. Being O(log log n)-competitive, tango trees are always at most a factor O(log log n) worse than OPT(X). On the other hand, they may actually pay this multiplicative overhead at each access, implying that they have \u0398(log log n log n) worst-case access time, and use \u0398(m log log n log n) time on some access sequences of length m. In comparison, any balanced BST (even static) has O(log n) worst-case access time and spends O(m log n) on every access sequence. The problem we consider in this paper is whether it is possible to combine the best of these bounds\u2014that is, whether an O(log log n)-competitive BST algo- rithms that performs each access in optimal O(log n) worst-case time exists. We answer it a\ufb03rmatively by presenting a data structure achieving these complexi- ties. It is based on the overall framework of tango trees\u2014however, where tango trees use red-black trees [6] for storing what is called preferred paths, we develop a specialized BST representation of the preferred paths, tuned to the purpose. This representation is the main technical contribution, and its description takes up the bulk of the paper. In the journal version of their seminal paper on tango trees, Demaine et al. sug- gested that such a structure exists. Speci\ufb01cally, in the further work section, the authors gave a short sketch of a possible solution. Their suggested approach, how- ever, relies on the existence of a BST supporting dynamic \ufb01nger, split and merge in O(log r) worst-case time where r is the rank di\ufb00erence between the accessed element and the previously accessed element. Such a BST could indeed be used for the auxiliary tree representation of preferred paths. However, the existence of such a structure (in the BST-model) is an open problem. Consequently, since the publication of their work, the authors have revised their stance and consider the problem solved in this paper to be an open problem [7]. Recently, Woo [13] made some progress concerning the existence of a BST having the dynamic \ufb01nger prop- erty in worst-case. He developed a BST algorithm satisfying, based on empirical evidence, the dynamic \ufb01nger property in worst-case. Unfortunately this BST algo- rithm does not allow insertion\/deletion or split\/merge operations, thus it cannot be used to maintain the preferred paths in a tango tree. After the publication of the tango tree paper, two other O(log log n)- competitive BSTs have been introduced by Derryberry et al. [4,11] and Geor- gakopoulos [5]. The multi-splay trees [4] are based on tango trees, but instead of using red-black trees as auxiliary trees, they use splay trees [10]. As a con- sequence, multi-splay trees can be shown [4,11] to satisfy additional properties, including the scanning and working-set bounds of splay trees, while maintaining O(log log n)-competitiveness. Georgakopoulos uses the interleave lower bound of","40 P. Bose et al. Demaine et al. to develop a variation of splay trees called chain-splay trees that achieves O(log log n)-competitiveness while not maintaining any balance condi- tion explicitly. However, neither of these two structures achieves a worst-case single access time of O(log n). A data structure achieving the same running time as tango trees alongside O(log n) worst-case single access time was developed by Kujala and Elomaa [8], but this data structure does not adhere to the BST model (in which the lower bounds on OPT(X) are proved). The rest of this paper is organized as follows: In Section 2, we formally de\ufb01ne the model of BSTs and the access sequences considered. We state the lower bound on OPT(X) developed in [3,12] for analyzing the competitive ratio of BSTs. We also describe the central ideas of tango trees. In Section 3, we introduce a preliminary data structure called hybrid trees, which does not \ufb01t the BST model proper, but which is helpful in giving the main ideas of our new BST structure. Finally in Section 4, we develop this structure further to \ufb01t the BST model. This \ufb01nal structure, called zipper trees, is a BST achieving the optimal worst- case access time while maintaining the O(log log n)-competitiveness property. 2 Preliminaries 2.1 BST Model In this paper we use the binary search tree model (BST model) de\ufb01ned by Wilber [12], which is standard in the area. Each node stores a key from a totally ordered universe, and the keys obey in-order: at any node, all of the keys in its left subtree are less than the key stored in the node, and all of the keys in its right subtree are greater (we assume no duplicate keys appear). Each node has three pointers, pointing to its left child, right child, and parent. Each node may keep a constant1 amount of additional information, but no further pointers may be used. To perform an access, we are given a pointer initialized to the root. An access consists of moving this pointer from a node to one of its adjacent nodes (through the parent pointer or one of the children pointers) until it reaches the desired element. Along the way, we are allowed to update the \ufb01elds and pointers in any nodes that the pointer touches. The access cost is the number of nodes touched by the pointer. As is standard in the area, we only consider sequences consisting of element accesses on a \ufb01xed set S of n elements. In particular, neither unsuccessful searches nor updates take place. 2.2 Interleave Lower Bound The interleave bound is a lower bound on the time taken by any binary search tree in the BST model to perform an access sequence X = {x1, x2, . . . , xm}. The interleave bound was developed by Demaine et al. [3] and was derived from a previous bound of Wilber [12]. 1 According to standard conventions, O(log2 n) bits are considered as constant.","An O(log log n)-Competitive BST with Optimal Worst-Case Access Times 41 Let P be a static binary search tree of minimum height, built on the set of keys S. We call P the reference tree. For each node y in P , we consider the accesses of X that are to keys in nodes in the subtree of P rooted at y (including y). Each access of this subsequence is then labelled \u201cleft\u201d or \u201cright\u201d, depending on whether the accessed node is in the left subtree of y (including y), or in its right subtree, respectively. The amount of interleaving through y is the number of alternations between left and right labels in this subsequence. The interleave bound IB(X) is the sum of these interleaving amounts over all nodes y in P . The exact statement of the lower bound from [3] is as follows: Theorem 1 (From [3]). For any access sequence X, IB(X)\/2 \u2212 n is a lower bound on OPT(X). 2.3 Tango Trees We outline the main ideas of tango trees [3]. As in the previous subsection, denote by the reference tree P a static binary search tree of height O(log n) built on a set of keys S. At a given point in time, the preferred child of an internal node y in P is de\ufb01ned as its left or right child depending on whether the last access to a node in the subtree rooted at y (including y) was in the left subtree of y (including y) or in its right subtree respectively. We call a maximal chain of preferred children a preferred path. The set of preferred paths partitions P into disjoint parts of size O(log n). Remember that P is a static tree, only the preferred paths may evolve over time (namely, after each access). The ingenious idea of tango trees is to represent the nodes on a preferred path as a balanced tree of height O(log log n), called an auxiliary tree. The tango tree can be seen as a collection of auxiliary trees linked together. The leaves of an auxiliary tree representing a preferred path p link to the roots of auxiliary trees representing the paths immediately below p in P , with the links uniquely determined by the inorder ordering. The auxiliary tree containing the root of P constitutes the top-part of the tango tree. In order to distinguish auxiliary trees within the tango tree, the root of each auxiliary tree is marked (using one bit). Note that the reference tree P is not an explicit part of the structure, it just helps to explain and understand the concept of tango trees. When an access is performed, the preferred paths of P may change. This change is actually a combination of several cut and concatenation operations involving subpaths. Auxiliary trees in tango tree are implemented as red-black trees [6], and [3] show how to implement these cut and concatenation operations using standard split and join operations on red-black tree. Here are the main two operations used to maintain tango trees: CUT-TANGO(A, d) \u2013 divide the red-black tree A into two red-black trees, one storing all nodes in the preferred path having depth at most d in P , and another storing all nodes having depth greater than d. CONCATENATE-TANGO(A, B) \u2013 merge two red-black trees that store two disjoint paths for which in P the bottom of one path (stored in A) is the parent of the top of the other path (stored in B). I.e., in the tango tree, the root of B is attached to a leaf of A.","42 P. Bose et al. These operations can be implemented by splits and joins in O(log k) time for trees of size k, if adding extra information in nodes: Besides the key value and the depth in P , each node also stores the minimum and maximum depth over the nodes in its subtree within its auxiliary tree. This additional data is easily maintained in red-black trees. As the trees store paths in P , we have k = O(log n). Hence, if an access passes i di\ufb00erent preferred paths in P , the necessary change in the tango tree will be O(i) cut and concatenation operations, which is performed in O(i log log n) time. Over any access sequence X the total number of cut and concatenation operations performed in P corresponds to the interleave bound IB(X), thus tango trees serve X in O(log log n IB(X)) time, which by Thm. 1 makes them O(log log n)-competitive. 3 Hybrid Trees In this section, we introduce a data structure called hybrid trees, which has the right running time, but which does not \ufb01t the BST model proper. However, it is helpful intermediate step which contains the main ideas of our \ufb01nal BST structure. 3.1 Path Representation For all preferred paths in P , we keep the top \u0398(log log n) nodes exactly as they appear on the path. We call this the top path. The remaining nodes (if any) of the path we store as a red-black tree, called the bottom tree, which we attach below the top path. Since a preferred path has size O(log n), this bottom tree has height O(log log n). More precisely, we will maintain the invariant that a top path has length in [log log n, 3 log log n], unless no bottom tree appears, in which case the constraint is [0, 3 log log n]. (This latter case, where no bottom tree appears, will induce simple and obvious variants of the algorithms in the remainder of the paper, variants which we for clarity of exposition will not mention further.) A hybrid tree consists of all the preferred paths of P , represented as above, linked together to form one large tree, analogous to tango trees. The required worst-case search complexity of hybrid trees is captured by the following lemma. Lemma 1. A hybrid tree T satis\ufb01es the following property: dT (x) = O(dP (x)) \u2200x \u2208 S, where dT (x) and dP (x) is de\ufb01ned as the depth of the node x in the tree T and in the reference tree P , respectively. In particular, T has O(log n) height. Proof. Consider a preferred path p in P and its representation tree h. The dis- tance in h, in terms of number of edges to follow, from the root of h to one of its nodes or leaves x is no more than a constant times the distance in p between x and the root of p. Indeed, if x is part of the top path, then the distance to the root of the path by construction is the same in h and p. Otherwise, this distance increases by at most a constant factor, since h has a height of O(log log n) and the distance in p is already \u03a9(log log n). To reach a node x in the reference tree P ,","An O(log log n)-Competitive BST with Optimal Worst-Case Access Times 43 we have to traverse (parts of) several preferred paths. As the above argument holds for all paths, the \ufb01rst statement dT (x) = O(dP (x)) of the lemma follows by summing over the paths traversed. Finally, since the depth of any node in P is O(log n), this implies that T has height O(log n). 3.2 Maintaining Hybrid Trees under Accesses The path p traversed in P to reach a desired node may pass through several preferred paths. During this access the preferred paths in P must change such that p becomes the new preferred path containing the root. This entails cut and concatenate operations on the preferred paths passed by p: When p leaves a preferred path, the path must be cut at the depth in P of the point of leave, and the top part cut out must be concatenated with the next preferred path to be traversed. We note that one may as well perform only the cutting while traversing p, producing a sequence of cut out parts hanging below each other, which can then be concatenated in one go at the end of the access, producing the new preferred path starting at the root. We will use this version below. In this subsection, we will show how to maintain the hybrid tree representation of the preferred paths after an access. Our main goal is to to give methods for performing the operations cut and concatenate on our path representations in the following complexities: When the search passes only the top path of a path representation (Case 1 cut), the cut operation takes O(k) time, where k is the number of nodes traversed in the top path. When the search passes the entire top path and ends up in the bottom tree (Case 2 cut), the cut operation takes O(log log n) time. The concatenation operation, which glues together all the cut out path representation parts at the end of the access, is bounded by the time used by the search and the cut operations performed during the access. Assuming these running times, it follows, by the invariant that all top paths (with bottom trees below them) have length \u0398(log log n), that the time of an access involving i cut operations in P is bounded both by the number of nodes on the search path p, and by i log log n. By Lemma 1, this is O(min{log n, i log log n}) time. Hence, we by Thm. 1 will have achieved optimal worst-case access time while maintaining O(log log n)-competitiveness. Cut: Case 1: We only traverse the top path of a path representation. Let k be the number of nodes traversed in this top path and let x be the last traversed node. The cut operation marks the node succeeding x on the top path as the new root of the path representation, and unmarks the other child of x (to make the cut out part join the pieces that after the \ufb01nal concatenation will constitute the new preferred path induced by the search). The cut operation now has removed k nodes from the top path of the path representation. This implies that we possibly have to update the representation, since the \u0398(log log n) bound on the size of its top path has to be maintained. Speci\ufb01cally, if the size of the top path drops below 2 log log n, we will move some nodes from the bottom tree to the top path. The nodes should be those from the","44 P. Bose et al. bottom tree having smallest depth (in P ), i.e., the next nodes on the preferred path in P . It is for small k (smaller than log log n) not clear how to extract the next k nodes from the bottom tree in O(k) time. Instead, we use an extraction process, described below, which extracts the next log log n nodes from the bottom tree in O(log log n) steps and run this process incrementally: Whenever i nodes are cut from the top path, the extraction process is advanced by \u0398(i) steps, and then the process is stopped until the next cut at this path occurs. Thus, the work of the extraction process is spread over several Case 1 cuts (if not stopped before by a Case 2 cut, see below). The speed of the process is chosen such that the extraction of log log n nodes is completed before that number of nodes have been cut away from the top path, hence it will raise the size of the top path to at least 2 log log n again. In general, we maintain the additional invariant that the top path has size at least 2 log log n, unless an extraction process is ongoing. Case 2: We traverse the entire top path of path representation A, and enter the bottom tree. Let x be the last traversed node in A and let y be the marked child of x that is the root of the next path representation on the search path. First, we \ufb01nish any pending extraction process in A, so that its bottom tree becomes a valid red-black tree. Then we rebuild the top path into a red-black tree in linear time (see details under the description of concatenate below), and we join it with the bottom tree using CONCATENATE-TANGO. Then we perform CUT-TANGO(A , d) where A is the combined red-black tree, and d = dP (y) \u2212 1. After this operation, all nodes of depth greater than d are removed from the path representation A to form a new red-black tree B attached to A (the root of B is marked in the process). To make the tree B a valid path representation, we perform an extraction process twice, which extracts 2 log log n nodes from it to form a top path. Finally we unmark y. This takes O(log log n) time in total. Concatenate: What is cut out during an access is a sequence of top paths (Case 1 cuts) and red-black trees (Case 2 cuts) hanging below each other. We have to concatenate this sequence into a single path representation. We \ufb01rst rebuild each consecutive sequence of top paths (which can be found as maximum sequences of nodes which have one marked child) into valid red-black trees, in time linear in the number of nodes in each sequence (details below). This leaves a sequence of at most 2j + 1 valid red-black trees hanging below each other where j is the number of Case 2 cuts performed during the access. Then we iteratively perform CONCATENATE-TANGO(A,B), where A is the current highest red-black tree and B is the tree hanging below A, until there is one remaining red-black tree (this is done in O(j log log n) time, which we have already spent on the Case 2 cuts). Finally we extract 2 log log n nodes from the obtained red-black tree to construct the top path of the path representation. The time used for concatenate is bounded by the time used already during the search and cut part of the access. To convert a path of length k into a red-black tree in O(k) time, we can simply traverse the path downwards, inserting the nodes one by one into a growing (and initially empty) red-black tree. During each insertion, the remaining path will be considered a leaf of the tree. Since the next node to be","An O(log log n)-Competitive BST with Optimal Worst-Case Access Times 45 inserted will become either the successor or the predecessor of the node just inserted, and since the amount of rebalancing in red-black trees is amortized O(1) per update, the entire process is linear in the number of nodes inserted. Extract: We now show how to perform the central process of our structure, namely extracting the next part of a top path from a bottom tree. Speci\ufb01cally, we will extract a subpath of log log n nodes of minimum depth (in P ) from the bottom tree A of a given path representation A, using O(log log n) time. Let x be the deepest node on the top path of A, such that the unmarked child of x corresponds to the root of the bottom tree A . The extraction process will separate the nodes of depth (in P ) smaller than d = dP (x) + log log n from the bottom tree A . Let a zig segment of a preferred path p be a maximal sequence of nodes such that each node in the sequence is linked to its right child in p. A zag segment is de\ufb01ned similarly such that each node on the segment is linked to its left child (see Fig. 1). The key observation we exploit is the following: the sequence of all zig seg- ments, ordered by their depth in the path, followed by the sequence of all reversed zag segments, ordered reversely by their depth in the path, is equal to the order- ing of the nodes in key space (see Fig. 1). This implies that to extract the nodes of depth smaller than d (in P ) from a bottom tree, we can cut the extreme ends (in key space) of the tree, linearize them to two lists, and then combine them by a binary merge procedure using depth in P as the ordering. This forms the core of the extract operation. We have to do this using rotations, while main- taining a tree at all times. We now give the details of how to do this, with Fig. 3 illustrating the process. Using extra \ufb01elds in each node storing the minimum and maximum depth value (in P ) of nodes inside its subtree, we can \ufb01nd the node of minimum key 1 d 2 r 14 13 3 12 4 5 6 7 11 10 9 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Fig. 1. A path, its decomposition into zig Fig. 2. The path representation in the (solid regions) and zag (dashed regions) zipper trees of Sect. 4 segments, and its layout in key order","46 P. Bose et al. value that has a depth greater than d in O(log log n) time, by starting at the root of A and repeatedly walking to the leftmost child whose subtree has a node of depth greater than d. Then de\ufb01ne as the predecessor of . Symmetrically, we can \ufb01nd the node r of maximum key value that has depth greater than d and de\ufb01ne r as the successor of r . First we split A at to obtain two subtrees B and C linked to the new root where B contains a \ufb01rst sequence of nodes at depth smaller than d. Then we split C at r to obtain the subtrees D and E where E contains a second sequence of nodes at depth smaller than d. In O(log log n) time we convert the subtrees B and E into paths as shown in Fig. 3. To do so we perform a left rotation at the root of B until its right subtree is a leaf (i.e., when its right child is a marked node). Then we repeat the following: if the left child of the root has no right child the we perform a right rotation at the root of B (adding one more node to right spine, which will constitute the \ufb01nal path). Otherwise we perform a left rotation at the left child of the root of B, moving its right subtree into the left spine. The entire process takes time linear in the size of B, since each node is involved in a rotation at most 3 times (once a node enters the left spine, it can only leave it by being added to the right spine). A symmetric process is performed to convert the subtree E into a path. The last operation, called a zip, merges (based on depths in P ) the two paths B and E, in order to form the next part of the top path. We repeatedly select the root of B or E that has the smallest depth in the tree P . The selected root is brought to the bottom of the top path using O(1) rotations. The zip operation stops when the subtrees B and E are both empty. Eventually, we perform a left rotation at the node if needed, i.e., if r has a smaller depth in P than . The time taken is linear in the extracted number of nodes, i.e., O(log log n). The process consists of a series of rotations, hence can stopped and resumed without problems. Therefore, the discussion presented in this section allows us to conclude with the following theorem. Theorem 2. Our hybrid tree data structure is O(log log n)-competitive and per- forms each access in O(log n) worst-case time. 3.3 Hybrid Trees and the BST Model We specify in the description of the cut operation (more precisely, in Case 1) that the extraction process is executed incrementally, i.e., the work is spread over sev- eral cut operations. In order to e\ufb03ciently revive an extraction process which has been stopped at some point in the past, we have to return to the position where its next rotation should take place. This location is always somewhere in the bottom tree, so traversing the top path to reach the bottom tree would be too costly for the analysis of Case 1. Instead, we store in the marked node (the \ufb01rst node of the top path) appropriate information on the state of the process. Additionally, we store an extra pointer pointing to the node where the next rotation in the process should take place. This allows us to revive an extraction process in constant time. Unfor- tunately, the structure so obtained is not in the BST model (see Section 2.1), due to the extra pointer. In the next section we show how to further develop the idea","An O(log log n)-Competitive BST with Optimal Worst-Case Access Times 47 A B CB rB r r D DE E r r D (a) (b) (c) (d) (e) Fig. 3. (a) Tree A . (b) Split A at . (c) Split C at r. (d) Convert the subtrees B and E into paths. (e) Zip the paths B and E. from this section into a data structure \ufb01tting the BST model. Still, we note that the structure of this section can be implemented in the comparison based model on a pointer machine, with access sequences X being served in O(log log n OPT(X)) time, and each access taking O(log n) time worst-case. 4 Zipper Trees The data structure described in the previous section is a BST, except that each marked node has an extra pointer facilitating constant time access to the point in the path representation where an extraction process should be revived. In this section, we show how to get rid of this extra pointer and obtain a data structure with the same complexity bounds, but now \ufb01tting the BST model described in Section 2.1. To do so, we develop a more involved version of the representation of preferred paths and the operations on them. The goal of this new path representation is to ensure that all rotations of an extraction process are located within distance O(1) of the root of the tree of the representation. The two main ideas involved are: 1) storing the top path as lists, hanging to the sides of the root, from which the top path can be generated incrementally by merging as it is traversed during access, and 2) using a version of the split operation that only does rotations near the root. The time complexity analysis follows that of hybrid trees, and will not be repeated. 4.1 Path Representation For all preferred paths in P we decompose its highest part into two sequences, con- taining its zig and its zag segments, respectively. These are stored as two paths of nodes, of increasing and decreasing key values, respectively. As seen in Section 3.2 (cf. Fig. 1), both will be ordered by their depth in P . Let and r be the lowest (in terms of depth in P ) node in the zig and zag sequence respectively. The node will be the root of the auxiliary tree (the marked node). The remainder of the","48 P. Bose et al. zig sequence is the left subtree of , r is its right child, and the remainder of the zag sequence is the right subtree of r. We call this upper part of the tree a zipper. We repeat this decomposition once again for the next part of the path to obtain another zipper which is the left subtree of r. Finally the remaining of the nodes on the path are stored as a red-black tree of height O(log log n), hanging below the lowest zipper. Fig. 2 illustrates the construction. The two zippers constitute the top path, and the red-black tree the bottom tree. Note that the root of the bottom tree is reachable in O(1) time from the root of the path representation. We will maintain the invariant that individually, the two zippers contain at most log log n nodes each, while (if the bottom tree is non-empty) they combined contain at least (log log n)\/2 nodes. A zipper tree consists of all the preferred paths of P , repre- sented as above, linked together to form one large tree. 4.2 Maintaining Zipper Trees under Accesses We give the di\ufb00erences, relative to Section 3.2, of the operations during an access. Cut: When searching a path representation, we incrementally perform a zip operation (i.e., a merge based on depth order) on the top zipper, until it outputs either the node searched for, or a node that leads to the next path representation. If the top zipper gets exhausted, the lower zipper becomes the upper zipper, and an incremental creation of a new lower zipper by an extraction operation on the bottom tree is initiated (during which the lower zipper is de\ufb01ned to have size zero). Each time one more node from the top zipper is being output (during the current access, or during a later access passing through this path representation), the extraction advances \u0398(1) steps. The speed of the extraction process is chosen such that it \ufb01nishes with log log n nodes extracted before (log log n)\/2 nodes have been output from the top zipper. The new nodes will make up a fresh lower zipper, thereby maintaining the invariant. If the access through a path representation overlaps (in time) at most one extraction process (either initiated by itself or by a previous access), it is de\ufb01ned as a Case 1 cut. No further actions takes place, besides the proper remarkings of roots of path representations, as in Section 3.2. If a second extraction process is about to be initiated during an access, we know that \u0398(log log n) nodes have been passed in this path representation, and we de\ufb01ne it as a Case 2 cut. Like in Section 3.2 this now ends by converting the path representation to a red-black tree, cutting it like in tango trees, and then converting the red-black tree remaining into a valid path representation (as de\ufb01ned in the current section), all in \u0398(log log n) time. Concatenate: There is no change from Section 3.2, except that the \ufb01nal path representation produced is as de\ufb01ned in the current section. Extract: The change from Section 3.2 is that the \ufb01nal zip operation is not performed (the process stops at step (d) in Fig. 3), and that we use a split operation on red-black trees where all structural changes consist of rotations a","An O(log log n)-Competitive BST with Optimal Worst-Case Access Times 49 distance O(1) from the root (of the bottom tree, which is itself at a distance O(1) from the root of the zipper tree). In the full version of this paper [1], we describe such a split operation. Searching for the splitting point takes place incrementally as part of this operation. 5 Conclusion The main goal in this area of research is to improve on the currently best com- petitive ratio of O(log log n). Here we have been able to tighten other bounds, namely the worst-case search time, thereby broadening our understanding of competitive BSTs. One natural question is to what extent competitiveness is compatible with optimal balance maintenance. We have given a positive an- swer for O(log log n)-competitiveness. On the other hand, splay-trees [10] and GreedyFuture trees [2,9], the two BSTs conjectured to be dynamically optimal, do not guarantee optimal worst-case search time. Thus, even if dynamically op- timal trees should be proven to exist, the present result could still be a relevant alternative with optimal worst-case performance. References 1. Bose, P., Dou\u00efeb, K., Dujmovi\u0107, V., Fagerberg, R.: An O(loglog n)-competitive binary search tree with optimal worst-case access times, arXiv 1003.0139 (2010) 2. Demaine, E., Harmon, D., Iacono, J., Kane, D.M., Patrascu, M.: The geometry of binary search trees. In: Proc. of the 10th ACM-SIAM Symp. on Disc. Alg. (SODA), pp. 496\u2013505 (2009) 3. Demaine, E., Harmon, D., Iacono, J., P\u0103tra\u015fcu, M.: Dynamic optimality\u2014almost. SICOMP 37(1), 240\u2013251 (2007) 4. Derryberry, J., Sleator, D.D., Wang, C.C.: O(log log n)-competitive dynamic binary search trees. In: Proc. of the 7th ACM-SIAM Symp. on Disc. Alg. (SODA), pp. 374\u2013383 (2006) 5. Georgakopoulos, G.F.: Chain-splay trees, or, how to achieve and prove loglogn- competitiveness by splaying. Inf. Process. Lett. 106(1), 37\u201343 (2008) 6. Guibas, L.J., Sedgewick, R.: A dichromatic framework for balanced trees. In: Proc. of the 19th Found. of Comp. Sci. (FOCS), pp. 8\u201321 (1978) 7. Iacono, J.: Personal communication (July 2009) 8. Kujala, J., Elomaa, T.: Poketree: A dynamically competitive data structure with good worst-case performance. In: Asano, T. (ed.) ISAAC 2006. LNCS, vol. 4288, pp. 277\u2013288. Springer, Heidelberg (2006) 9. Munro, J.I.: On the competitiveness of linear search. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 338\u2013345. Springer, Heidelberg (2000) 10. Sleator, D.D., Tarjan, R.E.: Self-adjusting binary search trees. J. ACM 32(3), 652\u2013 686 (1985) 11. Wang, C.C.: Multi-Splay Trees. PhD thesis, Computer Science Department, School of Computer Science, Carnegie Mellon University (2006) 12. Wilber, R.: Lower bounds for accessing binary search trees with rotations. SICOMP 18(1), 56\u201367 (1989) 13. Woo, S.L.: Heterogeneous Decomposition of Degree-Balanced Search Trees and Its Applications. PhD thesis, Computer Science Department, School of Computer Science, Carnegie Mellon University (2009)","The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition Jie Gao and Dengpan Zhou Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA {jgao,dpzhou}@cs.sunysb.edu Abstract. A spanner graph on a set of points in Rd provides shortest paths be- tween any pair of points with lengths at most a constant factor of their Euclidean distance. A spanner with a sparse set of edges is thus a good candidate for net- work backbones, as in transportation networks and peer-to-peer network overlays. In this paper we investigate new models and aim to interpret why good spanners \u2018emerge\u2019 in reality, when they are clearly built in pieces by agents with their own interests and the construction is not coordinated. Our main result is to show that the following algorithm generates a (1 + \u03b5)-spanner with a linear number of edges. In our algorithm, the points build edges at an arbitrary order. A point p will only build an edge pq if there is no existing edge p q with p and q at distances no more than 1 \u00b7 |pq| from p, q respectively. Eventually when all points 4(1+1\/\u03b5) \ufb01nish checking edges to all other points, the resulted collection of edges forms a sparse spanner as desired. As a side product, the spanner construction implies a greedy algorithm for constructing linear-size well-separated pair decompositions that may be of interest on its own. Keywords: Spanner, Well-separated pair decomposition, Greedy algorithm. 1 Introduction A geometric graph G de\ufb01ned on a set of points P \u2286 Rd with all edges as straight line segments of weight equal to the length is called a Euclidean spanner, if for any two points p, q \u2208 P the shortest path in G has length at most s \u00b7 |pq| where |pq| is the Euclidean distance. The factor s is called the stretch factor of G and the graph G is called an s-spanner. Spanners with a sparse set of edges provide good approximations for the pairwise Euclidean distances and are good candidates for network backbones. Thus, there has been a lot of work on the construction of sparse Euclidean spanners in both the centralized [19,33] and distributed settings [34]. In this paper we are interested in the emergence of good Euclidean spanners formed by uncoordinated agents. Many real-world networks, such as the transportation network and the Internet backbone network, are good spanners \u2014 one can typically drive from any city to any other city in the U.S. with the total travel distance at most a small constant times their straight line distance. The same thing happens with the Internet backbone graph as well. However, these large networks are not owned or built by any single authority. They are often assembled with pieces built by different governments H. Kaplan (Ed.): SWAT 2010, LNCS 6139, pp. 50\u201361, 2010. c Springer-Verlag Berlin Heidelberg 2010","The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition 51 or different ISPs, at different points in time. Nevertheless altogether they provide a convenient sparse spanner. The work in this paper is motivated by this observation of the lack of coordination in reality and we would like to interpret why a good Euclidean spanner is able to \u2018emerge\u2019 from these agents incrementally. Prior work that attempt to remove centralized coordination has been done, as in the network creation game [21,15,26,4,31], \ufb01rst introduced by Fabrikant et al. [21] to un- derstand the evolution of network topologies maintained by sel\ufb01sh agents. A cost func- tion is assigned to each agent, capturing the cost paid to build connections to others minus the bene\ufb01t received due to the resulting network topology. The agents play a game by minimizing their individual costs and one is interested in the existence and the price of anarchy of Nash equilibria. Though being theoretically intriguing, there are two major open questions along this direction. First, the choice of cost functions is heuristic. Almost all past literatures use a unit cost for each edge and they deviate in how the bene\ufb01t of \u2018being connected to others\u2019 is modeled. There is little understanding on what cost function best captures the reality yet small variation in the cost function may result in big changes in the network topologies at Nash equilibria. There is also not much understanding of the topologies at Nash equilibria, some of them are simplistic topologies such as trees or complete graphs, that do not show up often in the real world. It remains open whether there is a natural cost model with which the Nash equilibrium is a sparse spanner. The game theoretic model also has limitations capturing the reality: sel\ufb01sh agents may face deadlines and have to decide on building an edge or not immediately; once an edge is built, it probably does not make sense to remove it (as in the case of road networks); an agent may not have the strategies of all other agents making the evalu- ation of the cost function dif\ufb01cult. In this paper, we take a different approach and ask whether there is any simple rule, with which each agent can determine on its own, and collectively build and maintain a sparse spanner topology without any necessity of co- ordination or negotiation. The simple rule serves as a \u2018certi\ufb01cate\u2019 of the sparse spanner property that warrants easy spanner maintenance under edge dynamics and node inser- tion. We believe such models and good algorithms under these models worth further exploration and this paper makes a \ufb01rst step along this line. Our contribution. We consider in this paper the following model that abstracts the scenarios explained earlier. There are n points in the plane. Each point represents a sep- arate agent and may consider to build edges from itself to other points. These decisions can happen at different points in time. When an agent p plans on an edge pq, p will only build it if whether there does not exist a \u2018nearby\u2019 edge p q in the network, where |pp | and |qq | are within 1 \u00b7 |p q | from p and q respectively. This strategy is very 4(1+1\/\u03b5) intuitive \u2014 if there is already a cross-country highway from Washington D.C. to San Francisco, it does not make economical sense to build a highway from New York to Los Angeles. We assume that each agent will eventually check on each possible edge from itself to all other points, but the order on who checks which edge can be completely arbitrary. With this strategy, the agents only make decisions with limited information and no agent has full control over how and what graph will be constructed. It is not obvious that this strategy will lead to a sparse spanner. It is not clear that the graph is even connected.","52 J. Gao and D. Zhou The main result in this paper is to prove that with the above strategy executed in any arbitrary order, the graph built at the end of the process is a sparse spanner: \u2013 The stretch factor of the spanner is 1 + \u03b5. \u2013 The number of edges is O(n). \u2013 The total edge length of the spanner is O(|MST| \u00b7 log \u03b1), where \u03b1 is the aspect ratio, i.e., the ratio of the distance between the furthest pair and the closest pair, and |MST| is the total edge length of the minimum spanning tree of the point set. \u2013 The degree of each point is O(log \u03b1) in the worst case and O(1) on average. To explain how this result is proved, we \ufb01rst obtain as a side product the following greedy algorithm for computing a well-separated pair decomposition. A pair of two sets of points, (A, B), is called s-well-separated if the smallest distance between any two points in A, B is at least s times greater than the diameters of A and B. An s- well-separated pair decomposition (s-WSPD for short) for P is a collection of s-well- separated pairs W = {(Ai, Bi)} such that for any pair of points p, q \u2208 P there is a pair (A, B) \u2208 W with p \u2208 A and q \u2208 B. The size of an s-WSPD is the number of point set pairs in W. Well-separated pair decomposition (WSPD) was \ufb01rst introduced by Calla- han and Kosaraju [12] and they developed algorithms for computing an s-WSPD with linear size for points in Rd. Since then WSPD has found many applications in comput- ing k-nearest neighbors, n-body potential \ufb01elds, geometric spanners and approximate minimum spanning trees [9,10,12,11,6,5,32,29,24,20]. So far there are two algorithms for computing optimal size WSPD, one in the original paper [12] and one in a later paper [22]. Both of them use a hierarchical organization of the points (e.g., the fair split tree in [12] and the discrete center hierarchy in [22]) and output the well-separated pairs in a recursive way. In this paper we show the following simple greedy algorithm also outputs an s-WSPD with linear size. We take an arbitrary pair of points p, q that is not yet covered in any existing well-separated pair, and consider the pair of subsets (Br(p), Br(q)) with r = |pq|\/(2s + 2) and Br(p) (Br(q)) as the set of points of P within distance r from p (q). Clearly (Br(p), Br(q)) is an s-well- separated pair and all the pairs of points (p , q ) with p \u2208 Br(p) and q \u2208 Br(q) are covered. The algorithm continues until all pairs of points are covered. We show that, no matter in which order the pairs are selected, the greedy algorithm will always output a linear number of well-separated pairs. Similarly, this greedy algorithm can be executed in an environment when coordination is not present, while the previous algorithms (in [12,22]) cannot. WSPD is deeply connected to geometric spanners. Any WSPD will generate a span- ner if one puts an edge between an arbitrary pair of points p, q from each well-separated pair (A, B) \u2208 W [6,5,32,29]. The number of edges in the spanner equals the size of W. In the other direction, the deformable spanner proposed in [22] implies a WSPD of linear size. The connection is further witnessed in this paper: our spanner emergence algorithm implies a WSPD generated in a greedy manner. Thus the well-separated pairs and spanner edges are one-to-one correspondence. Last, this paper focuses on the Euclidean case when the points are distributed in the plane. The basic idea extends naturally to points in higher dimensions as well as metrics with constant doubling dimensions [25] (thus making the results applicable in","The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition 53 non-Euclidean settings), as the main technique involves essentially various forms of geometric packing arguments. Applications. The results can be applied in maintaining nice network overlay topolo- gies for P2P \ufb01le sharing applications [30]. Such P2P overlay networks are often con- structed in a distributed manner without centralized control, to achieve robustness, reliability and scalability. One important issue is reducing routing delay by making the overlay topology aware of the underlying network topology [14,35,28,39,40]. But all these work are heuristics without any guarantee. A spanner graph would be a good solution for the overlay construction, yet there is no centralized authority in the P2P network that supervises the spanner construction and the peers may join or leave the network frequently. The work in this paper initiates the study of the emergence of good spanners in the setting when there is little coordination between the peers and the users only need a modest amount of incomplete information of the current overlay topology. We show that the spanner can be constructed under a proper model such that only O(n log \u03b1) messages need to be delivered. The spanner topology is implicitly stored on the nodes with each node\u2019s storage cost bounded by O(log \u03b1). With such partial information stored at each node, there is a local distributed algorithm that \ufb01nds a (1+\u03b5)- stretch path between any two nodes. We remark that the idea of the greedy spanner resembles, on an abstract level, the \u2018highway hierarchy\u2019 in transportation networks. It has been shown that to \ufb01nd a shortest path to a destination, one only needs to search for a \u2018highway entrance\u2019 within certain radius, and search only on the highways beyond that. This turns out to substantially re- duce the time to \ufb01nd shortest paths on such graphs [8,36]. Our work provides a possible explanation of how the road system evolved to the way it is today. We propose to verify this in our future work. Related work. In the vast amount of prior literature on geometric spanners, there are three main ideas: \u0398-graphs, the greedy spanners, and the WSPD-induced spanners [33]. Please refer to the book for a nice survey [33]. We will review two spanner construction ideas that are most related to our approach. The \ufb01rst idea is the path-greedy spanner construction [13,16,17,18]. All pairwise edges are ordered with non-decreasing lengths and checked in that order. An edge is included in the spanner if the shortest path in the current graph is longer than s times the Euclidean distance, and is discarded otherwise. Variants of this idea generate spanners with constant degree and total weight O(|MST|). This idea cannot be applied in our setting as edges constructed in practice may not be in non-decreasing order of their lengths. The second idea is to use the gap property [13] \u2014 the sources and sinks of any two edges in an edge set are separated by a distance at least proportional to the length of the shorter of the two edges and their directions are differed no more than a given angle. The gap-greedy algorithm [7] considers pairs of points, again, in order of non-decreasing distances, and includes an edge in the spanner if and only if it does not violate the gap property. The spanner generated this way has constant degree and total weight O(|MST|). Compared with our algorithm, our strategy is a relaxation of the gap property in the way that the edges in our spanner may have one of their endpoints arbitrarily close (or at the same points) and we have no restriction on the direction of the edges and the ordering of the edges to be considered. The proof for the gap greedy algorithm requires heavily plane geometry tools and our proof technique","54 J. Gao and D. Zhou only uses packing argument and can be extended to the general metric setting as long as a similar packing argument holds. To get these bene\ufb01t our algorithm has slightly worse upper bounds on the spanner weight by a logarithmic factor. Prior work on compact routing [37,23,3,27,2] usually implies a (1 + \u03b5)-spanner ex- plicitly or implicitly. Again, these spanners are constructed in a coordinated setting. 2 Uncoordinated Spanner Construction Algorithm Given n points in Rd, each point p will check whether an edge pq should be built. p builds pq only if there does not exist an edge p q such that p and q are within distance |p q | from p ,q respectively. 2(s+1) This incremental construction of edges is executed by different agents in a com- pletely uncoordinated manner. We assume that no two agents perform the above strategy at exactly the same time. Thus when any agent conducts the above process, the decision is based on the current network already constructed. The algorithm terminates when all agents \ufb01nish checking the edges from themselves to all other points. We \ufb01rst examine the properties of the constructed graph G by these uncoordinated behaviors. We will discuss in Section 4 a proper complexity model for the uncoordinated construction in a distributed environment and also bound the computing cost of this spanner. Before we proceed, we \ufb01rst realize the following invariant is maintained by the graph G. The proof follows immediately from the construction of G. Lemma 1. 1. For any edge pq that is not in G, there is another edge p q in G such that |pp | \u2264 |p q |\/(2s + 2), |qq | \u2264 |p q |\/(2s + 2). 2. For any two edges pq, p q in the constructed graph G, suppose that pq is built before p q , then one of the following is true: |pp | > |pq|\/(2s + 2) or |qq | > |pq|\/(2s + 2). To show that the algorithm eventually outputs a good spanner, we \ufb01rst show the con- nection of G with the notion of well-separated pair decomposition. De\ufb01nition 1 (Well-separated pair). Let t > 0 be a constant, and a pair of sets of points A, B are well-separated with respect to t (or t-separated), if d(A, B) \u2265 t \u00b7 max(diam(A), diam(B)), where diam(A) is the diameter of the point set A, and d(A, B) = min |pq|. p\u2208A,q\u2208B De\ufb01nition 2 (Well-separated pair decomposition). Let t > 0 be a constant,and P be a point set. A well-separated pair decomposition (WSPD) with respect to t of P is a set of pairs W = {{A1, B1}, . . . , {Am, Bm}}, such that 1. Ai, Bi \u2286 P , and the pair sets Ai and Bi are t-separated for every i. 2. For any pair of points p, q \u2208 P , there is at least one pair (Ai, Bi) such that p \u2208 Ai and q \u2208 Bi. Here m is called the size of the WSPD. It is not hard to see that the uncoordinated spanner is equivalent to the following greedy algorithm that computes an s-WSPD.","The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition 55 1. Choose an arbitrary pair (p, q), not yet covered by existing pairs in W. 2. Include the pair of point sets Br(p) and Br(q) in the WSPD W, with r = |pq|\/(2+ 2s), where Br(p) is the collection of points within distance r from point p. 3. Label the point pair (pi, qi) with pi \u2208 Br(p) and qi \u2208 Br(q) as being covered. 4. Repeat the above steps until every pair of points is covered. Clearly the above algorithm produces a WSPD, as each pair (Br(p), Br(q)) is well- separated and all pairs of points are covered. The spanner edge (p, q) is one-to-one correspondence to the well-separated pair (Br(p), Br(q)) in the above algorithm \u2014 the rule in Lemma 1 prevented two edges from the same well-separated pair in W to be constructed. Thus the number of edges in the spanner G is the same as the size of the greedy WSPD. It is already known that for any well-separated pair decomposition, if one edge is taken from each well-separated pair, then the edges will become a spanner on the original point set [6,5,32,29]. For our speci\ufb01c greedy s-WSPD, we are able to get a slightly better stretch. The proof is omitted due to space constraint. Theorem 1. From the greedy s-WSPD, one build a graph G that includes each pair (p, q) when it is selected by the greedy algorithm. Thus G is a spanner with stretch factor (s + 1)\/(s \u2212 1). To make the stretch factor as 1+\u03b5, we just take s = 1+2\/\u03b5 in our spanner construction. Next, we show that the greedy WSPD algorithm will output a linear number of well- separated pairs. 3 A Greedy Algorithm for Well-Separated Pair Decomposition We show the connection of the greedy WSPD with a speci\ufb01c WSPD constructed by the deformable spanner [22], in the way that at most a constant number of pairs in W is mapped to each well-separated pair constructed by the deformable spanner. To be consistent, the greedy WSPD is denoted by W and the WSPD constructed by the deformable spanner is denoted by W\u02c6 . Deformable spanner. Given a set of points P in the plane, a set of discrete centers with radius r is de\ufb01ned to be the maximal set S \u2286 P that satis\ufb01es the covering property and the separation property: any point p \u2208 P is within distance r to some point p \u2208 S; and every two points in S are of distance at least r away from each other. In other words, all the points in P can be covered by balls with radius r, whose centers are exactly those points in the discrete center set S. And these balls do not cover other discrete centers. We now de\ufb01ne a hierarchy of discrete centers in an recursive way. S0 is the original point set P . Si is the discrete center set of Si\u22121 with radius 2i. Without loss of gener- ality we assume that the closest pair has distance 1 (as we can scale the point set and do not change the combinatorial structure of the discrete center hierarchy). Thus the number of levels of the discrete center hierarchy is log \u03b1, where \u03b1 is the aspect ratio of the point set P , de\ufb01ned as the ratio of the maximum pairwise distance to the minimum pairwise distance, that is, \u03b1 = max |uv|\/ min |uv|. Since a point p may stay in multi- u,v\u2208P u,v\u2208P ple consecutive levels and correspond to multiple nodes in the discrete center hierarchy, we denote by p(i) the existence of p at level i. For each point p(i\u22121) \u2208 Si\u22121 on level","56 J. Gao and D. Zhou i, it is within distance 2i from at least one other point on level i + 1. Thus we assign to p(i\u22121) a parent q(i) in Si such that |p(i\u22121)q(i)| \u2264 2i. When there are multiple points in Si that cover p(i\u22121), we choose one as its parent arbitrarily. We denote by P (p(i\u22121)) the parent of p(i\u22121) on level i. We denote by P (i)(p) = P (P (i\u22121)(p)) the ancestor of p at level i. The deformable spanner is based on the hierarchy, with all edges between two points u and v in Si if |uv| \u2264 c \u00b7 2i, where c is a constant equal to 4 + 16\/\u03b5. We restate some important properties of the deformable spanner below. Lemma 2 (Packing Lemma [22]). In a point set S \u2286 Rd , if every two points are at least distance r away from each other, then there can be at most (2R\/r + 1)d points in S within any ball with radius R. Lemma 3 (Deformable spanner properties [22]). For a set of n points in Rd with aspect ration \u03b1, 1. For any point p \u2208 S0, its ancestor P (i)(p) \u2208 Si is of distance at most 2i+1 away from p. 2. Any point p \u2208 Si has at most (1 + 2c)d \u2212 1 edges with other points of Si. 3. The deformable spanner G\u02c6 is a (1 + \u03b5)-spanner G with O(n\/\u03b5d) edges. 4. G\u02c6 has total weight O(|MST| \u00b7 lg \u03b1\/\u03b5d+1), where |MST| is the weight of the minimal spanning tree of the point set S. As shown in [22], the deformable spanner implies a well-separated pair decomposition W\u02c6 by taking all the \u2018cousin pairs\u2019. Speci\ufb01cally, for a node p(i) on level i, we denote by Pi the collection of points that are descent of p(i) (including p(i) itself), called the decedents. Now we take the pair (Pi, Qi), the sets of decedents of a cousin pair p(i) and q(i), i.e., p(i) and q(i) are not neighbors in level i but their parents are neighbors in level i+ 1. This collection of pairs constitutes a 4 -well-separated pair decomposition. \u03b5 The size of W\u02c6 is bounded by the number of cousin pairs and is O(n\/\u03b5d). Size of greedy WSPD. The basic idea is to map the pairs in the greedy WSPD W to the pairs in W\u02c6 and show that at most a constant number of pairs in W map to the same pair in W\u02c6 . Theorem 2. The greedy s-WSPD W has size O(nsd). Proof. Choose c = 4(s + 1) (or, s = c\/4 \u2212 1) in the deformable spanner DS. The size of W\u02c6 is O(nsd). Now we will construct a map from W to W\u02c6 . Each pair {P, Q} in W is created by considering the points inside the balls Br(p), Br(q) with radius r = |pq|\/(2 + 2s) around p, q. Now we consider the ancestors of p, q in the spanner DS respectively. There is a unique level i such that the ancestor ui = P (i)(p) and vi = P (i)(q) do not have a spanner edge in between but the ancestor ui+1 = P (i+1)(p) and vi+1 = P (i+1)(q) have an edge in between. The pair ui, vi is a cousin pair by de\ufb01nition and thus the decedents of them correspond to an s-well-separated pair in W\u02c6 . We say that the pair (Br(p), Br(q)) \u2208 W maps to the descendant pair (Pi, Qi) \u2208 W\u02c6 . By the discrete center hierarchy (Lemma 3), we show that, |pq| \u2265 |uivi| \u2212 |pui| \u2212 |qvi| \u2265 |uivi| \u2212 2 \u00b7 2i+1 \u2265 (c \u2212 4) \u00b7 2i.","The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition 57 The last inequality follows from that fact that ui, vi do not have an edge in the spanner and |uivi| > c \u00b7 2i. On the other hand, |pq| \u2264 |pui+1| + |ui+1vi+1| + |qvi+1| \u2264 2 \u00b7 2i+2 + c \u00b7 2i+1 = 2(c + 4) \u00b7 2i. The last inequality follows from the fact that ui+1, vi+1 have an edge in the spanner and |ui+1vi+1| \u2264 c \u00b7 2i+1. Similarly, we have c \u00b7 2i < |uivi| \u2264 |uiui+1| + |ui+1vi+1| + |vivi+1| \u2264 2 \u00b7 2i+1 + c \u00b7 2i+1 = 2(c + 2) \u00b7 2i. Therefore the distance between p and q is c \u00b7 |uivi|, where (c \u2212 4)\/(2c + 4) \u2264 c \u2264 (2c + 8)\/c. Now suppose two pair (Br1 (p1), Br1 (q1)), (Br2 (p2), Br2 (q2)) in W map to the same pair ui and vi by the above process. Without loss of generality suppose that p1, q1 are selected before p2, q2 in our greedy algorithm. Here is the observation: 1. |p1q1| = c1 \u00b7 |uivi|, |p2q2| = c2 \u00b7 |uivi|, r1 = |p1q1|\/(2 + 2s) = c1 \u00b7 |uivi|\/(2 + 2s), r2 = c2 \u00b7 |uivi|\/(2 + 2s), where (c \u2212 4)\/(2c + 4) \u2264 c1, c2 \u2264 (2c + 8)\/c, and r1, r2 are the radius of the balls for the two pairs respectively. 2. The reason that (p2, q2) can be selected in our greedy algorithm is that at least one of p2 or q2 is outside the balls B(p1), B(q1), by Lemma 1. This says that at least one of p2 or q2 is of distance r1 away from p1, q1. Now we look at all the pairs (p , q ) that are mapped to the same ancestor pair (ui, vi). The pairs are ordered in the same order as they are constructed, i.e., p1, q1 is the \ufb01rst pair selected in the greedy WSPD algorithm. Suppose rmin is the minimum among all radius ri. rmin \u2265 c\/(2c + 8) \u00b7 |uivi|\/(2 + 2s) = |uivi|\/(4s + 8). We group these pairs in the following way. The \ufb01rst group H1 contains (p1, q1) and all the pairs (p , q ) that have p within distance rmin\/2 from p1. We say that (p1, q1) is the representative pair in H1 and the other pairs in H1 are close to the pair (p1, q1). The second group H2 contains, among all remaining pairs, the pair that was selected in the greedy algorithm the earliest, and all the pairs that are close to it. We repeat this process to group all the pairs into k groups, H1, H2, \u00b7 \u00b7 \u00b7 , Hk. For all the pairs in each group Hj, we have one representative pair, denoted by (pj, qj) and the rest of the pairs in this group are close to it. We \ufb01rst bound the number of pairs belonging to each group by a constant with a pack argument. With our group criteria and the above observations, all p in the group Hj are within radius rmin away from each other. This means that the q \u2019s must be far away \u2014 the q \u2019s must be at least distance rmin away from each other, by Lemma 1. On the other hand, all the q \u2019s are descendant of the node vi, so |viq | \u2264 2i+1 by Theorem 3. That is, all the q \u2019s are within a ball of radius 2i+1 centered at vi. By the packing Lemma 2, the number of such q \u2019s is at most (2 \u00b7 2i+1\/rmin + 1)d \u2264 (2 \u00b7 2i+1(4s + 8)\/|uivi| + 1)d \u2264 (4(s + 2)\/(s + 1) + 1)d. This is also the bound on the number of pairs inside each group. Now we bound the number of different groups, i.e., the value k. For the repre- sentative pairs of the k groups, (p1, q1), (p2, q2), \u00b7 \u00b7 \u00b7 , (pk, qk), all the pi\u2019s must be at least distance rmin\/2 away from each other. Again these pi\u2019s are all descendant of ui","58 J. Gao and D. Zhou and thus are within distance 2i+1 from ui. By a similar packing argument, the num- ber of such pi\u2019s is bounded by (4 \u00b7 2i+1\/rmin + 1)d \u2264 (8(s + 2)\/(s + 1) + 1)d. So the total number of pairs mapped to the same ancestor pair in W\u02c6 will be at most (4(s + 2)\/(s + 1) + 1)d \u00b7 (8(s + 2)\/(s + 1) + 1)d = (O(1 + 1\/s))d. Thus the total number of pairs in W is at most O(nsd). This \ufb01nishes the proof. With the connection of the greedy WSPD with the uncoordinated spanner construction in Section 2, we immediately get the following theorem (with proofs omitted). Theorem 3. The uncoordinated spanner with parameter s is a spanner with stretch factor (s+1)\/(s\u22121) and has O(nsd) number of edges, a maximal degree of O(lg \u03b1\u00b7sd), average degree O(sd), and total weight O(lg \u03b1 \u00b7 |MST| \u00b7 sd+1). 4 Spanner Construction and Applications The uncoordinated spanner construction can be applied for peer-to-peer system design, to allow users to maintain a spanner in a distributed manner. For that, we will \ufb01rst extend our spanner results to a metric with constant doubling dimension. The doubling dimension of a metric space (X, d) is the smallest value \u03b3 such that each ball of radius R can be covered by at most 2\u03b3 balls of radius R\/2 [25]. Theorem 4. For n points and a metric space de\ufb01ned on them with constant doubling dimension \u03b3, the uncoordinated spanner construction outputs a spanner G with stretch factor (s + 1)\/(s \u2212 1), has total weight O(\u03b32 \u00b7 lg \u03b1 \u00b7 |M ST | \u00b7 sO(\u03b3)) and has O(\u03b32 \u00b7 n \u00b7 sO(\u03b3)) number of edges. Also it has a maximal degree of O(\u03b3 \u00b7 lg \u03b1 \u00b7 sO(\u03b3)) and average degree O(\u03b3 \u00b7 sO(\u03b3)). Distributed construction. Now we would like to discuss the model of computing for P2P overlay design as well as the construction cost of the uncoordinated spanner. We assume that there is already a mechanism maintained in the system such that any node x can obtain the distance to any node y in O(1) time. For example, this can be done by a TRACEROUTE command executed by x to the node y. We also assume that there is a service answering near neighbor queries: given a node p and a distance r, return the neighbors within distance r from p. Such an oracle is often maintained in a dis- tributed \ufb01le sharing system. Various structured P2P system support such function with low cost [30]. Even in unstructured system such as BitTorrent, the Ono plugin is effec- tive at locating nearby peers, with vanishingly small overheads [1]. The spanner edges are recorded in a distributed fashion so that no node has the entire picture of the spanner topology. After each edge pq is constructed, the peers p, q will inform their neighboring nodes (those in Br(p) and Br(q) with r = |pq|\/(2s + 2)) that such an edge pq exists so that they will not try to connect to one another. We assume that these messages are delivered immediately so that when any newly built edge is informed to nodes of relevance. The number of messages for this operation is bounded by |Br(p)| + |Br(q)|. The amount of storage at each node x is proportional to the number of well-separated pairs that include x. The following theorem bounds the total number of such messages during the execution of the algorithm and the amount of storage at each node.","The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition 59 Theorem 5. For the uncoordinated spanner G and the corresponding greedy WSPD W = {(Pi, Qi)} with size m, each node x is included in at most O(sd lg \u03b1) well- separated pairs in W. Thus, im=1(|Pi| + |Qi|) = O(nsd \u00b7 lg \u03b1). Distributed routing. Although the spanner topology is implicitly stored on the nodes with each node only knows some piece of it, we are actually able to do a distributed and local routing on the spanner with only information available at the nodes such that the path discovered has maximum stretch (s + 1)\/(s \u2212 1). In particular, for any node p who has a message to send to node q, it is guaranteed that (p, q) is covered by a well- separated pair (Br(p ), Br(q )) with p \u2208 Br(p ) and q \u2208 Br(q ). By the construction algorithm, the edge p q , after constructed, is informed to all nodes in Br(p ) \u222a Br(q ), including p. Thus p includes in the packet a partial route with {p p , p q , q q}. The notation p p means that p will need to \ufb01rst \ufb01nd out the low-stretch path from p to the node p (inductively), from where the edge p q can be taken, such that with another low-stretch path to be found out from q to q, the message can be delivered to q. This way of routing with partial routing information stored with the packet is similar to the idea of source routing [38] except that we do not include the full routing path at the source node. By the same induction as used in the proof of spanner stretch (Theorem 1), the \ufb01nal path is going to have stretch at most (s + 1)\/(s \u2212 1). Nearest neighbor search. We remark that with the spanner each node can easily \ufb01nd its nearest neighbor. Recall that each point x keeps all the pairs (p, q) that create a \u2018dumb-bell\u2019 pair set covering x. Then we claim, among all these p, one of them must be the nearest neighbor of x. Otherwise, suppose y is the nearest neighbor of x, and y is not one of p. But in the WSPD, (x, y) will belong to one of the pair set (Pi, Qi), which correspond to a pair (p , q ). Then there is a contradiction, as |xp | < |xy| implies that y is not the nearest neighbor of x. Thus one\u2019s nearest neighbor is locally stored at this node already. According to Theorem 5, x will belong to at most O(sd lg \u03b1) different pair sets. So the nearest neighbor search can be \ufb01nished in O(sd lg \u03b1) time by using just the local information. References 1. http:\/\/www.aqualab.cs.northwestern.edu\/projects\/Ono.html 2. Abraham, I., Gavoille, C., Goldberg, A.V., Malkhi, D.: Routing in networks with low dou- bling dimension. In: Proc. of the 26th International Conference on Distributed Computing Systems (ICDCS) (July 2006) 3. Abraham, I., Malkhi, D.: Compact routing on euclidian metrics. In: PODC \u201904: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, pp. 141\u2013 149. ACM, New York (2004) 4. Albers, S., Eilts, S., Even-Dar, E., Mansour, Y., Roditty, L.: On nash equilibria for a network creation game. In: SODA \u201906: Proceedings of the seventeenth annual ACM-SIAM sympo- sium on Discrete algorithm, pp. 89\u201398. ACM, New York (2006) 5. Arya, S., Das, G., Mount, D.M., Salowe, J.S., Smid, M.: Euclidean spanners: short, thin, and lanky. In: Proc. 27th ACM Symposium on Theory Computing, pp. 489\u2013498 (1995) 6. Arya, S., Mount, D.M., Smid, M.: Randomized and deterministic algorithms for geometric spanners of small diameter. In: Proc. 35th IEEE Symposium on Foundations of Computer Science, pp. 703\u2013712 (1994)","60 J. Gao and D. Zhou 7. Arya, S., Smid, M.: Ef\ufb01cient construction of a bounded-degree spanner with low weight. Algorithmica 17, 33\u201354 (1997) 8. Bast, H., Funke, S., Matijevic, D., Sanders, P., Schultes, D.: In transit to constant time shortest-path queries in road networks. In: Applegate, D., Brodal, G. (eds.) 9th Workshop on Algorithm Enginneering and Experiments (ALENEX\u201907), New Orleans, USA, pp. 46\u2013 59. SIAM, Philadelphia (2007) 9. Callahan, Kosaraju: Faster algorithms for some geometric graph problems in higher dimen- sions. In: Proc. 4th ACM-SIAM Symposium on Discrete Algorithms, pp. 291\u2013300 (1993) 10. Callahan, P.B.: Optimal parallel all-nearest-neighbors using the well-separated pair decom- position. In: Proc. 34th IEEE Symposium on Foundations of Computer Science, pp. 332\u2013340 (1993) 11. Callahan, P.B., Kosaraju, S.R.: Algorithms for dynamic closest-pair and n-body potential \ufb01elds. In: Proc. 6th ACM-SIAM Symposium on Discrete Algorithms, pp. 263\u2013272 (1995) 12. Callahan, P.B., Kosaraju, S.R.: A decomposition of multidimensional point sets with appli- cations to k-nearest-neighbors and n-body potential \ufb01elds. J. ACM 42, 67\u201390 (1995) 13. Chandra, B., Das, G., Narasimhan, G., Soares, J.: New sparseness results on graph spanners. Internat. J. Comput. Geom. Appl. 5, 125\u2013144 (1995) 14. Chu, Y., Rao, S., Seshan, S., Zhang, H.: Enabling conferencing applications on the internet using an overlay muilticast architecture. SIGCOMM Comput. Commun. Rev. 31(4), 55\u201367 (2001) 15. Corbo, J., Parkes, D.: The price of sel\ufb01sh behavior in bilateral network formation. In: PODC \u201905: Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing, pp. 99\u2013107. ACM, New York (2005) 16. Das, G., Heffernan, P., Narasimhan, G.: Optimally sparse spanners in 3-dimensional Eu- clidean space. In: Proc. 9th Annu. ACM Sympos. Comput. Geom., pp. 53\u201362 (1993) 17. Das, G., Narasimhan, G.: A fast algorithm for constructing sparse Euclidean spanners. Inter- nat. J. Comput. Geom. Appl. 7, 297\u2013315 (1997) 18. Das, G., Narasimhan, G., Salowe, J.: A new way to weigh malnourished Euclidean graphs. In: Proc. 6th ACM-SIAM Sympos. Discrete Algorithms, pp. 215\u2013222 (1995) 19. Eppstein, D.: Spanning trees and spanners. In: Sack, J.-R., Urrutia, J. (eds.) Handbook of Computational Geometry, pp. 425\u2013461. Elsevier Science Publishers B.V., North-Holland (2000) 20. Erickson, J.: Dense point sets have sparse Delaunay triangulations. In: Proc. 13th ACM- SIAM Symposium on Discrete Algorithms, pp. 125\u2013134 (2002) 21. Fabrikant, A., Luthra, A., Maneva, E., Papadimitriou, C.H., Shenker, S.: On a network cre- ation game. In: PODC \u201903: Proceedings of the twenty-second annual symposium on Princi- ples of distributed computing, pp. 347\u2013351 (2003) 22. Gao, J., Guibas, L., Nguyen, A.: Deformable spanners and their applications. Computational Geometry: Theory and Applications 35(1-2), 2\u201319 (2006) 23. Gottlieb, L.-A., Roditty, L.: Improved algorithms for fully dynamic geometric spanners and geometric routing. In: SODA \u201908: Proceedings of the nineteenth annual ACM-SIAM sym- posium on Discrete algorithms, Philadelphia, PA, USA, pp. 591\u2013600. Society for Industrial and Applied Mathematics (2008) 24. Gudmundsson, J., Levcopoulos, C., Narasimhan, G., Smid, M.: Approximate distance ora- cles for geometric graphs. In: Proc. 13th ACM-SIAM Symposium on Discrete Algorithms, pp. 828\u2013837 (2002) 25. Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion em- beddings. In: FOCS \u201903: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pp. 534\u2013543 (2003)","The Emergence of Sparse Spanners and Greedy Well-Separated Pair Decomposition 61 26. Jansen, T., Theile, M.: Stability in the self-organized evolution of networks. In: GECCO \u201907: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pp. 931\u2013938. ACM, New York (2007) 27. Konjevod, G., Richa, A.W., Xia, D.: Optimal-stretch name-independent compact routing in doubling metrics. In: PODC \u201906: Proceedings of the twenty-\ufb01fth annual ACM symposium on Principles of distributed computing, pp. 198\u2013207 (2006) 28. Kwon, M., Fahmy, S.: Topology-aware overlay networks for group communication. In: NOSSDAV \u201902: Proceedings of the 12th international workshop on Network and operating systems support for digital audio and video, pp. 127\u2013136. ACM, New York (2002) 29. Levcopoulos, C., Narasimhan, G., Smid, M.H.M.: Improved algorithms for constructing fault-tolerant spanners. Algorithmica 32(1), 144\u2013156 (2002) 30. Lua, K., Crowcroft, J., Pias, M., Sharma, R., Lim, S.: A survey and comparison of peer-to- peer overlay network schemes. IEEE Communications Surveys & Tutorials, 72\u201393 (2005) 31. Moscibroda, T., Schmid, S., Wattenhofer, R.: On the topologies formed by sel\ufb01sh peers. In: PODC \u201906: Proceedings of the twenty-\ufb01fth annual ACM symposium on Principles of distributed computing, pp. 133\u2013142. ACM, New York (2006) 32. Narasimhan, G., Smid, M.: Approximating the stretch factor of Euclidean graphs. SIAM J. Comput. 30, 978\u2013989 (2000) 33. Narasimhan, G., Smid, M.: Geometric Spanner Networks. Cambridge University Press, Cam- bridge (2007) 34. Peleg, D.: Distributed Computing: A Locality Sensitive Approach. In: Monographs on Dis- crete Mathematics and Applications. SIAM, Philadelphia (2000) 35. Ratnasamy, S., Handley, M., Karp, R., Shenker, S.: Topologically-aware overlay construc- tion and server selection. In: Proceedings of the 21th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM\u201905), vol. 3, pp. 1190\u20131199 (2002) 36. Sanders, P., Schultes, D.: Engineering highway hierarchies. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 804\u2013816. Springer, Heidelberg (2006) 37. Slivkins, A.: Distance estimation and object location via rings of neighbors. In: PODC \u201905: Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed com- puting, pp. 41\u201350 (2005) 38. Tanenbaum, A.S.: Computer networks (3rd ed.). Prentice-Hall, Inc., Upper Saddle River (1996) 39. Wang, W., Jin, C., Jamin, S.: Network overlay construction under limited end-to-end reach- ability. In: Proceedings of the 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM\u201905), March 2005, vol. 3, pp. 2124\u20132134 (2005) 40. Zhang, X., Li, Z., Wang, Y.: A distributed topology-aware overlays construction algorithm. In: MG \u201908: Proceedings of the 15th ACM Mardi Gras conference, pp. 1\u20136. ACM, New York (2008)","A Bottom-Up Method and Fast Algorithms for max independent set Nicolas Bourgeois1, Bruno Esco\ufb03er1, Vangelis Th. Paschos1, and Johan M.M. van Rooij2 1 LAMSADE, CNRS FRE 3234 and Universit\u00e9 Paris-Dauphine, France {bourgeois,escoffier,paschos}@lamsade.dauphine.fr 2 Department of Information and Computing Sciences Universiteit Utrecht, The Netherlands [email protected] Abstract. We \ufb01rst propose a new method, called \u201cbottom-up method\u201d, that, informally, \u201cpropagates\u201d improvement of the worst-case complexity for \u201csparse\u201d instances to \u201cdenser\u201d ones and we show an easy though non- trivial application of it to the min set cover problem. We then tackle max independent set. Following the bottom-up method we propagate improvements of worst-case complexity from graphs of average degree d to graphs of average degree greater than d. Indeed, using algorithms for max independent set in graphs of average degree 3, we tackle max independent set in graphs of average degree 4, 5 and 6. Then, we com- bine the bottom-up technique with measure and conquer techniques to get improved running times for graphs of maximum degree 4, 5 and 6 but also for general graphs. The best computation bounds obtained for max independent set are O\u2217(1.1571n), O\u2217(1.1918n) and O\u2217(1.2071n), for graphs of maximum (or more generally average) degree 4, 5 and 6 respectively, and O\u2217(1.2127n) for general graphs. These results improve upon the best known polynomial space results for these cases. Keywords: Bottom-Up Method, Max Independent Set, Exact Algorithms. 1 Introduction Very active research has been recently conducted around the development of exact algorithms for NP-hard problems with non-trivial worst-case complexity (see the seminal paper [10] for a survey on both methods used and results ob- tained). Among the problems studied in this \ufb01eld, max independent set (and particular versions of it) is one of those that have received a very particular attention and mobilized numerous researchers. Here, we propose in Section 2 a generic method that propagates improvements of worst-case complexity from \u201csparse\u201d instances to \u201cdenser\u201d (less sparse) ones, where the density of an instance is proper to the problem handled and refers Research supported by the French Agency for Research under the DEFIS program TODO, ANR-09-EMER-010. H. Kaplan (Ed.): SWAT 2010, LNCS 6139, pp. 62\u201373, 2010. c Springer-Verlag Berlin Heidelberg 2010","A Bottom-Up Method and Fast Algorithms for max independent set 63 to the average value of some parameter of its instance. We call this method \u201cbottom-up method\u201d. The basic idea here has two ingredients: (i) the choice of the recursive measure of the instance and (ii) a way to ensure that on \u201cdenser\u201d instances, a good branching (wrt. the chosen measure) occurs. We then illustrate our method to min set cover. Given a \ufb01nite ground set U and a set-system S over U, min set cover consists of determining a minimum- size subsystem S covering U. Here, the density of an instance is de\ufb01ned to be the average cardinality of the sets in the set-system S. Application of the method to min set cover is rather direct but it produces quite interesting results. As we show in Section 2 it outperforms the results of [9] in instances with average-set sizes 6, 7, 8, . . . Note that we are not aware of results better than those given here. We next handle the max independent set problem. Given a graph G = (V, E), max independent set consists of \ufb01nding a maximum-size subset V \u2286 V such that for any (vi, vj) \u2208 V \u00d7V , (vi, vj) \u2208\/ E. For this problem, [4] proposes an algorithm with worst-case complexity bound O\u2217(1.2201n)1. All the results we present here are polynomial space algorithms. We also quote the O(1.2108n) time bound in [7] using exponential space (claimed to be improved down to O(1.1889n) in the technical report [8], still using exponential space). Dealing with max independent set in graphs of maximum degree 3, faster and faster algorithms have been devised for optimally solving this problem. Let us quote the recent O\u2217(1.0892n) time algorithm in [6], and the O\u2217(1.0854n) time algorithm by the authors of the article at hand [3]. For max independent set density of a graph is measured by its average degree. So, the bottom-up method here extends improvements of the worst-case complexity in graphs of average degree d to graphs of average degree greater than d. In order to informally sketch our bottom-up method in the case of max in- dependent set, suppose that one knows how to solve the problem on graphs with average degree d in time O\u2217(\u03b3dn). Solving the problem on graphs with av- erage degree d > d is based upon two ideas: we \ufb01rst look for a running time expression of the form \u03b1m\u03b2n, where \u03b1 and \u03b2 depend both on the input graph (namely on its average degree), and on the value \u03b3d (see Section 2). In other words, the form of the running time we seek is parameterized by what we al- ready know on graphs with smaller average degrees. Next, according to this form, we identify particular values di (not necessarily integer) on the average degree that ensure that a \u201cgood\u201d branching occurs. This allows us to determine a good running time for increasing values of the average degree. Note also that a particular interest of this method lies in the fact that any improvement on the worst-case complexity on graphs of average degree 3 immediately yields im- provements for higher average degrees. A direct application of this method leads for instance for max independent set in graphs with average degree 4 to an upper complexity bound that already slightly outperforms the best known time 1 In a very recent article [5] a O\u2217(1.2132n) time algorithm for max independent set is proposed. Plugging this new result allows to further improve our results. We give the corresponding bounds at the end of Section 4.","64 B. Nicolas et al. bound of O\u2217(1.1713n) of [1] (Section 2). This result is further improved down to O\u2217(1.1571n) (Section 3) with a more re\ufb01ned case analysis. We also provide bounds for (connected) graphs with average degree at most 5 and 6. In section 4, we combine measure and conquer with bottom-up to show that max independent set can be optimally solved in time O\u2217(1.2127n) in gen- eral graphs, thus improving the O\u2217(1.2201n) bound [4]. Furthermore, in graphs of maximum degree at most 5 and 6, we provide time bounds of O\u2217(1.1918n) and O\u2217(1.2071n), respectively, that improve upon the respective O\u2217(1.2023n) and O\u2217(1.2172n) time bounds of [4]2. We give the results obtained using the O\u2217(1.0854n) time bound of [3] for graphs of degree 3. Note that using previously known bounds for solving max independent set in graphs of degree 3 (worse than O\u2217(1.0854n)), the bottom- up method would also lead to improved results (with respect to those known in the literature). We illustrate this point in Table 1. Table 1. max independent set results for graphs of degree 4, 5, 6 and general graphs with starting points several complexity bounds for graphs of degree 3 Degree 3 Degree 4 Degree 5 Degree 6 General graphs 1.08537 1.1571 1.1918 1.2071 1.2127 1.0892 [6] 1.1594 1.1932 1.2082 1.2135 1.0977 [2] 1.1655 1.198 1.213 1.217 Maybe more interesting than the improvements themselves is the fact that they are obtained via an original method that, once some, possibly long, case analysis has been performed on instances of small density, it is directly applicable for getting results higher density instances, even for general ones. We think that this method deserves further attention and insight, since it might be used to solve also other problems. Throughout this paper, we will denote by N (u) and N [u] the neighborhood and the closed neighborhood of u, respectively (N (u) = {v \u2208 V : (u, v) \u2208 E} and N [u] = N (u) \u222a {u}). 2 The Bottom-Up Method In this section we present the method that relies on the following two stepping stones: (i) the choice of the recursive complexity measure, applied in a very sim- ple way in Section 2.1 to the min set cover to get non trivial time bounds for instances of bounded average set-cardinalities, and (ii) a way to ensure that on instances of density greater that d, a good branching (wrt. the chosen com- plexity measure) occurs. This second point is illustrated in Section 2.2 for max independent set. 2 The bound in graphs of degree at most d is obtained as O\u2217(1.2201nwd ), where wd is the weight associated to vertices of degree d in the measure and conquer analysis in [4]; better bounds could maybe be achieved with this method, but this is not straightforward, since reduction rules may create vertices of arbitrarily large degree.","A Bottom-Up Method and Fast Algorithms for max independent set 65 2.1 The Importance of Recursive Complexity Measure: The Case of min set cover Let us consider the min set cover problem with ground set U = {x1, \u00b7 \u00b7 \u00b7 , xn} and set system S = {S1, \u00b7 \u00b7 \u00b7 , Sm}. We show that the bottom-up method easily applies to get algorithm with improved time bounds for instance with sets of m average (or maximum) size d, for any d \u2265 5. In what follows p = i=1 |Si|. Lemma 1. The algorithm in [9] solves min set cover in time resp. O\u2217(1.55m), O\u2217(1.63m), O\u2217(1.71m), O\u2217(1.79m) in instances with sets of average size 5, 6, 7 and 8. Proof. Denoting by ni the number of sets of size i and mj the number of elements of frequency j, [9] gives an algorithm working in time O\u2217(1.2302k(I)) where k(I) = i\u22651 wini+ j\u22651 vj mj. Here wi is the weight associated to a set of size i, and vj is the weight associated to an element of frequency j. It is easy to see that, by convexity3, if the average size of sets is an integer d, then i\u22651 wini \u2264 mwd. Moreover, note that vj\/j \u2264 v3\/3 for all j \u2265 14, hence j\u22651 vjmj \u2264 v3\/3 j\u22651 jmj = dmv3\/3. We get k(I) \u2264 m(wd + v3d\/3) (this bound being tight if all sets have size d and all elements have frequency 3). Let us consider an instance with p > dm. The bottom-up method assumes that we know how to solve the problem in instances with sets of average size d in time O\u2217(\u03b3dm). It seeks a complexity of the form O\u2217(\u03b3dmyp\u2212dm); indeed, it is valid by hypothesis for p = dm, ie. on instances with sets of average size d. Let us consider a set S of size s \u2265 d + 1. We branch on S. If we do not take it, we remove one set and s \u2265 d + 1 edges; if we take it, suppose that each element in S has frequency at least 3. Then we remove 1 set and (at least) 3s \u2265 3(d+1) edges. Hence, for the complexity to be valid we have to choose y such t1h\u2265at\u03b3\u03b3d\u2212dm1yyp\u2212\u22121d+m \u03b3\u2265d\u22121\u03b3ydm\u2212\u2212(21dy+p\u22123)(.d+1)\u2212d(m\u22121) + \u03b3dm\u22121yp\u22123(d+1)\u2212d(m\u22121), or equivalently Taking for instance \u03b35 = 1.55 (Lemma 1), this is true for y = 1.043. Let us check the other cases: \u2013 If there is an element j of frequency 1 in S, we have to take S and we remove one set and s \u2265 d + 1 edges. This does not create any problem as long as \u03b3dm\u22121y(p\u2212d\u22121)\u2212d(m\u22121) \u2264 \u03b3dmyp\u2212dm, i.e., y \u2264 \u03b3d. \u2013 Otherwise, if there is an element j of frequency 2 which is in S and S , then either we take S and remove 1 set and at least 2(d + 1) edges, or we remove S and take S , and we remove 2 sets and at least d + 2 edges. So we have to check that the value of y computed in the general case veri\ufb01es 1 \u2265 \u03b3d\u22121y\u2212(d+2) + \u03b3d\u22122y\u22122+d. Then if sets have average size d + 1, since p = (d + 1)m, the complexity is O\u2217(\u03b3dm+1) with \u03b3d+1 = \u03b3d \u00d7 y. Starting from \u03b35 = 1.55, the recurrences give \u03b36 = 1.61, \u03b37 = 1.66 (with y = 1.031) and \u03b38 = 1.70 (with y = 1.024). 3 wi\u2019s are (0,0.3755,0.7509,0.9058,0.9720,0.9982) for i = 1, \u00b7 \u00b7 \u00b7 , 6 and 1 for i \u2265 7. 4 vj \u2019s are (0, 0.2195, 0.6714, 0.8766, 0.9569, 0.9882) for i = 1, \u00b7 \u00b7 \u00b7 , 6 and 1 for i \u2265 7.","66 B. Nicolas et al. Theorem 1. min set cover is solvable in times O\u2217(1.61m), O\u2217(1.66m) and O\u2217(1.70m) in instances with sets of average size 6, 7 and 8, respectively. Interestingly enough, the time bound of O\u2217(1.2302m(wd+v3d\/3)) obtained in Lem- ma 1 is bigger than 2m for d \u2265 11 while using the previous method, it can be shown that \u03b3d < 2 for any d (for instance \u03b3100 < 1.98). Of course, the analysis conducted in [9] is not oriented towards the bounded size case, so better results might be obtained; we will say a few fords in conclusion on the links between this complexity measure and the one of measure and conquer. 2.2 Making a Good Branching: The Case of max independent set In order to use e\ufb03ciently the previous complexity measure for max indepen- dent set, we prove that, in a graph whose average degree is bounded from below by some constant (possibly not an integer), we are sure that there ex- ists a rather dense local con\ufb01guration we can branch on. More precisely, if the average degree is greater than d, this implies that we can \ufb01nd a vertex v with at least f (d) edges incident to some neighbor of v, for some increasing func- tion f . Let us give an example. In the independent set problem, there are two well known reductions rules that allow to remove without branching vertices of degree at most 25. In the following we will assume that these reductions rules has been performed, i.e., the graph does not contain any vertex of degree at most 2. Then, trivially, if the graph has average degree greater than d \u2208 N, we know there exists a vertex v of degree at least d + 1. If we assume that no vertex is dominated6, then there exist at least f (d) = 2(d + 1) + (d + 1)\/2 edges incident to some neighbor of v. Indeed, there exist d + 1 edges incident to v, d + 1 edges between a neighbor of v and a vertex not neighbor of v (one for each neighbor of v, to avoid domination) and, since each vertex has degree at least 3, at least (3(d + 1) \u2212 2(d + 1))\/2 = (d + 1)\/2 other edges. Note that such relationships may be established even if d is non integer. For instance, we will see that if d > 24\/7, then there exists a vertex of degree 5 or two adjacent vertices having degree 4, leading to f (d) = 11. This property linking the average degree to the quality of the branching is given in Lemma 2. Then, for a given d, either the average degree is greater than d, and we can make an e\ufb03cient branching (i.e., a branching that induces a recurrence relation leading to a lower time-bound), or it is not and we can use an algorithm tailored for low-degree graphs. Thus, Lemma 2 \ufb01xes a set of critical degrees (di) and we de\ufb01ne step-by-step (from the smallest to the highest) algorithms STABLE(di), that work on graphs of average degree di or less. With this lemma, we analyse the running time of these algorithms thanks to a measure allowing to fruitfully use the existence of the dense local con\ufb01gurations mentioned above. As for min 5 A vertex of degree at most 1 should be taken in the solution. If a vertex v has two neigbors u1 and u2, take v if u1 and u2 are adjacent, otherwise remove v, u1, u2 from the graph and add a new vertex u12 whose neighborhood is N (u1) \u222a N (u2) \\\\ {v} (this reduction is called vertex folding, see for instance [4]). 6 u dominates v if N [u] \u2286 N [v]. In this case, it is never interesting to take v.","","","","","","","","","","","","","","","","","","",""]


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook