Data Structures And Algorithms | 2019-04-10

in #algorithms5 years ago

Data Structures And Algorithms


String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure (1904.04228v1)

Dominik Kempa, Tomasz Kociumaka

2019-04-08

Burrows-Wheeler transform (BWT) is an invertible text transformation that, given a text of length , permutes its symbols according to the lexicographic order of suffixes of . BWT is one of the most heavily studied algorithms in data compression with numerous applications in indexing, sequence analysis, and bioinformatics. Its construction is a bottleneck in many scenarios and settling the complexity of this task is one of the most important unsolved problems in sequence analysis that has remained open for 25 years. Given a binary string of length occupying machine words of space, the BWT construction algorithm due to Hon et al. (FOCS 2003) runs in time and space. Recent advancements (Belazzougui, STOC 2014, and Munro et al., SODA 2017) focus on removing the alphabet-size dependency in the time complexity, but they still require time. In this paper, we propose the first algorithm that breaks the -time barrier for BWT construction. Given a binary string of length , our procedure builds the Burrows-Wheeler transform in time and space. We complement this result with a conditional lower bound proving that any further progress in the time complexity of BWT construction would yield faster algorithms for the very well studied problem of counting inversions: it would improve the state-of-the-art -time solution by Chan and P\v{a}tra\c{s}cu (SODA 2010). Our algorithm is based on a novel concept of string synchronizing sets. As one of the applications, we show a data structure of the optimal size that answers longest common extension queries in time and, furthermore, can be deterministically constructed in the optimal time. This significantly improves upon the previously best data structures and essentially closes the LCE problem.

Junta correlation is testable (1904.04216v1)

Anindya De, Elchanan Mossel, Joe Neeman

2019-04-08

The problem of tolerant junta testing is a natural and challenging problem which asks if the property of a function having some specified correlation with a -Junta is testable. In this paper we give an affirmative answer to this question: We show that given distance parameters , there is a tester which given oracle access to , with query complexity and distinguishes between the following cases: The distance of from any -junta is at least ; There is a -junta which has distance at most from . This is the first non-trivial tester (i.e., query complexity is independent of ) which works for all . The best previously known results by Blais \emph{et~ al.}, required . In fact, with the same query complexity, we accomplish the stronger goal of identifying the most correlated -junta, up to permutations of the coordinates. We can further improve the query complexity to for the (weaker) task of distinguishing between the following cases: The distance of from any -junta is at least . There is a -junta which is at a distance at most from . Here . Our main tools are Fourier analysis based algorithms that simulate oracle access to influential coordinates of functions.

A note on Cunningham's algorithm for matroid intersection (1904.04129v1)

Huy L. Nguyen

2019-04-08

In the matroid intersection problem, we are given two matroids of rank on a common ground set of elements and the goal is to find the maximum set that is independent in both matroids. In this note, we show that Cunningham's algorithm for matroid intersection can be implemented to use independent oracle calls.

Weighted Reservoir Sampling from Distributed Streams (1904.04126v1)

Rajesh Jayaram, Gokarna Sharma, Srikanta Tirthapura, David P. Woodruff

2019-04-08

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, is well studied, and admits tight upper and lower bounds on message complexity. For weighted sampling with replacement, there is a simple reduction to unweighted sampling with replacement. However, in many applications the stream has only a few heavy items which may dominate a random sample when chosen with replacement. Weighted sampling \textit{without replacement} (weighted SWOR) eludes this issue, since such heavy items can be sampled at most once. In this work, we present the first message-optimal algorithm for weighted SWOR from a distributed stream. Our algorithm also has optimal space and time complexity. As an application of our algorithm for weighted SWOR, we derive the first distributed streaming algorithms for tracking \textit{heavy hitters with residual error}. Here the goal is to identify stream items that contribute significantly to the residual stream, once the heaviest items are removed. Residual heavy hitters generalize the notion of heavy hitters and are important in streams that have a skewed distribution of weights. In addition to the upper bound, we also provide a lower bound on the message complexity that is nearly tight up to a factor. Finally, we use our weighted sampling algorithm to improve the message complexity of distributed tracking, also known as count tracking, which is a widely studied problem in distributed streaming. We also derive a tight message lower bound, which closes the message complexity of this fundamental problem.

A unifying method for the design of algorithms canonizing combinatorial objects (1806.07466v2)

Pascal Schweitzer, Daniel Wiebking

2018-06-19

We devise a unified framework for the design of canonization algorithms. Using hereditarily finite sets, we define a general notion of combinatorial objects that includes graphs, hypergraphs, relational structures, codes, permutation groups, tree decompositions, and so on. Our approach allows for a systematic transfer of the techniques that have been developed for isomorphism testing to canonization. We use it to design a canonization algorithm for combinatorial objects in general. This result gives new fastest canonization algorithms with an asymptotic running time matching the best known isomorphism algorithm for the following types of objects: hypergraphs, hypergraphs of bounded color class size, permutation groups (up to permutational isomorphism) and codes that are explicitly given (up to code equivalence).

Compact Error-Resilient Self-Assembly of Recursively Defined Patterns (1904.02763v2)

Brad Shutters, Timothy P. Hartke Jr., Robert J. Sammelson

2019-04-04

A limitation to molecular implementations of tile-based self-assembly systems is the high rate of mismatch errors which has been observed to be between 1% and 10%. Controlling the physical conditions of the system to reduce this intrinsic error rate prohibitively slows the growth rate of the system. This has motivated the development of techniques to redundantly encode information in the tiles of a system in such a way that the rate of mismatch errors in the final assembly is reduced even without a reduction in . Winfree and Bekbolatov, and Chen and Goel, introduced such error-resilient systems that reduce the mismatch error rate to by replacing each tile in an error-prone system with a block of tiles in the error-resilient system, but this increases the number of tile types used by a factor of , and the scale of the pattern produced by a factor of . Reif, Sahu and Yin, and Sahu and Reif, introduced compact error-resilient systems for the self-assembly of Boolean arrays that reduce the mismatch error rate to without increasing the scale of the pattern produced. In this paper, we give a technique to design compact error-resilient systems for the self-assembly of the recursively defined patterns introduced by Kautz and Lathrop. We show that our compact error-resilient systems reduce the mismatch error rate to by using the independent error model introduced by Sahu and Reif. Surprisingly, our error-resilient systems use the same number of tile types as the error-prone system from which they are constructed.

Subsets and Supermajorities: Unifying Hashing-based Set Similarity Search (1904.04045v1)

Thomas Dybdahl Ahle

2019-04-08

We consider the problem of designing Locality Sensitive Filters (LSF) for set overlaps, also known as maximum inner product search on binary data. We give a simple data structure that generalizes and outperforms previous algorithms such as MinHash [J. Discrete Algorithms 1998], SimHash [STOC 2002], Spherical LSF [SODA 2017] and Chosen Path [STOC 2017]; and we show matching lower bounds using hypercontractive inequalities for a wide range of parameters and space/time trade-offs. This answers the main open question in Christiani and Pagh [STOC 2017] on unifying the landscape of Locality Sensitive (non-data-dependent) set similarity search.

Knot Diagrams of Treewidth Two (1904.03117v2)

Hans L. Bodlaender, Benjamin Burton, Fedor V. Fomin, Alexander Grigoriev

2019-04-05

In this paper, we study knot diagrams for which the underlying graph has treewidth two. We give a linear time algorithm for the following problem: given a knot diagram of treewidth two, does it represent the unknot? We also show that for a link diagram of treewidth two we can test in linear time if it represents the unlink. From the algorithm, it follows that a diagram of the unknot of treewidth 2 can always be reduced to the trivial diagram with at most (un)twist and (un)poke Reidemeister moves.

From independent sets and vertex colorings to isotropic spaces and isotropic decompositions (1904.03950v1)

Xiaohui Bei, Shiteng Chen, Ji Guan, Youming Qiao, Xiaoming Sun

2019-04-08

In the 1970's, Lov'asz built a bridge between graphs and alternating matrix spaces, in the context of perfect matchings (FCT 1979). A similar connection between bipartite graphs and matrix spaces plays a key role in the recent resolutions of the non-commutative rank problem (Garg-Gurvits-Oliveira-Wigderson, FOCS 2016; Ivanyos-Qiao-Subrahmanyam, ITCS 2017). In this paper, we lay the foundation for another bridge between graphs and alternating matrix spaces, in the context of independent sets and vertex colorings. The corresponding structures in alternating matrix spaces are isotropic spaces and isotropic decompositions, both useful structures in group theory and manifold theory. We first show that the maximum independent set problem and the vertex c-coloring problem reduce to the maximum isotropic space problem and the isotropic c-decomposition problem, respectively. Next, we show that several topics and results about independent sets and vertex colorings have natural correspondences for isotropic spaces and decompositions. These include algorithmic problems, such as the maximum independent set problem for bipartite graphs, and exact exponential-time algorithms for the chromatic number, as well as mathematical questions, such as the number of maximal independent sets, and the relation between the maximum degree and the chromatic number. These connections lead to new interactions between graph theory and algebra. Some results have concrete applications to group theory and manifold theory, and we initiate a variant of these structures in the context of quantum information theory. Finally, we propose several open questions for further exploration. This paper is dedicated to the memory of Ker-I Ko.

On popularity-based random matching markets (1904.03890v1)

Hugo Gimbert, Claire Mathieu, Simon Mauras

2019-04-08

Stable matching in a community consisting of N men and N women is a classical combinatorial problem that has been the subject of intense theoretical and empirical study since its introduction in 1962 in a seminal paper by Gale and Shapley [GS62]. In this paper, we study the number of stable pairs, that is, the man/woman pairs that appear in some stable matching. We prove that if the preference lists on one side are generated at random using the popularity model of Immorlica and Mahdian [IM15], the expected number of stable edges is bounded by N + N ln N , matching the asymptotic bound for uniform preference lists. If in addition that popularity model is a geometric distribution, then the number of stable edges is O(N) and the incentive to manipulate is limited. If in addition the preference lists on the other side are uniform, then the number of stable edges is asymptotically N up to lower order terms: most participants have a unique stable partner, hence non-manipulability.



Coin Marketplace

STEEM 0.28
TRX 0.11
JST 0.034
BTC 66077.75
ETH 3167.77
USDT 1.00
SBD 4.01