Data Structures And Algorithms | 2019-07-16

binarytree (33)in #algorithms • 5 years ago

Data Structures And Algorithms

On a Generalization of the Marriage Problem (1907.05870v1)

Jonathan Lenchner

2019-07-12

We present a generalization of the marriage problem underlying Hall's famous Marriage Theorem to what we call the Symmetric Marriage Problem, a problem that can be thought of as a special case of Maximal Weighted Bipartite Matching. We show that there is a solution to the Symmetric Marriage Problem if and only if a variation on Hall's Condition holds on each of the bipartitions. We prove both finite and infinite versions of this result and provide applications.

Low-complexity Image and Video Coding Based on an Approximate Discrete Tchebichef Transform (1609.07630v3)

P. A. M. Oliveira, R. J. Cintra, F. M. Bayer, S. Kulasekera, A. Madanayake, V. A. Coutinho

2016-09-24

The usage of linear transformations has great relevance for data decorrelation applications, like image and video compression. In that sense, the discrete Tchebichef transform (DTT) possesses useful coding and decorrelation properties. The DTT transform kernel does not depend on the input data and fast algorithms can be developed to real time applications. However, the DTT fast algorithm presented in literature possess high computational complexity. In this work, we introduce a new low-complexity approximation for the DTT. The fast algorithm of the proposed transform is multiplication-free and requires a reduced number of additions and bit-shifting operations. Image and video compression simulations in popular standards shows good performance of the proposed transform. Regarding hardware resource consumption for FPGA shows 43.1% reduction of configurable logic blocks and ASIC place and route realization shows 57.7% reduction in the area-time figure when compared with the 2-D version of the exact DTT.

Towards Optimal Moment Estimation in Streaming and Distributed Models (1907.05816v1)

Rajesh Jayaram, David P. Woodruff

2019-07-12

One of the oldest problems in the data stream model is to approximate the -th moment of an underlying vector , which is presented as a sequence of poly updates to its coordinates. Of particular interest is when . Although a tight space bound of bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity when all updates are positive. Specifically, the upper bound is bits, while the lower bound is only bits. Recently, an upper bound of bits was obtained assuming that the updates arrive in a random order. We show that for , the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of bits for estimating . Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for , in the natural coordinator and blackboard communication topologies, there is an bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies , obtaining an max-communication upper bound, where is the diameter of . Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an bit lower bound for for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter.

Parallel Weighted Random Sampling (1903.00227v2)

Lorenz Hübschle-Schneider, Peter Sanders

2019-03-01

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries.

Travelling on Graphs with Small Highway Dimension (1902.07040v3)

Yann Disser, Andreas Emil Feldmann, Max Klimm, Jochen Konemann

2019-02-19

We study the Travelling Salesperson (TSP) and the Steiner Tree problem (STP) in graphs of low highway dimension. This graph parameter was introduced by Abraham et al. [SODA 2010] as a model for transportation networks, on which TSP and STP naturally occur for various applications in logistics. It was previously shown [Feldmann et al. ICALP 2015] that these problems admit a quasi-polynomial time approximation scheme (QPTAS) on graphs of constant highway dimension. We demonstrate that a significant improvement is possible in the special case when the highway dimension is 1, for which we present a fully-polynomial time approximation scheme (FPTAS). We also prove that STP is weakly NP-hard for these restricted graphs. For TSP we show NP-hardness for graphs of highway dimension 6, which answers an open problem posed in [Feldmann et al. ICALP 2015].

Space Efficient Approximation to Maximum Matching Size from Uniform Edge Samples (1907.05725v1)

Michael Kapralov, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos

2019-07-12

Given a source of iid samples of edges of an input graph with vertices and edges, how many samples does one need to compute a constant factor approximation to the maximum matching size in ? Moreover, is it possible to obtain such an estimate in a small amount of space? We show that, on the one hand, this problem cannot be solved using a nontrivially sublinear (in ) number of samples: samples are needed. On the other hand, a surprisingly space efficient algorithm for processing the samples exists: bits of space suffice to compute an estimate. Our main technical tool is a new peeling type algorithm for matching that we simulate using a recursive sampling process that crucially ensures that local neighborhood information from `dense' regions of the graph is provided at appropriately higher sampling rates. We show that a delicate balance between exploration depth and sampling rate allows our simulation to not lose precision over a logarithmic number of levels of recursion and achieve a constant factor approximation. The previous best result on matching size estimation from random samples was a approximation [Kapralov et al'14]. Our algorithm also yields a constant factor approximate local computation algorithm (LCA) for matching with exploration starting from any vertex. Previous approaches were based on local simulations of randomized greedy, which take time {\em in expectation over the starting vertex or edge} (Yoshida et al'09, Onak et al'12), and could not achieve a better than runtime. Interestingly, we also show that unlike our algorithm, the local simulation of randomized greedy that is the basis of the most efficient prior results does take time for a worst case edge even for .

Testing isomorphism of circular-arc graphs in polynomial time (1903.11062v2)

Roman Nedela, Ilia Ponomarenko, Peter Zeman

2019-03-26

A graph is said to be circular-arc if the vertices can be associated with arcs of a circle so that two vertices are adjacent if and only if the corresponding arcs overlap. It is proved that the isomorphism of circular-arc graphs can be tested by the Weisfeiler-Leman algorithm after individualization of two vertices.

A Quantum-inspired Classical Algorithm for Separable Non-negative Matrix Factorization (1907.05568v1)

Zhihuai Chen, Yinan Li, Xiaoming Sun, Pei Yuan, Jialin Zhang

2019-07-12

Non-negative Matrix Factorization (NMF) asks to decompose a (entry-wise) non-negative matrix into the product of two smaller-sized nonnegative matrices, which has been shown intractable in general. In order to overcome this issue, the separability assumption is introduced which assumes all data points are in a conical hull. This assumption makes NMF tractable and is widely used in text analysis and image processing, but still impractical for huge-scale datasets. In this paper, inspired by recent development on dequantizing techniques, we propose a new classical algorithm for separable NMF problem. Our new algorithm runs in polynomial time in the rank and logarithmic in the size of input matrices, which achieves an exponential speedup in the low-rank setting.

Bidirectional Text Compression in External Memory (1907.03235v2)

Patrick Dinklage, Jonas Ellert, Jonas Ellert, Johannes Fischer, Dominik Köppl, Manuel Penschuck

2019-07-07

Bidirectional compression algorithms work by substituting repeated substrings by references that, unlike in the famous LZ77-scheme, can point to either direction. We present such an algorithm that is particularly suited for an external memory implementation. We evaluate it experimentally on large data sets of size up to 128 GiB (using only 16 GiB of RAM) and show that it is significantly faster than all known LZ77 compressors, while producing a roughly similar number of factors. We also introduce an external memory decompressor for texts compressed with any uni- or bidirectional compression scheme.

Geometry of Scheduling on Multiple Machines (1907.05473v1)

Nikhil Bansal, Jatin Batra

2019-07-11

We consider the following general scheduling problem: there are identical machines and jobs all released at time . Each job has a processing time , and an arbitrary non-decreasing function that specifies the cost incurred for , for each possible completion time. The goal is to find a preemptive migratory schedule of minimum cost. This models several natural objectives such as weighted norm of completion time, weighted tardiness and much more. We give the first approximation algorithm for this problem, improving upon the bound due to Moseley (2019). To do this, we first view the job-cover inequalities of Moseley geometrically, to reduce the problem to that of covering demands on a line by rectangular and triangular capacity profiles. Due to the non-uniform capacities of triangles, directly using quasi-uniform sampling loses a factor, so a second idea is to adapt it to our setting to only lose an factor. Our ideas for covering points with non-uniform capacity profiles (which have not been studied before) may be of independent interest.