Uniavisen
Københavns Universitet
Uafhængig af ledelsen

Ph.d.-forsvar

Ph.d.-forsvar af Anders Aamund

Ph.d.-forsvar — On the 23rd of October Anders Aamand will defend his PhD thesis titled "Simple Hashing-Based Algorithms with Strong Theoretical Guarantees".

Info

Date & Time:

Place:
The PhD defence will take place as a hybrid event via Zoom and at H.C. Ørsted Institute, Auditorium 2, Universitetsparken 5, 2100 København. You can join online via Zoom: https://ucph-ku.zoom.us/s/65472457716

Hosted by:
Datalogisk Institut

Cost:
Free

Abstract

In recent years, demand has immensely increased for algorithms for handling and analyzing large-scale data. The data often arrives in a stream, and if it is not processed in time, the information is lost. Indeed, the high volume of the data makes storing a complete copy and performing exact computations an impossible task. These challenges motivate the following general question. “Can we design efficient algorithms and data structures that provide reliable statistical analyses in large-scale data scenarios?”. A powerful tool in designing such algorithms is randomization, and in particular, randomization through the use of hash functions. Conceptually, a hash function can be thought of as distributing a set of balls into a set of bins. Ideally it should do this in a fully random way, but unfortunately such fully random hashing is impossible to implement in practice. This thesis seeks to investigate practical, pseudorandom hash functions that still have enough randomness in them to give the same good theoretical guarantees as fully random hashing. First, we introduce  a new family of hash functions,  tabulation-permutation, which is simple to implement and demonstrated to be extremely fast in practice. We prove that this family of hash function provides reliable statistics on data, essentially showing that certain hash based sums nicely follow a normal distribution. We further show how access to such a hashing scheme leads to speedups of various streaming algorithms. We also study the distribution of non-empty bins when using simple tabulation hashing, another fast and practical hashing scheme. In several ways, this distribution is shown to resemble that from the fully random scenario.

Second, we study algorithms for estimating the frequencies of elements in a large stream of data. We show how such algorithms can profit from having access to a machine learning oracle that can predict whether an item is a heavy hitter or not.

Finally, we present a basic graph algorithm which can be understood as solving the following problem. For a city, can we make all the streets of the city one-way such that it is still possible to get from any place to any other place? The algorithm decides in linear time whether this can be done, also finding such a one-way orientation of the streets if it exists.

Assessment Committee

Christian Wulff-Nilsen, Associate professor, (DIKU) (Head of committee)

Rasmus Pagh,  Professor,  IT-University of Copenhagen
Martin Dietzfelbinger, Professor, Dr., Technische Universität Ilmenau, Germany

Academic  Main Supervisor
Mikkel Thorup, professor, Dept. of Computer Science (DIKU) UCPH

Co-supervisor

Mikkel Abrahamsen, Assistant professor, Dept.  of Computer Science (DIKU) UCPH

Moderator at this defense: Jacob Holm, Assistant professor, (DIKU)

For an electronic copy of the thesis, please contact phdadmin@di.ku.dk

Seneste