The Joy of Algorithms & Data Structures - part 2

In this series of blogs I'd like to tell you something about algorithms and data structures and how they can make your life easier. In this part we'll talk about binary search!

As you might have read in my first blog in this series, I really like algorithms and data structures and I'd like to share with you how they can make a difference in your software. I will do this by giving examples from real applications that I encountered over the years.

Last time I spoke about the hash set data structure. In this blog I will tell you about a very basic but useful search algorithm.

Part 2: Binary search

Binary search is a very simple yet powerful way of searching. Anybody who has ever used a dictionary (I mean the actual book full of words and explanations) knows what it is. Maybe even without realizing it! Suppose you hold a dictionary containing thousands of pages with tens of thousands of words and explanations. You are asked to find the explanation for the word 'obsequious'. What do you do? Of course, you will open the book at some - partially random - page. You look at the words on the page and see that they start with the letter M instead of O. So you open the book at a page with a higher page number. You see that the words start with the letters ON, so you know you have gone too far and open a page with a lower page number. You continue this strategy, closing in on the right page with every try, until you find the page and the word you were looking for. 
The reason that this method of searching works is that all the words in the dictionary appear in alphabetically sorted order, so with just one look at a page, you know you can rule out either all the pages ahead of it or all the pages trailing it. Effectively, every page view can cut the number of pages left to search in half. That's why it's called binary search.

A more formal explanation of the binary search algorithm and its time complexity can be found here and here.

As the dictionary example demonstrates you can use binary search to do lookups in an ordered set very quickly. To be precise, binary search has a running time of O(log N) where N is the number of items in the set. Note that a running time of O(log N) is usually extremely fast: if N is one billion, log N is only 31.

Another insightful example of binary search is git bisect.

Binary search applied to mathematical functions

A useful application of binary search is to minimize (or maximize) functions of one real variable with exactly one minimum (or maximum).

I was part of a team working on a project in which (fixed) beacons were used to determine the location of (mobile) receivers. In the setup we could read the signal strength of every beacon according to the receiver. Theoretically we could determine the distance between receiver and beacon directly from the signal strength and use this to find the position of the receiver.
Unfortunately, the signal strength measurements were highly unreliable and fluctuating a lot in both time and location. We decided to use only the ratio between two beacon strengths as an indicator. Together with the theoretical knowledge of our setup, we came to a (complicated) function giving us the deviation from the measured strengths as a function of the position of the receiver. We converted that to a function f (position of receiver) --> (deviation from measurements). Minimizing this function would give us the most likely position of the receiver.

For any function of one variable with exactly one minimum (so this is the only local minimum and therefor the global minimum), we can use binary search to quickly find the minimum. Let's call our axes x and y. Our algorithm works as follows:

  1. find an initial lower and upper bound for x such that you are sure that the x-value of the minimum lies in between these bounds;
  2. when testing an x-value xTest, determine whether the function is either increasing or decreasing at that value. If it is increasing, we know that the minimum must lie to the left of xTest. Else it must lie to the right of xTest. Change the search bounds accordingly;
  3. just like in binary search, keep on testing with x-values, each time reducing the search space and closing the gap between the lower and upper bound. This way we converge towards the minimum;
  4. stop when the lower and upper bound are sufficiently close to each other, say with difference precision.

C# code example:

I think that it's intuitively clear that this algorithm indeed gives us the x-value of the minimum of the function.

The size of our search space together with precision determines how many times a value has to be tested and therefor the running time of our algorithm. Suppose the initial lower bound is zero, upper bound is 1000 and precision is 0.001. Then our algorithm will do log(1000/0.001) = 20 tests. If we would have done a linear search for the minimum, testing each x-value in the search space with the same precision, we would have tested 1000/0.001 = 1.000.000 values. Obviously the binary search approach is much faster.

Note that determining whether a function is increasing in general can be tricky. Fortunately for a lot of functions we can simply take a small delta d and evaluate f at xTest and xTest+d and use the difference to see if f is increasing or not.

In another project, I needed to find the optimum brightness transformation on a photo to make it resemble a certain color. I defined a function (amount of brightness) --> (distance to target color). Then I minimized this function using binary search to quickly determine the optimal brightness transformation.

Comparison of binary search and linear search

The following numbers give an indication of the running time for linear and binary search when minimizing a function. I used the function f(x) = -1.0 * Math.Pow(x, 1.0 / x), searching for x from zero to 1000. As you can see in the table, the binary search is so fast that in fact it's hard to say how many times faster.

Precision Binary search Linear search
  Running time (mm:ss) Running time (mm:ss)
0.1 00:00.02 00:00.00
0.001 00:00.00 00:00.05
0.00001 00:00.00 00:04.71
0.0000001 00:00.00 08:38.40

Binary search in optimization problems

There is another kind of optimization problem in which binary search plays an important part.

Suppose you are asked to place smoke detectors in a big office to provide more safety in case of a fire. Each smoke detector has a certain range of cover in which it quickly detects smoke. You want to cover the whole office, but you want to buy as few detectors as possible because they are very expensive. Now suppose that - perhaps due to the geometry of the office - it is very hard to find a general optimal strategy of smoke detector placement, but it is relatively easy (in the sense of time complexity) to determine if full coverage can be obtained using a fixed number of smoke detectors. For example, if we know that we are going to place exactly 20 smoke detectors, we will place them in some even distribution, after which we can quickly* calculate whether the whole office is covered.

Now let's note that if it's not possible to cover the whole office with 10 smoke detectors, then it's no use to try for any number less than 10. Knowing this we can use binary search to find the minimum number of smoke detectors necessary:

  • find an upper bound for the number of smoke detectors;
  • write a function returning a boolean value whether or not it is possible to cover the whole office using n detectors. Let's call this the "IsPossible" function;
  • binary search on n from zero to the upper bound;
  • the found number n together with the placement of those n detectors represents an optimal solution for the problem.
Let's look at the running time of this algorithm (without going into details too much). The binary search will ensure that we run the IsPossible function maximum log N times, where N is the upper bound. As mentioned before, even for very large upper bounds this means that we will run the IsPossible function only few times. Furthermore, our assumption was that it is relatively easy to run the IsPossible function, meaning that this part does not have a high running time. Together that tells us we have an acceptably fast algorithm.

This kind of optimization can be applied in many other problems. All we need is a relatively easy "IsPossible" predicate in combination with binary search. The FairWorkLoad problem from TopCoder is another good example. Its solution - using an "IsPossible" function and binary search - can be found at the end of this article.

*) Keeping things simple, we could do this by representing the office space as a grid of 2D points instead of an actual 2D plane.

Conclusion

In this blog we have seen a couple of examples how binary search can be a part of an efficient algorithm. In particular we have seen that using binary search when finding the minimum of a function greatly improves the running time of our application.

In the next blog, we will look at a completely different but also interesting algorithm...

Many thanks go to Wolf Hijlkema and Freek Paans for giving feedback.

About Erik: I'm a software developer at Infi in the Netherlands. We mainly create web and mobile applications for our business clients. In my spare time I like to work on small, fun, side software projects. I mostly use .NET / C#.

Gezocht: ondernemende nerds!

› Wil jij je hersens bij ons laten kraken?

Wil je iets waarmaken met Infi?

Wil jij een eigen webapplicatie of mobiele app waarmee jij het bij anderen maakt?

Waargemaakt door de nerds van Infi.
Nerds met liefde voor softwareontwikkeling en die kunnen communiceren. En heel belangrijk: wat we doen, doen we met veel lol!

Wij willen het fixen. Laat jij van je horen?

Voor wie heb je een vraag?