Thoughts of a programmer

Binary Indexed Tree

Previous week, we saw how we can use Square root decomposition to solve our problem. This week, we will see how we can solve the same problem by using a different technique called Binary Indexed Tree (also called Fenwick Tree).

Binary Indexed tree is a useful technique when we are dealing with a dynamic array of data where following type of queries are required -

Idea

First let’s see a naive approach to solving this problem.

This approach is simple to code and understand, but is very inefficient when dealing with large array having large number of queries of type \(1\) and type \(2\).

For query of type \(2\), it takes \(O(1)\) time, but for the query of type \(1\), it takes \(O(N)\) time as whenever array values are updated, it will recalculate the prefix sums for each position of the array.

The basic idea in this naive approach was to decrease the time for query of type \(2\) by having precalculated all the prefix sums. But in doing so, it sacrificed the time for handling queries of type \(1\). This approach would have been not that bad, if the queries were mostly of type 2 and the data rarely changed. But in most cases that wouldn’t be the case. So let’s work on improving this stategy.

Instead of precomputing the prefix sums for all the positions on any data change, instead we can try precomputing the values for only the specific positions in the array. But which positions?

Peter Fenwick proposed selecting the positions based on the binary representation of the array indices (and not the values themselves at those array indices). He also suggested that instead of directly storing the prefix sums, we can store them in chunks. Don’t worry, all of this would get clear as we go along.

Let’s call our data structure holding this new information bit_tree (short for Binary Indexed tree). This will also be an array like our prefix sums array, but only storing information a bit differently. You may ask why this is called a tree then? Well, this is similar to how heaps can be imagined as a tree like structure, but can be implemented using an array.

Fenwick’s idea was that any array index of the bit_tree, would not hold the prefix sum like we did above, but sum of only those elements of the original array, which are dictated by the last set bit in the binary representation of the array index. Let’s try to understand this by an example.

Suppose we are at array index \(7\) in our bit_tree. In our naive approach we would have stored the sum of all elements from starting till the index \(7\) of our original array. But according to Fenwick, we have to consider the binary representation of \(7\) to find out sum of which elements has to be stored at index \(7\).

Binary of \(7\) is \(111\). The last set bit of this representation is at position \(0\) (counting from right), representing decimal number \(1\). Which means that index \(7\) would hold sum of just \(1\) number which is nothing but the value at index \(7\) of the original array.

Similarly if we consider the position \(6\). Binary of \(6\) is \(110\). The last set bit here is at position \(1\) which represents decimal number \(2\). Thus it holds sum of two numbers from index \(5\) to \(6\) in the original array.

Note that the elements to sum are considered from the current index and moving towards the left.

2
2
3
3
1
1
6
6
3
3
5
5
8
8
5
5
3
3
original
array
[Not supported by viewer]
12
12
8
8
8
8
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
bit_tree
bit_tree
index
index

In the above figure, I have shown the sum stored at position \(4\), \(6\) and \(7\) of bit_tree. As you can see, at position \(7\), only sum of one value is stored. At position \(6\), sum of two values is stored. And at position \(4\), sum of \(4\) values are stored since binary of \(4\) is \(100\) and the last set bit represents decimal number \(4\).

Let’s see how storing the sums this way, helps us in performing the \(2\) queries above.

Type 1

For handling queries of type \(1\), where we have to update the element at index \(i\) in the array, we can update all the indices in the bit_tree which cover the index \(i\) being updated. Let’s understand this with the help of the following figure.

2
2
3
3
1
1
6
6
3
3
5
5
8
8
5
5
3
3
original
array
[Not supported by viewer]
5
5
12
12
33
33
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
bit_tree
bit_tree<br>
index
index
5
[Not supported by viewer]
7
[Not supported by viewer]
14
[Not supported by viewer]
35
[Not supported by viewer]

Here, we have shown sums for \(3\) positions, \(2\), \(4\) and \(8\). We want to change the value of element at position \(2\) from \(3\) to \(5\). In the naive approach from above, after changing the value of the element in the original array, we would have recomputed the prefix sums for whole of the positions in the prefix array. But in our new approach, we would only have to change the value of \(3\) positions. Why these \(3\) only?

We would definitely would have to change the value at position \(2\) since it covers index 2 of our original array. To maintain the integrity of the sums stored in all the other indices in our bit_tree, we would also have to change all those values which were calculated using the value at index \(2\) in our original array. And how can we figure out those indices?

Before answering that question, let’s consider another problem. Given a number \(x\), how can we quickly find the last set bit of that number. Let’s represent the number \(x\) in binary as \(s1t\) where \(1\) represents the last set bit, \(t\) representing sequence of zero or more \(0\)’s at the end and \(s\) representing the start of the number. Let’s take bitwise \(\&\) of \(x\) and \(-x\). \(-x\) would be just \(1\) added to the \(1\)-complement of \(x\).

\[-x = 1 + \overline{s1t} = 1 + \overline{s}0{111...1} = \overline{s}1t\]

And bitwise \(\&\) of \(x\) and \(-x\) would be now,

\[x \, \& \, -x = s1t \, \& \, \overline{s}1t = 1t\]

Which gives us the last set bit of \(x\) because \(t\) is just sequence of \(0\)’s. You can verify that this works by taking an example.

Let’s continue now answering our previous question of determining which indices need to be updated. To find this, first we will find the last set bit of the current index and add it to current index to get the next index to update. So after having updated the value at position \(2\), we will then find the last set bit of \(2\) which would be \(2\) again. We will then add this \(2\)(last set bit) to the \(2\)(current index) to get \(4\). \(4\) is thus the next index which we need to update. And similarly we will update \(8\) which is the next index to update after \(4\).

So, to find the next index to update after updating the current index \(i\), following equation is used.

\[i_{next} = i + (i \, \& -i)\]

which basically tells us to add the last set bit from the current index to get the next index.

Time complexity of the update function would be number of bits in \(n\) which is nothing but \(O(log(n))\).

Below is a sample implementation of this update function in Python. Notice how small, clean and easy to understand the function is.

Type 2

For queries of type \(2\), we have to find the prefix sum for the first \(i\) elements. Type 2 query can be also of type where we are given two indices \(l\) and \(r\) and we have to find the range sum between these two indices. But these can also be solved by our first representation. Range sum between \(l\) and \(r\) is only difference between prefix sum till \(r\) and \(l-1\).

With the way the sums are stored in our data structure, answering queries of type 2 are very simple. We start with the \(i^{th}\) element in bit_tree. Now this index would hold the sum of some elements of our original array but not all. We will then find the next index, which holds the sum of the remaining elements and continue on from there till we have found the sum of all the elements. To find the next index from the current index \(i\), following equation is used.

\[i_{next} = i - (i \, \& -i)\]

which basically tells us to subtract the last set bit from the current index to get the next index.

12
12
8
8
8
8
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
bit_tree
bit_tree
index
index
= 111
[Not supported by viewer]
= 100
[Not supported by viewer]
= 110
[Not supported by viewer]
28
[Not supported by viewer]

In the above figure, we are interested in finding the prefix sum for the first \(7\) elements. We start with the \(7^{th}\) position in the bit_tree, and add its value(\(8\)) to our answer. But this holds the sum for only element. To find the next index to consider, we write \(7\) in the binary representation and subtract the last set bit, which would give us \(6\). We again add this value(\(8\)) to our answer. This holds sum of two values. We again find the next index by subtracting the last set bit from the current index. This would give us \(4\). We add (12) to our answer. This holds the sum of first \(4\) values. Now when we find the next index, we would reach \(0\) which signifies that we have added all the sums chunks to get the prefix sum for the first \(7\) elements i.e. \(28\).

Time complexity of the query function is also \(O(log(n))\).

Here is a sample implementation of the query function in Python.

Here is the complete implementation of the binary indexed tree in Python 3.

Time complexity of binary indexed tree is inclusive of time required to initially build the tree which is \(O(n*log(n))\) and time required to answer queries \(O(q * log(n))\) giving the total time complexity of \(O(n*log(n) + q * log(n))\).

Space complexity of binary indexed tree is \(O(n)\) since it requires a separate array to hold the precomputed information.

Solution

Now let’s use the idea of binary indexed tree to solve the problem from the previous week.

Here we have to find the \(k^{th}\) odd number in our array. To solve this problem using binary indexed tree, we can use a slight variation in our sum storage.

Instead of storing sum of array values directly, we can only store whether the number is odd or not. So if the number is odd, that would be considered as a number \(1\), and otherwise \(0\). So, this way, our bit_tree would hold the count of odd numbers till any position in our array.

To handle query of type 1, we can simply check whether the updated value is odd or not. If the updated value is odd, then we can simply update the corresponding index by 1, else we would have to decrease the index by 1 (or in other words, increase it by -1).

Now queries of type 2 are more interesting. How can we find the \(k^{th}\) odd number? Here we can use binary search over bit_tree for k. We can use binary search here, because bit_tree satisfies the two conditions of applying binary search. The data in bit_tree is sorted in increasing order of count of odd numbers found. And for any k, there exists a index for which all the indices to the left of it, would have odd number count less than it and all the indices to the right, would have odd number count greater than or equal to it.

Here is a my sample implementation in Python 3.

So, to summarize, binary indexed tree is a good data structure to know and have in your toolbox.

Thank you for reading my article. Let me know if you liked my article or any other suggestions for me, in the comments section below. And please, feel free to share :)