Search Algorithms in Python
A search algorithm is an algorithm that searches
or does it? . You have more than likely used a search algorithm before, to find an item in a list, or check if something is in it.
Before we get started, its helpful to learn about the term Complexity. The correct definition of Complexity is a measure of the amount of time and/or space required by an algorithm for an input of a given size (n). Go to the link above to learn more about the term itself. I will be using it as we go to refer to the speed of an algorithm!
Lets say we have a list (we'll call it ARRAY) and we want to search for certain items in the list.
The length of the array is 10000. It is sorted from least to greatest.
The goal is to have the fastest algorithm possible in order to find a random number.
Try brainstorming some different solutions, before reading the algorithm below! Post any different solutions in the comments below!
You have most likely used linear search before. It is the simplest algorithm and the most common one. See if the code below looks familiar:
arr = [1,4,12,58] # Function to search def linearSearch(array, target): # Loop over the list for index in range(0,len(array)-1): # Getting the item item = array[index] if item == target: # If the item is equal to the target item, end the function and return the index return index # If the function has not ended yet, that means the item is not in the list! # So, just return -1 return -1 # Let's try a search for 4 print(linearSearch(arr,4)) # If we've done everything correctly, we should get a value of 1 printed above
Simple, right? But there's a problem. Rememeber how our list's length from the original problem was 10000. Looping over the whole list may be a bad idea.
- Best Case Scenario : O(1)
- Average Scenario : n(n+1)/2n
- Worst Case Scenario : O(n)
Next, we have jump search. It is much less common, but it is faster than jump search. There is much more to it though.
First, the list has to be sorted. Since this is true, we dont have to worry about that.
In the list, the algorithm keeps jumping a certain amount until the item it is at is greater than the target. Then, you have to jump back a step and do linear search to find the target.
Try implementing these ideas on your own. Once you're done, check your code with the solution below!
arr = [1,23,145,246,1238,2342,3456] def jumpSearch(arr,target,n=len(arr)): # The step size is determined by finding the square root of the length of the array. step = __import__('math').sqrt(n) # Initializing prev at 0. This is what we will be returning prev = 0 # While the item is less than the target, jump while arr[int(min(step, n)-1)] < target: prev = step step += __import__('math').sqrt(n) # If the index crosses the length of the list, the item is not in it if prev >= n: return -1 # Begin linear search while arr[int(prev)] < target: prev += 1 # If the index is greater than the length of the list or the step, the item is not in the index if prev == min(step, n): return -1 # If the target is found if arr[int(prev)] == target: return int(prev ) # If nothing is found, return -1 return -1 print(linearSearch(arr,145))
Nice! That seems like a nice way to go about it! So far, this is our best solution. The only thing is, this has an average Performance of its worst case scenario. I think we could look for something better...
- Best Case Scenario : O(1)
- Average Scenario : O(√n)
- Worst Case Scenario : O(√n)
Just like jump search, this method also requires the list to be sorted! But, it is faster and less computationally expensive.
Here's how it works:
- Find the middle of the array, and the left and right.
- If the middle of the array is greater than the target, repeat with a new array with the contents of the array from the left to the middle.
- Else if the middle of the array is less, repeat with a new array with the contents from the middle to the right.
- Else if it is equivalent, return the value.
- Else if the left and the right are the same, return -1.
Like before, try implementing this on your own. Once you're done, compare your code with the code below!
arr = [1,23,145,246,1238,2342,3456] def binarySearch(arr,target,left=0,right=len(arr)): # If the right is not the left if right >= left: # Calculating the middle mid = left + right - 1 # Checking the middle value with the target if arr[mid] == target: return mid # Calling the function again with the corresponding values for left and right elif target > arr[mid]: return binarySearch(arr, target,mid, left) elif target < arr[mid]: return binarySearch(arr, target, left, mid) # If its not in there, return -1 return -1 print(binarySearch(arr,145)) # If everything is correct, we should get the value 2 printed on the display
Much better. And that's also faster!!! There we have it folks. A good solution would be Binary Search.
Best Case Scenario : O(1)
Average Scenario : O(log n)
Worst Case Scenario : O(log n)
Me when trying to find my homework in my locker when it's due in 7 minutes: QUICK, ACTIVATE BINARY SEARCH!