Sunday, 27 May 2012

Boyer-Moore Majority Vote Algorithm

Given an array A[N] of integers, find the major element in the array or return -1 if no major element exists in the array. An element 'e' of an array is called major if it exists more than N/2 times in the array.
PS: Use O(1) space and time complexity should be O(N).

Solution:
->Initial thought is to find the median of the array in O(N) and check if the median exists more than N/2 times in the array. Important thing to understand is that, if the element is the major element of the array then it has to occur at the median position.

Another approach i found after searching is based on majority vote algorithm, called Boyer-Moore algorithm. This algorithm is faster than the above one.
->In the first pass of the array you start with the first element as the major element and its count set to 1. There on, major element is voted for (when same element occurs next time) and against (when any other element occurs at the next position). After the count reaches 'zero', major element is changed to the next element and count set to 1. If there is a majority element in the array, then it would remain set as the major element. If however, there is no major element i.e. no element occurs more than N/2 times in the array, then the value in the major element after first pass would be a false value.
-> So we need a second pass, in which we get the count of the major element found in the firs pass. This count tells if the element is a true majority or a false one.

This is easy to PROVE: If there is a majority element 'e', then e's count would be atleast N/2+1 and other elements can not bring this count to zero by voting against it. hence this would be set as the major element after the first pass.

This algorithm is explained very well using a simulation on the following link : http://www.cs.utexas.edu/~moore/best-ideas/mjrty/index.html, and the original paper here: http://www.cs.rug.nl/~wim/pub/whh348.pdf

1 comment:

  1. def suffixe(word) :
    long_word = len(word)
    g = long_word-1
    i = long_word-2
    suffTabl = [0]*long_word
    suffTabl[long_word-1]=long_word

    while i>=0 :
    if (i>g and suffTabl[i+long_word-1-f]=0 and word[g] == word[g+long_word-1-f]) :
    g -= 1

    suffTabl[i] = f-g

    i-=1

    return suffTabl

    def bonSuffixe(word) :
    long_word = len(word)
    suffTabl = suffixe(word)
    bonSuffTabl = [0]*long_word

    for i in range (0, long_word-1) :
    bonSuffTabl[i] = long_word

    i = long_word-1
    while (i>=0) :
    if suffTabl[i] == i+1 :
    for j in range (0, long_word-i-1) :
    if bonSuffTabl[j] == long_word :
    bonSuffTabl[j] = long_word-i-1
    i -= 1

    for i in range (0, long_word -2) :
    bonSuffTabl[long_word-1-suffTabl[i]] = long_word-i-1

    print "Table des Bon Suffixes", bonSuffTabl
    return bonSuffTabl

    def mauvais_caractere(word, alphabet) :
    mcTabl = {}
    pos = len(word)-1

    for i in range (len(word)) :
    mcTabl[word[i]] = pos
    pos -= 1

    if mcTabl[word[i]]==0 :
    mcTabl[word[i]] = len(word)


    print "Table des mauvais caractere", mcTabl
    return mcTabl

    def BoyerMoore(word, text, alphabet) :
    mcTabl = mauvais_caractere(word, alphabet)
    bonSuffTabl = bonSuffixe(word)

    occur=0
    j=0
    while (j<= len(text)-len(word)) :
    i = len(word)-1
    while (i>=0 and word[i] == text[i+j]) :
    i-=1
    if i<0 :
    print "Match at", j
    j += bonSuffTabl[0]
    occur+=1
    else :
    j += max(bonSuffTabl[i], mcTabl[text[i+j]]-len(word)+1+i)
    print "Il y a", occur, "occurence(s)."

    ReplyDelete