2155c92be66863c4634778bf522efe14

My second python script. :)

This one sorts a list of words in a particular order, which depends on the order of characters in the 'alphabet' string.

Any suggestions on how to improve / simplify it?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

def sort(words):

    sortedList = []

    for i in range(len(words)):

        word = words[i]

        if not sortedList:

            sortedList.append(word)

        else:

            for j in range(len(sortedList)):

                sortedWord = sortedList[j]

                if isLeftWordBeforeRight(word, sortedWord):

                    sortedList.insert(j, word)

                    break

                elif j == (len(sortedList) - 1):

                    sortedList.append(word)

    return sortedList





def isLeftWordBeforeRight(left, right):

    isBefore = False

    if right.startswith(left):

        isBefore = True

    else:

        for n in range(len(left)):

            if n < len(right):

                comparison = compareChars(left[n], right[n])

                if comparison != 0:

                    isBefore = comparison < 0

                    break    

    return isBefore



def compareChars(left, right):

    return  alphabet.find(left) - alphabet.find(right)





alphabet = "zyxwvutsrqpomnlkjihgfedcba"



inputWords = ["england", "france", "spain", "italy", "greece", "portugal",

              "canada", "usa", "mexico", "peru", "cuba", "chile", "argentina",

              "zimbabwe", "uganda", "congo", "zambia", "namibia", "ghana"]



print sort(inputWords)

Refactorings

No refactoring yet !

848e7681373328946b4b7ccb3a537627

jaredgrubb

December 11, 2007, December 11, 2007 04:24, permalink

No rating. Login to rate!

You can use the sort() method of Python and change the comparison function. Then you should be able to run inputWords.sort(my_cmp) and it should do the same thing. I havent tested the following code, but I think it works... :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def my_cmp(a, b):
  global alphabet

  # Check if either input is zero-length, which is considered "smallest"
  if 0 in (len(a),len(b)):
    return cmp(a,b)

  # Try to see if the first element is in the alphabet; get its index
  try:
    a_i = alphabet.index(a[0])
  except ValueError:
    a_i = None
  try:
    b_i = alphabet.index(b[0])
  except ValueError:
    b_i = None
    
  if None in (a_i, b_i):
    # If exactly one is None, then the non-None comes earlier
    if a_i!=None: return -1
    if b_i!=None: return 1
    # They're both None, so use standard compare; if equal, keep comparing
    return cmp(a[0],b[0]) if a[0]!=b[0] else my_cmp(a[1:],b[1:])

  return cmp(a_i,b_i) if a_i!=b_i else my_cmp(a[1:],b[1:])
    
2155c92be66863c4634778bf522efe14

armandino.myopenid.com

December 12, 2007, December 12, 2007 03:00, permalink

No rating. Login to rate!

Hi Jared,

There seems to be a problem with the last two return statements. I'm not that familiar with python syntax to fix it tho :)

An unrelated question.. what's the benefit of using alphabet.index(...) instead of the find method? The latter, I understand, doesn't throw an error.

848e7681373328946b4b7ccb3a537627

jaredgrubb

December 14, 2007, December 14, 2007 21:56, permalink

No rating. Login to rate!

As for the difference between .index() and .find().... index raises an exception on no match, whereas find returns -1. You could remove the try..except's and test against -1 instead of None if you want to use find instead of index. Choice of style.

The last two return statements are like the ternary operator in C. For example, instead of the last return statement, you could replace it with:

1
2
3
  if a_i!=b_i then:
    return cmp(a_i, b_i)
  return my_cmp(a[1:], b[1:])
848e7681373328946b4b7ccb3a537627

jaredgrubb

December 14, 2007, December 14, 2007 21:57, permalink

No rating. Login to rate!

(ugh, my ancient BASIC background just popped through! Yikes. Forget that "then" in there lol)

9c52ad00ba2f2602661c49f896733229

orip

January 18, 2008, January 18, 2008 01:12, permalink

No rating. Login to rate!

armandino - I liked how you subtracted the two indexes to get the cmp, cool!

How about this version? (append to your original code)

alphabet_cmp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from itertools import izip
def alphabet_cmp(a, b):
    # Make sure no letters are missing. For performance in final code, comment out or remove with 'python -O'
    assert set(a).issubset( set(alphabet) ) and set(b).issubset( set(alphabet) )

    # find the first non-equal character
    for c_a, c_b in izip(a,b):
        if c_a != c_b:
            return alphabet.index(c_a) - alphabet.index(c_b)
    
    # one word is a prefix of another, sort by length
    return len(a) - len(b)

print sorted(inputWords, alphabet_cmp)
3b6399959bbfd36f5aba155ef0c6e133

Bjoern Brandenburg

May 24, 2008, May 24, 2008 01:59, permalink

No rating. Login to rate!

This can be done much more concise and without any global variables. Python offers higher-order functions and closures. Use them. :-)

1
2
3
4
5
6
7
8
9
10
from collections import defaultdict as dd

def list_cmp(order):
    "Return a list comparison function that sorts by order."
    idx  = dd(lambda: len(order), [(order[i], i) for i in xrange(len(order))])
    cmp  = lambda a, b, i=0: len(a) - len(b) if i ==  min(len(a), len(b)) else _cmp(a, b, i)
    _cmp = lambda a, b, i: cmp(a, b, i + 1) if idx[a[i]] == idx[b[i]] else idx[a[i]] - idx[b[i]]
    return cmp
        
inputWords.sort(cmp=list_cmp(alphabet))
3ca6bd7b50c8f00349e6586c3627de24

Tom Lynn

July 17, 2008, July 17, 2008 09:43, permalink

No rating. Login to rate!
1
2
3
4
5
6
7
alphabet = "zyxwvutsrqpomnlkjihgfedcba"

inputWords = ["england", "france", "spain", "italy", "greece", "portugal",
              "canada", "usa", "mexico", "peru", "cuba", "chile", "argentina",
              "zimbabwe", "uganda", "congo", "zambia", "namibia", "ghana"]

print sorted(inputWords, key=lambda word: [alphabet.index(c) for c in word])

Your refactoring





Format Copy from initial code

or Cancel