D2ff155cd04fa175620d2f3495b11b08

This is an excercise to implement an .ini parser that I was given at an interview. It needs to be able to read even some pretty malformed .ini files.

Things to note:
1) ; (semicolon) is the start of a comment
2) string values need to be taken verbatim from the file
3) quotes can be either 'single' or "double"
4) test.ini is your test file of course ;)

Can you make it smarter/more robust?

Thanks,
Lorenzo

######iniparser.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import sys
import re

inifile = sys.argv[1]
quotes = ["'", '\"']
f = file(inifile)
d = {}

def get_quote_char(line):
    for char in line:
        if char in quotes:
            return char

def getkey(line):
    #swallow everything up to the =
    return line[ : line.find('=') ].strip()

def getval(line):
    #swallow everything after the =
    line  = line[ line.find('=') + 1 : ].strip()
    q = get_quote_char(line)
    startq = line.find(q)

    #start scanning the line from the quote onwards
    position = 0
    for char in line[ startq : ]:
        if char not in quotes or line[ position - 1 ] == '\\':
            pass
        else:
            #might hit some remote corner-case with this
            if position > 0:
                return line[ startq + 1 : position ] 
            
        position+=1

for line in f:
    line = line.strip()
    
    #skip comments and empty lines
    if line.startswith(';') or line=='': pass
    
    #store sections as dicts
    elif line.startswith('['):
        section_name = line[ 1 : len(line) - 1 ].strip()
        section_dict = { section_name : {} }
        d.update(section_dict)
    else:
        k = getkey(line)
        v = getval(line)

        #print k,v

        try:         
            d[section_name].update( {k:v} )
        except TypeError:
            print 'The ini file contains invalid characters'
            
print d

#########test.ini

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

[foo]
greeting = 'hello'
;this is a comment
name = 'Eddie'

[bar]
lastname = 'Vedder';that's another comment;

[ malformed section  ]
city='Prague'
country="\'Czech Republic\'"
whatever='this ; is nasty'

     [bad]
dog='bau'
cat    =     'miao'
mouse = "squeak"

      [tabbed section]
dogname = '\Oliver'
catname = 'Barbara'

[one more]
appliance='lcd \'monitor\''
car = "Alfa \"Romeo\" - Giulietta";"foo"

Refactorings

No refactoring yet !

848e7681373328946b4b7ccb3a537627

jaredgrubb

November 11, 2007, November 11, 2007 20:50, permalink

No rating. Login to rate!

I know this may not be the answer you're looking for, but if someone asked me that in an interview, I would say "Well, I can't give the complete code off the top of my head, but it would start with 'import ConfigParser', a built-in module for Python."

D2ff155cd04fa175620d2f3495b11b08

lbolognini

November 13, 2007, November 13, 2007 09:18, permalink

No rating. Login to rate!

Hi Jared,

that solution wouldn't apply. It didn't even cross my mind to say smt like "I'll use a library" because the point of the question, as i assumed, was to see how i would solve a problem that i was unlikely to have solved before (because of the availibility of libraries).

Besides I believe that my version, while not perfect, goes to some length to ensure that no matter how badly formatted the ini file is, it will be parsed anyway ;)

Thanks anyway,
L.

848e7681373328946b4b7ccb3a537627

jaredgrubb

November 17, 2007, November 17, 2007 18:34, permalink

No rating. Login to rate!

If what you're looking for is robustness... then I would recommend using regular expressions (which it looks like you thought of with the 'import re'.) You can trim this program down to a dozen lines that way... Maybe if I get ambitious and no one else beat me to it, I'll give it a shot soon.

Avatar

John

January 9, 2008, January 09, 2008 06:04, permalink

No rating. Login to rate!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import sys
import re

SECTION = re.compile('^\s*\[\s*([^\]]*)\s*\]\s*$')
PARAM   = re.compile('^\s*(\w+)\s*=\s*(.*)\s*$')
COMMENT = re.compile('^\s*;.*$')

d = {}
f = open(sys.argv[1])
for line in f:
    if COMMENT.match(line): continue
    m = SECTION.match(line) 
    if m:
        section, = m.groups()
        d[section] = {}
    m = PARAM.match(line)
    if m:
        key, val = m.groups()
        d[section][key] = val
    
for k, v  in d.items():
    print k, v
D2ff155cd04fa175620d2f3495b11b08

lbolognini

January 9, 2008, January 09, 2008 09:20, permalink

No rating. Login to rate!

Thanks John!

Your refactoring





Format Copy from initial code

or Cancel