Python - Check Order of Lines in File
How does one c开发者_高级运维heck the order of lines in a file?
Example file:
a b c d e f
b c d e f g
1 2 3 4 5 0
Requirements:
- All lines beginning a, must precede lines beginning b.
- There is no limit on number of lines beginning a.
- Lines beginning a, may or may not be present.
- Lines containing integers, must follow lines beginning b.
- Numeric lines must have at least two integers followed by zero.
- Failure to meet conditions must raise error.
I initially thought a rather long-winded for loop, but that failed as I am unable to index lines beyond line[0]. Also, I do not know how to define location of one line relative to the others. There is no limit on the length of these files so memory may also be an issue.
Any suggestions very welcome! Simple and readable is welcome for this confused novice!
Thanks, Seafoid.
A straightforward iterative method. This defines a function to determine a linetype from 1 to 3. Then we iterate over the lines in the file. An unknown line type or a linetype less than any previous one will raise an exception.
def linetype(line):
if line.startswith("a"):
return 1
if line.startswith("b"):
return 2
try:
parts = [int(x) for x in line.split()]
if len(parts) >=3 and parts[-1] == 0:
return 3
except:
pass
raise Exception("Unknown Line Type")
maxtype = 0
for line in open("filename","r"): #iterate over each line in the file
line = line.strip() # strip any whitespace
if line == "": # if we're left with a blank line
continue # continue to the next iteration
lt = linetype(line) # get the line type of the line
# or raise an exception if unknown type
if lt >= maxtype: # as long as our type is increasing
maxtype = lt # note the current type
else: # otherwise line type decreased
raise Exception("Out of Order") # so raise exception
print "Validates" # if we made it here, we validated
You can get all the lines into a list with lines = open(thefile).readlines()
and then work on the list -- not maximally efficient but maximally simple, as you require.
Again simplest is to do multiple loops, one per condition (except 2, which is not a condition that can be violated, and 5 which isn't really a condition;-). "All lines beginning a, must precede lines beginning b" might be thought of as "the last line beginning with a, if any, must be before the first line beginning with b", so:
lastwitha = max((i for i, line in enumerate(lines)
if line.startswith('a')), -1)
firstwithb = next((i for i, line in enumerate(lines)
if line.startswith('b')), len(lines))
if lastwitha > firstwithb: raise Error
then similarly for "lines containing integers":
firstwithint = next((i for i, line in enumerate(lines)
if any(c in line for c in '0123456789')), len(lines))
if firstwithint < firstwithb: raise Error
This shouild really be plenty of hints for your homework -- can you now do by yourself the last remaining bit, condition 4?
Of course you can take different tacks from what I'm suggesting here (using next
to get the first number of a line satisfying a condition -- this requires Python 2.6, btw -- and any
and all
to satisfy if any / all items in a sequence meets a condition) but I'm trying to match your request for maximum simplicity. If you find traditional for
loops simpler than next
, any
and all
, let us know and we'll show how to recode these uses of the higher abstraction forms into those lower-layer concepts!
You don't need to index the lines. For every line you can chceck/set some conditions. If some condition is not met, raise an error. E.g. rule 1: you will have variable was_b initially set to False. In each iteration (besides from other checks / sets), check also, if the line starts with "b". If does, set was_b = True. Another check would be: if line starts with "a" and was_b is true, raise the error. Another check would be: if line contains integers and was_b is False, raise the error.. etc
Restrictions on lines:
I
. There must be no lines that begin with 'a'
after we've encountered a line that begins with 'b'
.
II
. If we encountered a numeric line then a previous one must start with 'b'
. (or your 4-th condition allows another interpretation: each 'b'
line must be followed by a numeric line).
Restriction on numeric line (as a regular expression): /\d+\s+\d+\s+0\s*$/
#!/usr/bin/env python
import re
is_numeric = lambda line: re.match(r'^\s*\d+(?:\s|\d)*$', line)
valid_numeric = lambda line: re.search(r'(?:\d+\s+){2}0\s*$', line)
def error(msg):
raise SyntaxError('%s at %s:%s: "%s"' % (msg, filename, i+1, line))
seen_b, last_is_b = False, False
with open(filename) as f:
for i, line in enumerate(f):
if not seen_b:
seen_b = line.startswith('b')
if seen_b and line.startswith('a'):
error('failed I.')
if not last_is_b and is_numeric(line):
error('failed II.')
if is_numeric(line) and not valid_numeric(line):
error('not a valid numeric line')
last_is_b = line.startswith('b')
精彩评论