开发者

Regex for multiple lines checking

I'm trying out regex (import re) to extract the info I want from a log file.

UPDATE: Added the C:\WINDOWS\security folder permissions which broke all of the sample codes.

Say the format of the log is:

C:\:
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    \Everyone   Allowed:    Read & Execute
    (No auditing)

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Modify
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Read & Execute
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\security:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Traverse Folder
            Read Attributes
            Read Permissions
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Traverse Folder
            Read Attributes
            Read Permissions
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

And it repeats for a few other directories. How can I split them into paragraphs and then check for lines containing Special Permissions:?

Like this:

  1. Separate the whole string1 into few parts, C:\ and C:\WINDOWS\system32.
  2. Look in each line that contains 'Special Permissions:'
  3. Display the whole line, e.g.: C:\: BUILTIN\Users Allowed: Special Permissions: \n\ Create Folders\n\ BUILTIN\Users Allowed: Special Permissions: \n\ Create Files\n\
  4. Repeat for next 'paragraph'

I was thinking of: 1. Search the whole text file for r"(\w+:\\)(\w+\\?)*:" - return me the path 2. String function or regex to get the rest of the output 3. Remove all the other lines besides the ones with Special Permissions 4. Display, and repeat step 1

But I think it is not efficient.

Can anyone guide me on this? Thanks.


Example output:

C:\:
BUILTIN\Users   Allowed:    Special Permissions:
Create Folders
BUILTIN\Users   Allowed:    Special Permissions:
Create Files

C:\WINDOWS\system32:
BUILTIN\Power Users Allowed:    Special Permissions: 
Delete

C:\WINDOWS\security:
BUILTIN\Users   Allowed:    Special Permissions: 
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed:    Special Permissions: 
Traverse Folder
Read Attributes
Read Permissions

C:\WINDOWS\system32\config doesn't show up as there's no Special Permission in the lines.


The template I am using:

import re

text = ""

def main():
    f = open('DirectoryPermissions.xls', 'r')
    global text
    for line in f:
        text = text + line
    f.close
    print text

def regex():
    global text
    &l开发者_JAVA技巧t;insert code here>

if __name__ == '__main__':
    main()
    regex()


# I would replace this with reading lines from a file,
# rather than splitting a big string containing the file.

section = None
inspecialperm = False
with open("testdata.txt") as w:
    for line in w:
        if not line.startswith("            "):
            inspecialperm = False

        if section is None:
            section = line

        elif len(line) == 0:
            section = None

        elif 'Special Permissions' in line:
            if section:
                print section
                section = ""
            inspecialperm = True
            print line,

        elif inspecialperm:
            print line,


You don't need the re module at all if you parse strings by "split & strip", which is more efficient:

for paragraph in string1.split('\n\n'):
    path = paragraph.split('\n', 1)[0].strip().rstrip(':')
    paragraph = paragraph.replace(': \n', ': ') # hack to have permissions in same line
    for line in paragraph.split('\n'):
        if 'Special Permissions: ' in line:
            permission = line.rsplit(':', 1)[-1].strip()
            print 'Path "%s" has special permission "%s"' % (path, permission)

Replace the print statement with whatever fits your needs.

EDIT: As pointed out in the comment, the previous solution doesn't work with the new input lines in the edited question, but here's how to fix it (still more efficiently than using regular expressions):

for paragraph in string1.split('\n\n'):
    path = paragraph.split('\n', 1)[0].strip().rstrip(':')
    owner = None
    for line in paragraph.split('\n'):
        if owner is not None and ':' not in line:
            permission = line.rsplit(':', 1)[-1].strip()
            print 'Owner "%s" has special permission "%s" on path "%s"' % (owner, permission, path)
        else:
            owner = line.split(' Allowed:', 1)[0].strip() if line.endswith('Special Permissions: ') else None


Similar to milkypostman's solution, but in the format you are trying to have that output in:

lines=string1.splitlines()
seperator = None
for index, line in enumerate(lines):
    if line == "":
        seperator = line
    elif "Special Permissions" in line:
        if seperator != None:
            print seperator
        print line.lstrip()
        offset=0
        while True:
            #if the line's last 2 characters are ": "
            if lines[index+offset][-2:]==": ":
                print lines[index+offset+1].lstrip()
                offset+=1
            else:
                break


Here is a solution using the re module and thefindall method.

data = '''\
C:\:
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control 
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    \Everyone   Allowed:    Read & Execute
    (No auditing)

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Modify
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Read & Execute
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)
'''

if __name__ == '__main__':
    import re

    # A regular expression to match a section "C:...."
    cre_par = re.compile(r'''
                ^C:.*?
                ^\s*$''', re.DOTALL | re.MULTILINE | re.VERBOSE)

    # A regular expression to match a "Special Permissions" line, and the
    # following line.
    cre_permissions = re.compile(r'''(^.*Special\ Permissions:\s*\n.*)\n''', 
                                re.MULTILINE | re.VERBOSE)

    # Create list of strings to output.
    out = []
    for t in cre_par.findall(data):
        out += [t[:t.find('\n')]] + cre_permissions.findall(data) + ['']

    # Join output list of strings together using end-of-line character
    print '\n'.join(out)

Here is the generated output:

C:\:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

C:\WINDOWS\system32:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete

C:\WINDOWS\system32\config:
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete


Thanks to milkypostman, scoffey, and the rest I came up with the solution:

def regex():
    global text
    for paragraph in text.split('\n\n'):
        lines = paragraph.split('\n', 1)
        #personal modifier to choose certain output only
        if lines[0].startswith('C:\\:') or lines[0].startswith('C:\\WINDOWS\system32:') or lines[0].startswith('C:\\WINDOWS\\security:'):
            print lines[0]
            iterables = re.finditer(r".*Special Permissions: \n(\s+[a-zA-Z ]+\n)*", lines[1])
            for items in iterables:
                #cosmetic fix
                parsedText = re.sub(r"\n$", "", items.group(0))
                parsedText = re.sub(r"^\s+", "", parsedText)
                parsedText = re.sub(r"\n\s+", "\n", parsedText)
                print parsedText
            print

I will still go through all of the posted codes (esp. scoffey's as I never knew pure string manipulation is that powerful). Thanks for the insight!

Of course, this will not be the most optimal, but it works for my case. If you have any suggestions, do feel free to post.


Output:

C:\Python27>openfile.py
C:\:
BUILTIN\Users   Allowed:        Special Permissions:
Create Folders
BUILTIN\Users   Allowed:        Special Permissions:
Create Files

C:\WINDOWS\security:
BUILTIN\Users   Allowed:        Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users     Allowed:        Special Permissions:
Traverse Folder
Read Attributes
Read Permissions

C:\WINDOWS\system32:
BUILTIN\Power Users     Allowed:        Special Permissions:
Delete
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜