How to extract certain value from collection of text files
Say, I have a collection of text files I need to process (e.g. search for a certain label and extract the value). What would be the general way to tackle the problem?
I also read this: "Retrieve Variable Values from Python" but it seems not applicable to some of the cases I face (like tab
is used instead of :
)
I just want to know the most appropriate way to tackle the problem regardless of the language used.
Say I have something like:
Name: Backup Operators SID: S-1-5-32-551 Caption: COMMSVR21\Backup Operators Description: Backup Operators can override security restrictions for the sole purpose of backing up or restoring files Domain: COMMSVR21
COMMERCE/cabackup
COMMSVR21/sys5erv1c3
I want to be able to access/retrieve the values of Backup Operators
and get COMMERCE/cabackup
& COMMSVR21/sys5erv1c3
in return.
How would you do it?
What I thought of is to read the whole text file, regex search and probably some if else statements. Is this effective? Or maybe parsing the text file into probably some array and retrieve it? I'm not sure.
Like in another example say:
GPO: xxx & yyy Servers
Policy: MaximumPasswordAge
Computer Setting: 45
How would you check the text file for Policy = MaximumPasswordAge
and return the value 45
?
Thanks!
p/s -- I might be doing this in Python (zero knowledge, so picking it up on the fly) or Java
pp/s -- I just realised that there's no spoiler tag. Hmm
--
E.g. of开发者_StackOverflow中文版 the logs: Log with Directory Permissions:
C:\:
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
\Everyone Allowed: Read & Execute
(No auditing)
C:\WINDOWS:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Modify
BUILTIN\Power Users Allowed: Special Permissions:
Delete
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
Another one with the following:
Audit Policy
------------
GPO: xxx & yyy Servers
Policy: AuditPolicyChange
Computer Setting: Success
GPO: xxx & yyy Servers
Policy: AuditPrivilegeUse
Computer Setting: Failure
GPO: xxx & yyy Servers
Policy: AuditDSAccess
Computer Setting: No Auditing
This is the tab delimited one:
User Name Full Name Description Account Type SID Domain PasswordIsChangeable PasswordExpires PasswordRequired AccountDisabled AccountLocked Last Login
53cuR1ty Built-in account for administering the computer/domain 512 S-1-5-21-2431866339-2595301809-2847141052-500 COMMSVR21 True False True False False 09/11/2010 7:14:27 PM
ASPNET ASP.NET Machine Account Account used for running the ASP.NET worker process (aspnet_wp.exe) 512
I always shove Python into people's faces ;)
I recommend looking at Regex: http://docs.python.org/howto/regex.html, as it might fit your needs. I won't do it for you (because I can't), but I know this will work if your files are colon-delimited key/value pairs separated by newline characters. Here's a quick start (which might work):
regex = '(.*):( *)(.*)\n'
This matches three groups (hopefully): A group before the colon (group 1), the spaces (group 2, which can be thrown away), and the text between that and a new line (group 3).
Play with that (I don't want to have a regex aneurysm, so this is far as I can help for now). Good luck!
精彩评论