What's the correct regexp pattern to match a VMS filename?
The documentation at http://h71000.www7.hp.com/doc/731final/documentation/pdf/ovms_731_file_app.pdf (section 5-1) says the filename should look like this:
node::device:[root.][dir开发者_Python百科ectory-name]filename.type;version
Most of them are optional (like node, device, version) - not sure which ones and how to correctly write this in a regexp, (including the directory name):
DISK1:[MYROOT.][MYDIR]FILE.DAT
DISK1:[MYDIR]FILE.DAT
[MYDIR]FILE.DAT
FILE.DAT;10
NODE::DISK5:[REMOTE.ACCESS]FILE.DAT
See the documentation and source for the VMS::Filespec Perl module.
From wikipedia, the full form is actually a bit more than that:
NODE"accountname password"::device:[directory.subdirectory]filename.type;ver
This one took a while, but here is an expression that should accept all valid variations, and place the components into capture groups.
(?:(?:(?:([^\s:\[\]]+)(?:"([^\s"]+) ([^\s"]+)")?::)?([^\s:\[\]]+):)?\[([^\s:\[\]]+)\])?([^\s:\[\]\.]+)(\.[^\s:\[\];]+)?(;\d+)?
Also, from what I can tell, your example of
DISK1:[MYROOT.][MYDIR]FILE.DAT
is not a valid name. I believe only one pair of brackets are allowed. I hope this helps!
You could probably come up with a single complicated regex for this, but it will be much easier to read your code if you work your way from left to right stripping off each section if it is there. The following is some Python code that does just that:
lines = ["DISK1:[MYROOT.][MYDIR]FILE.DAT", "DISK1:[MYDIR]FILE.DAT", "[MYDIR]FILE.DAT", "FILE.DAT;10", "NODE::DISK5:[REMOTE.ACCESS]FILE.DAT"]
node_re = "(\w+)::"
device_re = "(\w+):"
root_re = "\[(\w+)\.]"
dir_re = "\[(\w+)]"
file_re = "(\w+)\."
type_re = "(\w+)"
version_re = ";(.*)"
re_dict = {"node": node_re, "device": device_re, "root": root_re, "directory": dir_re, "file": file_re, "type": type_re, "version": version_re}
order = ["node", "device", "root", "directory", "file", "type", "version"]
for line in lines:
i = 0
print line
for item in order:
m = re.search(re_dict[item], line[i:])
if m is not None:
print " " + item + ": " + m.group(1)
i += len(m.group(0))
and the output is
DISK1:[MYROOT.][MYDIR]FILE.DAT
device: DISK1
root: MYROOT
directory: MYDIR
file: FILE
type: DAT
DISK1:[MYDIR]FILE.DAT
device: DISK1
directory: MYDIR
file: FILE
type: DAT
[MYDIR]FILE.DAT
directory: MYDIR
file: FILE
type: DAT
FILE.DAT;10
file: FILE
type: DAT
version: 10
NODE::DISK5:[REMOTE.ACCESS]FILE.DAT
node: NODE
device: DISK5
directory: REMOTE.ACCESS
file: FILE
type: DAT
精彩评论