How can i do this using a Python Regex?
I am trying to properly extract methods definitions that are generated by comtypes for Com Interfaces using a regex. Furthermore some of them are blank which causes even more problems for me.
Basically i have this:
IXMLSerializerAlt._methods_ = [
COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
( ['in'], BSTR, 'XML' ),
( ['in'], BSTR, 'TypeName' ),
( ['in'], BSTR, 'TypeNamespaceURI' ),
( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]
class EnvironmentManager(CoClass):
u'Singleton object that manages different environments (collections of configuration information).'
_reg_clsid_ = GUID('{8A626D49-5F5E-47D9-9463-0B802E9C4167}')
_idlflags_ = []
_typelib_path_ = typelib_path
_reg_typelib_ = ('{5E1F7BC3-67C5-4AEE-8EC6-C4B73AAC42ED}', 1, 0)
INumberFormat._methods_ = [
]
I want to extract both the IXMLSerializerAlt and INumberFormat methods definitions however i cant figure out a proper regex. E.g. for IXMLSerializer i want to extract this:
IXMLSerializerAlt._methods_ = [
COMMETHOD([helpst开发者_如何学Pythonring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
( ['in'], BSTR, 'XML' ),
( ['in'], BSTR, 'TypeName' ),
( ['in'], BSTR, 'TypeNamespaceURI' ),
( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]
This regex in my mind this should work:
^\w+\._methods_\s=\s\[$
(^.+$)*
^]$
Im checking my regex's using kodos however i cannot figure out a way to make this work.
You're missing the newline characters between $
and ^
, and may not be using the re.MULTILINE
flag which allows those to anchor at the start and end of lines. The following (compiled with re.MULTILINE
) would match:
\w+\._methods_\s=\s\[$(?:\n^.+$)*\n^\]$
However, here's a slightly simpliifed regex that will also match your examples:
>>> s = '''...\nIXMLSerializerAlt._methods_ = [\n COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',\n ( ['in'], BSTR, 'XML' ),\n ( ['in'], BSTR, 'TypeName' ),\n ( ['in'], BSTR, 'TypeNamespaceURI' ),\n ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),\n]\n...'''
>>> import re
>>> re.findall(r'^\w+\._methods_\s=\s\[$.*?^\]$', s, re.DOTALL | re.MULTILINE)
["IXMLSerializerAlt._methods_ = [\n COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',\n ( ['in'], BSTR, 'XML' ),\n ( ['in'], BSTR, 'TypeName' ),\n ( ['in'], BSTR, 'TypeNamespaceURI' ),\n ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),\n]"]
import re
interface_definitions = '''
IXMLSerializerAlt._methods_ = [
COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
( ['in'], BSTR, 'XML' ),
( ['in'], BSTR, 'TypeName' ),
( ['in'], BSTR, 'TypeNamespaceURI' ),
( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]
class EnvironmentManager(CoClass):
u'Singleton object that manages different environments (collections of configuration information).'
_reg_clsid_ = GUID('{8A626D49-5F5E-47D9-9463-0B802E9C4167}')
_idlflags_ = []
_typelib_path_ = typelib_path
_reg_typelib_ = ('{5E1F7BC3-67C5-4AEE-8EC6-C4B73AAC42ED}', 1, 0)
INumberFormat._methods_ = [
]
'''
RX_METHODS = re.compile(
r'(\w+)\._methods_\s=\s\[('
r'.*?'
r'(?:\[.*?\].*?)*'
r')\]',
re.DOTALL)
for match in RX_METHODS.finditer(interface_definitions):
print match.groups()
精彩评论