regex to extract names and values of attributes
I have the following possible strings that I need to turn into arrays so I can feed them into an html generator. I am not st开发者_JAVA百科aring with html or XML, I am trying to create a shorthand that will allow me to populate my html objects much simpler and faster with more readable code.
id='moo' id = "foo" type= doo value ='do\"o' on_click='monkeys("bobo")'
I need to pull out the attribs and their corresponding values. These attrib strings are not associated with an html or xml tag. And I would like to do it with 1 to 3 regular expressions
- The value may be encapsulated by either single or double quotes
- If the value is encapsulated by quotes it may also contain whitespace, quotes different from encapsulating quotes or escaped quotes that are same as the encapsulating quotes.
- There may or may not be whitespaces between the attrib and =, and the = and value.
The eventual results should look like:
array(1) { [id] => moo } array(3) { [id] => foo [type] => doo [value] => do"o } array(1) { [on_click] => monkeys("bobo") }
but if it turns out like:
array(2) { [0] => id [1] => moo } array(6) { [0] => id [1] => moo [2] => class [3] => foo [4] => value [5] => do"o } array(2) { [0] => on_click [1] => monkeys("bobo") }
I can re-arrange it from there.
Some previous regexes I have tried to use and their issues:
/[\s]+/
- Return attrib/value pairs only if there was no whitespace around the =/(?<==)(\".*\"|'.*'|.*)$/
- Returns value including the encapsulating quotes. It does ignore escaped quotes within the value though/^[^=]*/
- Returns the attribute just fine. regardless of whitespace between attrib and =
Any particular reason you want to use regex specifically here? Seems like a token-based parser might work better for you, as you need to keep more state than is comfortable to do in a regex.
Tyson,
It appears that you have already done some parsing to remove the XML/HTML elements, and are now trying to process the remaining attributes. In general, regular expressions are not sufficient for parsing XML/HTML.
If you have access to the XML/HTML, you should consider using a DOM processing library / extension to PHP to read in the XML/HTML, and iterate/parse the elements and attributes.
Here is an example reference:
- "Reading and writing the XML DOM with PHP" http://www.ibm.com/developerworks/library/os-xmldomphp/
精彩评论