How to find elements by 'id' field in SVG file using Python
Below is an excerpt from an .svg file (which is xml):
<text
xml:space="preserve"
style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
x="109.38555"
y="407.02847"
id="libcode-00"
sodipodi:linespacing="125%"
inkscape:label="#text4638"><tspan
sodipodi:role="line"
id="tspan4640"
x="109.38555"
y="407.02847">12345678</tspan></text>
I'm learning Python and have no clue how can I find all such text
elements that have an id
field equal to libcode-XX
where XX is a number.
I've loaded this .svg file using minidom's parser and tried to find elements using getElementById
. However I'm getting None
result.
svgTemplate = minidom.parse(svgFile)
print svgTemplate
print svgTemplate.getElementById('libcode-00')
Going after other SO question I've tried using setIdAttribute('id')
on svgTemplate
object with no luck.
Bottom line: please give a hint for a smart way to extract all of these text
elements that have id
s in form of libcode-XX
. After that it should be no problem to get tspan
text and substitute it with generated c开发者_开发问答ontent.
Sorry, I don't know my way around minidom. Also, I had to find the namespace declarations from a sample svg document so that your excerpt could load.
I personally use lxml.etree. I'd recommend that you use XPATH for addressing parts of your XML document. It's pretty powerful and there's help here on SO if you're struggling.
There are lots of answers on SO about XPATH and etree. I've written several.
from lxml import etree
data = """
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://web.resource.org/cc/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="50"
height="25"
id="svg2"
sodipodi:version="0.32"
inkscape:version="0.45.1"
version="1.0"
sodipodi:docbase="/home/tcooksey/Projects/qt-4.4/demos/embedded/embeddedsvgviewer/files"
sodipodi:docname="v-slider-handle.svg"
inkscape:output_extension="org.inkscape.output.svg.inkscape">
<text
xml:space="preserve"
style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
x="109.38555"
y="407.02847"
id="libcode-00"
sodipodi:linespacing="125%"
inkscape:label="#text4638"><tspan
sodipodi:role="line"
id="tspan4640"
x="109.38555"
y="407.02847">12345678</tspan></text>
</svg>
"""
nsmap = {
'sodipodi': 'http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd',
'cc': 'http://web.resource.org/cc/',
'svg': 'http://www.w3.org/2000/svg',
'dc': 'http://purl.org/dc/elements/1.1/',
'xlink': 'http://www.w3.org/1999/xlink',
'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'inkscape': 'http://www.inkscape.org/namespaces/inkscape'
}
data = etree.XML(data)
# All svg text elements
>>> data.xpath('//svg:text',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# All svg text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# TSPAN child elements of text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]/svg:tspan',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}tspan at b7cfc964>]
# All text elements with id starting with "libcode"
>>> data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfcc34>]
# Iterate text elements, access tspan child
>>> for elem in data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap):
... tp = elem.xpath('./svg:tspan',namespaces=nsmap)[0]
... tp.text = "new text"
open("newfile.svg","w").write(etree.tostring(data))
Does it work if you replace 'id' with 'xml:id'?
If minidom doesn't know svg it might treat the 'id' attribute as just any other attribute, instead of being of type ID. A conforming svg implementation would recognize the 'id' attribute in svg content as being of type ID, and an xml implementation that loads external DTDs should also recognize it correctly if the file is tagged appropriately. Loading external DTDs is optional in XML, so the proper way of fixing this would be to make the parser svg-aware.
Definition of 'id' in SVG 1.1 DTD: http://www.w3.org/TR/SVG11/svgdtd.html#DTD.1.4
Adding a little bit to MattH's great example when you use xpath and you know the namespace you can do things like
pub_name = data.xpath('//dc:publisher/cc:Agent/dc:title',
namespaces=nsmap)[0].text
This will give direct access to the element tag text that you want.
精彩评论