xml - insert/ignore a missing namespace in LXML -
i have parse malformed xml:
>>> lxml import etree >>> root = etree.fromstring(xml_string) xmlsyntaxerror: namespace prefix xlink href on email not defined, line 3, column 2446
xlink
indeed missing among declarations.
is there easy, recommended way tell lxml
ignore missing namespaces, or use supplied one?
right now, manually modify xml_string
inject namespace before parsing, works ugly , not general enough.
there no way tell lxml insert missing namespace declaration. 1 might imagine
etree.register_namespace("xlink", "http://www.w3.org/1999/xlink")
could help, has no effect.
even if "ugly", think you'll have continue inject namespace before parsing xml document (perhaps can automate if haven't already).
it is possible make lxml accept malformed input using parser object initialized recover=true
. example:
import lxml.etree etree input = """\ <root> <x:a>abc</x:a> </root>""" parser = etree.xmlparser(recover=true) tree = etree.fromstring(input, parser) print etree.tostring(tree)
output:
<root> <a>abc</a> </root>
here prefix removed, , don't think want. namespaces there reason; can't tossed away.
Comments
Post a Comment