xml - insert/ignore a missing namespace in LXML -
i have parse malformed xml:
>>> lxml import etree >>> root = etree.fromstring(xml_string) xmlsyntaxerror: namespace prefix xlink href on email not defined, line 3, column 2446 xlink indeed missing among declarations.
is there easy, recommended way tell lxml ignore missing namespaces, or use supplied one?
right now, manually modify xml_string inject namespace before parsing, works ugly , not general enough.
there no way tell lxml insert missing namespace declaration. 1 might imagine
etree.register_namespace("xlink", "http://www.w3.org/1999/xlink") could help, has no effect.
even if "ugly", think you'll have continue inject namespace before parsing xml document (perhaps can automate if haven't already).
it is possible make lxml accept malformed input using parser object initialized recover=true. example:
import lxml.etree etree input = """\ <root> <x:a>abc</x:a> </root>""" parser = etree.xmlparser(recover=true) tree = etree.fromstring(input, parser) print etree.tostring(tree) output:
<root> <a>abc</a> </root> here prefix removed, , don't think want. namespaces there reason; can't tossed away.
Comments
Post a Comment