python - I want to save my parsed HTML file into TXT file -
i've parsed web page showing article. want save parsed data text file, python shell shows error this:
unicodeencodeerror: 'ascii' codec can't encode character u'\u2019' in position 107: ordinal not in range(128)
and here part of code
search_result = urllib.urlopen(url) f = search_result.read() #xml parsing parsedresult = xml.dom.minidom.parsestring(f) linklist = parsedresult.getelementsbytagname('link') #extracting links extractedurl = linklist[3].firstchild.nodevalue #pick 1 link page = urllib.urlopen(extractedurl).read() #making html file g= open('yyyy.html', 'w') g.write(page) g.close() #reading html file , parsing html pure text of article g= open('yyyy.html', 'r') bs = beautifulsoup(g,fromencoding="utf-8") g.close() article = bs.find(id="articlebody") content = article.get_text() #save text file h= open('yyyy.txt', 'w') h.write(content) h.close()
what should add make work?
try with
import codecs h = codecs.open('yyyy.txt', 'w', 'utf-8')
or using python 3.
Comments
Post a Comment