python - I want to save my parsed HTML file into TXT file -


i've parsed web page showing article. want save parsed data text file, python shell shows error this:

unicodeencodeerror: 'ascii' codec can't encode character u'\u2019' in position 107: ordinal not in range(128) 

and here part of code

search_result = urllib.urlopen(url) f = search_result.read() #xml parsing parsedresult = xml.dom.minidom.parsestring(f) linklist = parsedresult.getelementsbytagname('link') #extracting links extractedurl = linklist[3].firstchild.nodevalue #pick 1 link page = urllib.urlopen(extractedurl).read() #making html file g= open('yyyy.html', 'w')  g.write(page) g.close() #reading html file , parsing html pure text of article g= open('yyyy.html', 'r') bs = beautifulsoup(g,fromencoding="utf-8") g.close() article = bs.find(id="articlebody") content = article.get_text() #save text file h= open('yyyy.txt', 'w') h.write(content) h.close() 

what should add make work?

try with

import codecs h = codecs.open('yyyy.txt', 'w', 'utf-8') 

or using python 3.


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

asp.net - Razor Page Hosted on IIS 6 Fails Every Morning -

c++ - wxwidget compiling on windows command prompt -