python - I want to save my parsed HTML file into TXT file -

- July 15, 2013

i've parsed web page showing article. want save parsed data text file, python shell shows error this:

unicodeencodeerror: 'ascii' codec can't encode character u'\u2019' in position 107: ordinal not in range(128)

and here part of code

search_result = urllib.urlopen(url) f = search_result.read() #xml parsing parsedresult = xml.dom.minidom.parsestring(f) linklist = parsedresult.getelementsbytagname('link') #extracting links extractedurl = linklist[3].firstchild.nodevalue #pick 1 link page = urllib.urlopen(extractedurl).read() #making html file g= open('yyyy.html', 'w')  g.write(page) g.close() #reading html file , parsing html pure text of article g= open('yyyy.html', 'r') bs = beautifulsoup(g,fromencoding="utf-8") g.close() article = bs.find(id="articlebody") content = article.get_text() #save text file h= open('yyyy.txt', 'w') h.write(content) h.close()

what should add make work?

try with

import codecs h = codecs.open('yyyy.txt', 'w', 'utf-8')

or using python 3.

Search This Blog

Permission

python - I want to save my parsed HTML file into TXT file -

Comments

Post a Comment

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -