Reading a binary file on backwards using python -
i try read file backwards (from end begin). example below this, ask community - there more elegant solution question?
import os, binascii chunk = 10 #read file blocks (big size) src_file_path = 'd:\\src\\python\\test\\main.zip' src_file_size = os.path.getsize(src_file_path) src_file = open(src_file_path, 'rb') #open in binary mode while src_file_size > 0: #read file last byte first :) if src_file_size > chunk: src_file.seek(src_file_size - chunk) byte_list = src_file.read(chunk) else: src_file.seek(0) byte_list = src_file.read(src_file_size) s = binascii.hexlify(byte_list) #convert '\xfb' -> 'fb' byte_list = [(chr(s[i]) + chr(s[i+1])) in range(0, len(s), 2)] #split, note below print(byte_list[::-1]) #output reverse list src_file_size = src_file_size - chunk src_file.close() #close file
upd know opinion of experts - need pay attention newbie in python? there potential flaw in code?
thanks in advance.
i'm using python 3.3.1 note: split bytes here!
i can see several things improved in code question. firstly, while
loop used in python because there better way express same using for
loop or using built-in functions.
i guess code purely training purpose or so. otherwise, ask first real goal (because knowing problem, better solution may different first idea).
the goal here positions seek
. know size, know chunk size, want go backwards. there built-in generator purpose in python named range
. single argument used; however, range(start, stop, step)
full form. generator can iterated in for
loop, or can use values build list of them (but not need later case). positions seek
can generated this:
chunk = 10 sz = 235 lst = list(range(sz - chunk, 0, -chunk)) print(lst)
i.e., start sz - chunk
position, stop @ 0 (not often) using negative value next generated value. here list()
iterates through values , builds list of them. can iterate directly through generated values:
for pos in range(sz - chunk, 0, -chunk): print('seek({}) , read({})'.format(pos, chunk)) if pos > 0: print('seek({}) , read({})'.format(0, pos))
the last generated position or 0 or positive. way, last if
processes last portion when shorter chunk
. putting above code together, prints:
c:\tmp\_python\wikicsm\so16443185>py a.py [225, 215, 205, 195, 185, 175, 165, 155, 145, 135, 125, 115, 105, 95, 85, 75, 65, 55, 45, 35, 25, 15, 5] seek(225) , read(10) seek(215) , read(10) seek(205) , read(10) seek(195) , read(10) seek(185) , read(10) seek(175) , read(10) seek(165) , read(10) seek(155) , read(10) seek(145) , read(10) seek(135) , read(10) seek(125) , read(10) seek(115) , read(10) seek(105) , read(10) seek(95) , read(10) seek(85) , read(10) seek(75) , read(10) seek(65) , read(10) seek(55) , read(10) seek(45) , read(10) seek(35) , read(10) seek(25) , read(10) seek(15) , read(10) seek(5) , read(10) seek(0) , read(5)
i replace print
's calling function take file object, pos, , chunk size. here faked body produce same prints:
#!python3 import os def processchunk(f, pos, chunk_size): print('faked f: seek({}) , read({})'.format(pos, chunk_size)) fname = 'a.txt' sz = os.path.getsize(fname) # not checking existence simplicity chunk = 16 open(fname, 'rb') f: pos in range(sz - chunk, 0, -chunk): processchunk(f, pos, chunk) if pos > 0: processchunk(f, 0, pos)
the with
construct 1 learn. (warning, nothing similar pascal's with
.) closes file object automatically after block ends. notice code below with
is more readable , need not changed in future. processchunk
developed further:
def processchunk(f, pos, chunk_size): f.seek(pos) s = binascii.hexlify(f.read(chunk_size)) print(s)
or can change result reversed hexdump (the full code tested on computer):
#!python3 import binascii import os def processchunk(f, pos, chunk_size): f.seek(pos) b = f.read(chunk_size) b1 = b[:8] # first 8 bytes b2 = b[8:] # rest s1 = ' '.join('{:02x}'.format(x) x in b1) s2 = ' '.join('{:02x}'.format(x) x in b2) print('{:08x}:'.format(pos), s1, '|', s2) fname = 'a.txt' sz = os.path.getsize(fname) # not checking existence simplicity chunk = 16 open(fname, 'rb') f: pos in range(sz - chunk, 0, -chunk): processchunk(f, pos, chunk) if pos > 0: processchunk(f, 0, pos)
when a.txt
copy of last code, produces:
c:\tmp\_python\wikicsm\so16443185>py d.py 00000274: 75 6e 6b 28 66 2c 20 30 | 2c 20 70 6f 73 29 0d 0a 00000264: 20 20 20 20 20 20 20 70 | 72 6f 63 65 73 73 43 68 00000254: 20 20 69 66 20 70 6f 73 | 20 3e 20 30 3a 0d 0a 20 00000244: 6f 73 2c 20 63 68 75 6e | 6b 29 0d 0a 0d 0a 20 20 00000234: 72 6f 63 65 73 73 43 68 | 75 6e 6b 28 66 2c 20 70 00000224: 75 6e 6b 29 3a 0d 0a 20 | 20 20 20 20 20 20 20 70 00000214: 20 2d 20 63 68 75 6e 6b | 2c 20 30 2c 20 2d 63 68 00000204: 20 70 6f 73 20 69 6e 20 | 72 61 6e 67 65 28 73 7a 000001f4: 61 73 20 66 3a 0d 0a 0d | 0a 20 20 20 20 66 6f 72 000001e4: 65 6e 28 66 6e 61 6d 65 | 2c 20 27 72 62 27 29 20 000001d4: 20 3d 20 31 36 0d 0a 0d | 0a 77 69 74 68 20 6f 70 000001c4: 69 6d 70 6c 69 63 69 74 | 79 0d 0a 63 68 75 6e 6b 000001b4: 20 65 78 69 73 74 65 6e | 63 65 20 66 6f 72 20 73 000001a4: 20 20 23 20 6e 6f 74 20 | 63 68 65 63 6b 69 6e 67 00000194: 65 74 73 69 7a 65 28 66 | 6e 61 6d 65 29 20 20 20 00000184: 0d 0a 73 7a 20 3d 20 6f | 73 2e 70 61 74 68 2e 67 00000174: 0a 66 6e 61 6d 65 20 3d | 20 27 61 2e 74 78 74 27 00000164: 31 2c 20 27 7c 27 2c 20 | 73 32 29 0d 0a 0d 0a 0d 00000154: 27 2e 66 6f 72 6d 61 74 | 28 70 6f 73 29 2c 20 73 00000144: 20 20 70 72 69 6e 74 28 | 27 7b 3a 30 38 78 7d 3a 00000134: 66 6f 72 20 78 20 69 6e | 20 62 32 29 0d 0a 20 20 00000124: 30 32 78 7d 27 2e 66 6f | 72 6d 61 74 28 78 29 20 00000114: 32 20 3d 20 27 20 27 2e | 6a 6f 69 6e 28 27 7b 3a 00000104: 20 78 20 69 6e 20 62 31 | 29 0d 0a 20 20 20 20 73 000000f4: 7d 27 2e 66 6f 72 6d 61 | 74 28 78 29 20 66 6f 72 000000e4: 20 27 20 27 2e 6a 6f 69 | 6e 28 27 7b 3a 30 32 78 000000d4: 65 20 72 65 73 74 0d 0a | 20 20 20 20 73 31 20 3d 000000c4: 20 20 20 20 20 20 20 20 | 20 20 20 20 23 20 74 68 000000b4: 62 32 20 3d 20 62 5b 38 | 3a 5d 20 20 20 20 20 20 000000a4: 73 74 20 38 20 62 79 74 | 65 73 0d 0a 20 20 20 20 00000094: 20 20 20 20 20 20 20 20 | 20 20 20 23 20 66 69 72 00000084: 31 20 3d 20 62 5b 3a 38 | 5d 20 20 20 20 20 20 20 00000074: 75 6e 6b 5f 73 69 7a 65 | 29 0d 0a 20 20 20 20 62 00000064: 20 20 20 62 20 3d 20 66 | 2e 72 65 61 64 28 63 68 00000054: 20 20 66 2e 73 65 65 6b | 28 70 6f 73 29 0d 0a 20 00000044: 63 68 75 6e 6b 5f 73 69 | 7a 65 29 3a 0d 0a 20 20 00000034: 73 73 43 68 75 6e 6b 28 | 66 2c 20 70 6f 73 2c 20 00000024: 20 6f 73 0d 0a 0d 0a 64 | 65 66 20 70 72 6f 63 65 00000014: 62 69 6e 61 73 63 69 69 | 0d 0a 69 6d 70 6f 72 74 00000004: 74 68 6f 6e 33 0d 0a 0d | 0a 69 6d 70 6f 72 74 20 00000000: 23 21 70 79 |
for src_file_path = 'd:\\src\\python\\test\\main.zip'
, can use forward slashes src_file_path = 'd:/src/python/test/main.zip'
in windows. or can use raw strings src_file_path = r'd:\src\python\test\main.zip'. last case used when need avoid doubling backslashes -- when writing regular expresions.
Comments
Post a Comment