Reading a binary file on backwards using python -
i try read file backwards (from end begin). example below this, ask community - there more elegant solution question?
import os, binascii chunk = 10 #read file blocks (big size) src_file_path = 'd:\\src\\python\\test\\main.zip' src_file_size = os.path.getsize(src_file_path) src_file = open(src_file_path, 'rb') #open in binary mode while src_file_size > 0: #read file last byte first :) if src_file_size > chunk: src_file.seek(src_file_size - chunk) byte_list = src_file.read(chunk) else: src_file.seek(0) byte_list = src_file.read(src_file_size) s = binascii.hexlify(byte_list) #convert '\xfb' -> 'fb' byte_list = [(chr(s[i]) + chr(s[i+1])) in range(0, len(s), 2)] #split, note below print(byte_list[::-1]) #output reverse list src_file_size = src_file_size - chunk src_file.close() #close file upd know opinion of experts - need pay attention newbie in python? there potential flaw in code?
thanks in advance.
i'm using python 3.3.1 note: split bytes here!
i can see several things improved in code question. firstly, while loop used in python because there better way express same using for loop or using built-in functions.
i guess code purely training purpose or so. otherwise, ask first real goal (because knowing problem, better solution may different first idea).
the goal here positions seek. know size, know chunk size, want go backwards. there built-in generator purpose in python named range. single argument used; however, range(start, stop, step) full form. generator can iterated in for loop, or can use values build list of them (but not need later case). positions seek can generated this:
chunk = 10 sz = 235 lst = list(range(sz - chunk, 0, -chunk)) print(lst) i.e., start sz - chunk position, stop @ 0 (not often) using negative value next generated value. here list() iterates through values , builds list of them. can iterate directly through generated values:
for pos in range(sz - chunk, 0, -chunk): print('seek({}) , read({})'.format(pos, chunk)) if pos > 0: print('seek({}) , read({})'.format(0, pos)) the last generated position or 0 or positive. way, last if processes last portion when shorter chunk. putting above code together, prints:
c:\tmp\_python\wikicsm\so16443185>py a.py [225, 215, 205, 195, 185, 175, 165, 155, 145, 135, 125, 115, 105, 95, 85, 75, 65, 55, 45, 35, 25, 15, 5] seek(225) , read(10) seek(215) , read(10) seek(205) , read(10) seek(195) , read(10) seek(185) , read(10) seek(175) , read(10) seek(165) , read(10) seek(155) , read(10) seek(145) , read(10) seek(135) , read(10) seek(125) , read(10) seek(115) , read(10) seek(105) , read(10) seek(95) , read(10) seek(85) , read(10) seek(75) , read(10) seek(65) , read(10) seek(55) , read(10) seek(45) , read(10) seek(35) , read(10) seek(25) , read(10) seek(15) , read(10) seek(5) , read(10) seek(0) , read(5) i replace print's calling function take file object, pos, , chunk size. here faked body produce same prints:
#!python3 import os def processchunk(f, pos, chunk_size): print('faked f: seek({}) , read({})'.format(pos, chunk_size)) fname = 'a.txt' sz = os.path.getsize(fname) # not checking existence simplicity chunk = 16 open(fname, 'rb') f: pos in range(sz - chunk, 0, -chunk): processchunk(f, pos, chunk) if pos > 0: processchunk(f, 0, pos) the with construct 1 learn. (warning, nothing similar pascal's with.) closes file object automatically after block ends. notice code below withis more readable , need not changed in future. processchunk developed further:
def processchunk(f, pos, chunk_size): f.seek(pos) s = binascii.hexlify(f.read(chunk_size)) print(s) or can change result reversed hexdump (the full code tested on computer):
#!python3 import binascii import os def processchunk(f, pos, chunk_size): f.seek(pos) b = f.read(chunk_size) b1 = b[:8] # first 8 bytes b2 = b[8:] # rest s1 = ' '.join('{:02x}'.format(x) x in b1) s2 = ' '.join('{:02x}'.format(x) x in b2) print('{:08x}:'.format(pos), s1, '|', s2) fname = 'a.txt' sz = os.path.getsize(fname) # not checking existence simplicity chunk = 16 open(fname, 'rb') f: pos in range(sz - chunk, 0, -chunk): processchunk(f, pos, chunk) if pos > 0: processchunk(f, 0, pos) when a.txt copy of last code, produces:
c:\tmp\_python\wikicsm\so16443185>py d.py 00000274: 75 6e 6b 28 66 2c 20 30 | 2c 20 70 6f 73 29 0d 0a 00000264: 20 20 20 20 20 20 20 70 | 72 6f 63 65 73 73 43 68 00000254: 20 20 69 66 20 70 6f 73 | 20 3e 20 30 3a 0d 0a 20 00000244: 6f 73 2c 20 63 68 75 6e | 6b 29 0d 0a 0d 0a 20 20 00000234: 72 6f 63 65 73 73 43 68 | 75 6e 6b 28 66 2c 20 70 00000224: 75 6e 6b 29 3a 0d 0a 20 | 20 20 20 20 20 20 20 70 00000214: 20 2d 20 63 68 75 6e 6b | 2c 20 30 2c 20 2d 63 68 00000204: 20 70 6f 73 20 69 6e 20 | 72 61 6e 67 65 28 73 7a 000001f4: 61 73 20 66 3a 0d 0a 0d | 0a 20 20 20 20 66 6f 72 000001e4: 65 6e 28 66 6e 61 6d 65 | 2c 20 27 72 62 27 29 20 000001d4: 20 3d 20 31 36 0d 0a 0d | 0a 77 69 74 68 20 6f 70 000001c4: 69 6d 70 6c 69 63 69 74 | 79 0d 0a 63 68 75 6e 6b 000001b4: 20 65 78 69 73 74 65 6e | 63 65 20 66 6f 72 20 73 000001a4: 20 20 23 20 6e 6f 74 20 | 63 68 65 63 6b 69 6e 67 00000194: 65 74 73 69 7a 65 28 66 | 6e 61 6d 65 29 20 20 20 00000184: 0d 0a 73 7a 20 3d 20 6f | 73 2e 70 61 74 68 2e 67 00000174: 0a 66 6e 61 6d 65 20 3d | 20 27 61 2e 74 78 74 27 00000164: 31 2c 20 27 7c 27 2c 20 | 73 32 29 0d 0a 0d 0a 0d 00000154: 27 2e 66 6f 72 6d 61 74 | 28 70 6f 73 29 2c 20 73 00000144: 20 20 70 72 69 6e 74 28 | 27 7b 3a 30 38 78 7d 3a 00000134: 66 6f 72 20 78 20 69 6e | 20 62 32 29 0d 0a 20 20 00000124: 30 32 78 7d 27 2e 66 6f | 72 6d 61 74 28 78 29 20 00000114: 32 20 3d 20 27 20 27 2e | 6a 6f 69 6e 28 27 7b 3a 00000104: 20 78 20 69 6e 20 62 31 | 29 0d 0a 20 20 20 20 73 000000f4: 7d 27 2e 66 6f 72 6d 61 | 74 28 78 29 20 66 6f 72 000000e4: 20 27 20 27 2e 6a 6f 69 | 6e 28 27 7b 3a 30 32 78 000000d4: 65 20 72 65 73 74 0d 0a | 20 20 20 20 73 31 20 3d 000000c4: 20 20 20 20 20 20 20 20 | 20 20 20 20 23 20 74 68 000000b4: 62 32 20 3d 20 62 5b 38 | 3a 5d 20 20 20 20 20 20 000000a4: 73 74 20 38 20 62 79 74 | 65 73 0d 0a 20 20 20 20 00000094: 20 20 20 20 20 20 20 20 | 20 20 20 23 20 66 69 72 00000084: 31 20 3d 20 62 5b 3a 38 | 5d 20 20 20 20 20 20 20 00000074: 75 6e 6b 5f 73 69 7a 65 | 29 0d 0a 20 20 20 20 62 00000064: 20 20 20 62 20 3d 20 66 | 2e 72 65 61 64 28 63 68 00000054: 20 20 66 2e 73 65 65 6b | 28 70 6f 73 29 0d 0a 20 00000044: 63 68 75 6e 6b 5f 73 69 | 7a 65 29 3a 0d 0a 20 20 00000034: 73 73 43 68 75 6e 6b 28 | 66 2c 20 70 6f 73 2c 20 00000024: 20 6f 73 0d 0a 0d 0a 64 | 65 66 20 70 72 6f 63 65 00000014: 62 69 6e 61 73 63 69 69 | 0d 0a 69 6d 70 6f 72 74 00000004: 74 68 6f 6e 33 0d 0a 0d | 0a 69 6d 70 6f 72 74 20 00000000: 23 21 70 79 | for src_file_path = 'd:\\src\\python\\test\\main.zip', can use forward slashes src_file_path = 'd:/src/python/test/main.zip' in windows. or can use raw strings src_file_path = r'd:\src\python\test\main.zip'. last case used when need avoid doubling backslashes -- when writing regular expresions.
Comments
Post a Comment