Reading a binary file on backwards using python -


i try read file backwards (from end begin). example below this, ask community - there more elegant solution question?

import os, binascii  chunk = 10 #read file blocks (big size) src_file_path = 'd:\\src\\python\\test\\main.zip'  src_file_size = os.path.getsize(src_file_path) src_file = open(src_file_path, 'rb') #open in binary mode while src_file_size > 0:     #read file last byte first :)     if src_file_size > chunk:         src_file.seek(src_file_size - chunk)         byte_list = src_file.read(chunk)     else:         src_file.seek(0)         byte_list = src_file.read(src_file_size)     s = binascii.hexlify(byte_list) #convert '\xfb' -> 'fb'     byte_list = [(chr(s[i]) + chr(s[i+1])) in range(0, len(s), 2)] #split, note below     print(byte_list[::-1]) #output reverse list     src_file_size = src_file_size - chunk src_file.close() #close file 

upd know opinion of experts - need pay attention newbie in python? there potential flaw in code?

thanks in advance.

i'm using python 3.3.1 note: split bytes here!

i can see several things improved in code question. firstly, while loop used in python because there better way express same using for loop or using built-in functions.

i guess code purely training purpose or so. otherwise, ask first real goal (because knowing problem, better solution may different first idea).

the goal here positions seek. know size, know chunk size, want go backwards. there built-in generator purpose in python named range. single argument used; however, range(start, stop, step) full form. generator can iterated in for loop, or can use values build list of them (but not need later case). positions seek can generated this:

chunk = 10 sz = 235  lst = list(range(sz - chunk, 0, -chunk)) print(lst) 

i.e., start sz - chunk position, stop @ 0 (not often) using negative value next generated value. here list() iterates through values , builds list of them. can iterate directly through generated values:

for pos in range(sz - chunk, 0, -chunk):     print('seek({}) , read({})'.format(pos, chunk))  if pos > 0:     print('seek({}) , read({})'.format(0, pos)) 

the last generated position or 0 or positive. way, last if processes last portion when shorter chunk. putting above code together, prints:

c:\tmp\_python\wikicsm\so16443185>py a.py [225, 215, 205, 195, 185, 175, 165, 155, 145, 135, 125, 115, 105, 95, 85, 75, 65, 55, 45, 35, 25, 15, 5] seek(225) , read(10) seek(215) , read(10) seek(205) , read(10) seek(195) , read(10) seek(185) , read(10) seek(175) , read(10) seek(165) , read(10) seek(155) , read(10) seek(145) , read(10) seek(135) , read(10) seek(125) , read(10) seek(115) , read(10) seek(105) , read(10) seek(95) , read(10) seek(85) , read(10) seek(75) , read(10) seek(65) , read(10) seek(55) , read(10) seek(45) , read(10) seek(35) , read(10) seek(25) , read(10) seek(15) , read(10) seek(5) , read(10) seek(0) , read(5) 

i replace print's calling function take file object, pos, , chunk size. here faked body produce same prints:

#!python3 import os  def processchunk(f, pos, chunk_size):     print('faked f: seek({}) , read({})'.format(pos, chunk_size))   fname = 'a.txt' sz = os.path.getsize(fname)     # not checking existence simplicity chunk = 16  open(fname, 'rb') f:     pos in range(sz - chunk, 0, -chunk):         processchunk(f, pos, chunk)      if pos > 0:         processchunk(f, 0, pos) 

the with construct 1 learn. (warning, nothing similar pascal's with.) closes file object automatically after block ends. notice code below withis more readable , need not changed in future. processchunk developed further:

def processchunk(f, pos, chunk_size):     f.seek(pos)     s = binascii.hexlify(f.read(chunk_size))     print(s) 

or can change result reversed hexdump (the full code tested on computer):

#!python3  import binascii import os  def processchunk(f, pos, chunk_size):     f.seek(pos)     b = f.read(chunk_size)     b1 = b[:8]                  # first 8 bytes     b2 = b[8:]                  # rest     s1 = ' '.join('{:02x}'.format(x) x in b1)     s2 = ' '.join('{:02x}'.format(x) x in b2)     print('{:08x}:'.format(pos), s1, '|', s2)   fname = 'a.txt' sz = os.path.getsize(fname)     # not checking existence simplicity chunk = 16  open(fname, 'rb') f:      pos in range(sz - chunk, 0, -chunk):         processchunk(f, pos, chunk)      if pos > 0:         processchunk(f, 0, pos) 

when a.txt copy of last code, produces:

c:\tmp\_python\wikicsm\so16443185>py d.py 00000274: 75 6e 6b 28 66 2c 20 30 | 2c 20 70 6f 73 29 0d 0a 00000264: 20 20 20 20 20 20 20 70 | 72 6f 63 65 73 73 43 68 00000254: 20 20 69 66 20 70 6f 73 | 20 3e 20 30 3a 0d 0a 20 00000244: 6f 73 2c 20 63 68 75 6e | 6b 29 0d 0a 0d 0a 20 20 00000234: 72 6f 63 65 73 73 43 68 | 75 6e 6b 28 66 2c 20 70 00000224: 75 6e 6b 29 3a 0d 0a 20 | 20 20 20 20 20 20 20 70 00000214: 20 2d 20 63 68 75 6e 6b | 2c 20 30 2c 20 2d 63 68 00000204: 20 70 6f 73 20 69 6e 20 | 72 61 6e 67 65 28 73 7a 000001f4: 61 73 20 66 3a 0d 0a 0d | 0a 20 20 20 20 66 6f 72 000001e4: 65 6e 28 66 6e 61 6d 65 | 2c 20 27 72 62 27 29 20 000001d4: 20 3d 20 31 36 0d 0a 0d | 0a 77 69 74 68 20 6f 70 000001c4: 69 6d 70 6c 69 63 69 74 | 79 0d 0a 63 68 75 6e 6b 000001b4: 20 65 78 69 73 74 65 6e | 63 65 20 66 6f 72 20 73 000001a4: 20 20 23 20 6e 6f 74 20 | 63 68 65 63 6b 69 6e 67 00000194: 65 74 73 69 7a 65 28 66 | 6e 61 6d 65 29 20 20 20 00000184: 0d 0a 73 7a 20 3d 20 6f | 73 2e 70 61 74 68 2e 67 00000174: 0a 66 6e 61 6d 65 20 3d | 20 27 61 2e 74 78 74 27 00000164: 31 2c 20 27 7c 27 2c 20 | 73 32 29 0d 0a 0d 0a 0d 00000154: 27 2e 66 6f 72 6d 61 74 | 28 70 6f 73 29 2c 20 73 00000144: 20 20 70 72 69 6e 74 28 | 27 7b 3a 30 38 78 7d 3a 00000134: 66 6f 72 20 78 20 69 6e | 20 62 32 29 0d 0a 20 20 00000124: 30 32 78 7d 27 2e 66 6f | 72 6d 61 74 28 78 29 20 00000114: 32 20 3d 20 27 20 27 2e | 6a 6f 69 6e 28 27 7b 3a 00000104: 20 78 20 69 6e 20 62 31 | 29 0d 0a 20 20 20 20 73 000000f4: 7d 27 2e 66 6f 72 6d 61 | 74 28 78 29 20 66 6f 72 000000e4: 20 27 20 27 2e 6a 6f 69 | 6e 28 27 7b 3a 30 32 78 000000d4: 65 20 72 65 73 74 0d 0a | 20 20 20 20 73 31 20 3d 000000c4: 20 20 20 20 20 20 20 20 | 20 20 20 20 23 20 74 68 000000b4: 62 32 20 3d 20 62 5b 38 | 3a 5d 20 20 20 20 20 20 000000a4: 73 74 20 38 20 62 79 74 | 65 73 0d 0a 20 20 20 20 00000094: 20 20 20 20 20 20 20 20 | 20 20 20 23 20 66 69 72 00000084: 31 20 3d 20 62 5b 3a 38 | 5d 20 20 20 20 20 20 20 00000074: 75 6e 6b 5f 73 69 7a 65 | 29 0d 0a 20 20 20 20 62 00000064: 20 20 20 62 20 3d 20 66 | 2e 72 65 61 64 28 63 68 00000054: 20 20 66 2e 73 65 65 6b | 28 70 6f 73 29 0d 0a 20 00000044: 63 68 75 6e 6b 5f 73 69 | 7a 65 29 3a 0d 0a 20 20 00000034: 73 73 43 68 75 6e 6b 28 | 66 2c 20 70 6f 73 2c 20 00000024: 20 6f 73 0d 0a 0d 0a 64 | 65 66 20 70 72 6f 63 65 00000014: 62 69 6e 61 73 63 69 69 | 0d 0a 69 6d 70 6f 72 74 00000004: 74 68 6f 6e 33 0d 0a 0d | 0a 69 6d 70 6f 72 74 20 00000000: 23 21 70 79 | 

for src_file_path = 'd:\\src\\python\\test\\main.zip', can use forward slashes src_file_path = 'd:/src/python/test/main.zip' in windows. or can use raw strings src_file_path = r'd:\src\python\test\main.zip'. last case used when need avoid doubling backslashes -- when writing regular expresions.


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -