File based fuzzing and PyDbg
So, the main steps are as follows (as can be recalled from the above mentioned post):
- Hook the call to function CreateFile() to know if it is the right file
- Hook the call to function ReadFile() (or fgets(), for example), if it corresponds to a file handle that we are interested in.
- Set a memory breakpoint at the buffer where ReadFile has written the file data.
- On memory access violation, calculate the offset that “may” correspond to a offset in the file.
dbg=pydbg() # get the instance of pydbg class to be used in mainThe callback function that may be used for CreateFile hook:
hooks = utils.hook_container() # get instance of hooks that will be used to add/delete hooks
def CreateFileReturn(dbg, argu, ret):We are mainly checking the 1st argument of the CreateFile (argu[0]) because it is the name of the file that we want to open. It returns file handler to the file. We add it to the dictionary openedFiles. From this, we can know which ReadFile calls to monitor!! Remember to remove this entry from openedFiles, once that handler has been close (how??.. well hook CloseHandle()).
print “Exiting CreateFile”
dataMem=dbg.read_process_memory(argu[0],100)
fileName=dbg.get_ascii_string(dataMem)
#print “going to create file: 0x%08x “%argu[0]
if fileName is not False:
#print “created file: “,fileName
if re.search(“.(mp3)|(pdf)”, fileName):
print “created file: “,fileName
openedFiles[ret]=fileName
print “return val: “,hex(ret)
return DBG_CONTINUE
Now, we are ready to hook ReadFile call. The 2nd argument to ReadFile is the buffer where the data is copied. Therefore, on return, we want to set a memory breakpoint at the address of the buffer.
def ReadFileReturn(dbg, argu, ret):In the above code, argu[1] is the address of the buffer and argu[2] is the length of the buffer. It should be noted that memory breakpoints are set of page boundary in which a the required buffer is located. Memory access violation is triggered if any address between belonging to the buffer is accessed. The following pictorial diagram may help in understanding the structure:
#print “Exiting ReadFile”
for k,v in openedFiles.iteritems():
if argu[0]==k:
print “setting mem BP from 0x%08x to 0x%08x”% (argu[1],argu[1]+argu[2])
dbg.bp_set_mem(argu[1],argu[2],description=”, handler=buffer_access_handler)
break
#print “return val %d”%ret
return DBG_CONTINUE
<page#1>From the above diagram, it takes a trivial arithmetic to calculate the offset of the content that is accessed:
…
…
<buffer-start>
…
<address accessed>
…
<buffer-end>
…
<page#1 end>
OFFSET = <address accessed> – <buffer-start>How do we know the <address accessed>?? Well.. it is also very simple, provided we peep into the Pydbg class code. pydbg class has several class variables that are not exposed (into the API documentation). Two very pertinent to our problem are:
self.memory_breakpoint_hit and self.violation_address. The 1st one is the address of the breakpoint that got hit i.e. address of the buffer and 2nd is the exact address that caused the memory access violation i.e. <address accessed>. Now based on the above formula, we can calculate the offset in the file. As I mentioned earlier, we assume that whole file is copied into the buffer and if this holds, the offset corresponds to the position of the file content that was read/written. Now, we can just fuzz text around this offset to see if we get lucky!! The access violation handler, used in ReadFile hook function (i.e. buffer_access_handler) may have the following code:
def buffer_access_handler(dbg):So, PyDbg has many (hidden) interesting features to make things easier!!! I shall be posting a more detailed post with complete code which does more than what is explained here.. till then.. happy PyDbging :)
if dbg.bp_is_ours_mem(dbg.violation_address) ==False:
print “not belonging to mem BP”
return DBG_CONTINUE
print “buffer accessed at bp 0x%08x\n”%dbg.memory_breakpoint_hit
# check if it is a read or write access violation
if dbg.write_violation:
print “write violation from %08x on %08x of mem bp” % (dbg.exception_address, dbg.violation_address)
else:
print “read violation from %08x on %08x of mem bp” % (dbg.exception_address, dbg.violation_address)
inst=dbg.disasm(dbg.context.Eip)
print “## 0x%08x\t%s offset: 0x%08x ##”% (dbg.context.Eip,inst,dbg.violation_address – dbg.memory_breakpoint_hit )
return DBG_CONTINUE
Labels: Dynamic Analysis, File Fuzzing, PyDbg