There are plenty of reasons to transfer data from fortran to python. First of all python has the number of perfect tools for visualization such as matplotlib, MayaVi and others. Sometimes it’s just easier to process your data there.
And the question is how to transfer a 1 Gb matrix of doubles to python for example. Of course parsing text files is not an option for such big files. After googling a little bit I have found that fortran has stupid binary format that inserts additional data into your flat binary file. That makes parsing it a bit trickier. You can read more about it here. Luckily at least for x86 and x64 ifort compiler users there is another option.
If you choose an option form = 'binary' in fortran it will give you the normal c-like binary file output.
Here is the code for reading such files from fortranopen(UNIT = 88, file = 'array1.bin', status='unknown', FORM = 'BINARY') do k = 1, nz do j = 1, ny write(88) (test_array(i,j,k), i=1, nx) end do end do close(88)
Here is the python reading and writing functions using two different modulesopen(UNIT = 77, file = 'arr.bin', status='unknown', FORM = 'BINARY') do k = 1, nz do j = 1, ny read(77) (test_arr(i,j,k), i = 1, nx) end do end do close(77)
As you can check second approach is faster. I have got this timings for 0.8Gb binary file:import struct import numpy as np import string import sys def readarray(fname, nx, ny, nz, datatype = 'float'): if datatype.lower() == 'float': dtype_str = 'f' dtype_int = 4 elif datatype.lower() == 'double': dtype_str = 'd' dtype_int = 8 else: print "Unknown datatype" sys.exit() data = np.zeros((nx,ny,nz), np.float) fmt = str(nx) + dtype_str with open(fname, 'rb') as f: for k in xrange(nz): for j in xrange(ny): bytes = f.read(nx*dtype_int) data[:,j,k] = struct.unpack(fmt, bytes) f.close() return data def readarray2(fname, nx, ny, nz, datatype = 'float'): if datatype.lower() == 'float': dtype_str = np.float32 dtype_int = 4 elif datatype.lower() == 'double': dtype_str = np.float64 dtype_int = 8 else: print "Unknown datatype" sys.exit() data = np.zeros((nx,ny,nz), np.float) with open(fname, 'rb') as f: for k in xrange(nz): for j in xrange(ny): bytes = f.read(nx*dtype_int) data[:,j,k] = np.fromstring(bytes,dtype = dtype_str, count=nx) f.close() return data def writearray(fname, data): if data.itemsize == 4: datatype = 'f' elif data.itemsize == 8: datatype = 'd' else: print "Unknown datatype" sys.exit() with open(fname, 'wb') as f: for k in xrange(data.shape[2]): for j in xrange(data.shape[1]): s = struct.pack(datatype*len(data[:,j,k]), *data[:,j,k]) f.write(s) f.close() def writearray2(fname, data): with open(fname, 'wb') as f: for k in xrange(data.shape[2]): for j in xrange(data.shape[1]): data[:,j,k].tofile(f) f.close()
Using struct it takes to read
20489 function calls in 80.889 CPU seconds
Using numpy array it takes to read20489 function calls in 44.707 CPU seconds
Using struct it takes to write30726 function calls in 101.300 CPU seconds
Using numpy array it takes to write10246 function calls in 18.771 CPU seconds
Would appreciate any commetns on this.