Python, memory error, csv file too large -

- March 15, 2011

this question has answer here:

reading huge .csv file 5 answers

i have problem python module cannot handle importing big datafile (the file targets.csv weights 1 gb)

the error appens when line loaded:

targets = [(name, float(x), float(y), float(z), float(bg))            name, x, y, z, bg in csv.reader(open('targets.csv'))]

traceback:

traceback (most recent call last):   file "c:\users\gary\documents\epson studies\colors_text_d65.py", line 41, in <module>     name, x, y, z, bg in csv.reader(open('targets.csv'))] memoryerror

i wondering if there's way open file targets.csv line line? , wondering slow down process?

this module pretty slow...

thanks!

import geometry import csv import numpy np import random import cv2  s = 0   img = cv2.imread("map.tif", -1) height, width = img.shape  pixx = height * width iterr = float(pixx / 1000) accomplished = 0 temp = 0  ppm = file("epson gamut.ppm", 'w')  ppm.write("p3" + "\n" + str(width) + " " + str(height) + "\n" + "255" + "\n") # ppm file header  all_colors = [(name, float(x), float(y), float(z))               name, x, y, z in csv.reader(open('xyzcolorlist_d65.csv'))]  # background marked support support_i = [i i, color in enumerate(all_colors) if color[0] == '255 255 255'] if len(support_i)>0:     support = np.array(all_colors[support_i[0]][1:])     del all_colors[support_i[0]] else:     support = none  tg, hull_i = geometry.tetgen_of_hull([(x,y,z) name, x, y, z in all_colors]) colors = [all_colors[i] in hull_i]  print ("thrown out: "        + ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))  targets = [(name, float(x), float(y), float(z), float(bg))            name, x, y, z, bg in csv.reader(open('targets.csv'))]  target in targets:       name, x, y, z, bg = target      target_point = support + (np.array([x,y,z]) - support)/(1-bg)      tet_i, bcoords = geometry.containing_tet(tg, target_point)      if tet_i == none:         #print str("out")             ppm.write(str("255 255 255") + "\n")         print "out"          temp += 1          if temp >= iterr:              accomplished += temp              print str(100 * accomplished / (float(pixx))) + str(" %")             temp = 0          continue          # not in gamut      else:          = bcoords[0]         b = bcoords[1]         c = bcoords[2]         d = bcoords[3]          r = random.uniform(0,1)          names = [colors[i][0] in tg.tets[tet_i]]          if r <= a:             s = names[0]           elif r <= a+b:             s = names[1]          elif r <= a+b+c:             s = names[2]          else:             s = names[3]          ppm.write(str(s) + "\n")          temp += 1          if temp >= iterr:              accomplished += temp              print str(100 * accomplished / (float(pixx))) + str(" %")             temp = 0   print "done" ppm.close()

csv.reader() reads lines 1 @ time. however, you're collecting of lines list first. should process lines 1 @ time. 1 approach switch generator, example:

targets = ((name, float(x), float(y), float(z), float(bg))            name, x, y, z, bg in csv.reader(open('targets.csv')))

(switching square brackets parens should change target list comprehension generator.)

Search This Blog

KBPS

Python, memory error, csv file too large -

Comments

Post a Comment

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -