Python, memory error, csv file too large -
this question has answer here:
- reading huge .csv file 5 answers
i have problem python module cannot handle importing big datafile (the file targets.csv weights 1 gb)
the error appens when line loaded:
targets = [(name, float(x), float(y), float(z), float(bg)) name, x, y, z, bg in csv.reader(open('targets.csv'))]
traceback:
traceback (most recent call last): file "c:\users\gary\documents\epson studies\colors_text_d65.py", line 41, in <module> name, x, y, z, bg in csv.reader(open('targets.csv'))] memoryerror
i wondering if there's way open file targets.csv line line? , wondering slow down process?
this module pretty slow...
thanks!
import geometry import csv import numpy np import random import cv2 s = 0 img = cv2.imread("map.tif", -1) height, width = img.shape pixx = height * width iterr = float(pixx / 1000) accomplished = 0 temp = 0 ppm = file("epson gamut.ppm", 'w') ppm.write("p3" + "\n" + str(width) + " " + str(height) + "\n" + "255" + "\n") # ppm file header all_colors = [(name, float(x), float(y), float(z)) name, x, y, z in csv.reader(open('xyzcolorlist_d65.csv'))] # background marked support support_i = [i i, color in enumerate(all_colors) if color[0] == '255 255 255'] if len(support_i)>0: support = np.array(all_colors[support_i[0]][1:]) del all_colors[support_i[0]] else: support = none tg, hull_i = geometry.tetgen_of_hull([(x,y,z) name, x, y, z in all_colors]) colors = [all_colors[i] in hull_i] print ("thrown out: " + ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0]))) targets = [(name, float(x), float(y), float(z), float(bg)) name, x, y, z, bg in csv.reader(open('targets.csv'))] target in targets: name, x, y, z, bg = target target_point = support + (np.array([x,y,z]) - support)/(1-bg) tet_i, bcoords = geometry.containing_tet(tg, target_point) if tet_i == none: #print str("out") ppm.write(str("255 255 255") + "\n") print "out" temp += 1 if temp >= iterr: accomplished += temp print str(100 * accomplished / (float(pixx))) + str(" %") temp = 0 continue # not in gamut else: = bcoords[0] b = bcoords[1] c = bcoords[2] d = bcoords[3] r = random.uniform(0,1) names = [colors[i][0] in tg.tets[tet_i]] if r <= a: s = names[0] elif r <= a+b: s = names[1] elif r <= a+b+c: s = names[2] else: s = names[3] ppm.write(str(s) + "\n") temp += 1 if temp >= iterr: accomplished += temp print str(100 * accomplished / (float(pixx))) + str(" %") temp = 0 print "done" ppm.close()
csv.reader()
reads lines 1 @ time. however, you're collecting of lines list first. should process lines 1 @ time. 1 approach switch generator, example:
targets = ((name, float(x), float(y), float(z), float(bg)) name, x, y, z, bg in csv.reader(open('targets.csv')))
(switching square brackets parens should change target
list comprehension generator.)
Comments
Post a Comment