python - Pandas error tokenizing data when field in csv file contains quotation mark -
i'm using pandas.read_csv
read tab delimited file , running error: error tokenizing data. c error: expected 364 fields in line 73058, saw 398
after searching, seems offending entry is: "– ,쳌 \\ ?Œ ø ,d -l ,ú ,‚ zo
removing quotation mark seems solve things. i've got lot of large files lot of strange characters in them, no doubt repeat itself. need remove single quotation marks ahead of time or there way around this?
there quoting argument read_csv
:
quoting : int or csv.quote_* instance, default none control field quoting behavior per ``csv.quote_*`` constants. use 1 of quote_minimal (0), quote_all (1), quote_nonnumeric (2) or quote_none (3). default (none) results in quote_minimal behavior.
these described in csv docs.
try setting quoting=3
(i.e. quote_none
).
Comments
Post a Comment