python - Pandas error tokenizing data when field in csv file contains quotation mark -

- August 15, 2014

i'm using pandas.read_csv read tab delimited file , running error: error tokenizing data. c error: expected 364 fields in line 73058, saw 398

after searching, seems offending entry is: "– ,쳌 \\ ?Œ ø ,d -l ,ú ,‚ zo

removing quotation mark seems solve things. i've got lot of large files lot of strange characters in them, no doubt repeat itself. need remove single quotation marks ahead of time or there way around this?

there quoting argument read_csv:

quoting : int or csv.quote_* instance, default none     control field quoting behavior per ``csv.quote_*`` constants. use 1 of     quote_minimal (0), quote_all (1), quote_nonnumeric (2) or quote_none (3).     default (none) results in quote_minimal behavior.

these described in csv docs.

try setting quoting=3 (i.e. quote_none).

Search This Blog

KBPS

python - Pandas error tokenizing data when field in csv file contains quotation mark -

Comments

Post a Comment

Popular posts from this blog

node.js - StackOverflow API not returning JSON -

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -