python - Pandas error tokenizing data when field in csv file contains quotation mark -


i'm using pandas.read_csv read tab delimited file , running error: error tokenizing data. c error: expected 364 fields in line 73058, saw 398

after searching, seems offending entry is: "– ,쳌 \\ ?Œ  ø ,d -l ,ú ,‚ zo

removing quotation mark seems solve things. i've got lot of large files lot of strange characters in them, no doubt repeat itself. need remove single quotation marks ahead of time or there way around this?

there quoting argument read_csv:

quoting : int or csv.quote_* instance, default none     control field quoting behavior per ``csv.quote_*`` constants. use 1 of     quote_minimal (0), quote_all (1), quote_nonnumeric (2) or quote_none (3).     default (none) results in quote_minimal behavior. 

these described in csv docs.

try setting quoting=3 (i.e. quote_none).


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -