Version 1.6, April 11, 2018
Eddie Rucker <erucker@bmc.edu>
The CSV library provides an interface for reading and writing comma separated value files. The module is very loosely based on Python’s CSV module (http://docs.python.org/lib/module-csv.html).
Get the latest source from https://bitbucket.org/purelang/pure-lang/downloads/pure-csv-1.6.tar.gz.
Run make to compile the module and make install (as root) to install it in the Pure library directory. This requires GNU make. The make install step is only necessary for system-wide installation.
The make utility tries to guess your Pure installation directory and platform-specific setup. If it gets this wrong, you can set some variables manually. In particular, make install prefix=/usr sets the installation prefix, and make PIC=-fPIC or some similar flag might be needed for compilation on 64 bit systems. Please see the Makefile for details.
Data records are represented as vectors or lists of any Pure values. Values are converted as necessary and written as a group of strings, integers, or doubles separated by a delimiter. Three predefined dialects are provided; DEFAULT (record terminator= \n ), RFC4180 (record terminator= \r\n ), and Excel. Procedures are provided to create other CSV dialects. See (http://www.ietf.org/rfc/rfc4180.txt) for more details about the RFC4180 standard.
creates a dialect from a record of dialect option pairs. The dialect object is freed automatically when exiting the pure script. The list of possible options and option values are presented below.
The following example illustrates the construction of a dialect for reading tab delimited files without quoted strings.
Example
> using csv;
> using namespace csv;
> let d = dialect {delimiter=>"\t", quote_flag=>STRING};
>
exactly as above except allows for list output or header options when reading.
Examples
> using csv;
> using namespace csv;
> let d = dialect {delimiter=>"\t"};
> let f = open ("junk.csv", "w", d);
> putr f {"hello",123,"",3+:4,world};
()
> close f;
()
> let f = open ("junk.csv", "r", d);
> getr f;
{"hello","123","","3+:4","world"}
>
Suppose our file “test.csv” is as presented below.
ir$ more test.csv
NAME,TEST1,TEST2
"HOPE, BOB",90,95
"JONES, SALLY",88,72
"RED, FEEFEE",45,52
Notice how the LIST option affects the return of getr and how the HEADER option may be used to index records.
> using csv;
> using namespace csv;
> let d = dialect {quote_flag=>MINIMAL};
> let f = open ("test.csv", "r", d, [LIST,HEADER]);
> let r = getr f;
> r!0;
"HOPE, BOB"
> let k = header f;
> k;
{"NAME"=>0,"TEST1"=>1,"TEST2"=>2}
> r!(k!"NAME");
"HOPE, BOB"
> r!!(k!!["NAME","TEST1"]);
["HOPE, BOB",90]
>
When modifying CSV files that will be imported into Microsoft Excel, fields with significant leading 0s should be written using a "=""0...""" formatting scheme. This same technique will work for preserving leading space too. Again, this quirk should only be necessary for files to be imported into MS Excel.
The first example shows how to write and read a default CSV file.
> using csv;
> using namespace csv;
> let f = open ("testing.csv", "w");
> fputr f [{"bob",3.9,"",-2},{"fred",-11.8,"",0},{"mary",2.3,"$",11}];
()
> close f;
()
> let f = open "testing.csv";
> fgetr f;
[{"bob","3.9","","-2"},{"fred","-11.8","","0"},{"mary","2.3","$","11"}]
> close f;
>
The second example illustrates how to write and read a CSV file using automatic conversions.
> using csv;
> using namespace csv;
> let d = dialect {quote_flag=>MINIMAL};
> let f = open ("test.csv", "w", d);
> putr f {"I","",-4,1.2,2%4,like};
()
> putr f {"playing","the",0,-0.2,1+:4,drums};
()
> close f;
()
> let f = open ("test.csv", "r", d);
> fgetr f;
[{"I","",-4,1.2,"2%4","like"},{"playing","the",0,-0.2,"1+:4","drums"}]
> close f;
()
>
Records containing quotes, delimiters, and line breaks are also properly handled.
> using csv;
> using namespace csv;
> let d = dialect {quote_flag=>STRING};
> let f = open ("test.csv", "w", d);
> fputr f [{"this\nis\n",1},{"a \"test\"",2}];
()
> close f;
()
> let f = open ("test.csv", "r", d);
> fgetr f;
[{"this\nis\n",1},{"a \"test\"",2}]
> close f;
()
>
Consider the following hand written CSV file. According to RFC4180, this is not a valid CSV file. However, by using the space_around_quoted_field, the file can still be read.
erucker:$ more test.csv
"this", "is", "not", "valid"
> using csv;
> using namespace csv;
> let f = open "test.csv";
> getr f;
csv::error "parse error at line 1"
> let d = dialect {space_around_quoted_field=>BOTH};
> let f = open ("test.csv", "r", d);
> getr f;
{"this","is","not","valid"}
>
The trim_space flag should be used with caution. A field with space in front of a number should be interpreted as a string, but consider the following file.
erucker:$ more test.csv
" this ", 45 ,23, hello
Now observe the differences for the two dialects below.
> using csv;
> using namespace csv;
> let d = dialect {trim_space=>BOTH};
> let f = open ("test.csv","r",d);
> getr f;
{"this","45","23","hello"}
> let d = dialect {trim_space=>BOTH, quote_flag=>MINIMAL};
> let f = open ("test.csv", "r", d);
> getr f;
{"this",45,23,"hello"}
>
The trim_space flag also affects writing.
> using csv;
> using namespace csv;
> let d = dialect {trim_space=>BOTH};
> let f = open ("test.csv", "w", d);
> putr f {" this "," 45 "};
()
> close f;
()
> quit
erucker:$ more test.csv
"this","45"
For the last example a tab delimiter is used, automatic conversions is on, and records are represented as lists. Files are automatically closed when the script is finished.
> using csv;
> using namespace csv;
> let d = dialect {quote_flag=>MINIMAL, delimiter=>"\t"};
> let f = open ("test.csv", "w", d, [LIST]);
> fputr f [["a","b",-4.5,""],["c","d",2.3,"-"]];
()
> close f;
()
> let f = open ("test.csv", "r", d, [LIST]);
> fgetr f;
[["a","b",-4.5,""],["c","d",2.3,"-"]]
> quit