Wading through the CSV format
I have reached the nadir of software development; I wrote my own CSV parser. Initially, I only wanted to learn how to use Lex and Yacc, but, as I got into the task, I found that I had use for a CSV parsing library. And, so, my descent began.
My first stop was to read up on the CSV file format. The usual standards documents were not as helpful as I had hoped, as there was no formalized CSV file format. The best I could find was the 2005 RFC 4180 ("Common Format and MIME Type for Comma-Separated Values (CSV) Files"), which stated up front that it did "not specify an Internet standard of any kind", and that it only "documents the format that seems to be followed by most implementations".
At least I had a starting point, with rules that I could implement in yacc and lex. It took several iterations, but I finally came up with a grammer that looked like it could properly parse most CSV files. So, to that grammer, I added some logic that would build an in-memory representation of an input CSV file, and a minimal read-only API to permit programmatic queries against the data.
What you get here is the source code for v1.1.5 of my libcsvtool. It includes the lexer and parser that implement the API, the man page for the library, explaining the use of the API, and some anciliary documentation.
The API provides five functions:
- csvOpen() to load the contents of the named CSV into memory,
- csvClose() to unload and discard the previously loaded CSV data,
- csvRowCount() to provide a count of the number of rows of data loaded from the CSV file,
- csvCellCount() to provide a count of the number of cells of data in a specific row, and
- csvCellData() to provide access to the data at a specific row and cell
The documentation includes an example program (in the libcsvtool(3) manpage):
/* csvinfo - example usage of libcsvtool
** Usage: csvinfo file.csv
** or csvinfo
*/#include <stdio.h>
#include <stdlib.h>
#include <csvtool.h>int main(int argc, char *argv[])
{
char *fname = NULL; /* implies stdin */
void *handle;if (argc == 2) fname = argv[1];
if (handle = csvOpen(fname))
{
size_t rowCount = csvRowCount(handle);printf("Table contains %u rows\n",rowCount);
for (size_t row = 0; row < rowCount; ++row)
{
size_t cellCount = csvCellCount(handle,row);
printf("\n Row %u contains %u cells\n",row+1,cellCount);
for (size_t cell = 0; cell < cellCount; ++cell)
{
char *datum = csvCellData(handle,row,cell);
printf(" Cell %4u contains ",cell+1);
if (datum)
printf("\"%s\"\n",datum);
else
puts("NULL");
}
}
csvClose(handle);
}return 0;
}
If you want programmatic read access to your CSV data, give it a try.
Attachment | Size |
---|---|
![]() | 27.8 KB |
![]() | 921 bytes |
- Log in to post comments