A nifty command line tool to split up a csv file, based on column values.
It has many features, but I used it for it’s very efficent file splitting function.
The file_split command splits a CSV input stream into a number of files based on the values of specified fields in the CSV input stream. All the CSV records with the same values for those fields will be placed in the same file. By default, the created files are numbered, but you can also generate files based on the contents of the fields used to perform the split. Unlike most other CSVfix commands, this command does not write anything to standard output, or to any file specified by the -o flag.
Note that any existing files will be overwritten by this command, without warning. Use the -fd flag to locate the output files, and the -fp and -fx flags to name them.
-f fields (Required) : Comma-separated list of filed indexes on which to base the split.
-fd dir : Specifies the directory in which to place the results of the split. Defaults to the current directory.
-fp prefix : Specifies the prefix to use when constructing file names. Default is file_
-fx ext : Specifies the extension to use when constructing file names. The default is csv
-ufn : Use the contents of the field(s) specified by the -f flag to generate file names. No check is made that the fields contain valid filename components, and the command will fail if they do not.
The following example splits the cities.csv file based on the second field, which contains the country code.
csvfix file_split -f 2 data/cities.csv
This produces the following files, each of which contains the cities for a particular country:
With the same data, the following example:
csvfix file_split -f 2 -ufn data/cities.csv
uses the country code values to generate the file names, producing:
Here, file_DE.csv will contain German cities, file_FR.csv French cities, and so on.