given multiple csv files, can hundreds of megabytes or more per file. have same header row starting file , have crlf @ end of each line. each file may or may not have crlf @ end of file. goal to:
- join list of files.
- keep header first file.
- output them new file.
- these files may have thousands of columns , millions of rows.
- the files must processed in order given, , order of rows significant.
given size of files, needs fast , memory efficient possible.
if headers same, can open write stream, go through input files, opening read streams them , copying data. first file copied in entirety. subsequent files have first line skipped.
that approach fastest, long 100% sure columns align , it's first line needs skipping.
this kind of thing quite straightforward on unix-style command line, btw.
Comments
Post a Comment