Split Text File Based on Size
Quick script for today. I needed to come up with a way to split a text file, specifically a SQL script with one insert statement per line up into smaller files if the base file exceeded a certain size. The requester of this functionality wanted the file to be split into 10,000 line chunks if it exceeded 5000KB.
This is a fairly simple task for PowerShell. First, I start by setting up some variables with the requirements:
$sourcepath = 'C:\Temp' $sourcefile = 'srcfilename.sql' $destfile = 'dstfilename' $destpath = 'C:\Temp\dst' $maxsize = 5000KB $numlines = 10000
Next, I clean up the destination and get information about the source file to determine whether to split it:
remove-item "$destpath\*" -ErrorAction Stop $file = get-item "$sourcepath\$sourcefile" -ErrorAction Stop
I set the ErrorAction to Stop because if either of these tasks fail, I don’t want the script to continue. Next comes the meat of the script, where the source file is read into the script in chunks of the value set for $numlines, then written to as many subfiles as needed:
if($file.length -gt $maxsize) { $count = 1 get-content "$sourcepath\$sourcefile" -ReadCount $numlines | foreach-object { $destfilename = "{0}{1}.{2}" -f ($destfile, $count, "sql") [system.io.file]::WriteAllLines("$destpath\$destfilename", $_) $count++ } }
There are a couple of items worth explaining.
- Get-Content -ReadCount pushes the specified number of the lines through the pipeline at a time. If this value wasn’t specified, only one line at a time would be feed to the ForEach-Object.
- “{0}{1}.{2}” -f ($destfile, $count, “sql”) uses the format operator in PowerShell to generate the destination filename. It basically uses the base filename declared previously in the script, appends it with an auto-incremented number, and a specified extension (.sql in my case).
- I used the .NET method [system.io.file]::WriteAllLines() instead of a more PowerShell-y method for writing the files. Originally I was using something like this:
$_ | out-file -Encoding ascii "$destpath\$destfilename"
However, in my testing I found that method to be pretty slow with larger files.
The file I am using to test is ~170MB and a half million or so lines. The .NET method can break this down into 50+ 10,000 line files in about 2 seconds, where the Out-File method takes more like 30 seconds.
Here is the full script, with some rudimentary error checking: ScriptCenter
It should be pretty simple to convert this into a function, if you are into that kind of thing.