Split Text File Based on Size

Quick script for today.  I needed to come up with a way to split a text file, specifically a SQL script with one insert statement per line up into smaller files if the base file exceeded a certain size.  The requester of this functionality wanted the file to be split into 10,000 line chunks if it exceeded 5000KB.

This is a fairly simple task for PowerShell.  First, I start by setting up some variables with the requirements:

$sourcepath = 'C:\Temp'
$sourcefile = 'srcfilename.sql'
$destfile   = 'dstfilename'
$destpath   = 'C:\Temp\dst'
$maxsize    = 5000KB
$numlines   = 10000

Next, I clean up the destination and get information about the source file to determine whether to split it:

remove-item "$destpath\*" -ErrorAction Stop
$file = get-item "$sourcepath\$sourcefile" -ErrorAction Stop

I set the ErrorAction to Stop because if either of these tasks fail, I don’t want the script to continue.  Next comes the meat of the script, where the source file is read into the script in chunks of the value set for $numlines, then written to as many subfiles as needed:

if($file.length -gt $maxsize) {
    $count = 1
    get-content "$sourcepath\$sourcefile" -ReadCount $numlines | 
    foreach-object { 
        $destfilename = "{0}{1}.{2}" -f ($destfile, $count, "sql")
        [system.io.file]::WriteAllLines("$destpath\$destfilename", $_)
        $count++
    }
}

There are a couple of items worth explaining.

  1. Get-Content -ReadCount  pushes the specified number of the lines through the pipeline at a time.  If this value wasn’t specified, only one line at a time would be feed to the ForEach-Object.
  2. “{0}{1}.{2}” -f ($destfile, $count, “sql”)  uses the format operator in PowerShell to generate the destination filename.  It basically uses the base filename declared previously in the script, appends it with an auto-incremented number, and a specified extension (.sql in my case).
  3. I used the .NET method [system.io.file]::WriteAllLines() instead of a more PowerShell-y method for writing the files.  Originally I was using something like this:
$_ | out-file -Encoding ascii "$destpath\$destfilename"

However, in my testing I found that method to be pretty slow with larger files.

The file I am using to test is ~170MB and a half million or so lines.  The .NET method can break this down into 50+ 10,000 line files in about 2 seconds, where the Out-File method takes more like 30 seconds.

Here is the full script, with some rudimentary error checking: ScriptCenter

It should be pretty simple to convert this into a function, if you are into that kind of thing.

Leave a Reply

Your email address will not be published. Required fields are marked *