Archive

Posts Tagged ‘csv’

Mashing CSVs around using PowerShell

February 15, 2012 2 comments

Since I spend most of my day in the console, PowerShell also serves as my ‘Excel’. So, continuing my recent trend of PowerShell related posts, let’s dig into a quick and easy way to parse up CSV files (or most any type of file) by creating objects!

We, of course, need a few rows of example data. Let’s use this pseudo student roster.

Example data:

Student,Code,Product,IUID,TSSOC,Date
123456,e11234,Reading,jsmith,0:18,1/4/2012
123456,e11234,Reading,jsmith,1:04,1/4/2012
123456,e11234,Reading,jsmith,0:27,1/5/2012
123456,e11234,Reading,jsmith,0:19,1/7/2012
123456,e11235,Math,jsmith,0:14,1/7/2012

Now, for reporting, I want my ‘Minutes’ to be a calculation of the TSSOC column (hours:minutes). Easy, we have PowerShell–it can split AND multiple!

The code:

Begin by creating an empty array to hold our output, importing our data into the ‘pipe’, and opening up an iteration (for each) function. The final $out is our return value–calling our array so we can see our results.

$out = @()
import-csv data_example.csv |
   % {

   }
$out

Next, let’s add in our logic to split out the hours and minutes. We have full access to the .NET string methods in PowerShell, which includes .Split(). .Split() returns an array, so since we have HH:MM, our first number is our hours and our second number is our minutes. Hours then need to be multiplied by 60 to return the “minutes per hour.”

You’ll also notice the [int] casting–this ensures we can properly multiply–give it a whirl without and you’ll get 60 0’s or 1’s back (it multiples the string).

$out = @()
import-csv data_example.csv |
   % {
	$hours = [int]$_.TSSOC.Split(':')[0] * 60
	$minutes = [int]$_.TSSOC.Split(':')[1]
   }
$out

The next step is to create a new object to contain our return values. We can use the new PowerShell v2.0 syntax to create a quick hashtable of our properties and values. Once we have our item, add it to our $out array.

$out = @()
import-csv data_example.csv |
   % {
	$hours = [int]$_.TSSOC.Split(':')[0] * 60
	$minutes = [int]$_.TSSOC.Split(':')[1]
        $item = new-object PSObject -Property @{
			Date=$_.Date;
			Minutes=($hours + $minutes);
			UserId=$_.IUID;
			StudentId=$_.Student;
			Code=$_.Code;
			Course=$_.Product
		}
	$out = $out + $item
   }

With that, we’re done, we can pipe it to an orderby for a bit of sorting, grouping, table formatting, or even export it BACK out as another CSV.

$out = @()
import-csv data_example.csv |
   % {
	$hours = [int]$_.TSSOC.Split(':')[0] * 60
	$minutes = [int]$_.TSSOC.Split(':')[1]
        $item = new-object PSObject -Property @{
			Date=$_.Date;
			Minutes=($hours + $minutes);
			UserId=$_.IUID;
			StudentId=$_.Student;
			Code=$_.Code;
			Course=$_.Product
		}
	$out = $out + $item
   } | sortby Date, Code
$out | ft -a

Quick and dirty CSV manipulation–all without opening anything but the command prompt!

UPDATE: Matt has an excellent point in the comments below. PowerShell isn’t the ‘golden hammer’ for every task and finding the right tool for the job. We’re a mixed environment (Windows, Solaris, RHEL, Ubuntu), so PowerShell only applies to our Windows boxes. However, as a .net developer, I spend 80-90% of my time on those Windows boxes. So let’s say it’s a silver hammer. 🙂

Now, the code in this post looks pretty long–and hopping back and forth between notepad, the CLI, and your CSV is tiresome. I bounce back and forth between the CLI and notepad2 with the ‘ed’ and ‘ex’ functions (these commands are ‘borrowed’ from Oracle PL/SQL). More information here.

So how would I type this if my boss ran into my cube with a CSV and needed a count of Minutes?

$out=@();Import-Csv data_example.csv | % { $out += (new-object psobject -prop @{ Date=$_.Date;Minutes=[int]$_.TSSOC.Split(':')[1]+([int]$_.TSSOC.Split(':')[0]*60);UserId=$_.IUID;StudentId=$_.Student;Code=$_.Code;Course=$_.Product }) }; $out | ft -a

Now, that’s quicker to type, but a LOT harder to explain. 😉 I’m sure this can be simplified down–any suggestions? If you could do automatic/implied property names, that’d REALLY cut it down.

Advertisements