Home > .net 4.0, c#, General Development, Microsoft, PowerShell, Windows 7, Windows Server > Mashing CSVs around using PowerShell

Mashing CSVs around using PowerShell

February 15, 2012

Since I spend most of my day in the console, PowerShell also serves as my ‘Excel’. So, continuing my recent trend of PowerShell related posts, let’s dig into a quick and easy way to parse up CSV files (or most any type of file) by creating objects!

We, of course, need a few rows of example data. Let’s use this pseudo student roster.

Example data:

Student,Code,Product,IUID,TSSOC,Date
123456,e11234,Reading,jsmith,0:18,1/4/2012
123456,e11234,Reading,jsmith,1:04,1/4/2012
123456,e11234,Reading,jsmith,0:27,1/5/2012
123456,e11234,Reading,jsmith,0:19,1/7/2012
123456,e11235,Math,jsmith,0:14,1/7/2012

Now, for reporting, I want my ‘Minutes’ to be a calculation of the TSSOC column (hours:minutes). Easy, we have PowerShell–it can split AND multiple!

The code:

Begin by creating an empty array to hold our output, importing our data into the ‘pipe’, and opening up an iteration (for each) function. The final $out is our return value–calling our array so we can see our results.

$out = @()
import-csv data_example.csv |
   % {

   }
$out

Next, let’s add in our logic to split out the hours and minutes. We have full access to the .NET string methods in PowerShell, which includes .Split(). .Split() returns an array, so since we have HH:MM, our first number is our hours and our second number is our minutes. Hours then need to be multiplied by 60 to return the “minutes per hour.”

You’ll also notice the [int] casting–this ensures we can properly multiply–give it a whirl without and you’ll get 60 0’s or 1’s back (it multiples the string).

$out = @()
import-csv data_example.csv |
   % {
	$hours = [int]$_.TSSOC.Split(':')[0] * 60
	$minutes = [int]$_.TSSOC.Split(':')[1]
   }
$out

The next step is to create a new object to contain our return values. We can use the new PowerShell v2.0 syntax to create a quick hashtable of our properties and values. Once we have our item, add it to our $out array.

$out = @()
import-csv data_example.csv |
   % {
	$hours = [int]$_.TSSOC.Split(':')[0] * 60
	$minutes = [int]$_.TSSOC.Split(':')[1]
        $item = new-object PSObject -Property @{
			Date=$_.Date;
			Minutes=($hours + $minutes);
			UserId=$_.IUID;
			StudentId=$_.Student;
			Code=$_.Code;
			Course=$_.Product
		}
	$out = $out + $item
   }

With that, we’re done, we can pipe it to an orderby for a bit of sorting, grouping, table formatting, or even export it BACK out as another CSV.

$out = @()
import-csv data_example.csv |
   % {
	$hours = [int]$_.TSSOC.Split(':')[0] * 60
	$minutes = [int]$_.TSSOC.Split(':')[1]
        $item = new-object PSObject -Property @{
			Date=$_.Date;
			Minutes=($hours + $minutes);
			UserId=$_.IUID;
			StudentId=$_.Student;
			Code=$_.Code;
			Course=$_.Product
		}
	$out = $out + $item
   } | sortby Date, Code
$out | ft -a

Quick and dirty CSV manipulation–all without opening anything but the command prompt!

UPDATE: Matt has an excellent point in the comments below. PowerShell isn’t the ‘golden hammer’ for every task and finding the right tool for the job. We’re a mixed environment (Windows, Solaris, RHEL, Ubuntu), so PowerShell only applies to our Windows boxes. However, as a .net developer, I spend 80-90% of my time on those Windows boxes. So let’s say it’s a silver hammer.๐Ÿ™‚

Now, the code in this post looks pretty long–and hopping back and forth between notepad, the CLI, and your CSV is tiresome. I bounce back and forth between the CLI and notepad2 with the ‘ed’ and ‘ex’ functions (these commands are ‘borrowed’ from Oracle PL/SQL). More information here.

So how would I type this if my boss ran into my cube with a CSV and needed a count of Minutes?

$out=@();Import-Csv data_example.csv | % { $out += (new-object psobject -prop @{ Date=$_.Date;Minutes=[int]$_.TSSOC.Split(':')[1]+([int]$_.TSSOC.Split(':')[0]*60);UserId=$_.IUID;StudentId=$_.Student;Code=$_.Code;Course=$_.Product }) }; $out | ft -a

Now, that’s quicker to type, but a LOT harder to explain.๐Ÿ˜‰ I’m sure this can be simplified down–any suggestions? If you could do automatic/implied property names, that’d REALLY cut it down.

  1. February 15, 2012 at 8:58 am

    >Since I spend most of my day in the console, PowerShell also serves as my โ€˜Excelโ€™

    Console??? It looks like you spend most of your day in a text editor. Its not like a one line bash statement, an awk followed by a sort, and the knowledge you can use across any CLI app, not just these powershell wrappers that output next-gen “objects”. Trust me you are waisting your time becoming a ninja in powershell when better technology that been out for decades exists (await ah.. but object piping is better than text.. ^^ looks like it)… Nothing against powershell, I blogged about importing/exporting into ravendb using powershell and made a a whole CI platform out it around psake. Explore, ignore the bash fud, and then compare.

    • February 15, 2012 at 9:06 am

      Quite true (I do spend a lot of time in notepad2), however, what shows as multi-line in an example for blogging is usually typed out single line.

      For me, and in the instance that spawned creating this example, it was far quicker to mash the data around than pulling it into Excel, adding formulas to substring the columns around, etc.

      I keep a RavenDB instance running on my machine, I could have dumped them in there and got some snazzy JSON out using LINQ, but that seemed to require a bit more setup for something disposable such as this.

      Also, I have created PowerShell versions of ‘ed’ and ‘ex’ (if you’re familiar with Oracle’s PL/SQL commands) that allows me to bounce back and forth between the CLI and notepad instantly–which helps out on these longer commands.

      What other tools would you recommend for a task like this?

  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: