I'VE GOT THE BYTE ON MY SIDE

57005 or alive

A handy Powershell filter for converting plain text to objects

May 15, 2015 powershell

Working on the command line with Powershell, much of the time I have the luxury of dealing directly with rich .NET objects.  If I need to sort, filter, or otherwise process cmdlet output, I have easy access to typed properties and methods right at the prompt.

Often, though, I’ll need to wrangle plain text, perhaps from a log file or the output of an executable.  In these cases an intermediate step is required in order to extract the typed information (timestamps, substrings, numerical fields, etc) from the plain strings.

This comes up often enough that I whipped up a handy Powershell filter, ‘ro’ (for ‘regex object’), to make it easy:

# converts text to objects via regex,
#  with properties corresponding to capture groups
filter ro
{
    param($pattern)
    
    if($_ -match $pattern)
    {
        $result = @{}
        $matches.Keys |?{ $_ } |%{
            $raw = $matches[$_]
            $asInt = 0
            $asFloat = 0.0
            $asDate = [datetime]::Now
            if([int]::TryParse($raw, [ref] $asInt)){ $result[$_] = $asInt }
            elseif([double]::TryParse($raw, [ref] $asFloat)){ $result[$_] = $asFloat }
            elseif([datetime]::TryParse($raw, [ref] $asDate)){ $result[$_] = $asDate }
            else{ $result[$_] = $raw }
        }
    [pscustomobject]$result
    }
}

This takes each line of text piped to it and attempts to match it with the provided regex pattern. If it matches, an object is created with properties corresponding to capture groups of the match (ignoring group 0, which represents the full match). It will even check if the capture group can be parsed as an int, float, or timestamp, in which case that strongly-typed value is used in favor of the flat string value.

I get a lot of mileage out of this guy.  As an example, when producing the Visual F# and Roslyn contributor mugs, we wanted to etch the username and SHA of each contributor’s first commit.

Using just the basic git tools and ‘ro’, this info can be produced in a 1-liner.

alt

In this case I got to pick convenient delimiters, so I could be very sloppy with my regex! 🙂