Tuesday, May 25, 2010

Parsing CSV Escaped with Speech Marks

I just know this isn’t the right way to do this but what the heck it seems to work for me. 

The simple match is "([\w,]*)",+|"([\w,]*)"$ but I needed to account for all the other commonly occurring characters such as %&* etc. ( I would’ve thought I could just use .* but I can’t seem to get it to work.)

I’m using the F# Active Pattern approach to divvy up the matches – and returning the Match objects into the seq rather than breaking out the capture in the active pattern. This was more useful to me when I was using the matches later on in the code.

open System.Text.RegularExpressions

let (|ActiveRegex|_|) regex str =
let ms = Regex(regex).Matches(str)
if ms.Count > 0
then Some ([ for m in ms -> m ])
else None

let matches s re =
match s with
| ActiveRegex re results -> results
| _ -> []

let testLine = "\"31\",\"a 1\",\"b-2\",\"c+3\",\",.;~!@#$%^&*()\/?><,.|{}[]_+-\",\"\",\"14/05/2010 12:12:20 a.m.\",\"1: 2; 3. 4? 5[ 6] 7& 8*\",\"a,b\""

matches testLine "\"([\w\s:;~!@#$%\^&\*_<>,\.\\\/\|\[\]\{\}\(\)\-\+\?]*)\",+|\"([\w\s:;~!@#$%\^&\*_<>,\.\\\/\|\[\]\{\}\(\)\-\+\?]*)\"$"
let printMatches s p =
for m in matches s p do
seq{for g in m.Groups -> g}
|> Seq.skip 1
|> Seq.iter (fun x -> printfn "%A" x)

printMatches testLine "\"([\w\s:;~!@#$%\^&\*_<>,\.\\\/\|\[\]\{\}\(\)\-\+\?]*)\",+|\"([\w\s:;~!@#$%\^&\*_<>,\.\\\/\|\[\]\{\}\(\)\-\+\?]*)\"$"
  

No comments: