regex - How to remove all except the first 3 and last of a specific character with sed -
i've looked on place can't find answer. i've used sed before i'm familiar syntax - 1 has me stumped.
i want remove except first 3 instances , last instance of specific character. here specific example:
input.csv:
"first", "some text "quote" blaw blaw", 1 "second", "some more text "another quote" blaw blaw", 3
i want remove quotes (") except first 3 , last 1 looks this:
output.csv:
"first", "some text quote blaw blaw", 1 "second", "some more text quote blaw blaw", 3
any pointers? thanks.
$ sed -r ':a; s/([^"]*"[^"]*"[^"]*")([^"]*)"([^"]*")/\1\2\3/; ta' input.csv "first", "some text quote blaw blaw", 1 "second", "some more text quote blaw blaw", 3
how works
the code works looking first 5 quotes. removes fourth. process repeated looping until there 4 quotes left.
:a
this defines label
a
.s/([^"]*"[^"]*"[^"]*")([^"]*)"([^"]*")/\1\2\3/
this looks the first 3 quotes , text that precedes them group 1. looks next set of non-quote characters group 2. looks following double quote. looks non-quote characters followed fifth quote group 3. replaces 3 groups, omitting fourth quote.
let's break down more explicitly:
([^"]*"[^"]*"[^"]*")
this looks the first 3 quotes , text that precedes them. saved group 1.
([^"]*)
this looks next set of non-quote characters. saved group 2.
"
this matches fourth quote on line.
([^"]*")
this matches next group of non-quote characters followed fifth quote on line. saved group 3.
the replacement text
\1\2\3
has effect of removing fourth quote of 5 quotes found.ta
if substitution made, loops label
a
. if not, done line.
bsd or mac osx
try:
sed -e -e ':a' -e 's/([^"]*"[^"]*"[^"]*")([^"]*)"([^"]*")/\1\2\3/' -e 'ta' input.csv
Comments
Post a Comment