java - how to parse a custom log file in scala to extract some key value pairs using patterns -
i building spark streaming app takes in logs coming out of server. log line looks this.
2015-06-18t13:53:46.606-0400 customlog v4 info: source="abcd" type="type1" <xml xml here attr1='value1' attr2='value2' > </xml> <some more xml></> time ="232"
i trying follow sample app written databricks on here here .
i kind of stuck @ pattern in apacheaccesslog.scala. log custom log , has key="value" pairs in typical log line.
i don't quite understand pattern means , how change suit app. need aggregation on times based on source , type keys in log
the case class expects variety of things ip address log doesn't have, therefore need modify case class definition include fields want add.
just illustrate here, let's make case class so:
case class apacheaccesslog(source: string, type: string, time: long)
then can replace regex 1 finds those, can play regex on regex101 here i've prepared start with, producing regex this:
source="(.*?)" type="(.*?)" .* time ="(.*?)"
capturing 3 groups of characters m
. can fix instantiation these groups:
apacheaccesslog(m.group(1), m.group(2), m.group(3).tolong)
hth.
Comments
Post a Comment