regex - Pig xmlloader error when loading tag with colon -
ive been using pig , xmlloader load xml files. i've been practising on book example. however, xml file need process has colons in tag. when run script says due ':' cannot processed.(exact log @ end)
this file have. modified purpose of ":" case. bookt.xml
<catalog> <bc:book id="1"> <title>hadoop defnitive guide</title> <author>tom white</author> <country>us</country> <company>cloudera</company> <price>24.90</price> <year>2012</year> </bc:book> <book id="2"> <title>programming pig</title> <author>alan gates</author> <country>usa</country> <company>horton works</company> <price>30.90</price> <year>2013</year> </book> </catalog>
now book.pig (note: tried regex , xpath thats why both appear , error still there)
register piggybank.jar define xpath org.apache.pig.piggybank.evaluation.xml.xpath(); = load 'bookt' using org.apache.pig.piggybank.storage.xmlloader('bc:book') (x:chararray); dump a; --b = foreach generate flatten(regex_extract_all(x,'<bc:book>\\s*<title>(.*)</title>\\s*<author>(.*)</author>\\s*<country>(.*)</country>\\s*<company>(.*)</company>\\s*<price>(.*)</price>\\s*<year>(.*)</year>\\s*</bc:book>')); b = foreach generate flatten xpath(x, 'bc:book/author'), xpath(x, 'bc:book/price'); describe b;
this error:
error org.apache.pig.tools.pigstats.pigstats - error 0:java.lang.runtimeexception: java.lang.runtimeexception: xml tag identifier 'bc:book' not match regular expression /[a-za-z\_][0-9a-za-z\-_]+/
question should put in xmlloade(string identifier) can have tags ":" ( cannot modify piggybank.jar, tried putting : xml special code,and tried using xmlloader('sth'+'sth')...
one , not neat solution, load pig storage , replace ':' '', , load xmlloader.
Comments
Post a Comment