regex - Pig xmlloader error when loading tag with colon -


ive been using pig , xmlloader load xml files. i've been practising on book example. however, xml file need process has colons in tag. when run script says due ':' cannot processed.(exact log @ end)

this file have. modified purpose of ":" case. bookt.xml

<catalog> <bc:book id="1"> <title>hadoop defnitive guide</title> <author>tom white</author> <country>us</country> <company>cloudera</company> <price>24.90</price> <year>2012</year> </bc:book> <book id="2"> <title>programming pig</title> <author>alan gates</author> <country>usa</country> <company>horton works</company> <price>30.90</price> <year>2013</year> </book> </catalog> 

now book.pig (note: tried regex , xpath thats why both appear , error still there)

register piggybank.jar define xpath org.apache.pig.piggybank.evaluation.xml.xpath();  =  load 'bookt' using org.apache.pig.piggybank.storage.xmlloader('bc:book') (x:chararray); dump a;  --b = foreach generate flatten(regex_extract_all(x,'<bc:book>\\s*<title>(.*)</title>\\s*<author>(.*)</author>\\s*<country>(.*)</country>\\s*<company>(.*)</company>\\s*<price>(.*)</price>\\s*<year>(.*)</year>\\s*</bc:book>')); b = foreach generate flatten xpath(x, 'bc:book/author'), xpath(x, 'bc:book/price'); describe b; 

this error:

error org.apache.pig.tools.pigstats.pigstats - error 0:java.lang.runtimeexception: java.lang.runtimeexception: xml tag identifier 'bc:book' not match regular expression /[a-za-z\_][0-9a-za-z\-_]+/ 

question should put in xmlloade(string identifier) can have tags ":" ( cannot modify piggybank.jar, tried putting : xml special code,and tried using xmlloader('sth'+'sth')...

one , not neat solution, load pig storage , replace ':' '', , load xmlloader.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -