sql - Best way to store relational data in hdfs -
i've been reading lot on hadoop lately , can understand general concept of it, there still (at least)one piece of puzzle can't head around. best way store relationnal data in hdfs.
first of all, know hadoop not exist replace conventional old sql database serve application. problem i'm facing here use hadoop aggregate data multiple systems hdfs. can cross-reference data multiple system , produce new set of data used reporting tools etc.
alright, so, should import tables data using 1 table 1 file or should import results of queries join tables.
for example:
sql tables:
person: personid name birthday sex
company:
companyid name address
personcompany
personid companyid
should import 3 table or should import result of query returns why person works company.
please share thought me!
typically build data warehouse in hadoop, have ingest tables. in example need have 3 tables in hdfs , etl/aggregation example joiners_weekly can have etl have
select * personcompany pc join person p on pc.personid=p.personid join company c on pc.companyid=c.companyid.
this can report can generated hadoop. hope helps.
Comments
Post a Comment