sql - Best way to store relational data in hdfs -


i've been reading lot on hadoop lately , can understand general concept of it, there still (at least)one piece of puzzle can't head around. best way store relationnal data in hdfs.

first of all, know hadoop not exist replace conventional old sql database serve application. problem i'm facing here use hadoop aggregate data multiple systems hdfs. can cross-reference data multiple system , produce new set of data used reporting tools etc.

alright, so, should import tables data using 1 table 1 file or should import results of queries join tables.

for example:

sql tables:

person: personid name birthday sex

company:

companyid name address 

personcompany

personid companyid 

should import 3 table or should import result of query returns why person works company.

please share thought me!

typically build data warehouse in hadoop, have ingest tables. in example need have 3 tables in hdfs , etl/aggregation example joiners_weekly can have etl have

select * personcompany pc join person p on pc.personid=p.personid join company c on pc.companyid=c.companyid.

this can report can generated hadoop. hope helps.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -