python - Spark DataFrame TimestampType - how to get Year, Month, Day values from field? -


i have spark dataframe take(5) top rows follows:

[row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=1, value=638.55),  row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=2, value=638.55),  row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=3, value=638.55),  row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=4, value=638.55),  row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=5, value=638.55)] 

it's schema defined as:

elevdf.printschema()  root  |-- date: timestamp (nullable = true)  |-- hour: long (nullable = true)  |-- value: double (nullable = true) 

how year, month, day values 'date' field?

you can use simple map other rdd:

elevdf = sqlcontext.createdataframe(sc.parallelize([         row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=1, value=638.55),         row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=2, value=638.55),         row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=3, value=638.55),         row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=4, value=638.55),         row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=5, value=638.55)]))  (elevdf  .map(lambda (date, hour, value): (date.year, date.month, date.day))  .collect()) 

and result is:

[(1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1)] 

btw: datetime.datetime stores hour anyway keeping separately seems waste of memory.

since spark 1.5 can use number of date processing functions

import datetime pyspark.sql.functions import year, month, dayofmonth  elevdf = sc.parallelize([     (datetime.datetime(1984, 1, 1, 0, 0), 1, 638.55),     (datetime.datetime(1984, 1, 1, 0, 0), 2, 638.55),     (datetime.datetime(1984, 1, 1, 0, 0), 3, 638.55),     (datetime.datetime(1984, 1, 1, 0, 0), 4, 638.55),     (datetime.datetime(1984, 1, 1, 0, 0), 5, 638.55) ]).todf(["date", "hour", "value"])  elevdf.select(year("date").alias('year'), month("date").alias('month'), dayofmonth("date").alias('day')).show() # +----+-----+---+ # |year|month|day| # +----+-----+---+ # |1984|    1|  1| # |1984|    1|  1| # |1984|    1|  1| # |1984|    1|  1| # |1984|    1|  1| # +----+-----+---+ 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -