i trying save table in spark1.6 using pyspark. of tables columns saved text, i'm wondering if can change this:
product = sc.textfile('s3://path/product.txt') product = m3product.map(lambda x: x.split("\t")) product = sqlcontext.createdataframe(product, ['productid', 'marketid', 'productname', 'prod']) product.saveastable("product", mode='overwrite')
is there in last 2 commands automatically recognize productid , marketid numerics? have lot of files , lot of fields upload ideally automatic
is there in last 2 commands automatically recognize productid , marketid numerics
if pass int
or float
(depending on need) pyspark convert data type you.
in case, changing lambda function in
product = m3product.map(lambda x: x.split("\t")) product = sqlcontext.createdataframe(product, ['productid', 'marketid', 'productname', 'prod'])
to
from pyspark.sql.types import row def split_product_line(line): fields = line.split('\t') return row( productid=int(fields[0]), marketid=int(fields[1]), ... ) product = m3product.map(split_product_line).todf()
you find easier control data types , possibly error/exception checks.
try prohibit lambda functions if possible :)
Comments
Post a Comment