apache spark - CreateDataFrame or SaveAsTable intuitively encode in pyspark 1.6 -


i trying save table in spark1.6 using pyspark. of tables columns saved text, i'm wondering if can change this:

product = sc.textfile('s3://path/product.txt')  product = m3product.map(lambda x: x.split("\t")) product = sqlcontext.createdataframe(product, ['productid', 'marketid', 'productname', 'prod']) product.saveastable("product", mode='overwrite') 

is there in last 2 commands automatically recognize productid , marketid numerics? have lot of files , lot of fields upload ideally automatic

is there in last 2 commands automatically recognize productid , marketid numerics

if pass int or float (depending on need) pyspark convert data type you.

in case, changing lambda function in

product = m3product.map(lambda x: x.split("\t")) product = sqlcontext.createdataframe(product, ['productid', 'marketid', 'productname', 'prod']) 

to

from pyspark.sql.types import row  def split_product_line(line):     fields = line.split('\t')      return row(         productid=int(fields[0]),         marketid=int(fields[1]), ...     )  product = m3product.map(split_product_line).todf() 

you find easier control data types , possibly error/exception checks.

try prohibit lambda functions if possible :)


Comments