bash - AWS Data Pipeline - How to set global pipeline variable from ShellCommandActivity -


i trying augment pipeline (migrates data rds redshift) selects rows id greater maximum id exists in redshift. have script in python calculates value , returns output. want take output , save variable max_id can later reference in rds selection query. example, rds selection section looks this:

{   "database": {     "ref": "rds_mysql"   },   "scheduletype": "timeseries",   "name": "srcrdstable",   "id": "srcrdstable",   "type": "sqldatanode",   "table": "#{myrdstablename}",   "selectquery": "select * #{table} #{myrdstablelastmodifiedcol} > '#{max_id}'" }, 

i want add section before execute bash script, retrieve id field , save variable max_id can referenced in above code. far have:

{  "mycomment": "retrieves maximum id given table in redshift",   "id": "shellcommandactivity_max_id",   "workergroup": "wg-12345",   "type": "shellcommandactivity",   "command": "starting_point=$(/usr/bin/python /home/user/aws-taskrunner-docker/get_id.py --schema=schema_name --table=users --database=master)" }, 

how can adjust above set max_id value of starting_point? thanks.

unfortunately, don't think there's way set pipeline parameter during pipeline execution. here couple options may you.

first, if data table has column modification date, can use pipeline template incremental copy of rds mysql redshift. if you're not using mysql, may still able modify template needs.

alternatively, instead of using sqldatanode, create shellcommandactivity uses python connect rds database , exports relevant record set s3. import records s3 using redshiftcopyactivity.


Comments