Introduction
This Blog is targeted for Linux user who want to use the Linux shell to work with Microsoft HDInsight cluster to upload data and scrip files and submit Pig Latin Jobs from the Shell without going into any other interface. Few information will be gathered from the azure management portal in order to successfully run the below scripts
Software requirements
To work with azure from linux you need to install Node.js, make sure that the dependencies are available
$ sudo apt-get install g++ curl libssl-dev apache2-utils
$ sudo apt-get install git-core
Then install Node.js and install it
$ git clone git://github.com/ry/node.git
$ cd node
$ ./configure
$ make
$ sudo make install
For more details please check this URL http://howtonode.org/how-to-install-nodejs
After installing Node.js now we need to install the azure management package for working with azure account
$ sudo npm install azure-cli –g
The last 12 lines of the output should look like the below
Then use the following command to get connected to the azure library
$ azure account download
Working with WASB from Linux shell
Now after installing the azure-cli on linux you’ll be able to start working with the azure blob storage. But before uploading and downloading you need to set those 2 environment variables in the shell as follows
$ export AZURE_STORAGE_ACCOUNT=’<StorageAcccountName>’
$ export AZURE_STORAGE_ACCESS_KEY=’<StorageAccessKey>’
Access key can me found from logging to the azure management portal > Storage > choose your storage account and click on it. The click on Manage Access Keys at the lower strip.
Now you’ll be able to easily manipulate and work with the storage you’ve just set its parameters. Let’s look at some examples.
i. Upload files to blob storage
$ azure storage blob upload [File] [Container] [blob]
[File]: the name of the local file on your system
[Container]: name of the container of the storage account you want to upload to
[Blob]: name of the blob which is the name of the file when uploaded
ii. Download file from blob storage
$ azure storage blob download [Container] [Blob] [File]
iii. List available all blobs on the storage
$ azure storage blob list
Find much more details at the following URL
Submitting Pig Latin jobs from Linux shell
To submit Pig Latin Jobs from the Linux shell I used the cURL library that’s available for download on the Linux Shell. cURL will call the WebHCat REST APIs that works with Pig and Hive on the Hadoop cluster. The full documentation of the WebHCat REST APIs can be found here http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_dataintegration/content/ch_using_hcatalog_1.html
1. To check the status of the WebHcat that your connectivity is ok and the server is up use the following command
$ curl -i ‘https://[clustername].azurehdinsight.net/templeton/v1/status’ -u [username]:[password]
[clustername]: name of the provisioned cluster
Note that because HDInsight uses SSL for accessing the templeton (WebHCat) REST APIs you’ll need to submit the username and password of the cluster.
You should receive message of 200 Server is running
2. To Submit pig job first upload you’re pig script file to the blob storage and use the following command if you pig scrip in the default storage set at provisioning the cluster
$ curl -d file=wasb:///filename -u [username]:[password] ‘https://[clustername].azurehdinsight.net/templeton/v1/pig’ -d user.name=admin
The user.name= you put the username who the pig script will be identified with on the map reduce function.
3. The following command will submit a Pig job where the script is located in another blob storage
$ curl –d file=wasb://<contrainername>@<strageaccountname>.blob.core.windows.net/run.pig -u [username]:[password] ‘https://[clustername].azurehdinsight.net/templeton/v1/pig’ -d user.name=admin
4. After executing you’ll receive the Job Id before the cursor you can use this id to check the status of your scrip
$ curl -u admin:password -s ‘https://[clustername].azurehdinsight.net/templeton/v1/queue/[jobif]?user.name=[username]‘
this summarizes all activities needed to work with HDinisght from Shell Script only .