This page is intended as a list of useful commands to refer back to when using Azure CLI. Feel free to add your own by forking the repo, or creating an issue. Some things are Azure CLI specific but some are just helpful Bash things.
Microsoft help pages
These have a lot of info and are where I have figured out most things but there is so much there it is a little overwhelming.
Useful commands
List pools
az batch pool list
returns all the info for every pool.
You can use --query
to simplify this. The query below
returns the id and information on numbers of nodes for all pools. I have
also changed the --output
from the default json to table
since it is more compact. query
and output
are
global parameters available for most functions.
az batch pool list \
--query "[].{name:id, vmSize:vmSize, curNodes: currentDedicatedNodes,tarNodes: targetDedicatedN
odes, allocState:allocationState}" \
--output table
#> Name VmSize CurNodes TarNodes AllocState
#> --------------- -------------- ---------- ---------- ------------
#> amartin_pool4 standard_d8_v3 1 1 steady
#> amartin_pool5 standard_d8_v3 1 1 steady
#> amartin_pool6 standard_d8_v3 1 1 steady
#> sendicott_roads standard_e2_v3 1 0 resizing
To break down the query, the first []
means we are going
to get values from each element of an array. The part in {}
chooses which properties to return and what their display names should
be. These are JMESPath querys and they can be very powerful. See this help
page for more details.
For example, we can also add a filter to the query:
az batch pool list \
--query "[?id == 'sendicott_roads'].{name:id, vmSize:vmSize, curNodes: currentDedicatedNodes,tarNodes: targetDedicatedNodes, allocState:allocationState}" \
--output table
#> Name VmSize CurNodes TarNodes AllocState
#> --------------- -------------- ---------- ---------- ------------
#> sendicott_roads standard_e2_v3 1 0 resizing
az batch pool list \
--query "[?contains(id, 'amartin')].{name:id, vmSize:vmSize, curNodes:currentDedicatedNodes,tarNodes: targetDedicatedNodes, allocState:allocationState}" \
--output table
#> Name VmSize CurNodes TarNodes AllocState
#> ---------- -------------- ---------- ---------- ------------
#> amartin_pool4 standard_d8_v3 1 1 steady
#> amartin_pool5 standard_d8_v3 1 1 steady
#> amartin_pool6 standard_d8_v3 1 1 steady
Get subnet id for use in JSON pool files
We can also save the output of a query to a variable and use it in future commands
Get sastoken and sasurl
# 7 days seems to be the max that the cli will allow
end=`date -u -d "7 days" '+%Y-%m-%dT%H:%MZ'`
sastoken=`az storage container generate-sas --account-name ecdcwls --expiry $end --name sendicott --permissions racwdli -o tsv --auth-mode login --as-user`
sasurl=https://ecdcwls.blob.core.windows.net/sendicott/?$sastoken
Copy folder to storage
If you copy individual files they will end up in the root of the
storage container, if you copy a folder using --recursize
the folder structure will be reflected in the storage container. If you
copy the contents of the folder using a wildcard (path/to/folder/*) it
will copy each file to the root of the container.
Copy/download files from storage
You can download files using a pattern. The below will download all files in sendicott container that end in “.R” and save them in a new folder called scripts2
az storage copy -s https://ecdcwls.blob.core.windows.net/sendicott/*?$sastoken \
-d "analyses/scripts2" --include-pattern "*.R"
Or by adding the folder name after the container name and before
?$sastoken
: Using recursive will copy the folder with its
contents:
az storage copy -s https://ecdcwls.blob.core.windows.net/sendicott/scripts?$sastoken \
-d "analyses/scripts3" --recursive
But using the folder name followed by * will just copy each file
Delete files on storage
Remove files matching pattern:
az storage remove -c sendicott --include-pattern "*.rds" --account-name ecdcwls --sas-token $sastoken --recursive
Delete one folder
az storage remove -c sendicott -n scripts --account-name ecdcwls \
--sas-token $sastoken --recursive
Delete everything in container Note you can also exclude files
matching a pattern with
e.g. --exclude-pattern "*.gpkg"
.
Create pool, job, task
poolName="test_pool_json_cli"
jobName="sendicott_job_test"
az batch pool create --json-file cloud_config/template_pool_cli.json
az batch job create --pool-id $poolName --id $jobName
az batch task create --json-file cloud_config/template_task_hello.json --job-id $jobName
Or create many tasks with a loop. This assumes you have created may different task jsons with different files that are numbered.
Reactivate task
This will re-run a task that failed. Useful if you fixed one of the scripts and re-uploaded it or if the error was a stochastic thing and you want to try again. Won’t re-run successful tasks
Monitoring tasks
# details for a single task filtered by query
az batch task show --job-id $jobName --task-id sampleTask \
--query "{state: state, executionInfo: executionInfo}" --output yaml
# Summary of state for all tasks in job
az batch job task-counts show --job-id $jobName
# Get other info from multiple tasks
az batch task list --job-id $jobName \
--query "{tasks: [].[id, executionInfo.{st:startTime, end:endTime}][]}" --output yaml
Manage nodes in pool
When running many tasks in a pool I usually start one node, check that it is working and then resize to the desired number of nodes.
# set target dedicated nodes
az batch pool resize --pool-id $poolName --target-dedicated-nodes 3
# check pool node counts
az batch pool list \
--query "[?id=='"$poolName"'].{curNodes: currentDedicatedNodes,tarNodes: targetDedicatedNodes, allocState:allocationState}" \
--output yaml
# check node state
az batch node list --pool-id $poolName --query "{nodes: [].[id, state][]}" --output json
Enable autoscaling
Once a pool is running well and I am confident that results will be saved as expected I enable autoscaling so that we will not be charged after the task is finished running. This will mean that the task is deleted after it completes so you won’t be able to look at logs or files on it afterwards. One confusing aspect of autoscaling is that it doesn’t happen immediately the interval set below is the shortest available and means it will run every 5 minutes. So generally if it seems like it is not working, wait 5 mins and check again. The formula below is hard to read, there is documentation available here and I will try to explain the logic of this example. First I have set max VMs to 25 since our account has a max of 27, you may want to set this lower if others are using the account at the same time. Next I find out how many tasks are currently waiting (“active tasks”) and how many are currently running. Unfortunately this is not a simple number because these values are not exactly accurate in real time, they are recorded every 30s and sometimes that process fails so the value is missing. Therefore the recommendation is to use a sampling of values, in this case I have chosen the last 2 minutes which returns a vector of values recorded over the last 2 minutes. I take the max of these values and add the current and active tasks together to get the total number of tasks that I want to run. Then I determine the number of nodes that are currently available. I use the max of current and target nodes so that it will not think there are 0 nodes if the pool is still starting up or if the target was 0 but new tasks have been added. Next I determine the total number of tasks that can run at a time (called cores below). Then I calculate the number of additional nodes that are needed given the total number of tasks minus the existing number of cores and the number of task slots per node. Then we set the target dedicated nodes making sure it is between 0 and 25. Finally, I set the node deallocation option so that all the tasks running on the node must finish before it is shutdown.
# you can prompt autoscaling of pool by changing time interval
az batch pool autoscale enable --pool-id $poolName --auto-scale-evaluation-interval "PT5M"\
--auto-scale-formula 'maxNumberofVMs = 25;
$atasks = $ActiveTasks.GetSample(TimeInterval_Minute * 2, 1);
$rtasks = $RunningTasks.GetSample(TimeInterval_Minute * 2, 1);
$tasks = max($atasks) + max($rtasks);
$nodes = max($CurrentDedicatedNodes, $TargetDedicatedNodes);
$cores = $nodes*$TaskSlotsPerNode;
$extraVMs = ceil((($tasks - $cores) + 0) / $TaskSlotsPerNode);
$targetVMs = ($nodes + $extraVMs);
$TargetDedicatedNodes = max(0, min($targetVMs, maxNumberofVMs));
$NodeDeallocationOption = taskcompletion;'
Although autoscaling only runs every 5 minutes you can check what the result would be by evaluating the autoscaling formula. This only works after autoscaling has been enabled.
# This will show what the formula would return if autoscaling ran now.
az batch pool autoscale evaluate --pool-id $poolName \
--auto-scale-formula 'maxNumberofVMs = 25;
$atasks = $ActiveTasks.GetSample(TimeInterval_Minute * 2, 1);
$rtasks = $RunningTasks.GetSample(TimeInterval_Minute * 2, 1);
$tasks = max($atasks) + max($rtasks);
$nodes = max($CurrentDedicatedNodes, $TargetDedicatedNodes);
$cores = $nodes*$TaskSlotsPerNode;
$extraVMs = ceil((($tasks - $cores) + 0) / $TaskSlotsPerNode);
$targetVMs = ($nodes + $extraVMs);
$TargetDedicatedNodes = max(0, min($targetVMs, maxNumberofVMs));
$NodeDeallocationOption = taskcompletion;'
If autoscaling is not working as desired or you want to change the formula you can disable autoscaling and manually set the number of nodes needed. But make sure to turn autoscaling back on once all the tasks have started to ensure the pool will not keep running after all tasks are complete.
Get GitHub PAT token
There is probably another way to set and retrieve a GitHub PAT within bash but this was the simplest way I found to get the token so I could supply it to a script programatically to avoid saving it in files that might accidentally get shared.
Use sed
to replace placeholder text in files
sed is a command line tool that allows you to replace a placeholder in a text file with a value generated by your script for example using the sastoken created above:
sed 's,<sastoken>,'${sastoken//&/\\&}',g' cloud_config/template_task_with_file_in_out.json > cloud_config/task_to_use.json
This is important to avoid hard coding secret passwords or token in
to your saved files. And you should remove the “to_use” file that
contains the password when you are done
(rm cloud_config/task_to_use.json
). You can also add
*to_use*
to your .gitignore file to make sure they don’t
accidentally get added to GitHub.
The syntax of sed is a bit tricky and is complicated by special
characters in sastoken. In the command above the s indicates we want to
use seds subsitiution command, the comma (,) is a delimiter that shows
the start of the string to be replaced. “//&/\\&
will just add a \
before all
the & in the sastoken so they don’t get interpreted as special
characters by sed. The third comma shows the end of the text to replace
with and the g indicates that every instance of the pattern should be
replaced. Finally the first path is the template file containing the
placeholder and the second is the path to the file to create.
/
is the default delimiter used in sed but it doesn’t work
if there are slashes in the replacement text so I have used commas
instead. See the sed
documentation for more details and examples.
You can also use sed with a loop to create many different versions of a file:
for rowi in {1..27}
do
sed 's,<SASURL>,'${sasurl//&/\\&}',g' analysis/cloud/task_roads.json\
| sed 's,<row>,'$rowi',g' > analysis/cloud/task_jsons/task_roads_$rowi.json
done
The above first replaces <SASURL>
in the file and
then uses the pipe |
to pass the result on to another sed
command that replaces <row>
with the loop index
rowi
and saves it to a file with the row appended. This can
be helpful for creating many different versions of a task.