最近在使用 AzCopy 的時候,發現怎麼跟以前差這麼多,這才發現原來最近出現了大改版,命令列的參數都跟以往不一樣了。這個新版改變蠻大的,我覺得對一個用過舊版的人來說,改用新版的第一印象真的不太好,研究的過程中發現了許多地雷,也發現許多優點,所以覺得有必要撰文分享一下心得。
安裝方式
在 Windows 我都是透過 Chocolatey 進行安裝:
-
舊版 (v8)
安裝舊版 AzCopy 並不會幫你註冊 PATH
環境變數,所以就算你安裝好,也必須透過 C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe
才能執行程式,否則你就叫自行設定 PATH
環境變數。
choco install azcopy -y
舊版查詢使用說明要用以下命令:
azcopy /?
-
新版 (v10)
我今天才覺得奇怪,透過 choco install azcopy
安裝好之後,怎麼連基本的 PATH
環境變數都不幫我設定好,到後來才發現原來 AzCopy 推出了新版(v10)。新版 AzCopy v10 透過 azcopy10
來安裝就會自動設定好預設執行路徑了。
choco install azcopy10 -y
新版查詢使用說明要用以下命令:
azcopy --help
新版 AzCopy v10 使用方式
秉持著台灣人從不看使用手冊,一種球來就打的精神,安裝好 azcopy10
之後,當然就是直接用 azcopy login --help
查詢登入用法。使用新版 AzCopy v10 最讓人崩潰的地方,就是連「登入」都有門檻,登入過程極其複雜,除了有非常多授權的選項外,連最簡單的選項基本上都沒辦法很順利的使用,而且發生錯誤的時候,也沒有清楚的指引。
-
登入 Azure AD 帳戶
使用 AzCopy 登入最簡單的方法,就是使用 Azure AD 帳戶登入,只要執行 azcopy login
就好,不過我卻怎樣都無法登入成功!
我接著看 Authorize access to blobs with AzCopy and Azure Active Directory (Azure AD) 這份文件,他說如果你無法用 azcopy login
登入的話,請改用 azcopy login --tenant-id=<tenant-id>
來登入,而這招確實可以順利登入成功!
-
列出 Blob Storage 特定 Container 下的檔案清單
雖然我可以登入成功,但我執行 azcopy list https://xxxxx.blob.core.windows.net/site
卻會得到以下錯誤訊息:
INFO: Authenticating to source using Azure AD
failed to traverse container: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/azure-storage-blob-go@v0.10.1-0.20210407023846-16cf969ec1c3/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=AuthorizationPermissionMismatch) =====
Description=This request is not authorized to perform this operation using this permission.
RequestId:76f314d5-c01e-002f-064b-9763e0000000
Time:2021-08-22T11:49:43.1775028Z, Details:
Code: AuthorizationPermissionMismatch
GET https://xxxxx.blob.core.windows.net/site?comp=list&delimiter=%2F&include=metadata&restype=container&timeout=901
Authorization: REDACTED
User-Agent: [AzCopy/10.11.0 Azure-Storage/0.13 (go1.15; Windows_NT)]
X-Ms-Client-Request-Id: [6de2cb93-6ca7-4adc-450f-2ec3b65c058d]
X-Ms-Version: [2019-12-12]
--------------------------------------------------------------------------------
RESPONSE Status: 403 This request is not authorized to perform this operation using this permission.
Content-Length: [279]
Content-Type: [application/xml]
Date: [Sun, 22 Aug 2021 11:49:42 GMT]
Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
X-Ms-Client-Request-Id: [6de2cb93-6ca7-4adc-450f-2ec3b65c058d]
X-Ms-Error-Code: [AuthorizationPermissionMismatch]
X-Ms-Request-Id: [76f314d5-c01e-002f-064b-9763e0000000]
X-Ms-Version: [2019-12-12]
這真的是一段相當難以理解的訊息啊! 🙄
我再執行一次 azcopy login --help
,可以從第一段文字看到一段重點:
To be authorized to your Azure Storage account, you must assign the Storage Blob Data Contributor role to your user account in the context of either the Storage account, parent resource group or parent subscription.
原來身為 Owner
或 Service administrator
是沒有用的,還要從 Azure Storage account 額外授權 Storage Blob Data Contributor 角色才行。然而,我授權了這個角色給我自己,結果一樣不能用。原來還要重新用 azcopy login --tenant-id=<tenant-id>
登入一次,才能取得完整的存取授權! 🔥
官網文件說我至少要授權 Storage Blob Data Contributor 或 Storage Blob Data Owner 角色才能上傳檔案,但沒叫我要重新登入啊!原來是因為更新 RBAC 授權後,要重新取得 Access Token 才能擁有新的權限,所以要重新登入才能取得新的 Access Token!
-
上傳資料夾到 Blob Storage 特定 Container 下
如果要將目前資料夾所有檔案與完整的子資料夾與檔案都上傳,請用以下命令:
azcopy cp * https://xxxxx.blob.core.windows.net/site/ --recursive
注意:使用 azcopy cp
的時候,千萬不要用 .
來代表目前所在資料,因為這樣寫會把當前資料夾的目錄名稱也複製進去。例如你在 C:\site
資料夾下執行上述命令,就會上傳到 https://xxxxx.blob.core.windows.net/site/site/
目錄下,這可能不是你想要的結果。
azcopy cp . https://xxxxx.blob.core.windows.net/site/ --recursive
如果你路徑混用 /
與 \
的話,將會得到一個 panic
錯誤,AzCopy 的 Exit Code 為 2
:
C:\> azcopy sync D:\a\r1\a/QA-CI/drop/site "https://xxxxx.blob.core.windows.net/site"
panic: inconsistent path separators. Some are forward, some are back. This is not supported.
-
刪除 Blob Storage 特定 Container 下所有檔案
azcopy rm https://xxxxx.blob.core.windows.net/site/* --recursive
-
建立一個 Blob Container
我們在使用 Static website hosting in Azure Storage 的時候,會需要建立一個名為 $web
的 Blob Container,此時你可以很簡便的使用 azcopy make
來建立。
在 Command Prompt 底下,你要這樣執行:
azcopy make "https://xxxxx.blob.core.windows.net/$web"
在 PowerShell 底下,你一定要用「單引號」框住網址才行,不然 $web
會變成去讀取一個名為 web
的變數:
azcopy make 'https://xxxxx.blob.core.windows.net/$web'
如果要啟用 Static website hosting in Azure Storage 功能,還需要執行以下命令才能啟用:
# 啟用 Static website hosting
az storage blob service-properties update --auth-mode login \
--account-name 'xxxxx' \
--static-website \
--404-document 'index.html' \
--index-document 'index.html'
# 取得對外公開的主要端點 (網址)
az storage account show --name 'xxxxx' --query 'primaryEndpoints.web' -o tsv
-
同步兩個資料夾
目前 AzCopy v10 支援以下四種來源與目的的組合:
- Local <-> Azure Blob / Azure File (either SAS or OAuth authentication can be used)
- Azure Blob <-> Azure Blob (Source must include a SAS or is publicly accessible; either SAS or OAuth authentication can be used for destination)
- Azure File <-> Azure File (Source must include a SAS or is publicly accessible; SAS authentication should be used for destination)
- Azure Blob <-> Azure File
從本機資料夾同步到 Blob Container 中
azcopy sync . https://xxxxx.blob.core.windows.net/site/
從本機資料夾同步到 Blob Container 中,但本機資料夾如果有刪除檔案,遠端也會跟著刪除
azcopy sync . https://xxxxx.blob.core.windows.net/site/ --delete-destination=true
然而使用同步功能有以下注意事項:
- 預設是以「資料夾」為單位,所以
--recursive
預設是啟用的
- 同步的過程主要是比對來源端與目的端的檔案修改時間而定,如果目的端的檔案比較新的話,檔案就不會同步過去,可以大幅節省同步的時間。
-
取得可用的環境變數清單
AzCopy v10 有許多環境變數可以用來改變 AzCopy 的行為,完整的清單說明請見 AzCopy v10 configuration settings (Azure Storage) 官方文件。
azcopy env
-
管理 AzCopy 作業 (Jobs)
AzCopy 在執行的時候,會將預計進行的複製工作建立所謂的 Jobs (作業),當你遇到任何錯誤,作業會中斷,但是會記憶上次尚未完成的進度,並在下次執行相同參數命令的時候,自動接續上次未完成的作業,將還沒有複製完成的檔案持續嘗試複製成功。所以 Jobs 管理有好幾種使用案例。
列出所有 AzCopy Jobs
azcopy jobs list
顯示特定一個 AzCopy Jobs 作業內容
azcopy jobs show [jobID]
接續特定一個 AzCopy Jobs 作業
azcopy jobs resume [jobID]
刪除特定一個 AzCopy Jobs 作業
azcopy jobs remove [jobID]
清空所有 AzCopy Jobs 紀錄
azcopy jobs clean
-
登出 AzCopy (清空認證快取)
azcopy logout
-
產生 Markdown 格式的 AzCopy 文件
你可指定一個目錄,儲存完整的 AzCopy 說明文件 (Markdown format),方便你查詢相當完整的 AzCopy 使用方法!
azcopy doc --output-location ./doc
使用 SAS 的注意事項
SAS (shared access signature) 是唯一不用登入就可以存取 Azure Storage 的方法,但是 Blob SAS token 的內容可能會包含 %
符號,這個符號在 Command Prompt 執行沒問題,但是寫在 批次檔 (*.bat
) 就不一樣了,因為 %
是特殊字元,直接寫在批次檔中,執行時會出現以下錯誤:
AuthenticationErrorDetail: Signature size is invalid
你要將 Blob URL (Blob SAS token) 上面的 %
全部換成 %%
才可以正常執行!
關於 Azure Pipelines 的補充說明
Azure Pipelines 提供的 Hosted Agent 有提供兩個不同的 AzCopy 版本:
-
舊版 AzCopy v3.1.0
"%AGENT_HOMEDIRECTORY%\externals\azcopy\azcopy.exe" /?
D:\a\r1\a>"C:\agents\2.190.0\externals\azcopy\azcopy.exe" /?
------------------------------------------------------------------------------
AzCopy 3.1.0 Copyright (c) 2014 Microsoft Corp. All Rights Reserved.
------------------------------------------------------------------------------
AzCopy </Source:> </Dest:> [/SourceKey:] [/DestKey:] [/V:] [/Z:] [/@:] [/Y]
[/SourceSAS:] [/DestSAS:] [/SourceType:] [/DestType:] [/S] [/Pattern:]
[/CheckMD5] [/L] [/MT] [/XN] [/XO] [/A] [/IA] [/XA] [/NC:] [/BlobType:]
[/Delimiter:] [/Snapshot] [/SyncCopy] [/SetContentType]
/Source:<source> Specifies the source data from which to copy.
The source can be a file system directory, a
blob container or a blob virtual directory.
/Dest:<destination> Specifies the destination to copy to. The
destination can be a file system directory,
a blob container or a blob virtual directory.
/SourceKey:<storage-key> Specifies the storage account key for the
source resource.
/DestKey:<storage-key> Specifies the storage account key for the
destination resource.
/V:[verbose-log-file] Outputs verbose status messages into a log
file.
By default, the verbose log file is named
AzCopyVerbose.log in
%LocalAppData%\Microsoft\Azure\AzCopy. If you
specify an existing file location for this
option, the verbose log will be appended to
that file.
/Z:[journal-file-folder] Specifies a journal file folder for resuming an
operation.
AzCopy always supports resuming if an
operation has been interrupted.
If this option is not specified, or it is
specified without a folder path, then AzCopy
will create the journal file in the default
location, which is
%LocalAppData%\Microsoft\Azure\AzCopy.
Each time you issue a command to AzCopy, it
checks whether a journal file exists in the
default folder, or whether it exists in a
folder that you specified via this option. If
the journal file does not exist in either
place, AzCopy treats the operation as new and
generates a new journal file.
If the journal file does exist, AzCopy will
check whether the command line that you input
matches the command line in the journal file.
If the two command lines match, AzCopy resumes
the incomplete operation. If they do not match,
you will be prompted to either overwrite the
journal file to start a new operation, or to
cancel the current operation.
The journal file is deleted upon successful
completion of the operation.
Note that resuming an operation from a journal
file created by a previous version of AzCopy
is not supported.
/@:<parameter-file> Specifies a file that contains parameters.
AzCopy processes the parameters in the file
just as if they had been specified on the
command line.
In a response file, you can either specify
multiple parameters on a single line, or
specify each parameter on its own line. Note
that an individual parameter cannot span
multiple lines.
Response files can include comments lines that
begin with the # symbol.
You can specify multiple response files.
However, note that AzCopy does not support
nested response files.
/Y Suppresses all AzCopy confirmation prompts.
/SourceSAS:<SAS-Token> Specifies a Shared Access Signature with READ
and LIST permissions for the source (if
applicable). Surround the SAS with double
quotes, as it may contains special command-line
characters.
If the source resource is a blob container,
and neither a key nor a SAS is provided, then
the blob container will be read via anonymous
access.
/DestSAS:<SAS-Token> Specifies a Shared Access Signature (SAS) with
READ and WRITE permissions for the
destination (if applicable).
Surround the SAS with double quotes, as it may
contains special command-line characters.
If the destination resource is a blob
container, you can either specify this option
followed by the SAS token, or you can specify
the SAS as part of the destination blob
container, without this option.
If the source and destination are both blobs,
then the destination blob must reside within
the same storage account as the source blob.
/SourceType:<blob> Specifies that the source resource is a blob
available in the local development environment,
running in the storage emulator.
/DestType:<blob> Specifies that the destination resource is a
blob available in the local development
environment, running in the storage emulator.
/S Specifies recursive mode for copy operations.
In recursive mode, AzCopy will copy all blobs
that match the specified file pattern,
including those in subfolders.
/Pattern:<file-pattern> Specifies a file pattern that indicates which
files to copy.
The behavior of the /Pattern parameter is
determined by the location of the source data,
and the presence of the recursive mode option.
Recursive mode is specified via option /S.
If the specified source is a directory in
the file system, then standard wildcards are
in effect, and the file pattern provided is
matched against files within the directory.
If option /S is specified, then AzCopy also
matches the specified pattern against all
files in any subfolders beneath the directory.
If the specified source is a blob container or
virtual directory, then wildcards are not
applied. If option /S is specified, then AzCopy
interprets the specified file pattern as a blob
prefix. If option /S is not specified, then
AzCopy matches the file pattern against exact
blob names.
The default file pattern used when no file
pattern is specified is *.* for a file system
location or an empty prefix for an Azure
Storage location.
Specifying multiple file patterns is not
supported.
/CheckMD5 Calculates an MD5 hash for downloaded data and
verifies that the MD5 hash stored in the blob
or file's Content-MD5 property matches the
calculated hash. The MD5 check is turned off by
default, so you must specify this option to
perform the MD5 check when downloading data.
Note that Azure Storage doesn't guarantee that
the MD5 hash stored for the blob is
up-to-date. It is client's responsibility to
update the MD5 whenever the blob is
modified.
AzCopy always sets the Content-MD5 property for
an Azure blob after uploading it to the
service.
/L Specifies a listing operation only; no data is
copied.
/MT Sets the downloaded file's last-modified time
to be the same as the source blob's.
/XN Excludes a newer source resource. The resource
will not be copied if the source is newer than
destination.
/XO Excludes an older source resource. The resource
will not be copied if the source resource is
older than destination.
/A Uploads only files that have the Archive
attribute set.
/IA:[RASHCNETOI] Uploads only files that have any of the
specified attributes set.
Available attributes include:
R Read-only files
A Files ready for archiving
S System files
H Hidden files
C Compressed file
N Normal files
E Encrypted files
T Temporary files
O Offline files
I Not content indexed Files
/XA:[RASHCNETOI] Excludes files from upload that have any of the
specified attributes set.
Available attributes include:
R Read-only files
A Files ready for archiving
S System files
H Hidden files
C Compressed file
N Normal files
E Encrypted files
T Temporary files
O Offline files
I Not content indexed Files
/NC:<number-of-concurrent> Specifies the number of concurrent operations.
AzCopy by default starts a certain number of
concurrent operations to increase the data
transfer throughput.
Note that large number of concurrent operations
in a low-bandwidth environment may overwhelm
the network connection and prevent the
operations from fully completing. Throttle
concurrent operations based on actual available
network bandwidth.
The upper limit for concurrent operations is
512.
/BlobType:<page | block> Specifies whether the destination blob is a
block blob or a page blob.
If the destination is a blob and this option
is not specified, then by default AzCopy will
create a block blob.
/Delimiter:<delimiter> Indicates the delimiter character used to
delimit virtual directories in a blob name.
By default, AzCopy uses / as the delimiter
character. However, AzCopy supports using any
common character (such as @, #, or %) as a
delimiter. If you need to include one of these
special characters on the command line, enclose
it with double quotes.
This option is only applicable for downloading
blobs.
/Snapshot Indicates whether to transfer snapshots. This
option is only valid when the source is a blob.
The transferred blob snapshots are renamed in
this format: [blob-name] (snapshot-time)
[extension].
By default, snapshots are not copied.
/SyncCopy Indicates whether to synchronously copy blobs
among two Azure Storage end points.
AzCopy by default uses server-side asynchronous
copy. Specify this option to download the blobs
from the service to local memory and then
upload them to the service.
/SetContentType:[content-
type] Specifies the content type of the destination
blobs.
AzCopy by default uses
"application/octet-stream" as the content type
for the destination blobs. If option
/SetContentType is specified without a value
for "content-type", then AzCopy will set each
blob's content type according to its file
extension. To set same content type for all the
blobs, you must explicitly specify a value for
"content-type".
##
## Samples ##
##
#1 - Download a blob from Blob storage to the file system, for example,
download 'https://myaccount.blob.core.windows.net/mycontainer/abc.txt'
to 'D:\test\'
AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer/
/Dest:D:\test\ /SourceKey:key /Pattern:"abc.txt"
#2 - Copy a blob within a storage count
AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer1/
/Dest:https://myaccount.blob.core.windows.net/mycontainer2/
/SourceKey:key /DestKey:key /Pattern:"abc.txt"
#3 - Upload files and subfolders in a directory to a container, recursively
AzCopy /Source:D:\test\
/Dest:https://myaccount.blob.core.windows.net/mycontainer/
/DestKey:key /S
#4 - Upload files matching the specified file pattern to a container,
recursively.
AzCopy /Source:D:\test\
/Dest:https://myaccount.blob.core.windows.net/mycontainer/ /DestKey:key
/Pattern:*ab* /S
#5 - Download blobs with the specified prefix to the file system, recursively
AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer/
/Dest:D:\test\ /SourceKey:key /Pattern:"a" /S
------------------------------------------------------------------------------
Learn more about AzCopy at
http://aka.ms/azcopy.
------------------------------------------------------------------------------
-
新版 AzCopy v10.11.0
(有在預設 PATH
路徑內)
C:\ProgramData\Chocolatey\bin\azcopy.exe --help
AzCopy 10.11.0
Project URL: github.com/Azure/azure-storage-azcopy
AzCopy is a command line tool that moves data into and out of Azure Storage.
To report issues or to learn more about the tool, go to github.com/Azure/
The general format of the commands is: 'azcopy [command] [arguments] --[flag-name]=
Usage:
azcopy [command]
Available Commands:
bench Performs a performance benchmark
copy Copies source data to a destination location
doc Generates documentation for the tool in Markdown format
env Shows the environment variables that you can use to configure the behavior
help Help about any command
jobs Sub-commands related to managing jobs
list List the entities in a given resource
login Log in to Azure Active Directory (AD) to access Azure Storage resources.
logout Log out to terminate access to Azure Storage resources.
make Create a container or file share.
remove Delete blobs or files from an Azure storage account
sync Replicate source to the destination location
Flags:
--cap-mbps float Caps the transfer rate, in megabits per ughput might vary slightly from the cap. If this option is set to zero, or it is omitted,
-h, --help help for azcopy
--output-type string Format of the command's output. The choices ult value is 'text'. (default "text")
--trusted-microsoft-suffixes string Specifies additional domain suffixes where tokens may be sent. The default is '*.core.windows.net;*.core.chinacloudapi.cn;*.core.api.net;*.storage.azure.net'. Any listed here are added to the default. For security, you re domains here. Separate multiple entries with semi-colons.
-v, --version version for azcopy
Use "azcopy [command] --help" for more information about a command.
後記
我有特別注意到 AzCopy v10 的執行效率比起以往高出許多,功能也比以前強大,同步功能也相當便利。除此之外,CLI 命令的使用上比以往簡單太多,這點要大大加分。
舊版的 AzCopy 可以直接用 Storage account 的 Access key 來存取 Blob 中的檔案,不用額外授權就可以使用。但是新版不能這樣用了,不用額外授權就可以使用的方式,必須靠 SAS (shared access signature) 來存取。任何其他的存取方式,像是 user identity, managed identity (還有分 system-wide managed identity 與 user-assigned managed identity 兩種) 與 service principal (還有分 Client Secret 與 Certificate 兩種) 都需要特別授權才能存取,即便你是 Owner
權限都無法使用,這部分應該是唯一要抱怨的地方了吧!雖然對 Azure 新手來說門檻太高,但我也覺得這樣是比較安全的設計。
相關連結
- AzCopy v10
- AzCopy v8
- AzCopy v7