Rclone

Rclone-120x120.png

What is Rclone?

Rclone, sometimes known as rsync for the cloud, is a tool written in Go which is used to transfer data to or from a computer and a cloud hosted data storage center. Rclone can connect to several different cloud storage centers such as: Amazon Cloud Drive, Amazon S3, Backblaze B2, Dropbox, Google Cloud Storage, Google Drive, Openstack Swift, Microsoft Onedrive, and others. Rclone can be used on the CRC front ends to upload/download data from your Google Drive or other Cloud Hosted Data storage to your AFS or /scratch spaces.


Setup and Configuration

Rclone is not installed system-wide at the CRC and so it must be installed into your own AFS space in order to use it.

  • First, download the Linux AMD64 precompiled binary from Rclone's download page. You can download this either on your local machine or on a CRC front end into your AFS space.
    • If downloaded into your own space, move this file to a CRC front end and unzip it. To move files to the front end, on Linux use the scp command, or on Mac/Windows use Cyberduck or WinSCP.
    • If on front end, use wget command : wget https://downloads.rclone.org/rclone-vX.Y-linux-amd64.zip where X.Y is the current version number.
    • To unzip the file once on a front end, use the unzip command. unzip rclone-[version #]-linux-amd64.zip This will unzip the archive into a new directory in your current directory.
  • Move into the newly created directory. There should be and executable, a few README files, and a manual page named rclone.1. If you would like to view the man page for rclone, type man ./rclone.1 You must be in the rclone directory for this to work.


Configuration

Now that Rclone is in your AFS space, it must be configured to work with your cloud storage. For the following instructions, Rclone will be configured to work with Google Drive. The other forms of cloud storage may vary slightly in terms of configuration, but the general process is still the same.

  • To start the configuration of Rclone, type:
    ./rclone config

    This will bring up a prompt which provides a few different options.
  • As this is the first time using Rclone, type N into the prompt to create a new 'remote'. In Rclone, a remote is the term for remote storage, so in this case, the remote will be a Google Drive object.
[12:56 @crcfe01 ~/upload_testing]
$ rclone config
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> 
  • After pressing N and hitting enter, the prompt will ask you for a Name. A good name to enter is gdrv to easily remember that this is your Google Drive. Once you entered the name and pressed Enter, the prompt will then ask which type of cloud storage remote is being created. Google Drive is listed as number 11, so type 11 and press Enter.
name> gdrv
Type of storage to configure.
Choose a number from below, or type in your own value
 1 / Alias for a existing remote
   \ "alias"
 2 / Amazon Drive
   \ "amazon cloud drive"
 3 / Amazon S3 Compliant Storage Providers (AWS, Ceph, Dreamhost, IBM COS, Minio)
   \ "s3"
 4 / Backblaze B2
   \ "b2"
 5 / Box
   \ "box"
 6 / Cache a remote
   \ "cache"
 7 / Dropbox
   \ "dropbox"
 8 / Encrypt/Decrypt a remote
   \ "crypt"
 9 / FTP Connection
   \ "ftp"
10 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
11 / Google Drive      <---
   \ "drive"
  • Next, the prompt will ask you for a Google Applications Id and Secret, these fields can be left blank by just pressing Enter for each without typing anything into the field.
  • The prompt then asks for the scope of its use within Google Drive. To upload and download files, type 1.
Choose a number from below, or type in your own value
 1 / Full access all files, excluding Application Data Folder.
   \ "drive"
 2 / Read-only access to file metadata and file contents.
   \ "drive.readonly"
   / Access to files created by rclone only.
 3 | These are visible in the drive website.
   | File authorization is revoked when the user deauthorizes the app.
   \ "drive.file"
   / Allows read and write access to the Application Data folder.
 4 | This is not visible in the drive website.
   \ "drive.appfolder"
   / Allows read-only access to file metadata but
 5 | does not allow any access to read or download file content.
   \ "drive.metadata.readonly"
scope> 1
  • The ID of the root folder will now be asked for, leave this section blank. Just press Enter. Also leave the Service Account Credentials blank. Press Enter.
  • Now the prompt will ask if you would like to use auto config, since the CRC front ends are typically 'headless' through a normal ssh connection, type N.
    • The prompt will then display a web address, right click on this web address and select Copy Link. Then go to an Internet Browser and past this address into it.
    • When asked, be sure to select the Google Account who's Drive you wish to use, most likely this should be your ND account.


    • With the link pasted into the browser, you will then login to your ND account using your Netid and password. Then click on Allow to let Rclone have access to your Google Drive.
Rclone-request.png
  • Google will then provide a code that needs to be copied and pasted into the Rclone prompt running on the front end. To paste the code into your terminal, press Ctrl, Shift and V at the same time.
  • When asked if you would like to set this as a team drive, type N.
  • Now you will be asked to verify the new remote, press Y, and you can then press Q to exit the configuration. Rclone is now ready to use your Google Drive.

Further information can be found on the Rclone Google Drive page.

Using Rclone anywhere

To avoid having to stay within the Rclone directory, you can add Rclone to your $PATH environment variable. You can do this by navigating to your home directory (cd ~), and appending your .cshrc fi. If you are not using Bash, you should edit your .bashrc file. To find which shell you are using, type echo $0. It is important you do not change anything within this file, and that you only add this one line to it exactly how it is. Open .cshrc or .bashrc depending on which shell you are using, with your favorite text editor and go to the very bottom of the document. Add the following line while entering in your own relevant information:

Cshell / Tcshell

setenv PATH $PATH\:/afs/crc.nd.edu/user/[your first letter of your netid]/[your netid]/[Path to rclone]/rclone-[version #]-linux-amd64

Bash

export PATH=$PATH:/afs/crc.nd.edu/usr/[first letter of your netid]/[Path to rclone]/rclone-[version #]-linux-amd64

For example:

export PATH=$PATH:/afs/crc.nd.edu/user/c/ckankel/rclone/rclone-v1.42-linux-amd64

Rclone Commands

There are a few commands that will be used frequently which will be described here. To see more commands, read the manual page by typing 'man ./rclone.1' while inside the Rclone directory.


Uploading Files from CRC to Google Drive

  • To upload files from the CRC to Google Drive, use the copy command in the following format, assuming your remote is called "gdrv":
rclone copy file1.txt gdrv:file1.txt
  • This will place the file or directory specified into your Google Drive account.
  • If you wish to specify a directory for the files to be put into once inside Google Drive, type the directory desired after the name of your Google Drive Remote which was created during configuration.
rclone copy file.txt gdrv:Desired_Directory/file.txt

Accelerating Upload

  • The upload speed of a file or directory can be very, very slow. To overcome this, Rclone can take a few options or parameters to dramatically increase upload speed.
rclone --transfers=10 --checkers=10 --drive-chunk-size=16384k copy /foo/bar/source [Name of Remote]:Destination_Directory
    • --transfers indicates the number of parallel file transfer streams to occur at once. This number can be oversubscribed without damaging results. To have optimal upload speeds, you must increase the size of your files and the number of transfers and checkers.
      • If you have 64 files to upload, set --transfers=32
    • --checkers indicates the number of parallel file checkers. These checkers check through all of the files to be uploaded, and comparing it against what is already in your Google Drive. This way, if you have a few of the same files, they will not be uploaded.
    • --drive--chunk-size drive chunk size specifies how much memory (RAM) will be used for the files during upload. It is noted in rclone documentation that the higher the drive chunk size, the faster the upload. However, this will quickly increase memory usage during the upload so caution should be used. A good amount to specify is 16384k, which is 16 MB. The default is 8MB, so this will nearly double the performance with just this optimization.
  • See the examples below for more information.


Downloading Files from Google Drive

  • Downloading data from Google Drive (or any other cloud service) is the same process as uploading, but switching the destination and the source. So, to download data use the following command:
rclone copy [Name of Remote]:Source_Directory foo/bar/Local_Destination_Directory
  • It is not absolutely necessary to add flags to increase performance unlike uploading. There is not a significant improvement from the default download speed.

Other Useful Rclone Commands

  • It may be useful to view the contents of your Google Drive before uploading or downloading files. To do so without having to use a browser, use the following commands:
rclone lsd [Name of Remote]:(Optional Directory Name)
    • This will show the directory listing for your Google Drive's 'home' directory. To view a specific directory listing you can append the directory name after the colon ( [Name of Remote]:Directory )
    • Notice the colon after the Remote, this is necessary for the proper execution of the command, whether or not you are specifying a specific directory.
    • This is helpful to quickly see the names of your directories before using the next command.
rclone ls [Name of Remote]:(Optional Directory)
    • This command will show you the entire listing recursively of the Google Drive. NOTE: This may take a while to complete, as it will search every directory and list every file in your Google Drive. It may be faster to first find the directory names using rclone lsd, and then using the directory name to view the contents.
    • rclone ls [Name of Remote]:My_Uploads -- This will show all contents of the My_Uploads directory and sub-directories.
rclone [command] --dry-run [Name of Remote]:
    • This flag will allow you to test the command without having any consequences if there is a mistake. It is a good testing tool to view what would happen, without it actually happening.


Examples

  • The following are examples of using some of the Rclone commands:
  • An example of Downloading files from Google Drive to the CRC:
[12:29 @crcfe01 ~/upload_testing]
$ rclone copy gdrv:Upload_testing . --verbose
2018/08/07 12:29:21 INFO  : Local file system at /afs/crc.nd.edu/user/user/upload_testing: Waiting for checks to finish
2018/08/07 12:29:21 INFO  : Local file system at /afs/crc.nd.edu/user/user/upload_testing: Waiting for transfers to finish
2018/08/07 12:30:17 INFO  : 
Transferred:   3.527 GBytes (57.723 MBytes/s)
Errors:                 0
Checks:                 0
Transferred:            0
Elapsed time:      1m2.5s
Transferring:
                                    test.img: 84% /3.000G, 48.737M/s, 9s
                                    debian.img: 100% /1.000G, 3.790M/s, 0s 


  • An example of Uploading files from the CRC to Google Drive, using the optimized approach:
[12:37 @crcfe01 ~/upload_testing]
$ rclone --transfers=5 --checkers=5 --drive-chunk-size=16384k --verbose copy ~/upload_testing/ gdrv:Upload_testing
2018/08/07 12:39:30 INFO  : Google drive root 'Upload_testing': Waiting for checks to finish
2018/08/07 12:39:30 INFO  : Google drive root 'Upload_testing': Waiting for transfers to finish
2018/08/07 12:40:07 INFO  : debian.img: Copied (new)
2018/08/07 12:40:30 INFO  : 
Transferred:   1.750 GBytes (29.694 MBytes/s)
Errors:                 0
Checks:                 0
Transferred:            1
Elapsed time:      1m0.3s
Transferring:
                                    test.img: 24% /3.000G, 7.519M/s, 5m6s
                       [ output clipped ]



  • An example of viewing the contents of a directory within Google Drive while on a Front end machine:
[12:42 @crcfe01 ~/upload_testing]
$ rclone ls gdrv:Upload_testing
1073741856 debian.img
3221225505 test.img