ramp v1.0.0
DOCUMENTATION

ToC

System & Software Requirements

System Requirements

Basic Requirements

  • We recommend configuring ramp on a server running a recent version of Linux.
    • All Linux operating systems are very similar, but sometimes the names of standard tools vary (for example, to install software on Ubuntu, use ‘apt-install’: to install software on Red Hat, use ‘yum install’).
  • You will need a computer with access to at least one NVIDIA CUDA-enabled GPU. Cloud based compute options are also widely available, but ramp has been designed to work well with a local, relatively small machine learning server.
    • ‘Machine learning laptops’ are not recommended for ramp – two of us have tried them. The GPUs are too small, the computers overheat and quit before training is finished, and no one is happy.
    • We recommend running ramp on a server with at least two GPUs if possible.
  • You will need to have sufficient permissions to install software on the machine (which, in the Linux world, means having ‘sudo’ or administrative permissions – more about that shortly).
  • We will be supporting ramp installations and testing on other systems as required by our users. However, Linux is the de facto data science platform on which tools such as Tensorflow are developed. While platforms such as Windows can run Tensorflow, community support (in the form of advice and bug fixes) is simply more easily available on Linux

What We Used

The ramp model was developed and tested on a Lambda Labs server with the following specifications:

Getting Comfortable with the Linux Command Line

Note: If you are familiar with the Linux command line, feel free to skip this section

Before starting to use ramp, users will benefit from getting some practice working with the Linux operating system. For many people, this will be their first experience with Linux, which is different from the Windows or MacOS desktop. However, the learning curve is smaller than it appears; you can do most or all of what you need to do with only a handful of commands.

Things to get used to:

Things that take some getting used to include:

  • Doing most of what you need to do on the terminal command line
    • Installing software from the command line on the Linux platform (instructions here for Ubuntu)
    • File and folder management on the command line — creating, moving, copying, deleting, searching, etc..
    • Running command line programs (such as scripts and services) that don’t have a gui
  • Everything on a Linux system is a file, including folders and software programs
    • Understanding read, write, and execute permissions on files
    • Understanding how individual and group ownership of files and folders works, and how it interacts with permissions

If you dont have access to a Linux computer:

If you don’t (yet) have access to a Linux computer, there are ways to get familiar with the Linux operating system on other operating systems.

  • If you have a Windows machine, some Windows machines ship with Windows Subsystem for Linux (WSL). Try typing ‘WSL’ in the search bar, and see if a Linux command line application pops up. If it doesn’t, you can install it from here.
  • If this doesn’t work on your Windows system (it didn’t work for me!), I recommend installing VirtualBox on your PC, and running an Ubuntu virtual machine. I ran Linux on my Windows PC for years using VirtualBox. One caveat is that even if your Windows PC has an NVIDIA-enabled GPU, the Ubuntu virtual machine won’t be able to access it, so it is not a solution for running the full ramp project. Instructions for the VirtualBox setup are here.
  • If you have a Mac machine, then you already have a Unix terminal ready to practice and learn on! MacOS is based on a Unix operating system (Linux is just another implementation of Unix). To get started, open up your Mac’s terminal app, and try a tutorial such as this one.

Basic Software & Tools

Overview

More Information

Text Editor

You will want a text editor, preferably one that feels comfortable and familiar to you. I do not recommend taking up a ‘Unix-classic’ text editor such as emacs or vim, which you will probably hate. Instead, I will suggest that you install, and get used to using, Visual Studio Code, Microsoft’s all-purpose text editor. It is available on all platforms.

  • VScode supports editing plain text files, markdown files, shell scripts, python code files, json files, and many other files used in ramp processing.
  • You can run Jupyter notebooks within it (Jupyter notebooks are the principal user interface to ramp tools). You can connect to the ramp Docker container with VSCode, and run everything you need for ramp from inside it. You can develop and debug new or existing code with it. VSCode runs on Windows and Mac as well as Linux. VSCode is entirely free, and I’ve been delighted with my experience using it on Linux and Windows.

Jupyter Notebooks

The ramp project uses Jupyter notebooks for most of its tasks. Jupyter notebook is an extremely pleasant way to interact with python and document your work; a jupyter notebook can run in your browser, and intersperses blocks of documentation with blocks of Python code. It is the de facto standard for most data science work. It comes pre-installed in the ramp docker image. To get a taste for what it is like to work in a Jupyter notebook, have a look at this interactive Tensorflow tutorial for image segmentation (brought to you by Google Colab).

Web Browser

You will need to use a browser! Firefox is a good choice on Linux and usually comes pre-installed on every Linux desktop.

Note: To run Jupyter Notebooks in a browser, an internet connection is not required. 

Quantum GIS

You will need to install and run Quantum GIS for visualizing and editing geospatial rasters and vectors. Although QGIS and the ESRI Arc suite of tools have much shared functionality, and many people are more familiar with ESRI tools, the ramp project uses some tools that have been developed as extensions to Quantum GIS, which is free and open source. There are QGIS installations for every platform, so you needn’t wait for a Linux environment to start getting familiar with it.

Anaconda (Optional)

Although Anaconda (a virtual environment/package management system for Python) is not an essential tool for running ramp, it is familiar to many users who run it on other platforms, and it can be extremely useful. I have used it to create a partial ramp environment in which I can run some of the ramp tools (such as the QGIS plugin tools) without needing the full ramp docker container.

Setting up the Ramp Environment

The Codebase - Overview

Things you may want to do with this codebase include:

  1. Running the scripts, including production scripts and data preparation tools.
  2. Working with the Jupyter notebooks.
  3. Working with the extensive ramp open source labeled dataset.
  4. Training the models with your own data.
  5. Modifying the underlying code for your own purposes.

 

The ramp codebase uses many Python libraries that are in standard use, some specialized libraries for image and geospatial processing, and the Tensorflow library for training and running deep neural networks. It can be very difficult to create a computational environment in which all these libraries are installed and play nicely with each other.

For this reason, we are also providing instructions to build a Docker image (based on a gpu-enabled Tensorflow 2.8 docker image with Jupyter notebook) that includes all of ramp’s libraries and dependencies. All four of the above tasks can be performed from a docker container based on this image.

For the last 3 tasks, we recommend using vscode, Microsoft’s open-source code editor. This code editor easily attaches to the running ramp docker container, and can run Jupyter notebooks, including the ones used to train the ramp models.

Github Repo Project Structure

Note that the ramp project currently contains a fork of the Solaris project, which has not been under active development. Some bugfixes and modifications are in this fork, and some more extensive modifications of Solaris code have been moved into the ramp library.

				
					ramp-staging
├── colab
│   └── README.md
│   └── jupyter_lab_on_colab.ipynb
│   └── train_ramp_model_on_colab.ipynb
├── data
├── docker
│   └── pipped-requirements.txt
├── Dockerfile
├── Dockerfile.dev
├── docs
│   ├── How_I_set_up_my_training_data.md
│   ├── how_to_debug_ramp_in_vscode.md
│   ├── How_to_run_production_and_evaluation.md
│   ├── list-of-ramp-scripts.md
│   └── using_the_ramp_training_configuration_file.md
|   └── images
├── experiments
│   ├── dhaka_nw
│   ├── ghana
│   ├── gimmosss
│   ├── himmosss 
├── notebooks
│   ├── augmentation_demo.ipynb
│   ├── Data_generator_demo.ipynb
│   ├── Duplicate_image_check.ipynb
│   ├── Independent_labelers_comparison_test.ipynb
│   ├── Train_ramp_model.ipynb
│   ├── Truncated_signed_distance_transform_example.ipynb
│   └── View_predictions.ipynb
│   ├── images
│   ├── sample-data
├── ramp
│   ├── __init__.py
│   ├── data_mgmt
│   │   ├── chip_label_pairs.py
│   │   ├── clr_callback.py
│   │   ├── data_generator.py
│   │   ├── display_data.py
│   │   ├── __init__.py
│   ├── models
│   │   ├── effunet_1.py
│   │   ├── __init__.py
│   │   ├── model_1_chollet_unet.py
│   ├── training
│   │   ├── augmentation_constructors.py
│   │   ├── callback_constructors.py
│   │   ├── __init__.py
│   │   ├── loss_constructors.py
│   │   ├── metric_constructors.py
│   │   ├── model_constructors.py
│   │   ├── optimizer_constructors.py
│   └── utils
│       ├── chip_utils.py
│       ├── eval_utils.py
│       ├── file_utils.py
│       ├── geo_utils.py
│       ├── imgproc_utils.py
│       ├── img_utils.py
│       ├── __init__.py
│       ├── label_utils.py
│       ├── log_fields.py
│       ├── lrfinder.py
│       ├── mask_to_vec_utils.py
│       ├── misc_ramp_utils.py
│       ├── model_utils.py
│       ├── multimask_utils.py
│       ├── ramp_exceptions.py
│       └── sdt_mask_utils.py
├── README.md
├── scripts
│   ├── add_area_to_labels.py
│   ├── binary_masks_from_polygons.py
│   ├── calculate_accuracy_iou.py
│   ├── find_learningrate.py
│   ├── get_chip_statistics.py
│   ├── get_dataset_loss_statistics.py
│   ├── get_labels_from_masks.py
│   ├── get_model_predictions.py
│   ├── make_train_val_split_lists.py
│   ├── move_chips_from_csv.py
│   ├── multi_masks_from_polygons.py
│   ├── polygonize_masks.py
│   ├── polygonize_multimasks.py
│   ├── remove_slivers.py
│   ├── sdt_masks_from_polygons.py
│   ├── tile_datasets.py
│   └── train_ramp.py
├── setup.py
├── shell-scripts
│   ├── create_aggregate_trainingset.bash
│   ├── create_masks_for_datasets.bash
│   ├── create_test_split_for_datasets.bash
│   ├── create_trainval_split_for_datasets.bash
│   ├── get_iou_metrics_for_datasets.bash
│   ├── get_iou_metrics_for_models.bash
│   ├── nvidia-check.sh
│   ├── run_production_on_datasets.bash
│   ├── run_production_on_single_dataset.bash
│   ├── write_predicted_masks_for_datasets.bash
│   └── write_truth_labels_for_datasets.bash
└── solaris
				
			

Environment Setup - Overview

How to get the ramp environment running on Google Colab

Instructions for getting started with ramp on Colab can be found Here, as well as in the colab/README.md file in the codebase.

Note that things will run very slowly and painfully in the free tier of Google Colab. If you will be running often on Google Colab, I recommend upgrading to Google Pro. If you will be using Google Colab as your compute platform for running large ramp training jobs, I recommend considering Google Pro Plus.

How to get the RAMP environment running on a local server running Ubuntu 20.04 with GPU support

  1. You will need to run Ubuntu 20.04 Linux on a machine with at least one CUDA-enabled NVIDIA GPU. You will absolutely need to have sudo (root user) powers on it.
  2. Install the currently recommended NVIDIA driver: instructions here. (It could be worse: happily, you do not need to install the CUDA libraries, as you would if you weren’t using Docker).
  3. Install docker CE, and the NVIDIA Container Toolkit (instructions here).
  4. Create the ‘docker’ group and add yourself to it, so you can run docker without using sudo (instructions here).
  5. Using docker, build the ramp base image, rampbase, as instructed below:
				
					# run from the ramp-code directory
docker build --tag rampbase .
				
			

IMPORTANT NOTE

You will have to rebuild rampbase after any change you make in the ramp module codebase (under ramp-code/ramp) so that the change will be installed in the container. This is not the case for running scripts or notebooks.

To build the ramp_proto Docker image, run docker build from the ramp-code directory. Docker will read its instructions from the Dockerfile in the ramp-code directory. This will create a Docker image named ramp_proto on your local machine.

  1. Start a docker container based on rampbase, and run a bash shell in it, as follows:
				
					docker run -it --rm --gpus=all -v /home/carolyn/ramp-staging:/tf/ramp-staging -v /home/carolyn/ramp-data:/tf/ramp-data  -p 8888:8888 rampbase bash

				
			

If you wish to run a script: do so in the bash shell, using the default python interpreter, which will be loaded with all the components of ramp.

Note that there is a Jupyter notebook server installed in a ramp container, and the -p 8888:8888 portion of the ‘docker run’ command enables port forwarding so that you can run Jupyter notebook in a browser on your host machine.

If you wish to run a Jupyter notebook in your browser or in Jupyterlab, start your docker container using the same command without ‘bash’ at the end, as shown below. You will be given a link to the running Jupyter notebook server in the command output.

				
					docker run -it --rm --gpus=all -v /home/carolyn/ramp-staging:/tf/ramp-staging -v /home/carolyn/ramp-data:/tf/ramp-data  -p 8888:8888 rampbase

				
			

If you wish to run a bash shell in the Jupyter notebook container, so that you can run scripts as well as the Jupyter notebook, you can connect a bash shell to the same container using the following commands.

First, run:

				
					docker ps
				
			

This will give an output listing of all the docker containers running on your machine, similar to that given by the Unix ps command:

				
					CONTAINER ID   IMAGE     COMMAND   CREATED       STATUS       PORTS                                       NAMES
209755699cea   rampdev   "bash"    3 hours ago   Up 3 hours   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp   condescending_cerf
				
			

You can use either the container id or the container name to connect to it with a bash shell, using the following command:

				
					docker exec -it condescending_cerf bash
				
			

This will give you a bash shell in the same container that is running your jupyter notebook.

Instructions on how to debug ramp code, and run Jupyter notebooks, in VScode on your desktop are given in the Debugging Ramp section.

A note on running ramp as yourself, vs. as the root user

Note that by default, Docker runs containers as the root user. If you want to use vscode to attach to the container, you will need to run the container as the root user, because vscode needs root permission to install its server in the container.

This means that any files you create during the Docker session will have root user ownership. This is undesirable from a security standpoint, and is a hassle when you later need to change or delete the files you created on the local machine. (Note, to fix this problem run the following Linux command: find . -user root | xargs sudo chown your-username.)

If you are just going to interact with the bash shell (say to run production code or a script), I recommend running the container as yourself, rather than the root user. To do that, add the –user 1000:1000 switch as shown below.

				
					# run from anywhere as yourself (as the non-root user)
docker run -it --rm --gpus=all --user 1000:1000 -v /home/carolyn/ramp-staging:/tf/ramp-staging -v /home/carolyn/ramp-data:/tf/ramp-data -p 8888:8888 rampbase