Projects
Table of Contents
- Extreme Weather Prediction
- High Spatio-Temporal CyGNSS Soil Moisture Using Machine Learning
- Interpretable Machine Learning for Satellite Based Remote Sensing
- YouTop200: A Most-Watched Video Object Segmentation Dataset
- Semi-Supervised Learning Based 3D Instance Segmentation of Scutoid
Extreme Weather Prediction
Abstract
In this study, we use analogue method and Convolutional Neural Networks (CNNs) to assess the potential predictability of extreme precipitation occurrence based on Large-Scale Meteorological Patterns (LSMPs) for the winter (DJF) of Pacific Coast California (PCCA) and the summer (JJA) of Midwestern United States (MWST). We evaluate the LSMPs constructed with a large set of variables at multiple atmospheric levels and quantify the prediction skill with a variety of complementary performance measures. Our results suggest that LSMPs provide useful predictability of daily extreme precipitation occurrence and its interannual variability over both regions. The 14-year (2006-2019) independent forecast shows Gilbert Skill Scores (GSS) in PCCA range from 0.06 to 0.32 across 24 CNN schemes and from 0.16 to 0.26 across 4 analogue schemes, in contrast to those from 0.1 to 0.24 and from 0.1 to 0.14 in MWST. Overall, CNN is shown to be more powerful in extracting the relevant features associated with extreme precipitation from the LSMPs than analogue method, with several single-variate CNN schemes achieving more skillful prediction than the best multi-variate analogue scheme in PCCA and more than half of CNN schemes in MWST. Nevertheless, both methods highlight the Integrated Vapor Transport (IVT, or its zonal and meridional components) enables higher skills than other atmospheric variables over both regions. Warm-season extreme precipitation in MWST presents a forecast challenge with overall lower prediction skill than in PCCA, attributed to the weak synoptic-scale forcing in summer.
Publication
Xiang Gao and Shray Mathur. “Predictability of U.S. Regional Extreme Precipitation Occurrence Based on Large-Scale Meteorological Patterns (LSMPs).” Journal of Climate, 2021 [Paper]
High Spatio-Temporal CyGNSS Soil Moisture Using Machine Learning
Abstract
This dissertation presents a Machine Learning based soil moisture retrieval method for NASA’s Cyclone Global Navigation Satellite System (CYGNSS) mission. The CYGNSS observations are compared to the Soil Moisture Active Passive (SMAP), in-situ Texas Soil Observation Network (TxSON) and NASA’s CyGNSS L3 soil moisture (SM) measurements for the entire 2019 year. An initial grid-wise sensitivity analysis of CYGNSS reflectivity (Pr,eff) to Soil Moisture (SM) is conducted at a 9x9km2 grid resolution over the 36x36 km2 TxSON region to assess the spatio-temporal relationships between Pr,eff and SM. Variability among grid cells and seasonal shifts in correlations motivated inclusion of land physical parameters and CYGNSS observation geometry in the analysis. Specifically, we include the Specular Point (SP) incidence angle(θ),Elevation, Clay Fraction, Normalized Difference Vegetation Index (NDVI), Depth to Restrictive Layer (DepRes), and surface roughness. The individual effects of these variables on Pr,eff are assessed through a correlation and regression analysis. Finally, an Artificial Neural Network (ANN) model is trained for different combinations of input features to attain SM estimates at 9x9km2and 3x3km2 grid resolutions. The model structure is tuned to attain optimal results for different combinations and a 5-fold cross validation approach is employed to train the models. SM predictions with a root mean squared error (RMSE) of 0.0409 (0.0497) cm3/cm3 and Pearson correlation coefficient (R) of 0.7024 (0.6794) are reported at 9x9 (3x3) km2 grid resolution for the months of January, April and July and at an RMSE of 0.034 and R of 0.763 for the entire 2019 year.
[Thesis]
Interpretable Machine Learning for Satellite Based Remote Sensing
Abstract
The use of machine learning methods for data analytics in satellite based remote sensing has grown tremendously in recent years. However, the increase in performance due to the use of increasingly sophisticated machine learning models have negatively impacted the transparency of the data pipeline, as the machine learning models often acts as black-box predictors. In this paper, we employ methods from the field of interpretable machine learning and explainable AI to address this issue, and present a transparent data pipeline, where machine learning classifications are presented in union with explanations to the underlying reasons for the individual classifications. We demonstrate this on crop type classification over 4 seasons from 2017-2020, based on field-level zonal statistics from Sentinel-1 tiles. The data analytics pipeline has emphasis on visualizing and investigating the temporal evolution of the individual crop types, which is then used in conjunction with a variety of machine learning models to provide crop type classification strongly coupled to the temporal evolution and the underlying agronomical understanding. This couples the machine learning models directly to the temporal dynamics of the individual crop types. We focus on linear and non-linear models, from a logistic regression classifier to a neural network, and differentiate between inherently interpretable models, such as logistic regression classifiers, and black-box with no inherent interpretability, such as neural networks. Throughout the paper, we provide insights into potential pitfalls regarding interpretability, such as issues with correlated input features and subjective interpretations of the classification explanations. Finally, we investigate the current state-of-the-art in interpretable non-linear machine learning models, such as explainable boosting machines, and provide future perspectives with regard to increasingly complex machine learning models and significantly larger data repositories, combined with and increased use of machine learning models by regulatory agencies in the EU, and what needs for interpretability such uses require.
YouTop200: A Most-Watched Video Object Segmentation Dataset
Abstract
We collected and annotated a new dataset—YouTop200-of 200 most-watched YouTube videos in their full length, spanning ten genres. Our effort exceeds previous attempts in dataset size, scene and object variability, and narrative structure complexity. YouTop200 has 431K annotated instance masks, which doubles the number of masks over previous datasets. We build a semi-automatic system to efficiently annotate high-quality masks for main characters — human, animal, or animated — whose position, pose, and appearance can change significantly across edited shots. Furthermore, we design a simple long-term crossshot tracking module (LCT) to enhance existing methods and provide stronger baselines. Finally, we show the limitation of current methods with a results analysis on our YouTop200 dataset to motivate future research.
Semi-Supervised Learning Based 3D Instance Segmentation of Scutoid
Cells in bent epithelia can undergo intercalations along the apico-basal axis. This phenomenon forces cells to have different neighbours in their basal and apical surfaces. As a consequence, epithelial cells adopt a novel shape that is termed as “scutoid”. The task is to use Computer Vision models to automate the process of 3D instance segmentation of electron microscopy (EM) scutoid volumes. Machine learning applications in bio-medical imaging are frequently limited by the lack of quality labeled data. In this project, we explore the self training method, a form of semi-supervised learning, which leverages unlabelled volumes to address the labeling burden and improve model performance. We use a pre-trained Residual Symmetric 3D U-Net to generate pseudo-labels of additional unlabeled volumes. The model is then retrained using both labelled and pseudo-labeled volumes to improve the adapted rand index on a set of test volumes. The watershed segmentation algorithm is used to convert the binary foreground probability maps, instance contours and signed distance transform to instance masks.