CSCI 6527 – Final project – Current Image based Kaggle competition
About the Competition:
The Fine-grained Visual Categorization (FGVC) conducted a research prediction challenge with the objective of forecasting six key plant characteristics, including leaf area and plant height, utilizing plant images and additional ecosystem data.
Problem Statement:
The primary objective of this competition is to employ regression models rooted in deep learning, specifically Convolutional Neural Networks (CNNs) like ConvNext or Transformers. The aim is to predict plant characteristics from images. Although moderate levels of accuracy are anticipated, the overarching goal is to explore the capabilities of this approach and gain insights into global ecosystem changes.
Insights into the Problem Domain: Significance and Importance
The investigation into plant traits reveals essential characteristics that shed light on their operational dynamics and interactions within their ecological milieu. An example of this is the plant canopy height, which serves as an indicator of a plant’s capacity to outcompete neighboring plants for sunlight by casting shadows. Additionally, robust leaves, characterized by a high leaf mass per leaf area, indicate plants that are adept at thriving in harsh conditions such as strong winds or droughts.
Nevertheless, it’s crucial to note that environmental circumstances are in a constant state of flux. Anticipated global changes, especially those related to climate change, are projected to have a profound impact on ecosystem functionality. This impact encompasses a range of processes, including plants adjusting their traits to suit new conditions and potential alterations in the distribution of plant species, ultimately resulting in modifications to the distribution of plant traits.
Transformative Influence of Machine Learning and Computer Vision in the Field:
The integration of computer vision (CV) and machine learning (ML) methodologies in plant science offers substantial advantages in comprehending and addressing challenges related to plant traits and their responses to shifting environmental conditions. This outlines five key areas where this integration proves beneficial:
- Data Acquisition and Analysis: CV techniques facilitate extensive data collection on plant traits through image analysis, such as using drones for canopy imaging. ML algorithms then extract quantitative traits like leaf area and texture from these images.
- Trait Prediction and Monitoring: ML models, particularly neural networks, predict trait evolution under varying environmental factors, aiding ongoing plant health monitoring and adaptation studies.
- Identification of Adaptation Strategies: ML models analyze historical data to identify successful adaptation strategies, guiding breeding programs and conservation efforts.
- Ecosystem Modeling: ML techniques integrate into ecosystem models, simulating how changing plant traits impact ecosystem functionality, aiding in informed management decisions.
- Early Warning Systems: ML models develop early warning systems by detecting changes in plant traits signaling ecological shifts or disturbances, enabling timely intervention strategies.
This integration enhances our understanding of ecosystem dynamics, facilitates adaptive management strategies, and contributes to biodiversity conservation amidst global challenges like climate change.
About the Dataset:
The dataset comprises two distinct sets of image folders designed for training and testing purposes, supplemented by corresponding .csv files housing plant trait data. Additionally, a sample submission.csv file serves as a template for competition entry formats.
- Training Folder:
- Contains 55,489 plant images.
- Each image is associated with ancillary data and mean trait values found in the train.csv file.
- Test Folder:
- Consists of 6,545 plant images.
- Each image is paired with relevant ancillary data in the test.csv file.
The .csv files feature the following columns:
- id: Unique image identifier and prefix of image name.
- WORLDCLIM_BIO [*]: Ancillary climate variables aiding trait prediction.
- SOIL_ [*]: Ancillary soil variables aiding trait prediction.
- MODIS_ []/VOD_ []: Ancillary multitemporal satellite variables aiding trait prediction.
- X [*] _mean: Target values to be predicted, encompassing multiple traits such as X3112, X1080, etc.
- X [*] _sd: Standard deviation of traits found for each species.
The target variables include:
- X4: Stem specific density (SSD) or wood density
- X11: Leaf area per leaf dry mass
- X18: Plant height
- X26: Seed dry mass
- X50: Leaf nitrogen (N) content per leaf area
- X3112: Leaf area
Data Visualization:
The bar graph presented herein illustrates the distribution of climate variables within the dataset.
The distribution of several soil variables in the dataset is depicted in the box plot below.
The image plot displayed includes sample train images labeled with their corresponding trait values, utilized during model training.
Models used:
In this study, two machine learning models were utilized to tackle distinct tasks. Given the dataset’s composition of both image and tabular data, the employment of two models became essential to achieve the desired outcomes. The first model employed was the EfficientNetB3, utilized for image classification and feature extraction from plant images, subsequently transforming them into tabular data. Meanwhile, the XG Boost regressor model was utilized to predict values for the six plant traits target variable using this tabular data.
The EfficientNetB3 model, trained on ImageNet-1k at a resolution of 300×300, is noteworthy for being a mobile-friendly pure convolutional model (ConvNet). It introduces a novel scaling method that uniformly scales all dimensions of depth, width, and resolution through a compound coefficient, proving to be simple yet highly effective. On the other hand, the XGBoost Regressor stands out as a robust machine learning algorithm renowned for its efficiency and accuracy in regression tasks. Being part of the gradient boosting family, it excels particularly well in handling structured and tabular data. Consequently, the decision to integrate these two models was deemed crucial to address the complexities of the problem statement effectively.
Approach:
The process initiates with loading both the Image data and the tabular data. Subsequently, train and test data frames were generated for each auxiliary data type. Further, image paths for the images in the train and test sets were appended to their respective auxiliary data as a new column.
Data Pre-Processing:
After loading the necessary datasets, the tabular data and image data underwent separate preprocessing steps. The tabular data comprises 176 columns, with 6 plant traits (‘X4_mean’, ‘X11_mean’, ‘X18_mean’, ‘X50_mean’, ‘X26_mean’, ‘X3112_mean’) as target variables and 170 columns as features. The first step in preprocessing involved feature selection, where the correlation scores of these variables were calculated and the variables with the lowest scores were dropped.
Following feature selection, outlier detection was performed on the target variables. Outliers were identified, and rows containing outlier values were entirely removed from the training dataset.
In the case of image data, images were loaded from their respective image paths and underwent specific preprocessing steps before being inputted into the EfficientNet model:
- Resize: Images were resized to a dimension of 300×300 pixels, which is the accepted input size for the EfficientNetB3 model. This resizing ensures uniform dimensions across all images in the dataset, which is crucial for training machine learning models requiring consistent input sizes.
- Rescale: Rescaling in image preprocessing involves adjusting the scale of pixel values within the images. This process normalizes the pixel values, often scaling them to a common range such as [0, 1] or [-1, 1]. Normalization standardizes the data, making it easier for machine learning models to learn and converge efficiently.
- Data Augmentation: Data augmentation is a powerful technique employed to enhance the robustness, generalization, and overall performance of deep learning models. It involves generating additional training samples by applying various transformations like rotation, scaling, flipping, cropping, and adding noise to existing images. This augmented dataset enables the model to learn a broader range of features and variations, leading to improved generalization and preventing overfitting. It also enhances the model’s ability to generalize well to unseen data.
Figure illustrating the transformations used during data augmentation is provided for reference.
Model Selection:
The EfficientNet model’s weights, pre-trained on the Imagenet database, were loaded, and the trained layers were frozen to ensure stability in the base layers’ weights during training. Initially, three dense neural layers were created, utilizing ELU as an activation function. ELU, short for Exponential Linear Unit, is a widely used activation function in deep learning models but introduces unnecessary computational complexities compared to simpler activation functions like ReLU.
Given this consideration, various simple activation functions such as ReLU, Sigmoid, TanH, and Leaky ReLU were evaluated. Ultimately, ReLU was chosen as the activation function due to its simplicity, computational efficiency, and mitigation of the vanishing gradient problem that can impede deep network training, especially for inputs far from zero.
Following the definition of the dense layers, the features from the image data were concatenated in the output layer, utilizing Softmax as the activation function. The model architecture of the utilized EfficientNet model is depicted in the figure below.
The XGBRegressor model is set up with the ‘reg:squarederror’ objective, signaling its focus on minimizing the mean squared error loss function during training. It’s configured to employ 200 estimators (individual decision trees) in its learning process, with a learning rate of 0.05 to regulate the step size for optimization. Furthermore, the max_depth parameter is restricted to 10, ensuring a capped depth for each decision tree within the ensemble. This configuration is carefully designed to craft a finely-tuned XGBoost regression model tailored for the specific dataset and aimed at predicting plant traits accurately. The snippet below shows the model selection of XGBoost regressor.
Evaluation metrics:
The competition prioritizes the R2 score as its primary evaluation metric. R2, or the coefficient of determination, stands as a widely recognized measure in regression tasks, quantifying how well a model fits the data. It gauges the proportion of the variance in the dependent variable explained by the independent variables. One notable strength of the R2 score lies in its interpretability, offering insights into the model’s ability to explain variation in the target variable. A higher R2 score signifies a better capture of data variability, leading to more accurate predictions. However, it’s crucial to note that R2 can be influenced by outliers, necessitating their removal during preprocessing to ensure the metric’s reliability. Consequently, R2 score serves as the evaluation metric for both models. In the case of EfficientNet, R2 score guides the determination of model loss and metrics, as depicted in the figure illustrating the Python function for calculating R2 loss and metric scores for the EfficientNet model.
Furthermore, callback functions such as Early Stopping, Learning Rate Scheduler, and Model Checkpoints were employed during hyperparameter tuning. These methods were utilized to preserve the best weights of the model throughout the training process, enhancing its performance and ensuring optimal convergence.
Results:
Following model training and optimization, the model’s performance is assessed through k-fold cross-validation on the training set. This technique in machine learning evaluates the model’s performance and its ability to generalize by segmenting the data into subsets for both training and testing purposes. Through several iterations, the data is divided into varying subsets, with the model being trained and evaluated multiple times. Each iteration employs a distinct subset as the testing set while utilizing the remaining data for training. The outcomes of this cross-validation process are depicted in the figure below.
The analysis from the figure reveals notable variations in R2 scores across different plant traits. Traits like X18 (Height of the plant) and X3112 (Leaf area), which are physically visible, demonstrated high R2 scores, indicating the model’s success in accurately predicting and categorizing these visually discernible traits. In contrast, non-physical traits such as X50 (Leaf Nitrogen content per leaf area) exhibited the lowest R2 scores. This disparity suggests that the model excelled in capturing distinct visual features relevant to physically observable traits, likely due to their clear representation in images. Conversely, non-physical traits may involve complexities or necessitate additional data, such as biochemical or environmental factors, not solely visible in images, possibly contributing to the lower performance observed for these traits.
Following the analysis of model performance, the test images and auxiliary data are loaded to predict the values of plant traits. The figure below illustrates four sample test images extracted from the test dataset.
Upon completing the model prediction on the test set, the resulting prediction values for the six plant traits were organized and stored in a new data frame. This data frame was then saved into a submission.csv file. The table provided below displays the initial rows of the submission data frame for reference.
Submitting to the Competition:
The ‘submission.csv’ file has been submitted for the competition, and the figure below presents the public score achieved for these submissions.
While the score isn’t significantly higher, it surpassed previous version scores, and any non-negative score is deemed satisfactory by the competition organizers for the leaderboard, which remains unpublished.
References:
Kaggle Competition: https://www.kaggle.com/competitions/planttraits2024/overview
Kaggle code template: https://www.kaggle.com/code/awsaf49/planttraits2024-kerascv-starter-notebook
Documentations referred:
https://keras.io/api/applications/efficientnet_v2/
GitHub repo Link:
https://github.com/Kishan1082/EfficientNetV2–PlantTraits2024-prediction
The GitHub repo contains the full code implementation and the submission.csv file.