Over the past week, some of the key learnings / experiments wrt our master's project included:
- Whenever presenting a research topic, it's better to accompany the presentation with visual aids. For example, we could have shown more images from our example dataset to convey the complexity of the topic. Many of the images are blurry or otherwise hard to decipher, which complicates the research object and makes it necessary to consider additional pre-processing steps, such as implementing GANs to improve image quality. (Attaching more examples of the images below.)
- For instance, on the last photo, one can see that the "Cl" symbol is blurred, and the "N" and "H" symbols lie so close to each other, that it's hard to decipher the separating line between them.
- As we started working with the dataset, we realized an increase in its quantity could yield better results, whereby all the different rotations / flips of the image are also considered. So, in a way, the same chemical formula can be present in the dataset but be displayed in different formats. This increases the size and variety of consideration without the need to look for external examples (at least for the initial end-to-end product). So far, we've been able to implement the following types of rotations and flips of the same image, using the OpenCV library:
However, as one can see there's a problem with flipping such characters as "S" and "OH" in the right way, to preserve their orientation (this is the next step of the problem that I don't yet know how to solve - would appreciate any help with this!). I did hear mentions of Google Image API but not sure how much it can help with this rule-based type of augmentation.
- I did try Pytesseract on individually recognized characters but more often than not it actually failed to decipher the symbol correctly.
- Another key learning is that apart from the numerical accuracy metric, i.e. the Levenstein distance, it would be an interesting idea to explore semantic evaluation. I.e. how close are the results to what chemists expect, without the need to go into the nitty-gritty numerical details. This once again ties in with the usefulness of our final product to customers.
Would really love to hear feedback and any useful comments about the augmentation issue!
Thanks