Bayesian Iterative Prediction and Lexical-based Interpretation for Disturbed Chinese Sentence Pair Matching
2024, International World Wide Web Conference (WWW), accepted for publication (selected for an oral presentation) [pdf]
Advantage Actor-Critic with Reasoner: Explaining the Agent’s Behavior from an Exploratory Perspective
2024, paper under review [arxiv]
Developed an Advantage Actor-Critic with Reasoner (A2C with Reasoner network) framework to interpret the behavior of actors of atria games.
Active Learning to accelerate the training of NER model on clinical data
2023, Internship research
Conducted TFIDF, Sentence-Transformer, and proposed Iterative-BERT methods to vectorize and represent medical records, followed by rescaling, dimensionality-reduction, clustering, and error analysis to prepare for cluster-based diversity sampling of NER model acceleration. This research was completed during the internship in the Department of Data Science & AI Solutions, Blue Cross Blue Shield, Chicago, under the supervision of Dr. Emre Hakguder and Pranav Suresh Magadi.
MSQ-BioBERT: Ambiguity Resolution to Enhance BioBERT Medical Question-Answering
2023, International World Wide Web Conference (WWW), accepted for publication (selected for an oral presentation). [pdf]
Introduced a multiple synonymous questions BERT model (MSQ-BERT) for the contents-ambiguous question-answering task. The method utilized question augmentation, word frequency scores, and singular value decomposition.
Identifying COVID-19 cases and extracting patient-reported symptoms from Reddit using NLP
2023, Nature Scientific Report, accepted for publication [pdf]
Developed an intelligent NLP model with two original algorithms (Dual-corpus Expansion Algorithm and Adaptive Rotation Clustering Algorithm) to extract COVID-19 symptoms from Reddit posts and built a COVID-19 symptom corpus system.
"Ingenuity Cup" National Artificial Intelligence Innovation Application Competition
(China's largest and most participated national-level AI competition, 9000+ teams and 16300+ participants)
2023, 1st Prize Winner in Track Competition; 2nd Prize Winner in Final Presentation; Special Award (Top 50 High-Value Application Plans)
- Technological Innovation Track: Deep Learning Model Interpretability Competition
- Sponsor: Ministry of Industry and Information Technology of the People's Republic of China; Ministry of Science and Technology of the People's Republic of China; The people's Government of Shenzhen Municipality.
Devised an Edit-distance-weighted fine-tuning method that amplifies the emphasis on semantics, significantly improving NLP models' semantic understanding. Proposed a Bayesian Iterative Prediction algorithm that greatly enhances the robustness and accuracy of NLP models' prediction. Innovated by incorporating Lexical Category Scores with the existing local interpretative method LIME, substantially elevating the rationality and loyalty of token-level rationale. Introduced Bi-criteria Denoising method further strengthens these explanatory capabilities.
When Patients Recover from COVID-19: Data-driven Insights from Wearable Technologies
2022, Frontiers in Big Data, accepted for publication. [pdf]
Introduced a novel classification model with uncertainty quantification to identify COVID-19 disease stages by combining Long-short Term Memory (LSTM) and Deep Neural Network (DNN) to exploit both temporal stream data and attribute stream data from wearable devices.
Classification of Twitter engagement using text and numeric features
2022, GWU Data Science and American Statistical Association's Datathon (individual runner-up)
Analyzed the engagement of Twitter based on more than 1.1 million tweet data. Introduced BERT+DNN model for tweet engagement prediction using both text and numerical variables. Improved all predictive metrics compared to baseline models on a test dataset of over 220,000 tweets.
Analysis of British tourists' consumption during the London Olympics
2021, China Online Datathon Competition (individual runner-up, rank 2/3771)
Analyzed the factors that influence consumer spending by Random Forest and Best Subsets Regression. Explored the differences in the impact of the Olympics in different regions of the UK by LSTM. Determined fluctuations in the number of visitors to the UK during the London Olympics by SVD.
Knowledge Graph Based Platform of COVID-19 Drugs and Symptoms
2021, ASONAM, accepted for publication. [pdf]
Built a named entity recognition-based framework to extract information accurately and generate knowledge graphs efficiently from a myriad of clinical test results articles; Developed a question answering system answers to medical questions regarding COVID-19-related symptoms using Wikipedia articles.
Forecast and volatility analysis of house prices by prefecture in Japan
2020, Data Development Competition (individual champion, rank 1/933)
Introduced the geojson map dataset and visualized the house price data of Japan. Predicted the house price by RNN model and ARIMA model. Applied SVD on map sequences, determining price volatility by the idea of video background extraction.