Welcome to my website!

I am a Research Scientist at Meta. I completed my Ph.D. at the University of British Columbia, Canada, under the supervision of Dr. Muhammad Abdul-Mageed. My research specializes in natural language processing, deep learning, and social media mining. I have had the privilege of interning at Microsoft Research, Meta, Google, and Borealis AI. Additionally, I was a visiting researcher at MBZUAI in Abu Dhabi. I earned my M.Sc. in Data Science at Clarkson University, where I was advised by Dr. Boris Jukic.

Awards:

Anne and George Piternick Research Award (2024), UBC.
Ph.D. Travel Award of School of Information (2019, 2021, and 2023), UBC.
Distinguished Teaching (2022), UBC.
Tung Graduate Fellowship (2021-2022), UBC.
Affiliated Fellowship (2021-2022), UBC.
President’s Academic Excellence Initiative Ph.D. Award (2020-2023), UBC.
Evelyn Markwei Memorial Award (2020-2021), UBC.
International Tuition Award (2018-2023), UBC.
Graduate Scholarship of School of Information (2018-2023), UBC.
Graduate Scholarship (2016-2017), Clarkson University, USA.

Publications:

Zhang, Chiyu, Sun, Y., Wu, M., Chen, J., Lei, J., Abdul-Mageed, M., Jin, R., Liu, A., Zhu, J., Park, S., Yao, N., Long, B. (2024). EmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations. In Proceedings of 18th ACM Conference on Recommender Systems (RecSys ‘24). ACM.
Wang, R., Li, H., Wu, M., Wang, Y., Han, X., Zhang, Chiyu, Baldwin, T. (2024). Demystifying Instruction Mixing for Fine-tuning Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). ACL.
Abdul-Mageed, M., Keleg, A., Elmadany, A., Zhang C., Hamed, I., Magdy, W., Bouamor, H., Habash, N. (2024). NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task. In Proceedings of The Second Arabic Natural Language Processing Conference. ACL.
Zhang, Chiyu, Sun, Y., Chen, J., Lei, J., Abdul-Mageed, M., Wang, S., Jin, R., Park, S., Yao, N., & Long, B. (2024). SPAR: Personalized Content-Based Recommendation via Long Engagement Attention. arXiv 2402.10555. (Under Review)
Zhang, Chiyu, Cai, H., Li, Y., Wu, Y., Hou, L., & Abdul-Mageed, M. (2024). Distilling Text Style Transfer With Self-Explanation From LLMs. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop). ACL.
Wu, M., Waheed, A., Zhang, C., Abdul-Mageed, M., & Aji, A. F. (2024). LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL). ACL. GitHub
Zhang, C., Doan, K. D., Liao, Q., & Abdul-Mageed, M. The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Zhang, C., Abdul-Mageed, M., & Jawahar, G. (2023). Contrastive Learning of Sociopragmatic Meaning in Social Media. Findings of the Association for Computational Linguistics: ACL 2023. ACL.
Zhang, C., Yan, Q., Meng, L., & Sylvain, T. (2023). What Constitutes Good Contrastive Learning in Time-Series Forecasting?. IJCAI’s workshop: AI4TS: AI for time series analysis.
Abdul-Mageed, M., Zhang, C., Elmadany, A., Bouamor, H., & Habash, N. (2022). NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Task. In Proceedings of the Seventh Arabic Natural Language Processing Workshop, pages 85–97, Abu Dhabi. ACL. [Data]
Zhang, C., Abdul-Mageed, M., & Nagoudi, E. M. B. (2022). Decay No More: A Persistent Twitter Dataset for Learning Social Meaning. In Proceedings of the 1st Workshop on Novel Evaluation Approaches for Text Classification Systems on Social Media (NEATCLasS). AAAI Press. (Best Paper Award) [Data and Model]
Zhang, C., & Abdul-Mageed, M. (2022). Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning. In Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, pages 141–156, Dublin, Ireland. ACL. [Code]
Laricheva, M., Zhang, C., Liu, Y., Chen, G., Tracey, T., Young, R., & Carenini, G. (2022). Automated Utterance Labeling of Conversations Using Natural Language Processing. In Proceedings of 15th International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation, Pittsburgh, USA.
Liu, Y., Laricheva, M., Zhang, C., Boutet, P., Chen, G., Tracey, T., Carenini, G., & Young, R. (2022). Transition to Adulthood for Young People with Intellectual or Developmental Disabilities: Emotion Detection and Topic Modeling. In Proceedings of 15th International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation, Pittsburgh, USA.
Abdul-Mageed, M., Zhang, C., Elmadany, A., Bouamor, H., & Habash, N. (2021). NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 244–259, Kyiv, Ukraine (Virtual). ACL. [Data]
Qiu, Y., Yang, X., Li, Z., Zhang, C., & Chen, S. (2021). Investigating the impacts of artificial intelligence technology on technological innovation from a patent perspective. Applied Mathematics and Nonlinear Sciences, 6(1), 129-140.
Abdul-Mageed, M., Zhang, C., Bouamor, H., & Habash, N. (2020, December). NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 97–110, Barcelona, Spain (Online). ACL. [Data]
Abdul-Mageed, M., Zhang, C., Elmadany, A., & Ungar, L. (2020). Toward micro-dialect identification in diaglossic and code-switched environments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5855–5876, Online. [Data] [Model]
Elmadany, A., Zhang, C., Abdul-Mageed, M., & Hashemi, A. (2020, May). Leveraging Affective Bidirectional Transformers for Offensive Language Detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 102–108, Marseille, France. European Language Resource Association.
Abdul-Mageed, M., Zhang, C., & Hashemi, A. (2020, May). AraNet: A Deep Learning Toolkit for Arabic Social Media. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 16–23, Marseille, France. European Language Resource Association. [Software]
Abdul-Mageed, M., Zhang, C., Rajendran, A., Elmadany, A., Przystupa, M., & Ungar, L. (2019). Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media. arXiv preprint arXiv:1911.00637.
Abdul-Mageed, M., Zhang, C., Elmadany, A., Rajendran, A., & Ungar, L. (2019). DiaNet: BERT and Hierarchical Attention Multi-Task Learning of Fine-Grained Dialect. arXiv preprint arXiv:1910.14243.
Zhang, C., & Abdul-Mageed, M. (2019, December). BERT-Based Arabic Social Media Author Profiling. In Proceedings of 11th meeting of the Forum for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019. CEUR-WS.org.
Zhang, C., & Abdul-Mageed, M. (2019, December). Multi-Task Bidirectional Transformer Representations for Irony Detection. In Proceedings of 11th meeting of the Forum for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019. CEUR-WS.org.
Zhang, C., & Abdul-Mageed, M. (2019, August). No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 279–284, Florence, Italy. ACL. (Best System Paper) [Code]
Zhang, C., Rajendran, A., & Abdul-Mageed, M. (2019, June). UBC-NLP at SemEval-2019 Task4: Hyperpartisan News Detection with Attention-Based Bi-LSTMs. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 775–781, Minneapolis, Minnesota, USA. ACL. [Code]
Rajendran, A., Zhang, C., & Abdul-Mageed, M. (2019, June). UBC-NLP at SemEval-2019 Task 6: Ensemble Learning of Offensive Content with Enhanced Training Data. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 775–781, Minneapolis, Minnesota, USA. ACL.
Rajendran, A., Zhang, C., & Abdul-Mageed, M. (2019, January). Happy Together: Learning and Understanding Appraisal from Natural Language. In Proceedings of the 2nd Workshop on Affective Content Analysis (AffCon 2019), Honolulu, USA, January 27, 2019. CEUR-WS.org. (Best System Paper)
Qiu, Y.*, & Zhang, C.*. (2018, September). Wrapper feature selection algorithm for the optimization of an indicator system of patent value assessment. IPPTA, 30(3), 300-308
Qiu, Y.*, Zhang, C.*, & Shuixuan, C. (2017, March). Research of Patent-value Assessment Indicator System Based on Classification and Regression Tree Algorithm. Journal of Xiamen University (Natural Science)(2), 244-251.
Qiu, Y., & Zhang, C. (2016, August). Research of indicator system in customer churn prediction for telecom industry. In 2016 11th International Conference on Computer Science & Education (ICCSE) (pp. 123-130). IEEE.

Chiyu Zhang

Welcome to my website!

Awards:

Publications: