Haihua Chen

Haihua Chen (陈海华)

Assistant Professor, Data Science,
Affiliated in Health Informatics,
Director, Intelligent Data Engineering & Analytics (IDEA) Lab
Deparmental webpage
Department of Data Science
University of North Texas.
Office: E298A, Discovery Park
Address: 3940 North Elm, Suite E292 Denton, Texas 76203-5017

Email: haihua.chen[at]unt.edu

Google Scholar Linkedin DBLP ResearchGate CV

Brief Bio

I am an Assistant Professor of Data Science, affiliated with Health Informatics, at the University of North Texas, where I direct the Intelligent Data Engineering and Analytics (IDEA) Lab. My research focuses on artificial intelligence, data-centric AI, and natural language processing, with an emphasis on data quality evaluation and improvement for machine learning and large language models. I work across domains including health informatics, legal informatics, scientific communication, and computational social science. I have published more than 50 peer-reviewed articles in leading journals such as Information Processing & Management, Journal of Biomedical Informatics, Knowledge-Based Systems, ACM Computing Surveys, and top-tier conferences including EMNLP, WWW, and ICDM. My work has received 1,600+ citations, and several of my papers are among the most cited in their respective venues. I have participated in over 20 grant applications funded or reviewed by agencies including the NSF, NIH, and IMLS, serving as PI, Co-PI, or senior personnel. I earned my Ph.D. in Information Science from the University of North Texas (in 2022) advised by Prof. Jiangping Chen and Prof. Junhua Ding, and my M.S. in Information Science from Wuhan University (in 2017) under the supervision of Prof. Wei Lu. I am actively engaged in the research community as an associate editor and editorial board member for multiple SSCI/SCIE journals, and I regularly serve in leadership roles for international conferences and workshops. I am also deeply committed to teaching and mentoring, having received consistently excellent teaching evaluations and advising Ph.D. and master’s students in data science, information science, and computer science.

Research Interests: Data Science, Health Informatics, Legal AI, Document Intelligence, Scientific Innovation, Generative AI, AI Applications.

I am recruiting perspective Ph.D. students with strong self-motivation and computational skills in Data Science, Computer Science, Information Science, and related fields will full financial support!

Research Philosophy: Research is the art of transforming curiosity into innovative knowledge that enriches human understanding and society.

-- 2025

News

[Feb. 2026] I will serve as the PC Co-Chair for ACM/IEEE JCDL 2026.

[Jan. 2026] I will organize The Second International Workshop on Data Quality Aware, High-Performance, and Trustworthy AI Systems for Healthcare at IEEE/ACM CHASE 2026.

[Jan. 2026] One paper accepted by WWW 2026.

[Dec. 2025] I will serve as the PC Co-Chair for IEEE AITest 2026.

[Dec. 2025] I will attend the NSF CISE RE Workshop at FIU.

[Nov. 2025] Paper “Unveiling the Merits and Defects of LLMs in Automatic Review Generation for Scientific Papers” received the Best Poster Award at IEEE ICDM'25.

[Jul. 2025] Named as IEEE ICAIT 2025 Best Reviewer.

Funded Research Grants

Extenal Grants:

[NSF] Title: REU Site: Making Generative Artificial Intelligence Responsible, Role: Senior Personnel, Duration: 2025-2028, Award Amount: $463,434.

[NSF] Title: HSI Implementation and Evaluation Project: Developing a High-Quality Academic Environment for Broadening Participation of Hispanic Students in Computing, Role: Co-PI, Duration: 2022-2025, Award Amount: $499,608.

Internal Grants:

[UNT] Title: Partner, Not Crutch: Designing a Metacognitive Nudge to Promote AI Co-Regulation, Role: Co-PI, Duration: 2026-2027, Award Amount: $5,000.

[UNT] Title: Embodied & Multimodal AI for Cross-Cultural Access to Digital Archives: A UNT–Osaka Seed Collaboration, Role: PI, Duration: 2026, Award Amount: $5,000.

[UNT COI] Title: Utilizing AI/ML to Enhance Personalized Health Information Services for Hispanic Populations during Disaster Recovery, Role: PI, Duration: 2024, Award Amount: $5,000.

[UNT COI] Title: Towards a Large-scale and High-quality Corpus for Legal Argument Mining, Role: PI, Duration: 2022, Award Amount: $9,975.

Selected Publications [*Corresponding author] [Google Scholar]

Arxiv First:

	Beyond Human Annotation: Recent Advances in Data Generation Methods for Document Intelligence Dehao Ying, Fengchang Yu, Haihua Chen, Changjiang Jiang, Yurong Li, Wei Lu arXiv 2026 [arXiv]
	The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist Haoxuan Zhang, Ruochi Li, Yang Zhang, Ting Xiao, Jiangping Chen, Junhua Ding, Haihua Chen* arXiv 2025 [arXiv] [GitHub]
	Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent Analysis Wonduk Seo, Juhyeon Lee, Junseo Koh, Hyunjin An, Jian Park, Seunghyun Lee, Haihua Chen, Yi Bu arXiv 2025 [arXiv] [code]
	Large Language Models for Oral History Understanding with Text Classification and Sentiment Analysis Komala Subramanyam Cherukuri, Pranav Abishai Moses, Aisa Sakata, Jiangping Chen, Haihua Chen* arXiv 2025 [arXiv] [code]

Journal Articles:

	A Comprehensive Survey on Medical Concept Normalization: Datasets, Techniques, Applications, and Future Directions Haihua Chen, Yuhan Zhou, Ruochi Li, Aryan Murthy Illa, Ana Cleveland, Junhua Ding Journal of Biomedical Informatics* (JCR Q1) 2026 [SSRN] [GitHub]
	A Novel Multi-layer Task-centric and Data Quality Framework for Autonomous Driving Yuhan Zhou, Haihua Chen, Kewei Sha* IEEE Internet Computing (JCR Q1) 2026 [arXiv]
	A hybrid graph and LLM approach for measuring scientific novelty via knowledge recombination and propagation Zhongyi Wang, Zereg Wang, Guangzhao Zhang, Jiangping Chen, Markus Luczak-Roesch, Haihua Chen* Expert Systems with Applications (JCR Q1) 2025 [pdf] [code]
	IBID-CCT: A novel model for interdisciplinary breakthrough innovation detection based on the cusp catastrophe theory Zhongyi Wang, Na Wang, Haoxuan Zhang, Zeren Wang, Zhou Wang, Junhua Ding, Haihua Chen* Information Processing & Management (JCR Q1) 2025 [pdf] [code]
	Enhancing data quality in medical concept normalization through large language models Haihua Chen, Ruochi Li, Ana Cleveland, Junhua Ding Journal of Biomedical Informatics (JCR Q1) 2025 [pdf] [code]
	Exploring the influence of regulated learning processes on learners’ prestige in project-based learning Fengjiao Tu, Linjing Wu, Kinshuk, Junhua Ding, Haihua Chen* Education and Information Technologies (JCR Q1) 2024 [pdf]
	An effective framework for measuring the novelty of scientific articles through integrated topic modeling and cloud model Zhongyi Wang, Haoxuan Zhang, Jiangping Chen, Haihua Chen* Journal of Informatics (JCR Q1) 2024 [pdf] [code]
	Content-based quality evaluation of scientific papers using coarse feature and knowledge entity network Zhongyi Wang, Haoxuan Zhang, Haihua Chen, Yunhe Feng, Junhua Ding Journal of King Saud University - Computer and Information Sciences* (JCR Q1) 2024 [pdf] [code]
	From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures Jiani Huang, Haihua Chen, Fengchang Yu, Wei Lu ACM Computing Surveys (JCR Q1) 2024 [pdf]
	Identifying interdisciplinary topics and their evolution based on BERTopic Zhongyi Wang, Jing Chen, Jiangping Chen, Haihua Chen* Scientometrics (JCR Q1) 2024 [pdf] [code]
	ICAD-MI: Interdisciplinary concept association discovery from the perspective of metaphor interpretation Zhongyi Wang, Siyuan Peng, Jiangping Chen, Xian Zhang, Haihua Chen* Knowledge-Based Systems (JCR Q1) 2023 [pdf] [code]
	Detecting interdisciplinary semantic drift for knowledge organization based on normal cloud model Zhongyi Wang, Siyuan Peng, Jiangping Chen, Amoni G Kapasule, Haihua Chen* Journal of King Saud University - Computer and Information Sciences (JCR Q1) 2023 [pdf]
	A comparative study of automated legal text classification using random forests and deep learning Haihua Chen, Lei Wu, Jiangping Chen, Wei Lu, Junhua Ding Information Processing & Management (JCR Q1) 2022 [pdf] [code]
	Construction and evaluation of a high-quality corpus for legal intelligence using semiautomated approaches Haihua Chen, Lavinia F Pieptea, Junhua Ding IEEE Transactions on Reliability (JCR Q1) 2022 [pdf] [code]
	Constructing a high-quality dataset for automated creation of summaries of fundamental contributions of research articles Haihua Chen, Huyen Nguyen, Asmaa Alghamdi Scientometrics* (JCR Q1) 2022 [pdf] [data] [code]
	Measuring the innovation of method knowledge elements in scientific literature Zhongyi Wang, Keying Wang, Jiyue Liu, Jing Huang, Haihua Chen* Scientometrics (JCR Q1) 2022 [pdf] [code]
	A comparative evaluation of biomedical similar article recommendation Li Zhang, Wei Lu, Haihua Chen, Yong Huang, Qikai Cheng Journal of biomedical informatics (JCR Q1) 2022 [pdf]
	Data Evaluation and Enhancement for Quality Improvement of Machine Learning Haihua Chen, Jiangping Chen, Junhua Ding IEEE Transactions on Reliability (JCR Q1) 2022 [pdf]

Conference Papers:

	Modeling and Measuring Redundancy in Multisource Multimodal Data for Autonomous Driving Yuhan Zhou, Mehri Sattari, Haihua Chen, Kewei Sha The Fourth IEEE International Conference on Mobility: Operations, Services, and Technologies, MOST 2026 [pdf] [code]
	AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation Haoxuan Zhang, Ruochi Li, Zhenni Liang, Mehri Sattari, Phat Vo, Collin Qu, Ting Xiao, Junhua Ding, Yang Zhang, Haihua Chen The ACM Web Conference, WWW 2026 [pdf] [code]
	TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, Jin Zeng Findings of the Association for Computational Linguistics: EMNLP 2025 [pdf]
	ReviewGuard: Enhancing Deficient Peer Review Detection via LLM-Driven Data Augmentation Haoxuan Zhang, Ruochi Li, Sarthak Shrestha, Shree Harshini Mamidala, Revanth Putta, Arka Krishan Aggarwal, Ting Xiao, Junhua Ding, Haihua Chen The ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2025 [pdf] [code]
	Unveiling the Merits and Defects of LLMs in Automatic Review Generation for Scientific Papers Ruochi Li, Haoxuan Zhang, Edward Gehringer, Ting Xiao, Junhua Ding, Haihua Chen* IEEE International Conference on Data Mining, ICDM 2025 Best Poster Award for the Regular Paper Session. [pdf] [code]
	Fine-Grained, Accurate Data Generation and Multimodal Layout Analysis for Academic Papers Dehao Ying, Fengchang Yu, Haihua Chen, Wei Lu The ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2024 [pdf]
	DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout Analysis Dehao Ying, Fengchang Yu, Haihua Chen, Wei Lu ACM International Conference on Multimedia, MM 2024 [pdf]
	A Survey on Data Quality Dimensions and Tools for Machine Learning Yuhan Zhou, Fengjiao Tu, Kewei Sha, Junhua Ding, Haihua Chen* IEEE International Conference on Artificial Intelligence Testing, AITest 2024 [pdf] [GitHub]
	Investigating Code Generation Performance of ChatGPT with Crowdsourcing Social Data Yunhe Feng, Sreecharan Vanam, Manasa Cherukupally, Weijian Zheng, Meikang Qiu, Haihua Chen IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023 Best Track Paper Award. [pdf] [Prompt Dataset]
	Enhancing Text Classification Models with Generative AI-aided Data Augmentation Huanhuan Zhao, Haihua Chen, Hong-Jun Yoon IEEE International Conference on Artificial Intelligence Testing, AITest 2023 Best Student Paper Award. [pdf]
	Evaluating the Impact of Incentive/Non-incentive Reviews on Customer Decision-making Kate Kargozari, Junhua Ding, Haihua Chen* IEEE International Conference on Artificial Intelligence Testing, AITest 2023 Best Paper Award. [pdf]

Teaching

Summer 2026: DTSC 3020: Introduction to Computation with Python (Online)
Spring 2026: INFO 5731: Computational Methods for Information Systems (Face to face)
Spring 2026: HINF 5506: Applications of Artificial Intelligence in Health (Face to face)
Fall 2025: INFO 5731: Computational Methods for Information Systems (Face to face)
Fall 2025: DTSC 3020: Introduction to Computation with Python (Face to face)
Summer 2025: INFO 5810: Data Analysis and Knowledge Discovery (Online)
Spring 2025: INFO 5731: Computational Methods for Information Systems (Face to face)
Spring 2025: HINF 5506: Applications of Artificial Intelligence in Health (Face to face)

Previous Scheduled Teaching

Academic Services

Organizing Committee of Conferences:

Editor or Guest Editor or Editorial Board Member of Journals:

Co-chair of Workshops:

Reviewer of Journals (Selected):

PC member or Reviewer of Conferences:

Honors & Awards

2025. Rocking Star of Research and Innovation, Department of Data Science, University of North Texas.
2025. Best Poster Award, The 2025 IEEE International Conference on Data Mining (ICDM2025), Washington DC, USA.
2025. Best Reviewer Award, The 2025 IEEE Technical Community on Learning Technology (TCLT), Changhua, Taiwan.
2024. Rising Star Runner-up, College of Information, University of North Texas.
2023. Best Student Paper Award, The 2023 IEEE International Conference on Artificial Intelligence Testing (AITest), Athens, Greece.
2023. Best Paper Award, The 2023 IEEE International Conference on Artificial Intelligence Testing (AITest), Athens, Greece.
2023. Best Paper Award of the SETA Track, The 2023 IEEE Computer Society Signature Conference on Computers, Software and Applications (COMPSAC), Torino, Italy.
2020. Best Paper Award, The IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), Macau, China.

Students

I have been fortunate to work with many gifted students:

Fengjiao Tu, Ph.D. Student in Information Science, UNT, Fall 2023 - Present

Laxmigayathri Challa, Ph.D. Student in Information Science, UNT, Fall 2023 - Present

Ruochi Li, Ph.D. Student in Computer Science, NCSU, Fall 2023 - Present

Yuhan Zhou, Ph.D. Student in Information Science, UNT, Fall 2024 - Present

Haoxuan Zhang, Ph.D. Student in Information Science, UNT, Fall 2024 - Present

Komala Subramanyam Cherukuri, Ph.D. Student in Information Science, UNT, Fall 2024 - Present

Mehri Sattari, Ph.D. Student in Information Science, UNT, Fall 2025 - Present

Huyen Nguyen, Ph.D. Student in Information Science, UNT, Fall 2020 - Spring 2025

Ngan Tran, Ph.D. Student in Information Science, UNT, Fall 2021 - Fall 2024

Ampana Jayaram, MS Student in Data Science, UNT, Fall 2025 - Present

Kanishk Sharma, MS Student in Data Science, UNT, Fall 2025 - Present

Sai Donepudi, Undergraduate Student in Data Science, UNT, Fall 2024 - Present

Julian Ondrey, Undergraduate Student in Computer Science, UNT, Fall 2025 - Present

Eneojo Unwuchola, Undergraduate Student in Computer Science, UNT, Fall 2025 - Present

Lizal Adhikari, Undergraduate Student in Data Science, UNT, Fall 2025 - Present

Suvrat Sharma Bhatta, Undergraduate Student in Data Science, UNT, Fall 2025 - Present

Extenal Grants:

Internal Grants:

Arxiv First:

Journal Articles:

Conference Papers:

Summer 2026: DTSC 3020: Introduction to Computation with Python (Online)

Spring 2026: INFO 5731: Computational Methods for Information Systems (Face to face)

Spring 2026: HINF 5506: Applications of Artificial Intelligence in Health (Face to face)

Fall 2025: INFO 5731: Computational Methods for Information Systems (Face to face)

Fall 2025: DTSC 3020: Introduction to Computation with Python (Face to face)

Summer 2025: INFO 5810: Data Analysis and Knowledge Discovery (Online)

Spring 2025: INFO 5731: Computational Methods for Information Systems (Face to face)

Spring 2025: HINF 5506: Applications of Artificial Intelligence in Health (Face to face)

Organizing Committee of Conferences:

Editor or Guest Editor or Editorial Board Member of Journals:

Co-chair of Workshops:

Reviewer of Journals (Selected):

PC member or Reviewer of Conferences:

2025. Rocking Star of Research and Innovation, Department of Data Science, University of North Texas.

2025. Best Poster Award, The 2025 IEEE International Conference on Data Mining (ICDM2025), Washington DC, USA.

2025. Best Reviewer Award, The 2025 IEEE Technical Community on Learning Technology (TCLT), Changhua, Taiwan.

2024. Rising Star Runner-up, College of Information, University of North Texas.

2023. Best Student Paper Award, The 2023 IEEE International Conference on Artificial Intelligence Testing (AITest), Athens, Greece.

2023. Best Paper Award, The 2023 IEEE International Conference on Artificial Intelligence Testing (AITest), Athens, Greece.

2023. Best Paper Award of the SETA Track, The 2023 IEEE Computer Society Signature Conference on Computers, Software and Applications (COMPSAC), Torino, Italy.

2020. Best Paper Award, The IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), Macau, China.

I have been fortunate to work with many gifted students: