Haihua Chen (陈海华)


Assistant Professor, Data Science,
Affiliated in Health Informatics,
Director, Intelligent Data Engineering & Analytics (IDEA) Lab
Deparmental webpage
Department of Data Science
University of North Texas.
Office: E298A, Discovery Park
Address: 3940 North Elm, Suite E292 Denton, Texas 76203-5017

Email: haihua.chen[at]unt.edu

Google Scholar     Linkedin     DBLP     ResearchGate     CV

Brief Bio
I am an Assistant Professor of Data Science, affiliated with Health Informatics, at the University of North Texas, where I direct the Intelligent Data Engineering and Analytics (IDEA) Lab. My research focuses on artificial intelligence, data-centric AI, and natural language processing, with an emphasis on data quality evaluation and improvement for machine learning and large language models. I work across domains including health informatics, legal informatics, scientific communication, and computational social science. I have published more than 50 peer-reviewed articles in leading journals such as Information Processing & Management, Journal of Biomedical Informatics, Knowledge-Based Systems, ACM Computing Surveys, and top-tier conferences including EMNLP, WWW, and ICDM. My work has received 1,600+ citations, and several of my papers are among the most cited in their respective venues. I have participated in over 20 grant applications funded or reviewed by agencies including the NSF, NIH, and IMLS, serving as PI, Co-PI, or senior personnel. I earned my Ph.D. in Information Science from the University of North Texas (in 2022) advised by Prof. Jiangping Chen and Prof. Junhua Ding, and my M.S. in Information Science from Wuhan University (in 2017) under the supervision of Prof. Wei Lu. I am actively engaged in the research community as an associate editor and editorial board member for multiple SSCI/SCIE journals, and I regularly serve in leadership roles for international conferences and workshops. I am also deeply committed to teaching and mentoring, having received consistently excellent teaching evaluations and advising Ph.D. and master’s students in data science, information science, and computer science.

Research Interests: Data Science, Health Informatics, Legal AI, Document Intelligence, Scientific Innovation, Generative AI, AI Applications.

I am recruiting perspective Ph.D. students with strong self-motivation and computational skills in Data Science, Computer Science, Information Science, and related fields will full financial support!

Research Philosophy: Research is the art of transforming curiosity into innovative knowledge that enriches human understanding and society.
-- 2025
News

[Jan. 2026] I will organize The Second International Workshop on Data Quality Aware, High-Performance, and Trustworthy AI Systems for Healthcare at IEEE/ACM CHASE 2026.

[Jan. 2026] One paper accepted by WWW 2026.

[Dec. 2025] I will serve as the PC Co-Chair for IEEE AITest 2026.

[Dec. 2025] I will attend the NSF CISE RE Workshop at FIU.

[Nov. 2025] Paper “Unveiling the Merits and Defects of LLMs in Automatic Review Generation for Scientific Papers” received the Best Poster Award at IEEE ICDM'25.

[Jul. 2025] Named as IEEE ICAIT 2025 Best Reviewer.

Funded Research Grants

[NSF] Title: REU Site: Making Generative Artificial Intelligence Responsible, Role: Senior Personnel, Duration: 2025-2028, Award Amount: $463,434.

[NSF] Title: HSI Implementation and Evaluation Project: Developing a High-Quality Academic Environment for Broadening Participation of Hispanic Students in Computing, Role: Co-PI, Duration: 2022-2025, Award Amount: $499,608.

[UNT] Title: Partner, Not Crutch: Designing a Metacognitive Nudge to Promote AI Co-Regulation, Role: Co-PI, Duration: 2026-2027, Award Amount: $5,000.

[UNT] Title: Embodied & Multimodal AI for Cross-Cultural Access to Digital Archives: A UNT–Osaka Seed Collaboration, Role: PI, Duration: 2026, Award Amount: $5,000.

[UNT COI] Title: Utilizing AI/ML to Enhance Personalized Health Information Services for Hispanic Populations during Disaster Recovery, Role: PI, Duration: 2024, Award Amount: $5,000.

[UNT COI] Title: Towards a Large-scale and High-quality Corpus for Legal Argument Mining, Role: PI, Duration: 2022, Award Amount: $9,975.

Selected Publications [*Corresponding author] [Google Scholar]
Beyond Human Annotation: Recent Advances in Data Generation Methods for Document Intelligence

Dehao Ying, Fengchang Yu, Haihua Chen, Changjiang Jiang, Yurong Li, Wei Lu
arXiv 2026
[arXiv]

The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist

Haoxuan Zhang, Ruochi Li, Yang Zhang, Ting Xiao, Jiangping Chen, Junhua Ding, Haihua Chen*
arXiv 2025
[arXiv] [GitHub]

A Comprehensive Survey on Medical Concept Normalization: Datasets, Techniques, Applications, and Future Directions

Haihua Chen*, Yuhan Zhou, Ruochi Li, Aryan Murthy Illa, Ana Cleveland, Junhua Ding
SSRN 2025
[SSRN] [GitHub]

A Novel Multi-layer Task-centric and Data Quality Framework for Autonomous Driving

Yuhan Zhou, Haihua Chen*, Kewei Sha*
arXiv 2025
[arXiv]

Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent Analysis

Wonduk Seo, Juhyeon Lee, Junseo Koh, Hyunjin An, Jian Park, Seunghyun Lee, Haihua Chen*, Yi Bu*
arXiv 2025
[arXiv] [code]

Large Language Models for Oral History Understanding with Text Classification and Sentiment Analysis

Komala Subramanyam Cherukuri, Pranav Abishai Moses, Aisa Sakata, Jiangping Chen, Haihua Chen*
arXiv 2025
[arXiv] [code]

A hybrid graph and LLM approach for measuring scientific novelty via knowledge recombination and propagation

Zhongyi Wang, Zereg Wang, Guangzhao Zhang, Jiangping Chen, Markus Luczak-Roesch, Haihua Chen*
Expert Systems with Applications (JCR Q1) 2025
[pdf] [code]

IBID-CCT: A novel model for interdisciplinary breakthrough innovation detection based on the cusp catastrophe theory

Zhongyi Wang, Na Wang, Haoxuan Zhang, Zeren Wang, Zhou Wang, Junhua Ding, Haihua Chen*
Information Processing & Management (JCR Q1) 2025
[pdf] [code]

Enhancing data quality in medical concept normalization through large language models

Haihua Chen, Ruochi Li, Ana Cleveland, Junhua Ding
Journal of Biomedical Informatics (JCR Q1) 2025
[pdf] [code]

Exploring the influence of regulated learning processes on learners’ prestige in project-based learning

Fengjiao Tu, Linjing Wu, Kinshuk, Junhua Ding, Haihua Chen*
Education and Information Technologies (JCR Q1) 2024
[pdf]

An effective framework for measuring the novelty of scientific articles through integrated topic modeling and cloud model

Zhongyi Wang, Haoxuan Zhang, Jiangping Chen, Haihua Chen*
Journal of Informatics (JCR Q1) 2024
[pdf] [code]

Content-based quality evaluation of scientific papers using coarse feature and knowledge entity network

Zhongyi Wang, Haoxuan Zhang, Haihua Chen*, Yunhe Feng, Junhua Ding
Journal of King Saud University - Computer and Information Sciences (JCR Q1) 2024
[pdf] [code]

From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures

Jiani Huang, Haihua Chen, Fengchang Yu, Wei Lu
ACM Computing Surveys (JCR Q1) 2024
[pdf]

Identifying interdisciplinary topics and their evolution based on BERTopic

Zhongyi Wang, Jing Chen, Jiangping Chen, Haihua Chen*
Scientometrics (JCR Q1) 2024
[pdf] [code]

ICAD-MI: Interdisciplinary concept association discovery from the perspective of metaphor interpretation

Zhongyi Wang, Siyuan Peng, Jiangping Chen, Xian Zhang, Haihua Chen*
Knowledge-Based Systems (JCR Q1) 2023
[pdf] [code]

Detecting interdisciplinary semantic drift for knowledge organization based on normal cloud model

Zhongyi Wang, Siyuan Peng, Jiangping Chen, Amoni G Kapasule, Haihua Chen*
Journal of King Saud University - Computer and Information Sciences (JCR Q1) 2023
[pdf]

A comparative study of automated legal text classification using random forests and deep learning

Haihua Chen, Lei Wu, Jiangping Chen, Wei Lu, Junhua Ding
Information Processing & Management (JCR Q1) 2022
[pdf] [code]

Construction and evaluation of a high-quality corpus for legal intelligence using semiautomated approaches

Haihua Chen, Lavinia F Pieptea, Junhua Ding
IEEE Transactions on Reliability (JCR Q1) 2022
[pdf] [code]

Constructing a high-quality dataset for automated creation of summaries of fundamental contributions of research articles

Haihua Chen*, Huyen Nguyen, Asmaa Alghamdi
Scientometrics (JCR Q1) 2022
[pdf] [data] [code]

Measuring the innovation of method knowledge elements in scientific literature

Zhongyi Wang, Keying Wang, Jiyue Liu, Jing Huang, Haihua Chen*
Scientometrics (JCR Q1) 2022
[pdf] [code]

A comparative evaluation of biomedical similar article recommendation

Li Zhang, Wei Lu, Haihua Chen, Yong Huang, Qikai Cheng
Journal of biomedical informatics (JCR Q1) 2022
[pdf]

Data Evaluation and Enhancement for Quality Improvement of Machine Learning

Haihua Chen, Jiangping Chen, Junhua Ding
IEEE Transactions on Reliability (JCR Q1) 2022
[pdf]

TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, Jin Zeng
Findings of the Association for Computational Linguistics: EMNLP 2025
[pdf]

ReviewGuard: Enhancing Deficient Peer Review Detection via LLM-Driven Data Augmentation

Haoxuan Zhang, Ruochi Li, Sarthak Shrestha, Shree Harshini Mamidala, Revanth Putta, Arka Krishan Aggarwal, Ting Xiao*, Junhua Ding, Haihua Chen*
The ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2025
[pdf] [code]

Unveiling the Merits and Defects of LLMs in Automatic Review Generation for Scientific Papers

Ruochi Li, Haoxuan Zhang, Edward Gehringer, Ting Xiao, Junhua Ding, Haihua Chen*
IEEE International Conference on Data Mining, ICDM 2025

Best Poster Award for the Regular Paper Session.

[pdf] [code]

Fine-Grained, Accurate Data Generation and Multimodal Layout Analysis for Academic Papers

Dehao Ying, Fengchang Yu, Haihua Chen, Wei Lu
The ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2024
[pdf]

DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout Analysis

Dehao Ying, Fengchang Yu, Haihua Chen, Wei Lu
ACM International Conference on Multimedia, MM 2024
[pdf]

A Survey on Data Quality Dimensions and Tools for Machine Learning

Yuhan Zhou, Fengjiao Tu, Kewei Sha, Junhua Ding, Haihua Chen*
IEEE International Conference on Artificial Intelligence Testing, AITest 2024
[pdf] [GitHub]

Investigating Code Generation Performance of ChatGPT with Crowdsourcing Social Data

Yunhe Feng, Sreecharan Vanam, Manasa Cherukupally, Weijian Zheng, Meikang Qiu, Haihua Chen
IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023

Best Track Paper Award.

[pdf] [Prompt Dataset]

Enhancing Text Classification Models with Generative AI-aided Data Augmentation

Huanhuan Zhao, Haihua Chen, Hong-Jun Yoon
IEEE International Conference on Artificial Intelligence Testing, AITest 2023

Best Student Paper Award.

[pdf]

Evaluating the Impact of Incentive/Non-incentive Reviews on Customer Decision-making

Kate Kargozari, Junhua Ding, Haihua Chen*
IEEE International Conference on Artificial Intelligence Testing, AITest 2023

Best Paper Award.

[pdf]

 

Teaching Academic Services Honors & Awards Students

© 2025 Dr. Haihua Chen. Thanks to Xu Ma and Dr. Deqing Sun for the template. [Updated: Jan/2026]