Abstract:Reliable effort estimation is of paramount importance to software planning and management, especially in industry that requires effective and on-time delivery. Although various estimation approaches have been proposed (e.g., planning poker and analogy), they may be manual and/or subjective, which are difficult to apply to other projects. In recent years, deep learning approaches for effort estimation that rely on learning expert features or semantic features respectively have been extensively studied and have been found to be promising. Semantic features and expert features de-scribe software tasks from different perspectives, however, in the literature, the best combination of these two features has not been explored to enhance effort estimation. Additionally, there are a few studies that discuss which expert features are useful for estimating effort in the industry. To this end, we investigate the potential 13 expert features that can be used to estimate effort by interviewing 26 enterprise employees. Based on that, we propose a novel model, called Fine-SE, that leverages semantic features and expert features for effort estimation. To validate our model, a series of evaluations are conducted on more than 30,000 software tasks from 17 industrial projects of a global ICT enterprise and four open-source software (OSS) projects. The evaluation results indicate that Fine-SE provides higher performance than the baselines on evaluation measures (i.e., mean absolute error, mean magnitude of relative error, and performance indicator), particularly in industrial projects with large amounts of software tasks, which implies a significant improvement in effort estimation. In comparison with expert estimation, Fine-SE improves the performance of evaluation measures by 32.0%-45.2% in within-project estimation. In comparison with the state-of-the-art models, Deep-SE and GPT2SP, it also achieves an improvement of 8.9%-91.4% in industrial projects. The experimental results reveal the value of integrating expert features with semantic features in effort estimation.

Assessing Project-Level Fine-Tuning of ML4SE Models

Best Practices for Machine Learning Systems: An Industrial Framework for Analysis and Optimization

A Validation and Quality Assessment Method with Metamorphic Relations for Unsupervised Machine Learning Software.

"Project smells" -- Experiences in Analysing the Software Quality of ML Projects with mllint

Detecting Refactoring Commits in Machine Learning Python Projects: A Machine Learning-Based Approach

Fine-Tuning Language Models Using Formal Methods Feedback

Fine-tuning Large Language Models for Automated Diagnostic Screening Summaries

Rethinking Code Refinement: Learning to Judge Code Efficiency

Assessing Fine-Tuning Efficacy in LLMs: A Case Study with Learning Guidance Chatbots

Fine-Tuning and Prompt Engineering for Large Language Models-based Code Review Automation

Demystifying the Impact of Open-Source Machine Learning Libraries on Software Analytics

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

QualEval: Qualitative Evaluation for Model Improvement

Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches

Fine-SE: Integrating Semantic Features and Expert Features for Software Effort Estimation.

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data