News

Sep13 '18: RAISE labs bid farewell to Dr. Junjie Wang:

RAISE labs bid farewell to our distinguished visitor from the Chinese Academy of Science, Dr. Junjie Wang.

Dr Wangs's farewell dinner

Aug27 '18: Paper "A Method for Finding Trends in Software Research" is accepted by TSE:

George et al.’s paper “A Method for Finding Trends in Software Research” is accepted by TSE.

Aug27 '18: Paper "Finding Faster Configurations using FLASH" is accepted by TSE:

Vivek et al. paper “Finding Faster Configurations using FLASH” is accepted by TSE.

Aug26 '18: Two papers were accepted by SWAN'18:

Huy et al.’s paper “Is One Hyperparameter Optimizer Enough?” and Akond et al.’s paper “Characterizing The Influence of Continuous Integration: Empirical Results from 250+ Open Source and Proprietary Projects” were accepted by SWAN18.

Aug24 '18: New award from LAS -- "How Safe is this Conclusion?":

Abstract: Nothing is permanent, except change. Conclusions made yesterday may now have become overcome by news events. Ideas drift in time, or in space, thus making those conclusions unsafe for future use. Addressing conclusion safety (by addressing temporal shift and context shift) is an urgent and pressing task. In the AI-aware 21st century, more and more of society’s interactions are being mediated by models learned from data. Hence, assuring conclusion safety would have application in very many other organizations.

Aug23 '18: Research students return for summer internships with great results:

Zhe google

Several RAISE lab members got internship from industrial companies, e.g. Google, Facebook, Microsoft, IBM, Phasechange.ai, etc.

We hold reading groups to introduce our experiences. See more event schedules at here.

Jul26 '18: The leverage project helps LexisNexis reduce cost in cloud computing and automated testing:

AI4SE was granted from LexisNexis in the Leverage Project. This project contains two goals

  • Cloud Computing Optimization (Improve usage and reduce cost of using cloud compute environments (AWS+Azure))
  • Test Case Prioritization (Empirically assess the relative merits of different test case prioritization schemes for LexisNexis.)

Jun28 '18: <ELSEVIER Article> How does your research influence legislation? Text mining may reveal the answer:

Collaboration between North Carolina State University, Elsevier and LexisNexis Risk Solutions shows link between research and policy.

See full story here!

Jun11 '18: Paper Applications of Psychological Science for Actionable Analytics accepted by FSE'18:

Di Chen’s paper titled Applications of Psychological Science for Actionable Analytics has been accepted by ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2018.

May9 '18: Congrats Wei Fu for passing PhD Final Defense exam!:

wei defnese exam

Please join us in congratulating Wei Fu for unconditionally passing his PhD Defense Exam! Wei is the first PhD graduate from RAISE lab.

He will join landing.ai in California after graduation.

Apr20 '18: Congrats Jianfeng for passing PhD oral prelim exam!:

jianfeng oral exam

Please join us in congratulating Jianfeng Chen for unconditionally passing his oral prelim exam!

Check out his slides here.

Apr19 '18: Paper RIOT, a Novel Stochastic Method for Rapidly Configuring Cloud Based Workflows accepted at IEEE CLOUD 2018:

Jianfeng’s paper titled RIOT, a Novel Stochastic Method for Rapidly Configuring Cloud Based Workflows has been accepted by IEEE International Conference on Cloud Computing 2018.

Mar2 '18: Two papers were accecpted by MSR 2018:

Majunmder’s paper titled 500 Times Faster Than Deep Learning, a Case Study Exploring Faster Methods for Text Mining StackOverflow has been accepted for Mining Software Repositories (MSR).

Vivek et al.’s paper titled Data-Driven Search-based Software Engineering has been accepted for Mining Software Repositories (MSR).

Feb23 '18: Paper Bellwethers - A Baseline Method For Transfer Learning accepted by TSE:

Rahul Krishna’s paper titled “Bellwethers -A Baseline Method For Transfer Learning)” has been accepted for publication at TSE.

Feb23 '18: Paper What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering) accepted by IST:

Amritanshu Agrawal’s paper titled “What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)” has been accepted for publication at IST.

Jan23 '18: Paper What is the Connection Between Issues, Bugs, and Enhancements? accepted at ICSE-SEIP'18:

Rahul Krishna’s paper titled “What is the Connection Between Issues, Bugs, and Enhancements?” has been accepted for publication at ICSE-SEIP’18.

Jan23 '18: Paper We Don't Need Another Hero? The Impact of "Heroes" on Software Development accepted at ICSE-SEIP'18:

Amritanshu Agrawal’s paper titled “We Don’t Need Another Hero? The Impact of “Heroes” on Software Development” has been accepted for publication at ICSE-SEIP’18.

Jan22 '18: Paper On The Merits of Continuous Integration for Proprietary Projects accepted at ICSE-SEIP'18:

Akond Rahman’s paper titled “Continuous Integration: The Silver Bullet? On The Merits of Continuous Integration for Proprietary Projects” has been accepted for publication at ICSE-SEIP’18.

Jan14 '18: Paper A deep learning model for estimating story points is accepted by TSE:

Morakot et al.’s article A deep learning model for estimating story points is accepted by TSE. preprint

Jan10 '18: RAISE reading group continous in new semester:

RAISE reading group continued in the new semester on Jan 10, 2018! Jianfeng Chen presented the paper “FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage”. Dozens of graduate students from the Department of Computer Science and Computer Engineering attended and discussed in the meeting.

reading group

In the RAISE reading group, we explore synergies between AI and software engineering. To that end, we conduct a weekly interest group for software engineers and developers at NC State. At the meeting, we read trending papers every week on AI + SE. To get upcoming schedules for our reading group, please visit this page. The reading group is open to the public and free pizza will be provided.

Jan9 '18: TSE article “Sampling” as a Baseline Optimizer for Search-based Software Engineering available on line now:

Jianfeng et al.’s TSE latest article “Sampling” as a Baseline Optimizer for Search-based Software Engineering is available online now. preprint

Jan9 '18: 3 out of 20 most cited recent TSE papers are from RAISE lab and more from PROMISE repo!:

According to metrics from google scholar, three out of the top 20 most cited recent TSE (IEEE Transactions on Software Engineering) papers are by RAISE lab people, plus their talented co-authors: Ekrem Kocagüneli, Ayse Bener, Jacky Keung, Andrew Butcher , David Cox, Andrian Marcus , Lucas Layman , Forrest Shull, Burak Turhan, Thomas Zimmermann. Also, 6 papers in the list are based on data curated by the RAISE lab in the PROMISE repo.

Dec25 '17: Dr. Menzies was selected as TOSEM 2015-2016 distinguished referee:

Dr. Menzies is now one of TOSEM distinguished referees. More.

Dec22 '17: Dr. Menzies was invited to SSBSE'18 and ESEM'18 PC:

Dr. Menzies was invited to SSBSE’18 and ESEM’18 program committee.

Dec13 '17: Paper Is “Better Data” Better Than “Better Data Miners”? is accepted in ICSE:

Amritanshu’s paper Is “Better Data” Better Than “Better Data Miners”? is accetped by ICSE’18.

Dec11 '17: Paper Finding Better Active Learners for Faster Literature Reviews accepted in EMSE:

Zhe’s paper “Finding Better Active Learners for Faster Literature Reviews” is accetped by EMSE.

Dec11 '17: Dr. Menzies attended and gave keynote speech in Data-driven Search-based SE'17:

Dr. Menzies attended Data-driven Search-based SE and gave keynote speech.

Nov9 '17: Dr. Menzies presented at IBM TEC conference:

Dr. Menzies gave an invited talk at IBM TEC conference. Slides can be found here.

Sep11 '17: Dr. Menzies is elected to steering committee in Symposium Search-based Software Engineering:

Dr. Menzies attended SSBSE’17 and was elected to steering committee in Symposium Search-based Software Engineering.

Sep3 '17: Multiple awards at HPCC Summit:

George Mathews won third prize at the 2nd annual HPCC Systems technical poster presentations competition. Along with George, Vivek Nair won the community award for innovative use of HPCC Systems. Dr. Menzies was a part of a panel discussingTalent Gap in Data Science.

HPCC Panel

HPCC Prizes

Aug22 '17: Grant How to Make a Magician selected by LAS:

LAS’18 grant selected: ‘How to Make a Magician’. Link

Aug22 '17: Grant Analytics Science with (In,Ab,De)duction selected by LAS:

LAS’18 grant selected: “Analytics Science with (In,Ab,De)duction”. Link

Aug21 '17: Wei presents at FSE17:

Wei Fu presents his (not one but) two papers at FSE’17.

Aug21 '17: Vivek presents at FSE17:

Vivek presents his paper at FSE’17.

Aug21 '17: George presents at RE17:

George Mathews presents his paper at RE’17.

Aug15 '17: Moved to new Lab:

Yipee!! We just moved into a new lab. Our new home is located at Engineering Building 2 – 3240.

alt text alt text

Jul26 '17: Paper Faster Discovery of Faster System Configurations with Spectral Learning accepted at AUSE:

Vivek Nair’s paper titled “Faster Discovery of Faster System Configurations with Spectral Learning” has been accepted for publication at AUSE.

Jul25 '17: Paper Beyond Evolutionary Algorithms for Search-based Software Engineering accepted at IST:

Jianfeng Chen’s paper titled “Beyond Evolutionary Algorithms for Search-based Software Engineering” has been accepted for publication at IST.

Jul21 '17: Dr. Menzies is a keynote speaker at SWAN-2017:

Dr. Menzies present a guest lecture at Monash University. His talk is titled ‘Software Analytics’. The video of the talk can be found here.

Jun13 '17: Paper Heterogeneous Defect Prediction accepted at TSE:

Wei Fu’s and Dr. Tim Menzies paper titled “Heterogeneous Defect Prediction” has been accepted for publication at TSE.

Jun3 '17: Paper Are Delayed Issues Harder to Resolve? accepted at FSE'17:

Dr. Menzies’ paper titled “Are Delayed Issues Harder to Resolve?” journal paper first has been accepted for publication at FSE’17.

Jun3 '17: Paper Easy over Hard - A Case Study on Deep Learning accepted at FSE'17:

Wei Fu’s paper titled “Easy over Hard: A Case Study on Deep Learning” has been accepted for publication at FSE’17. The author version can also be found here.

Jun3 '17: Paper Revisiting Unsupervised Learning for Defect Prediction accepted at FSE'17:

Wei Fu’s paper titled “Revisiting Unsupervised Learning for Defect Prediction” has been accepted for publication at FSE’17. The author version can also be found here.

Jun2 '17: Paper Using Bad Learners to find Good Configurations accepted at FSE'17:

Vivek Nair’s paper titled “Using Bad Learners to find Good Configurations” has been accepted for publication at FSE’17. The author version can also be found here.

Jun1 '17: Paper accepted at RE'17:

George Mathew’s paper titled “Shorter Reasoning About Larger Requirements Models” has been accepted for publication at RE’17. The author version can also be found here.

Jun1 '17: Paper Faster Discovery of Faster System Configurations with Spectral Learning accepted at AUSE:

Vivek Nair’s paper titled “Faster Discovery of Faster System Configurations with Spectral Learning” has been accepted for publication at ASE Journal(AUSE). The author version can also be found here.

Apr1 '17: Written Prelims:

Congrats Jianfeng for succesfully passing his Written Qualifiers (now he is a Ph.D. candidate) and Sushma for successfully defending her master’s thesis. Upon graduation in May 2017, Sushma will join IBM as a software engineer.

Mar24 '17: Paper accepted at IST:

Rahul Krishna’s paper titled “Less is More: Minimizing Code Reorganization using XTREE” has been accepted for publication at IST. The Author version can also be found here.

Mar18 '17: Reading Group:

We recently formed our reading group which is formed to read and discuss the recent innovations in Software Engineering. Please find our (WIP) schedule here.

Mar1 '17: MSR Foundational Contribution Award:

Congrats to Dr. Menzies for winning the inaugural Mining Software Repositories Foundational Contribution Award. For more information visit this.

Feb4 '17: Summer Internships 2017:

Our students have been offered internships for Summer’17. Congrats to all the students for receiving internship positions.

Jack Chen: Google (Mountain View)

George Mathews: Lexisnexis (Atlanta)

Rahul Krishna: Lexisnexis (Raleigh)

Zhe Yu: Lexisnexis (Raleigh)

Amritanshu: IBM (RTP)

Vivek Nair: Lexisnexis (Atlanta)

Feb3 '17: Bad Learners for configuration optimisation:

Vivek Nair’s paper titled “Using Bad Learners to find Good Configurations” has been released as a technical report.

Feb2 '17: Unsupervised Learning for Defect Prediction:

Wei Fu’s paper titled “Revisiting Unsupervised Learning for Defect Prediction” has been released as a technical report.

Feb2 '17: Easy over Hard:

Wei Fu’s paper titled “Easy over Hard: A Case Study on Deep Learning” has been released as a technical report.

Feb1 '17: Qualitative Analysis using Crowd:

Jack Chen’s paper titled “Replicating and Scaling up Qualitative Analysis using Crowdsourcing: A Github-based Case Study” has been released as a technical report.

Jan14 '17: Journal Paper accepted:

Dr. Menzies’ paper titled “TMAP: Discovering Relevant API Methods through Text Mining of API Documentation” has been accepted for publication at Journal of Software - Special Issue, SCAM 2015. Author version can also be found here.

Jan13 '17: How to read less:

Zhe Yu’s paper titled “How to Read Less: Better Machine Assisted Reading Methods for Systematic Literature Reviews” has been released as a technical report.

Jan12 '17: Impacts of Bad ESP:

George Mathew’s paper titled “Impacts of Bad ESP (Early Size Predictions) on Software Effort Estimation” has been released as a technical report.

Nov14 '16: Paper accepted at EMSE:

Dr. Menzies’ paper titled “Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle” has been accepted for publication at EMSE. Author version can also be found here.

Nov13 '16: Dr. Menzies is a keynote speaker at SWAN-2017:

Dr. Menzies will be the keynote speaker at 2nd International Workshop on Software Analytics (SWAN 2016). His talk is titled “More or Less: seeking simpler software analytics”. The slides of the talk can be found here

Oct11 '16: Dr. Menzies to serve as co-chair at SSBSE-2017:

Dr. Menzies to serve as co-program chair for Symposium on “Search-based Software Engineering. 2017”. Find the flyer to SSBSE’17.

Oct10 '16: Among the top-3 papers at SSBSE-2016:

Vivek, Dr. Menzies and Jianfeng paper titled “ An (Accidental) Exploration of Alternatives to Evolutionary Algorithms for SBSE” at Symposium on Search-Based Software Engineering (SSBSE) was adjudged to be among the Top-3 (of 48 submission) . Slides can be viewed here.

<img align=left src=”/img/timm_ssbse_16.jpg” height=265 width=400>

Oct9 '16: Dr. Menzies presents at SSBSE-2016:

Dr. Menzies presented his paper titled “ An (Accidental) Exploration of Alternatives to Evolutionary Algorithms for SBSE” at Symposium on Search-Based Software Engineering(SSBSE). Slides can be viewed here.

<img align=left src=”/img/timm_ssbse_16.jpg” height=265 width=400>

Sep29 '16: Paper accepted at ESE:

George and Dr. Menzies’ paper titled “Negative Results for Software Effort Estimation” has been accepted for publication at ESE.

Sep20 '16: Dr. Menzies is a Guest Speaker:

Dr. Menzies has been invited to speak at “Big Software on the Run”, winter school, Netherlands on October 27, 2016.

Sep12 '16: Rahul Krishna submits his paper to IST:

Rahul submitted his paper titled ‘Recommendations for Intelligent Code Reorganization’ to Journal of Information and Software Technology.

Sep8 '16: Wei Fu submits his paper to IST:

Wei Fu submitted his paper titled ‘Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors?’ to Journal of Information and Software Technology.

Sep5 '16: Rahul Krishna presents at ASE-2016:

Rahul Krishna presented his paper titled “Too much automation? the bellwether effect and its implications for transfer learning” at International Conference on Automated Software Engineering (ASE 2016). Slides can be viewed here.

<img align=left src=”/img/rahul_ase_2016.jpg” height=265 width=400>

Aug28 '16: Three Papers submitted to ICSE'17:

Last day for submission to International Conference of Software Engineering-2017 was on Aug 28, 2016. This year we have three very interesting papers, which were submitted by Amrit, George and Dr. Menzies.

The papers are:

Now we wait with our fingers crossed!

Aug27 '16: Jianfeng Chen submits his paper to TSE:

Jianfeng Chen submitted his paper titled ‘Is “Sampling” better than “Evolution” for Search-based Software Engineering?’ to Transactions of Software Engineering.

Aug27 '16: Reading Party for ICSE'17:

Reading party to critique works of Amrit, George among others. Great papers, good food along with a lots of caffiene.

<img align=left src=”/img/icse_reading_party.jpg” height=270 width=480>

Aug18 '16: Foundation of Software Science:

Dr. Menzies is teaching a new course - “Foundation of Software Science. This subject will explore methods for designing data collection experiments; collecting that data; exploring that data; then presenting that data in such a way to support business-level decision making for software projects.

Aug15 '16: Funding from LexisNexis:

Thanks LexisNexis for sponsoring our BIG SE with a grant (total award: $60K).

Aug10 '16: Funding from NSA:

Thanks NSA for sponsoring our privatized data sharing research(Privatized data sharing: Practical? Useful?) with a grant (total award: $85K).

Jul18 '16: Rahul Krishna's paper accepted to ASE:

Rahul Krishna’s paper titled “Too Much Automation? The Bellwether Effect and Its Implications for Transfer Learning” is accepted to the 31st IEEE/ACM International Conference Automated Software Engineering 2016. This was a joint work with Dr. Lucas Layman of
Fraunhofer Center for Experimental Software Engineering. Here is a link to his paper.

Jun29 '16: REU Camp:

RAISE hosted two undergraduate students (Abdulrahim Sheikhnureldin and Matthew J. Martin) over summer’16, where they were involved in project titled ‘The Effect of Code Dependencies on Software Project Quality’ and ‘Enhanced Issue Prediction Using Contextual Features respectively.

Jun10 '16: Vivek Nair's paper accepted to SSBSE:

Vivek Nair’s paper titled “An (Accidental) Exploration of Alternatives to Evolutionary Algorithms for SBSE” is accepted to the Symposium on Search-Based Software Engineering - 2016.

Jun1 '16: Congrats to RAISE Members:

Congrats to 5 members of RAISE for securing internships from Lexisnexis and ABB.

May5 '16: Rahul Krishna's paper accepted to BIG DSE:

Rahul Krishna’s paper titled “The “BigSE” Project: Lessons Learned from Validating Industrial Text Mining” is accepted to the BIG Data Software Engineering Workshop, 2016. This was a joint work with Manuel Dominguez, David Wolf of LexisNexis, Raleigh. Here is a link to his paper.

Apr29 '16: Wei Fu's paper accepted to IST journal:

Wei Fu’s paper titled “Tuning for software analytics: Is it really necessary?” is accepted to the Journal of Information and Software Technology. This was a joint work with Dr. Xipeng Shen. Here is a link to his paper.

Feb1 '16: The BigSE Project:

Mr. Krishna submits his paper titled “The “BigSE” Project: Lessons Learned from Validating Industrial Text” to BIGDSE. This is a joint work with Mr. Yu, Mr. Agarwal, Dr. Menzies, Mr. Manuel Dominguez and Mr. David Wolf.

Title: The BigSE Project: Lessons Learned from Validating Industrial Text Mining

Abstract:

As businesses become increasingly reliant on big data analytics, it becomes increasingly important to test the choices made within the data miners. This paper reports lessons learned from the BigSE Lab, an industrial/university collaboration that augments industrial activity with low-cost testing of data miners (by graduate students). BigSE is an experiment in academic/ industrial collaboration. Funded by a gift from LexisNexis, BigSE has no specific deliverables. Rather, it is fueled by a research question “what can industry and academia learn from each other?”. Based on open source data and tools, the output of this work is (a) more exposure by commercial engineers to state-of-the-art methods and (b) more exposure by students to industrial text mining methods (plus research papers that comment on methods on how to improve those methods). The results so far are encouraging. Students at BigSE Lab have found numerous “standard” choices for text mining that could be replaced by simpler and less resource intensive methods. Further, that work also found additional text mining choices that could significantly improve the performance of industrial data miners.

Nov24 '15: Dr. Menzies talk AT CREST Open Workshop:

Dr. Menzies is one of the speakers at The 44th CREST Open Workshop - Predictive Modelling for Software Engineering. The talk is titled “Predicting What Follows Predictive Modeling”. Slides can be viewed here.

Oct20 '15: Relax! Most stats yields the same results:

Dr. Menzies submits his paper titled “On the Value of Negative Results in Software Analytics” to Empirical Software Engineering. This is a joint work with Dr. Ekrem Kocaguneli.

Title: On the Value of Negative Results in Software Analytics

Abstract: When certifying some new technique in software analytics, some ranking procedure is applied to check if the new model is in fact any better than the old. These procedures include t-tests and other more recently adopted operators such as Scott-Knott. We offer here the negative result that at least one supposedly “better” ranking procedure, recently published in IEEE Transactions on Software Engineering, is in fact functionally equivalent (i.e. gives the same result) as some much simpler and older procedures. This negative results is useful, for several reasons. Firstly, functional equivalence can prune research dead-ends before researchers waste scarce resources on tasks with little practical impact. Secondly, by recognizing needless elaborations, negative results like functional equivalence can inform the simplification of the toolkits and syllabi used by practising or student data scientists. Thirdly, each time a new ranking procedure is released into the research community then old results must be revisited. By slowing the release of new procedures, negative results of functional equivalence lets us be more confident about old results, for longer. Fourthly, the particular negative result presented in this paper explains two previously inexplicable results; specifically:

(1) prior results on conclusion instability results documented ;

(2) the strangely similar performance of different evaluation rigs found in previous publications.

Oct20 '15: Older methods just as good or better than anything else:

Dr. Menzies submits his paper titled “Negative Results for Software Effort Estimation” to Empirical Software Engineering. This is a joint work with Dr. Ye Yang, Mr. George Mathew, Dr. Barry Boehm and Dr. Jairus Hihn.

Title: Negative Results for Software Effort Estimation

Abstract:

Context: More than half the literature on software effort estimation (SEE) focuses on comparisons of new estimation methods. Surprisingly, there are no studies comparing state of the art latest methods with decades-old approaches. Objective: To check if new SEE methods generated better estimates than older methods.

Method: Firstly, collect effort estimation methods ranging from “classical” COCOMO (parametric estimation over a pre-determined set of attributes) to “modern” (reasoning via analogy using spectral-based clustering plus instance and feature selection). Secondly, catalog the list of objections that lead to the development of post-COCOMO estimation methods. Thirdly, characterize each of those objections as a comparison between newer and older estimation methods. Fourthly, using four COCOMO-style data sets (from 1991, 2000, 2005, 2010), run those comparisons experiments. Fifthly, compare the performance of the different estimators using a Scott-Knott procedure using (i) the A12 effect size to rule out “small” differences and (ii) a 99% confident bootstrap procedure to check for statistically different groupings of treatments). Sixthly, repeat the above for some non-COCOMO data sets.

Results: For the non-COCOMO data sets, our newer estimation methods performed better than older methods. However, the major negative result of this paper is that for the COCOMO data sets, nothing we used did any better than Boehm’s original procedure.

Conclusions: In some projects, it is not possible to collect effort data in the COCOMO format recommended by Boehm. For those projects, we recommend using newer effort estimation methods. However, when COCOMO-style attributes are available, we strongly recommend using that data since the experiments of this paper show that, at least for effort estimation, how data is collected is more important than what learner is applied to that data.

Oct6 '15: Dr. Menzies talk AT University of Notre Dame:

Dr. Menzies is to talk to the computer students at University of Norte Dame. The talk is titled “The Future and Promise of Software Engineering Research”. This posting for the talk. Slides can be viewed here.

Sep29 '15: Dr. Menzies delivers a talk at HPCC summit 2015:

Dr. Menzies delivers a talk titled “Big Data: the weakest link “ at HPCC summit 2015. He was also a part of a panel discussion on “Grooming Data Scientists for Today and for Tomorrow”.

Congratulations to Dr. Menzies for winning an award for his outstanding contribution to the HPCC community.

<img align=left src=”/img/DrM_hpcc_talk.png”> <img align=left src=”/img/DrM_hpcc_panel.png”>

Dr. Menzies says “I want a scientist. I want someone who actually doubts their own conclusions vigorously.”

Sep27 '15: Welcome Dr. Dam:

We are very happy to host fellow researcher, Dr. Dam, from down under(Australia). Dr Hoa Khanh Dam is a Senior Lecturer at the School of Computing and Information Technology, University of Wollongong, Australia. The lab is excited to learn from his experience with requirements engineering and effort estimation in AGILE settings.

<img align=left src=”/img/Dr.Hoa_Dam.png”>

Aug28 '15: Mr. Rahul Krishna submits his paper to ICSE'16:

Mr. Rahul Krishna submits his paper titled “How to Learn Useful Changes to Software Projects (to Reduce Runtimes and Software Defects)” to ICSE 2016. This is a joint work with Dr. Xipeng Shen, Andrian Marcus, Naveen Lekkalapudi and Lucas Layman. For more see notes.

Title: How to Learn Useful Changes to Software Projects (to Reduce Runtimes and Software Defects)

Abstract: Business users now demand more insightful analytics; specifically, tools that generate “plans”– specific suggestions on what to change in order to improve the predicted values. This paper proposes XTREE, a planner for software projects. XTREE receives tables of data with independent features and a corresponding weighted class which indicates the quality (“bad” or “better”) of each row in the table. Plans are edits to the rows which ensures the changed row is more likely to be of a “better” quality. XTREE learns those plans by building a decision tree across the data, then reporting the differences in the branches from some current branch to another desired branch. Using data from 11 software projects, XTREE can find better plans compared to three alternate methods. Those plans have lead to improvements with a median size of (56%, 28%) and largest size of (60%, 77%) in (defect counts, runtimes), respectively.

Aug28 '15: Mr. Wei Fu submits his paper to ICSE'16:

Wei Fu submits his paper titled “Tuning for Software Analytics: is it Really Necessary?” to ICSE 2016. This is a joint work with Dr. Xipeng Shen. For more see notes.

Aug28 '15: Dr. Tim Menzies submits his paper to ICSE'16:

Dr. Menzies submits his paper titled “Live and Let Die? (Delayed Issues not Harder to Resolve)” to ICSE 2016. This is a joint work with Dr. William R. Nichols, Dr. Forrest Shulland Dr. Lucas Layman.

Title: Live and Let Die? (Delayed Issues not Harder to Resolve)

Abstract: Many practitioners and academics believe in a delayed issue effect (DIE); i.e. as issues linger longer in a system, they become exponentially harder to resolve. This belief is often used to justify major investments in new development processes that promise to retire more issues, sooner. This paper tests for the delayed issue effect in 171 software projects conducted around the world in the period from 2006–2014. To the best of our knowledge, this is the largest study yet published on this effect. We found no evidence for the delayed issue effect; i.e. the time to resolve issues in a later phase was not consistently more than when issues were resolved soon after their introduction. This result begs the question: how many other long-held beliefs in software engineering are unsupported by current data?

<img align=left src=”/img/Dr.M_ICSE2016.png”>

Aug10 '15: Laws of trusted data sharing:

A repeated, and somewhat pessimistic, conclusion is that the more we privitize data, the more we lose the signal in that data. That is, the safer the data (for sharing) the worse it ebcomes (for making conclusions)

Recent results have addressed this issue. Former RAISE-member (now working on her post-doc) Fayola Peters presented her novel privacy algorithms called LACE2. In recent work with Dr. Tim Menzies, presented at the International Confernce on Software Engineering, Dr Peters applies instance-based learning methods to learns how much (and how litte) we can mutate data without changing the conclusions we might learn from that data. Based on that work, she offers three laws of trusted data mining.

To explain our three laws, we must first introduce the concept of “corners” in a data set. Many researchers in machine learning offer the same conclusion: when learning models from data, it is not necessary to share all rows and columns within tables of data. It turns out that most of the signal in a data set tables can be represented as small “corners” of the data; i.e. just a few columns and just a few rows. While the exact numbers vary from data set to data set:

  • The usual instance selection result is that rows of data contain redundancies; i.e. repeated instances of a similar example. Hence, M1 rows can be approximated using M2=M1/5 (or fewer) rows by (for example), clustering the data then replacing each cluster with the median point within that cluster.
  • The usual feature selection result , is most of the signal in N1 cols comes from N2=sqrt(N1) columns or less (and the remaining data is either noisy or closely correlated to the data in the selected N2 columns).

We say that the “corner” of a data set are just the small number of rows and columns found via instance and row selection. Using the corners, we can state our first law of cost-effective trusted data sharing:

First Law: don’t share everything; just share the corners.

This law is interesting since, in the usual case, that this “corner” is very small compared to the original data set. For example, consider a table of data with 1000 rows and 100 columns. Note that this data has 1000*100 cells. If this data set compresses according to the instance/feature selection ratios listed above, then the “corners” of this data would hold just 200 rows and 10 columns; i.e. 2000 cells in all– which is 2% of the original data. That is, if we just shared the “corners” of this data, then most of the data is never revealed to an outside party (in this case, we’d share 2% and hide 98%).

Of course, if this “corner” contains the essence of the data, then it is important to apply a second law of cost-effective trusted data sharing:

Second Law: anonymize the data in the “corners”.

Our research has suggested an interesting way to implement this second law. Our experience with this approach1,2,3 is that as move from all the data into the “corners”, then this increases the distance between:

  • an example of some class A
  • and the nearest example of some class B.

Halfway between these examples is the class boundary where the classification might flip from A to B. Note that, to privatize data, we could mutate the example A anywhere up to that class boundary without changing what the conclusions of a nearest neighbor algorithm working this dataset. Accordingly, we offer the third law of cost-effective trusted data sharing:

Third Law: while anonymizing, never mutate examples across the class boundary.

Note that, using this third law, our privatization methods achieved better results that standard anonymization algorithms such as k-anonymity , at least for data taken from software engineering projects.

In their recent paper, Dr Peters and Dr. Menzies we simulated a consortium of 20+ data owners, each with a separate data set. A “pass the parcel” system was implemented in which each data owner incrementally added their data to a parcel of shared data- but only the parts of their data that was somehow outstandingly different from the data already in the parel. To define “different”, various instance-based reasoning operators were employed such that when we said some data was “different”, we based that comparison on the most informative attributes. In all, that shared parcel held just 5% of the data owned by all members of the consortium– yet when we build software quality predictors from this 5%, those predictors performed better than predictors built from all that data.

The most significant aspect of that work was that after applying our three laws of data sharing we built predictive models from a privatized version of the shared data and the shared privatized data generated better predictions than the raw data.

So, not only is privitization necessary, it can actually boost the value of the data.

Jun9 '15: LexisNexis to fund AI lab:

For more, see briefing notes

Feb26 '15: HPC Cluster Access:

We now have HPC accounts, which gives us access to 1000+ 8-core machines! For more see tutorial.

Oct28 '14: News of GALE:

DTLZ, d=20,o-2,4,6,8

DTLZ o=2,d=20,40,80,160

Oct24 '14: Crazed idea.... Keys West:

Been reading on MOEA/D (see also PADE. Its kind of a meta-learner. It builds islands then runs a standard learner on each island. E.g. MOEA/D-DE would run differential evolution on various islands.

There are some standard methods for making the islands but I was thinking, why not just use linear-time binary-split FastMap?

Then, build recommendations for jumping from current to better, as follows. For each current island I1..

  • Find “better” islands where “better” means that for at least one objective, the Cliff’s delta effect size (using the thresholds proposed top of p14 of here, pword=user=guest, o) says they are truly different and the medians are skewed in a “better” way.
  • For each island I2 build a contrast learning task as follows where class1= I1, class2= I2 and class3= every other island.
  • Discretize all numerics by minimizing entropy of class1,class2,class3.
  • Sort the ranges by BORE where best=class2 and rest=class1_ (for notes on BORE, see section 4.2 of this paper.
  • Let the value of the first i items of that sort be what percentage of class1,class2,class3 instances that have those i ranges contain class2 (the target class).
  • Return the smallest i ranges where i+1 has less value.

If this is data mining (where no new data can be generated) then stop. Call what you have “islands, first generation”. Else:

  • For each island with a contrast set, collect new instances by interpolating instances in that island, then applying the contrast set.
  • Repeat till new improvements are only epsilon better than last. This generates, “islands, generation last”.
  • Run the above current to better algorithm using a combination of the first and last generation algorithms.

Not some short cuts:

  • Instead of discretizing for each new pair of current,better, discretize ONCE across all the islands. Proably would work just fine.
  • Once the data is discretized, build a reverse index from the ranges back to the candidates they select for. Which would make testing the value stuff very fast.
  • When looking for better, be simpler.
  • Active learning: on the way down with fastmap, prune dull islands. Also, when testing if one island is better than another, only pick some items at random in each island (say, the small m examples nearest the fastmap poles of each island)

But why is it called Keys West? An algorithm that builds bridges between islands? That extends an older algorithm of mine called Keys2? Well, see if you can figure that out.