Mayur Naik: Building Better Software

By Lida Tunesi

In many industries and academic fields, more people are finding it helpful to know some basic programming, but not all of them want to get full degrees in computer science. Meanwhile, as modern software ecosystems grow more and more complex, even those who do spend their undergraduate years studying computer science cannot learn everything. As software becomes more integrated into our daily lives, these gaps have growing implications.

“A few decades ago software malfunction typically meant you just needed to reboot your computer,” says Mayur Naik, associate professor in the Department of Computer and Information Science. “Today, the impact is much more dramatic. For example, fatalities with self-driving cars, or cyber criminals that can take advantage of software defects to steal private information.”

Naik’s work aims to stop these sorts of problems before they take shape.

“The overarching goal of my research is to help programmers and developers build higher-quality software,” Naik says, “and I’m most interested in using machine learning and artificial intelligence to do so.”

Through his research, Naik hopes to both help people better their own software skills, and create artificial-intelligence-based tools than can improve their software for them.

Better often means safer. “Cybersecurity has become one of the main ways to improve quality, not just performance or energy efficiency,” Naik says.

In some cases, defects and potential points of attack can arise when software is overly complex. If a programmer doesn’t understand the entire codebase, he or she is bound to miss some weak spots. One of Naik’s projects, called ASPIRE, hopes to address this issue by creating a special type of compiler — the software that transforms code from one programming language into another — to help simplify programs.

“Traditionally, compiler transformations have targeted performance: How can I make a piece of software run faster?” Naik says. “But we want to explore transformations for reducing complexity of software, to make it harder for cyber attackers to exploit weaknesses.”

Compilers were one of Naik’s original interests in computer science.

“I’ve always been interested in bridging the gap between humans and computers, and compilers are an excellent example of that,” says Naik. “Programmers like to communicate what they want to do in very high-level languages, and on the other hand machines are used to dealing with the 1’s and 0’s of binary or assembly languages. If anything this gap has grown wider and wider over the years, introducing new challenges.”

Naik is also bridging this gap via artificial intelligence tools. The ASPIRE compiler uses ideas from AI, as does another of Naik’s research projects — an AI assistant for software developers.

“We envision the system as analyzing code as the programmer writes it, in real time,” says Naik. “For example, it could tell them if they are about to create a well-known kind of security vulnerability, then point out how others ran into the same problem and fixed it in the past.”

The appeal of this type of artificial intelligence, for Naik, is not to take over a programmer’s job, but to add value to what the programmer is already doing.

“It is well accepted that given the way we train programmers today, using the same textbooks and same principles, they will go on to make the same mistakes. So it’s very tantalizing to have some technology that will prevent you from making these same mistakes that programmers have been making for generations,” says Naik.

Naik is also finding ways to use artificial intelligence to educate the next generation of programmers.

“I’m involved in Penn Engineering’s online education initiatives,” Naik says, “and a lot of the technology that I use in my research can not only help professional developers but also students who are trying to learn how to program. If I were to teach the courses I teach in small classrooms at Penn to an online class of hundreds or thousands of students, I would need intelligent tutoring systems, intelligent auto-graders, even intelligent exam grading strategies. If a student is not quite getting the solution to a question right, how do you start communicating with them? How can you create an algorithm that gives the best feedback to this student?”

Naik sees digital classrooms not just as a chance to teach more students, but to give the students a better learning experience. “One problem this could address is that there are often not enough human resources in a classroom. Also, you can imagine an algorithm having a very holistic view both of a particular student, after seeing how the student performs over time, and in terms of seeing how well many students perform on a particular problem. I am very excited about online education.”

With the goal in mind of helping professionals, students, and the general public better their software, Naik’s group has also created a tutorial website called RightingCode.org. The site, which is open for anyone to use, teaches software analysis skills such as debugging, testing, and assessing security. “Right now the website is a series of lectures and labs,” says Naik, “and we are hoping to further evolve it and integrate it with Penn’s online initiatives as well.”

Before coming to Penn, a large part of Naik’s work was aimed at setting the stage for the research he does now. As an assistant professor at Georgia Tech, he worked on ways to create a body of software with labeled security vulnerabilities, which is now the data for his new AI tools to ‘learn’ from.

One of the main draws to Penn was the chance to interact with other branches of academia. “I view computer science not only as the epicenter of engineering today, but other disciplines as well,” says Naik. “Software directly impacts all parts of society, and at Penn I saw an opportunity for my research to have a direct effect on these other fields. The University offers a holistic environment, with its schools of medicine, business, and law.”

For instance, software is becoming more and more integrated into medicine, creating new worries for patients.

“There is a sense that medicine will be the next frontier for cyberattacks,” says Naik. “As an example, attackers can take control of devices embedded in humans, such as pacemakers.” These are the sorts of concerns Naik addresses as part of Penn’s PRECISE center, which works closely with groups in the Perelman School of Medicine. “The work I’ve been doing can be directly applied to improve the quality of these types of cyber-physical, embedded software,” he says.

“Also,” he adds, “the programming languages research group here is globally recognized. I was excited to be a part of that.” Programming languages was Naik’s original field of research, and he has now combined that with machine learning. “There are very few groups out there doing what I do, because one has to understand both of these fields. My group faces the formidable task of having an impact on both communities,” says Naik. “In the short term, we want to use machine learning techniques to change the way programmers build and debug their software, but in the longer term we also want to have an impact on the machine learning community; to develop a foundation that integrates these two fields.”