While AI tools promise life-changing enhancements in everything from our entertainment to our medicine, this black box of unknowns has raised legitimate concerns on a global scale. But, in every instance where current AI tools could steer us wrong, experts also agree there is an exponentially larger opportunity for them to help us. In October, President Biden released an executive order that established new standards for AI safety and security, but questions remain. How do we move this technology forward responsibly?
Penn Engineers are leveraging their collaborative strengths to tackle this challenge. In the summer of 2022, the ASSET (AI-enabled Systems: Safety, Explainability and Trustworthiness) Center was born. Under the direction of Rajeev Alur, Zisman Family Professor in Computer and Information Science, it has already become home to over 40 faculty and 50 students from diverse backgrounds and departments.
“We are working on the technological challenges of making AI models explainable, safe, fair, ethical and unbiased,” says Alur. “The solutions to these challenges lie in our expertise across core machine learning, statistics, natural language processing, and formal methods for building reliable software systems. We need to collaborate with one another and industry professionals to help us apply our research to the real-world challenges in current AI systems.”
To facilitate this vision, the Center held its first annual symposium on trustworthy AI technology for health care applications on October 6, 2023. The event was attended by over 100 participants from research in artificial intelligence, biomedical informatics and machine learning, along with clinical practitioners, sparking new research questions that would otherwise have gone unasked.
“In my Ph.D., I worked on creating trustworthy AI systems but only for theoretical benchmarks,” says Eric Wong, Assistant Professor in Computer and Information Science. “Now, I work on real-world problems with doctors and cosmologists to understand why AI makes certain surgical recommendations in the operating room and how AI can teach us more about the dark matter in our universe, topics I would never investigate in the silo of my own research.”
Wong’s recent work focuses on improving the safety of current AI chatbots, such as ChatGPT, one of the most famous chatbots that rapidly gained 100 million users only two months after its release.
“My lab is looking at the ways in which people are able to jailbreak the safety features of ChatGPT,” says Wong. “People have found ways to manipulate their input requests to get the chatbot to tell them how to build an explosive or provide other information just as harmful.”
To address these hackable safety features, Wong and colleagues created an algorithm that, when added to ChatGPT, decreases certain types of jailbreaking instances from 80% to less than 1%.
“Essentially, we created a defense mechanism that can wrap around any language model to insert tiny character-level typos into the input request to render these jailbreaks ineffective.” says Wong. “I am excited to dive into the next steps of this work and I also realize the gravity of the consequences if we don’t do it right.”
Another issue that weakens certain AI systems’ safety, explainability and trustworthiness is their lack of certainty. ChatGPT can “hallucinate,” meaning it makes mistakes or provides false information, when asked questions it doesn’t know the answer to. This limitation can facilitate the spread of false information and contribute to poor decision making.
“Our recent work shows how transformers, the models powering Large Language Models, or LLMs, learn to reason via ‘shortcuts’ instead of the inductive style of reasoning that humans do,” says Surbhi Goel, Magerman Term Assistant Professor in Computer and Information Science. “Though these shortcuts are computationally faster to evaluate and work well during training, they leave room for hallucinations in real-world applications. With students at Penn, I am currently working on improving the robustness of the LLM’s internal reasoning process by modifying the underlying architecture and training process to decrease the frequency of these hallucinations.”
“Knowing how sure a chatbot is about the answer it is providing would help us decide how to use the information,” says Osbert Bastani, Assistant Professor in Computer and Information Science. “We are creating a component that can be added to the chatbot’s API to provide this information as a calibrated probability as well as list all of the plausible answers if there are more than one.”
In addition to making certainty in chatbot-based decisions transparent, Bastani is working on closing the gap in health care inequity by training AI models on diverse datasets.
“Biased outputs are just as dangerous as those that are uncertain,” says Bastani. “I’m currently working on training models on data from patients of African ancestry, an underserved demographic with higher rates of glaucoma. This will help us understand where we can and cannot generalize when using AI tools to serve specific groups of people.”
Alongside faculty, students at the ASSET Center are also addressing real-world problems in their own research from machine learning to robotics.
“I am working on programming a robot in a manner similar to how I would teach another person, through goal specifications and visual maneuvers, without it misinterpreting my intentions,” says Christopher Watson, a doctoral student in Computer and Information Science. “To program the robot successfully, I need to bring the human into the training loop to allow the robot to ask for extra supervision when needed. The only robot I’d want to bring into the home and workplace to assist humans is one powered by safe, explainable and trustworthy AI systems.”
Students are also benefiting from the collaborative aspect of the center.
“The ASSET center provides a great environment to bring people from diverse backgrounds together,” says Ziyang Li, a doctoral student in Computer and Information Science. “I enjoy having opportunities to apply our AI technology to areas like health, robotics and education, guided by experts in these respective domains. These collaborations not only ground my research on programming and deep-learning in real-life cases, but also broaden my vision for improving the safety of future AI systems.”
To further students’ success, the center is offering a new industry affiliates mentorship program for graduate and undergraduate students to learn from industry professionals. ASSET will also be moving into Amy Gutmann Hall, Penn Engineering’s new data science building, in the summer of 2024.
“We are excited to be growing quickly in different disciplines and in new physical spaces,” says Alur. “And we need to do so as we step into a revolution, not just in AI and machine learning, but for the entire field of computer science. We didn’t expect this AI boom, but now that it is here, we are recognizing its current limitations and are working as quickly as we can to keep these tools safe. Not having all of the solutions just yet makes it an important time to be in this research.”