研究計畫

SOftware SECURITY LAbORATORY

DEPT. of MIS, National Chengchi University

Research Projects

[MOST 113-115] Automatic Testing on Neural Network Models

Automatic testing plays a critical role in realizing the ultimate goal of Trustworthy AI: artificial intelligence that is lawful and technically robust. To realize automatic testing on neural network models, we continue our previous study on Python concolic testing and propose quantization simplification and dynamic concolication to equip PyCT with ability to analyze complex network models such as Transformer. We also propose several new applications on PyCT including fairness checking, model poison attack synthesis, and constrain-based jailbreaking on LLMs.

This is the NSTC granted project:113-2221-E-004-010-MY2 (PI: Fang Yu, NT 1852,000, 2024/8-2026/7). [TCSE 2024]

[MOST 112-113] Software Verification and Security on AI Models

In the past decade, deep neural networks have reached incredible success in a wide spectrum of domains and attracted tremendous attention both in academia and industry. It is of the essence to develop safe AI techniques that assess if the model is stable and free of vulnerabilities. In this project, we propose a novel influence guided concolic testing framework that combines dynamic symbolic execution, decision logic extraction, and abstraction/refinement techniques to achieve scalable adversarial example generation and robustness checking on deep neural networks. Specifically, the proposed approach integrates python concolic testing engine, influence inference engine and network abstraction and refinement engine to provide a new mechanism to realize software verification and security on deep neural networks. The concolic testing engine is used to conduct concolic testing on a given input and a given network. The influence engine infers the critical pixels and concolic decision graph to guide the concolic testing process. The abstraction and refinement engines abstract and refine network structures and advance the concolic testing running under the CEGAR framework. We expect to develop the very first Python concolic tester that is able to analyze neural network models, explore both program and network computation branches, and generate adversarial examples and test cases that achieve high test adequacy. Facilitating dependability of AI applications that are ubiquitous may influence every aspect of society including commerce, entertainment, communication, healthcare and security. Hence any improvement in the dependability of these systems and applications will translate to improved dependability of information technology in the aspect of society.

This is the NSTC granted project:112-2221-E-004-002- (PI: Fang Yu, NT 690,000, 2023/8-2024/7). [TAAI 2023, BIOKDD@KDD 2024, AST@ICSE 2024, ICHMS 2024, JISE 2024]

[MOST 109-112] Python Program Characterization and Automated Exploit Generation with Static Symbolic Execution and Runtime Verification

Python language has been widely adopted to develop modern applications such as web applications, data analytics, machine learning, and robotics due to its readability and accessibility of rich open-source libraries. While it becomes one of the most popular language, a systematic approach to analyze behaviors of Python programs is of the essence for software security. An exploit of a python application can only be observed under executions on specific inputs. Malware can also contain hidden behavior which is only activated when properly triggered. It requires program properties hold under any usage scenarios for software and security applications. Static analysis on formally modeling programs with symbolic and abstraction models provides a sound approach to analyze all potential behaviors of programs, but it may raise false alarms and requires runtime analysis for validation. Dynamic analysis on running and analyzing real executions provides a complete approach to witness violations, but it requires effective input generation to trigger critical executions and cover sufficient program behaviors. A hybrid paradigm that combines static analysis and dynamic analysis provides an elegant solution. In this project, we propose a hybrid analysis framework that combines static symbolic execution and dynamic instrumentation techniques for charactering, triggering, and observing critical behaviors of Python programs. The idea is to statically reason programs to generate constraints and inputs to explore critical paths and dynamically trigger and verify runtime executions. For program reasoning, we investigate stack-based symbolic simulation with type reference on Python opcode to explore and construct constraints on potential execution paths, addressing the challenges on dynamic type inference. For input synthesis, we model complex built-in operations with modern string and numeric constraint solvers to synthesize inputs that can trigger the executions. For Python profiling, we develop a new dynamic instrumentation interpreter to dump layered call sequences with their parameters and return values, and propose both explicit and abstract representations for behavior characterization and runtime verification against security properties. We target our applications on 1) Python malware analysis with NCHC collections, 2) CVE Python vulnerability analysis for automated exploit generation, and 3) adversarial machine learning on Python implementation from the perspective of program executions.

This is the MOST granted project:109-2221-E-004-008-MY3 (PI: Fang Yu, NT 285,4000, 2020/8-2023/7). [APLAS2021, IJHCI 2022, DeepTest@ICSE 2023, AsiaCCS 2023]

[MOST 107-109] Static Analysis for IOT Software Verification and Information Security

The Internet of Things (IoT) that builds an increasingly connected world of devices and human beings brings tremendous new business and service opportunities. While life becomes more convenient and digitized, security stands the major concern in the evolution. The more data and information we share and store, the greater the risk of theft and manipulation. This risk rises as we allow networked computer systems to control safety-critical systems such as self-driving cars or life-guard control systems. The IoT ecosystem encompasses multiple layers ranging from physical layers of sensors, computation and communication, and devices to application layers where services are provided with collected information. We expect software to be main attack surfaces. Software security is hence of essential importance. This project focuses on software security, investigating static analysis techniques for discovering software defects in IoT applications. We target four attack surfaces of IoT eco systems: device applications, mobile applications, web applications and block-chain cloud applications, detecting side channel vulnerabilities, API call vulnerabilities, SQL/XSS vulnerabilities, and contract vulnerabilities. Attacks that compromise any of these vulnerabilities may corrupt system security and break data confidentiality and integrity in the IoT eco system. Side channel attacks by observing device applications reveal program secrets such as password without exhaustive input enumeration. API call attacks that take external inputs to make dynamic API calls as backdoors for users to alter application behaviors at run time via external inputs. SQL Injections and XSS attacks both have occupied in a decade the top three Web application vulnerabilities of the OWASP top 10 list, allowing the attackers to invoke the application with a malicious input that is part of an SQL command or scripts that the application executes. This permits the attacker to damage or get unauthorized access to data stored in a database or cookies stored by the browser. Contract vulnerabilities like stack overflows may result in incomplete executions of smart contracts that lead to unexpected loss or a financial breach of block chain applications.

This is the MOST granted project:107-2221-E-004 -002 -MY2 (PI: Fang Yu, NT 147,4000, 2018/8-2020/7). [ASE 2018, ESEC/FSE 2018, SBC@ASIACCS 2020, PACIS 2020]

[MOST 106-107] Binflow: Static Flow and String Analysis of iOS Mobile Applications

Millions of mobile apps have been published in Apple's AppStore with more than 15 billion downloads by iOS devices. In order to protect iOS users from malicious apps, Apple has strict policies which are used to eliminate apps before they can be published in the AppStore. We propose flow and string analysis techniques for iOS executables for statically checking security development policies, particularly for those that are related to dynamically loaded classes. In order to check that an app conforms to such a policy, it is necessary to determine the possible string values for the class name parameters of the functions that dynamically load classes. The first step of our approach is to construct the assembly for iOS executables using existing tools. We then extract flow information from the assembly code and construct control flow graphs (CFGs) of functions. We identify functions that dynamically load classes, and, for each parameter that corresponds to a dynamically loaded class, we construct a dependency graph that shows the set of values that flow to that parameter. Finally, we conduct string analysis on these dependency graphs to determine all potential string values that these parameters can take, which identifies the set of dynamically loaded classes. Taking the intersection of these values with patterns that characterize Apple's app security development policies, we are able to detect potential policy violations.

This is the MOST granted project:106-2221-E-004 -002 - (PI: Fang Yu, NT 578,000, 2017/8-2018/7). [poster, ICSE 2018]

[MOST 103-106] VISO: Securing Open Source Cloud Platforms via Virtualization Introspection and Artificial Intelligence

Cloud systems have rapidly replaced personal computers and become the dominant platform nowadays. Security threats could be one of the major stunning blocks on this evolution road. While system vendors and cloud tenants are benefit from sharing resources in the cloud environment, security breaches may cause worse damages of the cloud ecosystem than personal computers. We propose to investigate a virtualization introspection system on Openstack for securing open-source cloud platforms. By using artificial intelligence techniques and automata theory to analyze behaviors of virtual machines, we aim at a formal and systematic approach to enhance cloud security, achieving mandatory, transparency and precision on defense mechanisms.

This is the MOST granted project:103-2221-E-004 -006 -MY3 (PI: Fang Yu, NT 1881,000, 2014/8-2017/7). [HICSS 2014, IJCNN 2014, IJCNN 2015, CloudCom 2015, IEEE Cloud 2016, IEEE BigData 2017, IEEE DataCom2017, IEEE BigData 2018, IEEE DataCom 2018, IEEE CloudCom 2018]

[MOST 102-103] AppBeach: Revealing Behaviors of Mobile Applications via Static Binary Analysis

We investigate static binary analysis techniques on iOS executable with the aim of revealing suspicious behaviors that are embedded (but may not be shown to users) in mobile applications. The main idea is counting (or resolving) explicit (implicit) sensitive function calls in executable. We decrypt online apps and use IDA Pro as a disassembler to reverse engineering the binary, and as a script runner to extract embedded information in executable. Particularly, we use underlying system function calls to characterize application behaviors. We also build a pattern library of user-sensitive functions (such as access private information, connect to ftp cites, change external web page contents), and by matching these patterns against resolved system call functions, we are able to reveal behaviors of iOS apps, and further identify suspicious ones. We develop the tool AppBeach (App Behavior Checker) and reveal behaviors of hundreds of apps that are directly downloaded from the Apple app store.

This is the NSC granted project: 102-2221-E-004 -002 - (PI: Fang Yu, NT 560,000, 2013/8-2014/7). [MS 2013, IJCNN 2013, CCISA 2015, ICWS 2016]

This project is also partially funded by TWISC.

[MOST 99-102] Patcher: Vulnerability Detection and Patch Synthesis with Symbolic String Analysis

We investigate symbolic string verification techniques with the aim of developing a formal approach for automatic detection and removal of string related vulnerabilities in Web applications. The main topics include: sanitization synthesis, abstraction, and scalable service development.

This is the NSC granted project (PI: Fang Yu, NT1600,000 , 2010/11- 2013/7). [ISSTA 2016, CAV 2016, HICSS 2014, FMSD 2014, IJFCS 2011, ICSE 2011, SPIN 2011]

[NCCU 99-101] ICT-enabled Services: Service Quality and Service Innovation

The purpose of this project is to facilitate a high-standard service quality of cloud-based innovation services. One of our recent results addresses formal quantitative analysis on a distributed service framework. Its queuing models for the cloud-based streaming services lead to closed form expressions for service quality of system and the corresponding computer resource required. A simulation procedure is also proposed to catch the dynamics of the presented cloud-based streaming services, estimating operation characteristics such as lag time and interruptions that a customer may experience under different environment settings. Another direction is to develop innovative services and mobile applications. We are particularly interested in location based services and 3D immersion.

This work is partially funded by Service Innovation and E-commerce of NCCU Top University Project. (PI: Fang Yu, NT289,000 , 2013/1- 2013/12). [ICANN 2019, CC 2018, SCC 2016, IS 2014, SCC 2013, IS 2012]

[NTU 99-101] Logics and Verification of Concurrent Systems

We investigate testing, simulation and verification techniques for symbolic consistency checking of multi-threaded programs. We also investigate strategy logics on games. This is joint work with the verification lab@NTU EECS under the supervision of Prof. Farn Wang. Specifically, we provide a new symbolic simulator that combines constraint solving with guided simulation strategies to find witnesses of program inconsistency. Another research direction joint with Prof. Chao is the theoretical study on Petri nets that have been widely used to model flexible manufacturing systems. we present new algorithms for synthesizing effective control policies and estimating reachable states without conducting expensive reachability analysis on a sub class of Petri nets.

[CONCUR 2011, ACM SIGPLAN Notices (LCTES 2012), IETCTA 2013, CASE 2013, ISIE 2012]

Industrial Projects

[APF 113] AI and LLM Personal Credit Evaluation and Prediction

This industry project is funded by 亞太普惠金融.

(PI: Fang Yu, NT860,000, 2024/6-2025/5).

[Deloitte 109] Security Consultant on Health Data Transmission

This industry project is funded by Deloitte Taiwan.

(PI: Fang Yu, NT200,000, 2020/5-2020/11).

[MOE 109-110] Insurance and Investment Applications with Modifiable Smart Contracts Year II

This cross-discipline project is funded by Ministry of Education. Collaborated with Prof. Sun-Wen Hsiao and Prof. Jin-Lung Peng, we aim to develop next generation insurance and investment applications under the block chain and smart contract systems.

(PI: Fang Yu, NT600,000, 2020/2-2021/1).

[MOE 108-109] Insurance and Investment Applications with Modifiable Smart Contracts Year I

(PI: Fang Yu, NT560,000, 2019/3-2020/1).

[III 107] Static Stack Overflow Detection and Gas Estimation on Smart Contracts

This research project is funded by III. (PI: Fang Yu, NT600,000, 2018/3-2018/11).

[WSL 106] Adaptive Social Network: The practice of 9EMBA.com

This social network research project, collaborated with Prof. Yang and Ribo Tech, is funded by SayLing Wen Cultural and Education Foundation. (PI: Fang Yu, NT 2866,000, 2016/3-2017/2). [IEEE EDGE 2017]

[Wieson Tech 104-106] LightFarm: Intelligent Green IOT Application Development

The lightfarm project is collaborated and funded by Wieson Tech. (PI: Fang Yu, NT 1666,000, 2015/6-2017/6). [IEEE ICIOT 2017]

[TSMC 99-100] Text Mining and Cloud Computing

We investigate automatic web data crawling and knowledge extraction techniques under the private/public clouds. This is a joint TSMC granted project with Prof. Eugene Huang and Prof. Wenqing Liu, NCCU MIS. (PI: Eugene Huang, Co-PIs: Wenqing Liou and Fang Yu, 2010/10-2011/9).

Prior to NCCU

1. STRANGER
Supervisor: Prof. Tevfik Bultan
Coworker: Prof. Oscar H. Ibarra, Marco Cova, Muath Alkhalaf

String analysis is a static analysis technique that determines the string values that a variable can hold at specific points in a program. We develop an automata-based string analyzer, called Stranger (STRing AutomatoN GEneratoR), for real-world web applications. Stranger takes advantage on symbolic representation of automata, which allows the practical handling of automata on very large alphabets. Stranger also incorporates various novel approximation techniques and targets on proving the correctness of secured web applications. We later extend this approach to systems having both string and integer variables. By exchanging length information between string and arithmetic automata, we improve precision of both string analysis and size analysis. We incorporate backward analysis to generate vulnerability signatures to characterize malicious inputs. We also investigate the essence of string systems, and characterize the decidability of reachability analysis on various string systems. In one of our recent work, we use multi-track automata to generate effective patches for vulnerable applications. Stranger is implemented on top of Pixy and Mona projects. Stranger and benchmarks can be downloaded from here. [SPIN 2008, TACAS 2009, ASE 2009, TACAS 2010, CIAA 2010, IJFCS 2011]

2. VeriBPEL
Supervisor: Dr. Chao Wang

We propose a novel method for modular verification of BPEL service composition. We first use symbolic fixpoint computation to derive relations of the incoming and outgoing messages for well-defined web services based on reachability analysis. The relations are collected and serve as a repository of summarizations of individual web services. We then compose the summarization of external invocations, resulting in scalable verification of web service composition. We realize our ideas in VeriBPEL. The technical features include (1) a novel symbolic encoding for modeling the concurrency semantics of systems having both internal multi-threading and external message passing, and (2) a method for summarizing concurrently running processes, along with a modular verification framework that utilizes these summaries for scalable verification. [SIGSOFT/FSE2008]

3. Size Analysis on Object Constraint Language
Supervisor: Prof. Tevfik Bultan
Coworker: Erik Petterson

We aim to verify specifications of object oriented systems based on infinite state model checking techniques. We target on object constraint language and propose size abstraction and size analysis. We conducted a case study on the OCL specifications of Java Card APIs. The experiments indicate our abstraction is precise enough to verify/falsify target systems, while coarse enough to perform complex model checking techniques efficiently. [ESEC/FSE2007]

4. Characterizing Spiking Neural P Systems
Supervisor: Prof. Oscar H. Ibarra
Coworker: Sara Woodworth

Neurons are arguably one of the most interesting cell-types in the human body. A large number of neurons working in a cooperative manner are able to perform tasks that are not yet matched by the tools we can build with our current technology. Spiking Neuron P-systems (SNPs) incorporate ideas from spiking neurons into membrane computing. In this project, we give characterizations of sets definable by partially blind multicounter machines in terms of k-output SNPs operating in a sequential mode. [UC2006, NC2008]

5. SATRepair
Supervisor: Prof. Der-Tsai Lee and Prof. Sy-Yen Kuo
Coworker: Yaw-Wen Huang, Chung-Hung Tsai, and Hung-Yaw Lin

Fabricating large memory and processor arrays is subject to physical failures resulting in yield degradation. The strategy of incorporating spare rows and columns to obtain reasonable production yields was first proposed in the 1970s, and continues to serve as an important role in recent VLSI developments. Since the spare allocation problem (SAP) is NP-complete but requires solving during fabrication, an efficient exact spare allocation algorithm has great value. In this project, we proposed a novel Boolean encoding to reduce exact SAP to the satisfiablity problem, so that we can leverage the capability of modern SAT solvers. [DFT2005]

6. xBMC
Supervisor: Dr. Bow-Yaw Wang

We propose a new SAT-based model checking algorithm to solve the reachability problem of real-time systems. In our algorithm, the behavior of region automata is encoded as Boolean formulas, and hence any SAT solver can be used to explore the region graph efficiently. Although our SAT-based algorithm performs better than other algorithms in flaw detection, it is less effective inproving properties. To overcome the problem, we incorporate a complete inductive method in our algorithm to improve the performance when the property is satisfied. We implement both algorithms in a tool called xBMC and report experimental results. The experiments show that the combination of efficient encoding and inductive methods offers an effective and practical method for the analysis of timing behavior. [FORMATS/FTRTFT2004, ATVA2004, IJFCS2006]

7. WebSARRI
Supervisor: Prof. Der-Tsai Lee and Prof. Sy-Yen Kuo
Coworker: Yaw-Wen Huang, Chung-Hung Tsai, and Christine Hang

WebSSARI stands for Web application Security via Static Analysis and Runtime Inspection. Viewing Web application vulnerabilities as a secure information flow problem, we created a lattice-based static analysis algorithm derived from type systems and typestate. During the analysis, sections of code considered vulnerable are instrumented with runtime guards, thus securing Web applications in the absence of user intervention. Soundness, i.e no false negative, is achieved by applying formal methods including typestate checking and bounded model checking. I worked on designing and developing core checkers including: (1) A light-weight Type Checker, and (2) A heavy-weight Bounded Model Checker. [WWW2004][DSN2004]

Here is a related link. Parts of the techniques have been commercialized by Armorize Technologies, Inc..

8. RED 4.0
Supervisor: Prof. Farn Wang
Coworker: Geng-Dian Huang

I was involved in designing and developing RED 4.0 on top of RED, the full Timed-CTL symbolic model checker initiated by Prof. Farn Wang in 1996. RED 4.0 enhanced RED in the following directions: (1) Symbolic Simulation, (2) Numerical Coverage Estimation, (3) Greatest Fixpoint Computation and (4) Symmetric Analysis with Pointer Data Structure. [RTCSA2003][CIAA2003, FORTE2003, TSE2004, TSE2006]

9. TimeC
Supervisor: Prof. Farn Wang

TimeC is a C-like language that supports basic C statements, timed statements, e.g., duration and switch-event, and Open Verification Library(OVL) assertions. I developed an optimized translator that automatically translates programs written in TimeC into timed automata and TCTL formula in the input format of RED. Integrating the translator as the front end of RED, we can automatically verify real-time systems specified in TimeC. [RTCSA2003, JEC2004]