課程

SOftware SECURITY LAbORATORY

DEPT. of MIS, National Chengchi University

 

Data Structures (Fall 2023)


Instructor: 郁方 (Yu, Fang) Office:261113 (Commerce Building, 11F) Ext: 81113

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 1:10-4:00pm (English Session) 

Lecture location: Commerce Building 313/311 

TAs: 蔣其叡 Ray Chiang 111356024@nccu.edu.tw 陳卉縈 Mia Chen 112356043@nccu.edu.tw

Lab times: Monday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom


Course Topics

In this advanced programming course, we will cover fundamental algorithms and data structures with the aim of offering students a solid technical training on java programming. Students come to understand and use data structures effectively by studying their concepts and related java programs. Students learn from lectures for ideas and method descriptions and labs for hands-on application development. Students also get chance to learn how to develop Java applications using eclipse and java class library.

At the end of this course, students should understand common data structures and algorithms, and be able to apply that understanding to investigating new data abstractions, as well as using existing library components to develop middle-size applications. Students should also become a better programmer and feel comfortable programming in Java.

Both basic and advance data structures and their related algorithms will be discussed during the semester.  The (tentative) topics include:

  1. A brief review of java programing, object oriented design, and analysis of algorithms

  2. Basic data structures: queues, stacks, linked lists, sequences, vectors, trees, heaps, and priority queues

  3. Advanced data structures: dictionaries, hash tables, maps, skip lists, search/balance/splay trees, directed/weighted graphs

  4. Fundamental algorithms: divide and conquer, merge/quick/bucket/radix sort, set partition, dynamic programing, greedy method, breadth-first search, depth-first search

  5. Advanced topics: topological sorting, pattern matching, text compression, task scheduling, transitive closures, strongly connected components, shortest paths, minimum spanning trees

Course Requirements:

  1. Lectures (30%):
    - One Midterm Exam (close book) in the end of November

         - A Makeup Exam (optional) at the end of the semester

  1. Labs (40%): [lab github]
    - Weekly Programming Assignments

         - Program Prescreen in the early of November

  1. Term Project (30%): [project github] [team list]

         - Project demo in the early of January

         - Including project proposal (20%), demo (40%) and implementation (40%).

         - 3-5 students as a team. The project details will be announced in the early of Oct.


  1. Please note that all slides/exams will be written in English despite Chinese or English sessions. Students are strongly encouraged to write their proposal and present their project in English.


Text book

  1. Data Structures and Algorithms in Java 6th edition, by Michael T. Goodrich and Roberto Tamassia, John Wiley & Sons, Inc.

  2. Official Website: Wiley

  3. PDF Handouts [zip]

  4. 代理商:新月圖書公司/東華書局, 台北市重慶南路一段143號三樓TEL: 02-23317856


Announcements

  1. [Article] JAVA is one of the most popular programming languages. [Tiobe] [Career]

  2. [News] The first lab is scheduled on Sep.  18 @ the MIS PC Classroom

  3. [Article] Web Search Data can be be a key tool!

  4. [News] Eclipse has been installed in the PC and Mac Classrooms

  5. [Lab Exam] Programming PreScreen is scheduled on Nov. 2 @ the MIS PC Classroom

  6. [Proposal] Project proposal (due on Nov.  16) should include the following sections:

    1. Introduction /Your topic and motivation

    2. Search tricks /Your score formulation

    3. System design /Class diagrams [proposal sample]

    4. Schedule /How and when to accomplish stages

    5. Challenges /Techniques that you need but may have a hard time to learn on your own


Lectures/Schedules (Subject to change)

September- Get ready to programming!

  1. 9/14: Opening: A brief overview of Java and eclipse [Lec0.pdf] [Lec1.pdf] [Lab 1]
    - Text Book (TB) Chapter 1

  2. 9/21: Introduction: Object-oriented design and abstract data type [Lec2.pdf]
    - TB Chapter 2

  3. 9/28: Text/Pattern matching and Class project announcements [Lec3.pdf]
    - TB Chapter 12
    - Project: Intelligent Searching-LetsBeatGoogle!


October – Introduce basic data structures and their implementations

  1. 10/5: Linked List [Lec4.pdf]
    - TB Chapter 3 and Chapter 7

  2. 10/12: Queues and Stacks [Lec5.pdf]
    - TB Chapter 6

  3. 10/19: Trees [Lec6.pdf]
    -TB Chapter 8

  4. 10/26: Binary Trees and Heaps [Lec7.pdf]
    -TB Chapter 9


November – Introduce fundamental algorithms and their analyses

  1. 11/2: Prescreen on Java Programming @ MIS PC Classroom

  2. 11/9: BigO, Divide and Conquer [Lec8.pdf]

         -TB Chapter 4, 5

  1. 11/16: Merge/Quick Sort (Recurrence Equations) [Lec9.pdf] (Proposal Due)

         -TB Chapter 5, 13

  1. 11/23: Dynamic Programming [Lec10.pdf]

         -TB Chapter 12


December – Step on advanced data structures

  1. 11/30:  Midterm Exam (9:10-12:00am / 1:10-4:00pm)
    -Lecture 1-10 TB Chapter 1-9, 12-13

  2. 12/7: Midterm review

  3. 12/14: Binary Search Trees [Lec11.pdf]

         -TB Chapter 10

  1. 12/21: Invited Talk: AI from Berkeley

         -Maps and Hash tables [Lec12.pdf] and Dictionaries and Skip Lists  [Lec13.pdf] (optional)

         -TB Chapter 11

  1. 12/28: Graphics I [Lec14.pdf] and Graphics II [Lec15.pdf] (optional)

          -TB Chapter 14


January – Talk and Demo

  1. 1/4: Project Demo (schedule) @313/311 College of Commerce

  2. 1/8: Makeup Exam (if needed)

  3. 1/11: Project Code  and Final Report Upload


Links that might be useful

  1. Introduction to programming in Java [open course by MIT]

  2. Java Applet Tutorial [oracle]

  3. Eclipse download [official website]

  4. An eclipse tutorial [eclipse]


================================================================

Data Structures (Fall 2022)


Instructor: 郁方 (Yu, Fang) Office:261113 (Commerce Building, 11F) Ext: 81113

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 1:10-4:00pm (English Session) 

Lecture location: Commerce Building 311 

TAs: 林苡晴 110356019@nccu.edu.tw  蔣其叡 111356024@nccu.edu.tw

Lab times: Monday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom



================================================================ Advanced Information System Development (Spring 2022)


Instructor: 郁方 (Yu, Fang) Office: 261113 (College of Commerce, 11F) Ext: 81113

Co-instructor: 蔡瑞煌教授

Contact: yuf@nccu.edu.tw

Meeting: Wednesday 1:10-4:00pm@260304, College of Commerce Building


Course Objective

The course objective is to develop in-depth understanding on deep learning, aka convolutional neural networks.  We will go through the lectures and videos of CS231n course offered by Prof. FeiFei Li at Stanford University. Students are required to watch the lecture videos and go through the slides for online discussion.

Syllabus

CS231n@Stanford [2021/19 slides] [2017 lectures]

******

Background knowledge:

Lecture 1: Introduction [slides][fei_slides][video]

Lecture 2: Image Classification [slides][video]

Lecture 3: Loss Functions and Optimization [slides][video]

Lecture 4: Neural Networks and Back Propagation [slides][video][quiz1]

******

Build Your Neural Networks:

Lecture 5: Convolutional Neural Networks [slides][video] [quiz2] [4/20: Attack Team 1]

Lecture 6: Training Neural Networks I [slides][video] [quiz3]

Lecture 7: Training Neural Networks II [slides][video] [assignment2@cs231n]

Lecture 8: Deep Learning Software [slides][video]

******

Neural Network Models:

Lecture 9: CNN Architectures [slides][video] [quiz4]

Lecture 10: Recurrent Neural Networks [slides][video] [4/27: Attack Team 2]

Lecture 11: Generative Models [slides][video] [5/18: Defense Team]

Lecture 12: Detection and Segmentation[slides][video] [assignment3@cs231n] [5/25: Fang]

Lecture 13: Visualizing and Understanding [slides][video]

Lecture 14: Deep Reinforcement Learning [slides][video] [6/2: Fang]

Guest Lecture: Adversarial Machine Learning [slides][video] [3/30: Fang]

******

More advanced topics (2020/2021):

Human Centered AI [slides]

Video learning [slides]

3D learning [slides]

AI Fairness Accountability [slides]

Neural Radiance Fields [slides]

Scene Graphs [slides]

Self-supervised Learning [slides]


******

Attention and Transformers [slides] [5/11]

-Attention is All You Need [Original Transformers Paper]

-Attention? Attention [Blog by Lilian Weng]

-The Illustrated Transformer [Blog by Jay Alammar]

-ViT: Transformers for Image Recognition [Paper] [Blog] [Video]

-DETR: End-to-End Object Detection with Transformers [Paper] [Blog] [Video]


******

About Neural Network Security Project: [6/8, 15: Project Demo]

-A Survey of Safety and Trustworthiness of Deep Neural Networks: Verification, Testing, Adversarial Attack and Defense, and Interpretability [paper]

-AI Failures [ieee spectrum] [4/13: Fang]

-The great AI reckoning [ieee spectrum]


Adversarial Tools:

-Torchattack: a PyTorch library that provides adversarial attacks to generate adversarial examples[github] [4/13: Kuo]


Adversarial and defense papers: The Attack Team:

-AdvDoor: adversarial backdoor attack of deep learning system [issta21][github] [4/13: Kuo]

-Neural Cleanse: Identifying and Mitigating Backdoor. Attacks in Neural Networks[sp19][slides]

-BadNets: Evaluating Backdooring Attacks on Deep Neural Networks [paper]

-RNN-Test: Towards Adversarial Testing for Recurrent Neural Network Systems [paper] [5/4: Huang]

-testRNN: Coverage Guided Testing for Recurrent Neural Networks  [tod20][github]

-BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. [mlsec17][github]


Program Analysis Papers: The Defense Team:

-Concolic Testing for Deep Neural Networks [paper]

-DeepConcolic: testing and debugging deep neural networks [icse19][github]



******

The AI Trend: [Google Research: Themes from 2021 and Beyond]

· Trend 1: More Capable, General-Purpose ML Models 

· Trend 2: Continued Efficiency Improvements for ML 

· Trend 3: ML Is Becoming More Personally and Communally Beneficial

· Trend 4: Growing Benefits of ML in Science, Health and Sustainability 

· Trend 5: Deeper and Broader Understanding of ML


******

Grading Policy

  1. -Participation (50%) : Attendance, Presentation, Discussion and Assignments

  2. -System Demo (50%): Completeness, Functionality, Scalability

  3. -Late HW policy, -1% per day after the due date


================================================================

Data Structures (Fall 2021)


Instructor: 郁方 (Yu, Fang) Office:261113 (Commerce Building, 11F) Ext: 81113

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 1:10-4:00pm (English Session) 

Lecture location: Commerce Building 313 

TAs: 陳宜莉 109356049@nccu.edu.tw  林苡晴 110356019@nccu.edu.tw

Lab times: Monday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom



================================================================ Advanced Information System Development (Spring 2021)


Instructor: 郁方 (Yu, Fang) Office: 261113 (College of Commerce, 11F) Ext: 81113

Contact: yuf@nccu.edu.tw

Meeting: Monday 6:10-9:00pm@260205, College of Commerce Building


================================================================

Data Structures (Fall 2020)


Instructor: 郁方 (Yu, Fang) Office:261113 (Commerce Building, 11F) Ext: 81113

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 1:10-4:00pm (English Session) 

Lecture location: Commerce Building 313 

TAs:  詹語昕 107356036@nccu.edu.tw, 陳宜莉 109356049@nccu.edu.tw

Lab times: Monday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom



================================================================

Advanced Information System Development (Spring 2020)

Instructor: 郁方 (Yu, Fang) Office: 261113 (College of Commerce, 11F) Ext: 81113

Contact: yuf@nccu.edu.tw

Meeting: Monday 6:10-9:00pm@260311, College of Commerce Building


================================================================


Data Structures (Fall 2019)

Instructor: 郁方 (Yu, Fang) Office:261113 (Commerce Building, 11F) Ext: 81113

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 1:10-4:00pm (English Session) 

Lecture location: Commerce Building 313 

TAs:  詹之愷 107356036@nccu.edu.tw, 陳怡君 108356016@nccu.edu.tw

Lab times: Monday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom


================================================================

Advanced Information System Development (Spring 2019)

Instructor: 郁方 (Yu, Fang) Office: 261113 (College of Commerce, 11F) Ext: 81113

Contact: yuf@nccu.edu.tw


Meeting: Monday 6:10-9:00pm@260311, College of Commerce Building


Course Objective

Gartner believes by 2022, 40% of new application development will involve AI co-developers. This trend is a good callout. Gartner identifies AI services, platforms, frameworks and infrastructure as enablers of applications across domains. We believe that AI will augment autonomous vehicles, analytics and application development, among scores of other activities and platforms will emerge to accelerate development and deployment. [Top Tech Trends in 2019]

The course objectives are in-depth paper and case study on modern artificial intelligence techniques and applications.  The first part of this course covers the selected blogs from openAI research [OpenAI research]. These blogs summarize most recent AI projects and advances. Students are formed in a team to take the lead of the discussion of the the project achievements and potential extensions. The second part of this course is to conduct rigorous study on research of generative models. The tremendous amount of information is out there and to a large extent easily accessible. The tricky part is to develop models and algorithms that can analyze and understand this treasure trove of data. Generative models are one of the most promising approaches towards this goal.  Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. One network generates candidates and the other evaluates them. Generative models are one of the most promising approaches towards the goal to endorse computers to understand the world. Finally, to address cyber security issues, the last part of this course covers string analysis techniques  to enhance software reliability and develop static program analysis ability.  

Students are expected to 1) read research papers and investigate openAI projects, and 2) develop generative model application for prediction and decision making.

  1. II.1 Book Reference (Team A: 宋昆原, 王少昕, 陳子軒, 彭旻浩)

Tevfik Bultan, Fang Yu, Muath Alkhalaf,  and Abdulbaki Aydin. “String analysis for Software Verification and Security.“ Publisher: Springer International Publishing. eBook ISBN 978-3-319-68670-7. DOI 10.1007/978-3-319-68670-7. Hardcover ISBN 978-3-319-68668-4. [Springer] [Amazon]


  1. II.2 OpenAI Reference (Team B: 陳高欽, 詹宗霖, 林郁豪)

OpenAI Research and Systems


II.3 GAN Paper References (Team C: 詹之愷, 許甄珉, 郭金尼, 李博逸)


[1] J Xie*, R Gao*, Z Zheng, SC Zhu, and YN Wu (2019) Learning dynamic generator model by alternating back-propagation through time. AAAI-19: 33rd AAAI Conference on Artificial Intelligence. pdf project page

Abstract: This paper studies the dynamic generator model for spatialtemporal processes such as dynamic textures and action sequences in video data. In this model, each time frame of the video sequence is generated by a generator model, which is a non-linear transformation of a latent state vector, where the non-linear transformation is parametrized by a top-down neural network. The sequence of latent state vectors follows a non-linear auto-regressive model, where the state vector of the next frame is a non-linear transformation of the state vector of the current frame as well as an independent noise vector that provides randomness in the transition. The non-linear transformation of this transition model can be parametrized by a feedforward neural network. We show that this model can be learned by an alternating back-propagation through time algorithm that iteratively samples the noise vectors and updates the parameters in the transition model and the generator model. We show that our training method can learn realistic models for dynamic textures and action patterns.


[2] J Xie, Y Lu, R Gao, SC Zhu, and YN Wu (2019) Cooperative learning of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),  pdf slides project page

Abstract: This paper studies the cooperative training of two generative models for image modeling and synthesis. Both models are parametrized by convolutional neural networks (ConvNets). The first model is a deep energy-based model, whose energy function is defined by a bottom-up ConvNet, which maps the observed image to the energy. We call it the descriptor network. The second model is a generator network, which is a non-linear version of factor analysis. It is defined by a top-down ConvNet, which maps the latent factors to the observed image. The maximum likelihood learning algorithms of both models involve MCMC sampling such as Langevin dynamics. We observe that the two learning algorithms can be seamlessly interwoven into a cooperative learning algorithm that can train both models simultaneously. Specifically, within each iteration of the cooperative learning algorithm, the generator model generates initial synthesized examples to initialize a finite-step MCMC that samples and trains the energy-based descriptor model. After that, the generator model learns from how the MCMC changes its synthesized examples. That is, the descriptor model teaches the generator model by MCMC, so that the generator model accumulates the MCMC transitions and reproduces them by direct ancestral sampling. We call this scheme MCMC teaching. We show that the cooperative algorithm can learn highly realistic generative models


[3] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference net- works during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.


[4] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.


[5] Dai, J., Lu, Y., & Wu, Y. N. (2014). Generative modeling of convolutional neural networks. arXiv preprint arXiv:1412.6296.

Abstract: The convolutional neural networks (CNNs) have proven to be a powerful tool for discriminative learning. Recently researchers have also started to show interest in the generative aspects of CNNs in order to gain a deeper understanding of what they have learned and how to further improve them. This paper investigates generative modeling of CNNs. The main contributions include: (1) We construct a generative model for the CNN in the form of exponential tilting of a reference distribution. (2) We propose a generative gradient for pre-training CNNs by a non-parametric importance sampling scheme, which is fundamentally different from the commonly used discriminative gradient, and yet has the same computational architecture and cost as the latter. (3) We propose a generative visualization method for the CNNs by sampling from an explicit parametric image distribution. The proposed visualization method can directly draw synthetic samples for any given node in a trained CNN by the Hamiltonian Monte Carlo (HMC) algorithm, without resorting to any extra hold-out images. Experiments on the challenging ImageNet benchmark show that the proposed generative gradient pre-training consistently helps improve the performances of CNNs, and the proposed generative visualization method generates meaningful and varied samples of synthetic images from a large-scale deep CNN.


[6] YN Wu, R Gao, T Han, and SC Zhu (2019) A tale of three probabilistic families: discriminative, descriptive and generative models. Quarterly of Applied Mathematics. pdf.

Abstract: The pattern theory of Grenander is a mathematical framework where patterns are represented by probability models on random variables of algebraic structures. In this paper, we review three families of probability models, namely, the discriminative models, the descriptive models, and the generative models. A discriminative model is in the form of a classifier. It specifies the conditional probability of the class label given the input signal. A descriptive model specifies the probability distribution of the signal, based on an energy function defined on the signal. A generative model assumes that the signal is generated by some latent variables via a transformation. We shall review these models within a common framework and explore their connections. We shall also review the recent developments that take advantage of the high approximation capacities of deep neural networks.


[7] Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. (2016, June). A theory of generative convnet. In International Conference on Machine Learning (pp. 2635-2644).

Abstract: We show that a generative random field model, which we call generative ConvNet, can be derived from the commonly used discriminative ConvNet, by assuming a ConvNet for multi-category classification and assuming one of the category is a base category generated by a reference distribution. If we further assume that the non-linearity in the ConvNet is Rectified Linear Unit (ReLU) and the reference distribution is Gaussian white noise, then we obtain a generative ConvNet model that is unique among energy-based models: The model is piecewise Gaussian, and the means of the Gaussian pieces are defined by an auto-encoder, where the filters in the bottom-up encoding become the basis functions in the top-down decoding, and the binary activation variables detected by the filters in the bottom-up convolution process become the coefficients of the basis functions in the top-down deconvolution process. The Langevin dynamics for sampling the generative ConvNet is driven by the reconstruction error of this auto-encoder. The contrastive divergence learning of the generative ConvNet reconstructs the training images by the auto-encoder. The maximum likelihood learning algorithm can synthesize realistic natural image patterns.


[8] Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Cooperative training of descriptor and generator networks. arXiv preprint arXiv:1609.09408.

Abstract: This paper studies the cooperative training of two probabilistic models of signals such as images. Both models are parametrized by convolutional neural networks (ConvNets). The first network is a descriptor network, which is an exponential family model or an energy-based model, whose feature statistics or energy function are defined by a bottom-up ConvNet, which maps the observed signal to the feature statistics. The second network is a generator network, which is a non-linear version of factor analysis. It is defined by a top-down ConvNet, which maps the latent factors to the observed signal. The maximum likelihood training algorithms of both the descriptor net and the generator net are in the form of alternating back-propagation, and both algorithms involve Langevin sampling. We observe that the two training algorithms can cooperate with each other by jumpstarting each other's Langevin sampling, and they can be naturally and seamlessly interwoven into a CoopNets algorithm that can train both nets simultaneously.


[9] Han, T., Lu, Y., Zhu, S. C., & Wu, Y. N. (2017). Alternating Back-Propagation for Generator Network. In AAAI (Vol. 3, p. 13).

Abstract: This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a nonlinear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.



[10] Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In Advances in Neural Information Processing Systems(pp. 4565-4573).

Abstract: Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.


[11] Luc, P., Couprie, C., Chintala, S., & Verbeek, J. (2016). Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408.

Adversarial training has been shown to produce state of the art results for generative image modeling. In this paper we propose an adversarial training approach to train semantic segmentation models. We train a convolutional semantic segmentation network along with an adversarial network that discriminates segmentation maps coming either from the ground truth or from the segmentation network. The motivation for our approach is that it can detect and correct higher-order inconsistencies between ground truth segmentation maps and the ones produced by the segmentation net. Our experiments show that our adversarial training approach leads to improved accuracy on the Stanford Background and PASCAL VOC 2012 datasets.


[12] Liu, M. Y., & Tuzel, O. (2016). Coupled generative adversarial networks. In Advances in neural information processing systems (pp. 469-477).

Abstract: We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images. In contrast to the existing approaches, which require tuples of corresponding images in different domains in the training set, CoGAN can learn a joint distribution without any tuple of corresponding images. It can learn a joint distribution with just samples drawn from the marginal distributions. This is achieved by enforcing a weight-sharing constraint that limits the network capacity and favors a joint distribution solution over a product of marginal distributions one. We apply CoGAN to several joint distribution learning tasks, including learning a joint distribution of color and depth images, and learning a joint distribution of face images with different attributes. For each task it successfully learns the joint distribution without any tuple of corresponding images. We also demonstrate its applications to domain adaptation and image transformation.


[13] Yu, L., Zhang, W., Wang, J., Yu, Y.. (2017) SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In AAAI 2017. https://arxiv.org/abs/1609.05473

Abstract: As a new way of training generative models, Generative Adversarial Net (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.


[14] Springenberg, J. T. (2015). Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390.

Abstract: In this paper we present a method for learning a discriminative classifier from unlabeled or partially labeled data. Our approach is based on an objective function that trades-off mutual information between observed examples and their predicted categorical class distribution, against robustness of the classifier to an adversarial generative model. The resulting algorithm can either be interpreted as a natural generalization of the generative adversarial networks (GAN) framework or as an extension of the regularized information maximization (RIM) framework to robust classification against an optimal adversary. We empirically evaluate our method – which we dub categorical generative adversarial networks (or CatGAN) – on synthetic data as well as on challenging image classification tasks, demonstrating the robustness of the learned classifiers. We further qualitatively assess the fidelity of samples generated by the adversarial generator that is learned alongside the discriminative classifier, and identify links between the CatGAN objective and discriminative clustering algorithms (such as RIM).


[15] Dai, Z., Yang, Z., Yang, F., Cohen, W. W., & Salakhutdinov, R. R. (2017). Good semi-supervised learning that requires a bad gan. In Advances in Neural Information Processing Systems (pp. 6513-6523).

Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets .


[16] Zhu, J. Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017). Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems (pp. 465-476).

Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a distribution of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.


[17] Odena, A., Olah, C., & Shlens, J. (2016). Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585.

In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. We construct a variant of GANs employing label conditioning that results in 128 × 128 resolution image samples exhibiting global coherence. We expand on previous work for image quality assessment to provide two new analyses for assessing the discriminability and diversity of samples from class-conditional image synthesis models. These analyses demonstrate that high resolution samples provide class information not present in low resolution samples. Across 1000 ImageNet classes, 128 × 128 samples are more than twice as discriminable as artificially resized 32 × 32 samples. In addition, 84.7% of the classes have samples exhibiting diversity comparable to real ImageNet data.


[18] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017.

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G:X→Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F:Y→X and introduce a cycle consistency loss to push F(G(X))≈X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.


Paper Study:

-Presenter: Prepare slides to present details of the paper

-Summary: Write a 1-2 page to conclude 3Ws+Pros/Cons + Arguments to support your claims

-Q&A: Prepare 3-5 questions and potential answers to lead the discussion after presentation


Schedule (Subject to Change)

  1. -2/25 OpenAI introduction and Paper bidding (Fang). Tensorflow: Tensorflow [get started] and GAN implementation (DCGAN)

  2. -3/4 Book introductoin: Web Application Security (Fang) and Paper and project selection (Team B and Team C)

  3. -3/11 Better Language Models and Their Implications [better-language-models] and How AI Training Scales [science-of-ai]

  4. -3/18 Learning Dexterity [learning-dexterity]

  5. -3/25  Improving Language Understanding with Unsupervised Learning [language-unsupervised]

-4/1 Retro Contest [retro-contest]

  1. -4/8 Block-Sparse GPU Kernels [block-sparse-gpu-kernels]

  2. -4/15 Competitive Self-Play [competitive-self-play]

- 4/22 Robots that Learn [robots-that-learn]

  1. -4/29 Unsupervised Sentiment Neuron [unsupervised-sentiment-neuron]

  2. -5/6- 6/10 Neural Networks (by Prof. Ray-Huan Tsaih)

- 6/17 System Demo. (Team A, B, C)

  1. -6/24 Final Report Due. (Team A, B, C)


Grading Policy

  1. -Participation (30%) and HWs (40%): Attendance, Presentation, Discussion and Paper Study

  2. -System Demo (30%): Completeness, Functionality, Scalability

  3. -Late HW policy, -1% per day after the due date



================================================================

Data Structures (Fall 2018)

Instructor: 郁方 (Yu, Fang) Office:261113 (Commerce Building, 11F) Ext: 81113

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 1:10-4:00pm (English Session) 

Lecture location: Commerce Building 313 

TAs:  詹之愷 107356010@nccu.edu.tw, 彭旻浩107356036@nccu.edu.tw

Lab times: Monday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom


================================================================

Seminar on Machine Learning Techniques: Dueling Neural Networks (Spring 2018)

Instructor: 郁方 (Yu, Fang) Office: 261113 (College of Commerce, 11F) Ext: 81113

Contact: yuf@nccu.edu.tw


Meeting: Wednesday 9:10-12:00am@ 260312, College of Commerce Building


Course Objective:

To develop algorithms and techniques that endow computers with an understanding of our world”

The course objectives are the in-depth paper study and discussions on modern machine learning techniques. Particularly, we will focus on dueling neural networks (10 Breakthrough technologies in 2018 by MIT Tech Review) and their applications.  Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. One network generates candidates and the other evaluates them. Generative models are one of the most promising approaches towards the goal to endorse computers to understand the world.

Students are expected to 1) read two to three research papers per week and lead several paper discussions in the semester, and 2) investigate one OpenAI GAN project and their implementation. Students are also expected to get familiar with tensor flows.

  1. I.OpenAI on Generative models

The tremendous amount of information is out there and to a large extent easily accessible — either in the physical world of atoms or the digital world of bits. The only tricky part is to develop models and algorithms that can analyze and understand this treasure trove of data. Generative models are one of the most promising approaches towards this goal. [continue reading]

Some OpenAI projects:

  1. 1.Generative Adversarial Networks (GANs) and Improving GANs (code).

  2. 2.Variational Autoencoders (VAEs) and Improving VAEs (code).

  3. 3.InfoGAN (code).

  4. 4.Deep Reinforcement Learning via Bayesian Neural Networks (code).

  5. 5.Generative Adversarial Imitation Learning (code)


II. Paper Study


[1] Li, W., Gauci, M., & Groß, R. (2013, July). A coevolutionary approach to learn animal behavior through controlled interaction. In Proceedings of the 15th annual conference on Genetic and evolutionary computation (pp. 223-230). ACM.

Abstract: This paper proposes a method that allows a machine to infer the behavior of an animal in a fully automatic way. In principle, the machine does not need any prior information about the behavior. It is able to modify the environmental conditions and observe the animal; therefore it can learn about the animal through controlled interaction. Using a competitive coevolutionary approach, the machine concurrently evolves animats, that is, models to approximate the animal, as well as classifiers to discriminate between animal and animat. We present a proof-of-concept study conducted in computer simulation that shows the feasibility of the approach. Moreover, we show that the machine learns significantly better through interaction with the animal than through passive observation. We discuss the merits and limitations of the approach and outline potential future directions.


[2] Li, W., Gauci, M., & Groß, R. (2016). Turing learning: a metric-free approach to inferring behavior and its application to swarms. Swarm Intelligence, 10(3), 211-243.

Abstract: We propose Turing Learning, a novel system identification method for inferring the behavior of natural or artificial systems. Turing Learning simultaneously optimizes two populations of computer programs, one representing models of the behavior of the system under investigation, and the other representing classifiers. By observing the behavior of the system as well as the behaviors produced by the models, two sets of data samples are obtained. The classifiers are rewarded for discriminating between these two sets, that is, for correctly categorizing data samples as either genuine or counterfeit. Conversely, the models are rewarded for 'tricking' the classifiers into categorizing their data samples as genuine. Unlike other methods for system identification, Turing Learning does not require predefined metrics to quantify the difference between the system and its models. We present two case studies with swarms of simulated robots and prove that the underlying behaviors cannot be inferred by a metric-based system identification method. By contrast, Turing Learning infers the behaviors with high accuracy. It also produces a useful by-product - the classifiers - that can be used to detect abnormal behavior in the swarm. Moreover, we show that Turing Learning also successfully infers the behavior of physical robot swarms. The results show that collective behaviors can be directly inferred from motion trajectories of individuals in the swarm, which may have significant implications for the study of animal collectives. Furthermore, Turing Learning could prove useful whenever a behavior is not easily characterizable using metrics, making it suitable for a wide range of applications.


[3] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference net- works during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.


[4] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.


[5] Dai, J., Lu, Y., & Wu, Y. N. (2014). Generative modeling of convolutional neural networks. arXiv preprint arXiv:1412.6296.

Abstract: The convolutional neural networks (CNNs) have proven to be a powerful tool for discriminative learning. Recently researchers have also started to show interest in the generative aspects of CNNs in order to gain a deeper understanding of what they have learned and how to further improve them. This paper investigates generative modeling of CNNs. The main contributions include: (1) We construct a generative model for the CNN in the form of exponential tilting of a reference distribution. (2) We propose a generative gradient for pre-training CNNs by a non-parametric importance sampling scheme, which is fundamentally different from the commonly used discriminative gradient, and yet has the same computational architecture and cost as the latter. (3) We propose a generative visualization method for the CNNs by sampling from an explicit parametric image distribution. The proposed visualization method can directly draw synthetic samples for any given node in a trained CNN by the Hamiltonian Monte Carlo (HMC) algorithm, without resorting to any extra hold-out images. Experiments on the challenging ImageNet benchmark show that the proposed generative gradient pre-training consistently helps improve the performances of CNNs, and the proposed generative visualization method generates meaningful and varied samples of synthetic images from a large-scale deep CNN.


[6] Xie, J., Zhu, S. C., & Wu, Y. N. (2017, May). Synthesizing dynamic patterns by spatial-temporal generative convnet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7093-7101).

Abstract: Video sequences contain rich dynamic patterns, such as dynamic texture patterns that exhibit stationarity in the temporal domain, and action patterns that are non-stationary in either spatial or temporal domain. We show that a spatialtemporal generative ConvNet can be used to model and synthesize dynamic patterns. The model defines a probability distribution on the video sequence, and the log probability is defined by a spatial-temporal ConvNet that consists of multiple layers of spatial-temporal filters to capture spatialtemporal patterns of different scales. The model can be learned from the training video sequences by an “analysis by synthesis” learning algorithm that iterates the following two steps. Step 1 synthesizes video sequences from the currently learned model. Step 2 then updates the model parameters based on the difference between the synthesized video sequences and the observed training sequences. We show that the learning algorithm can synthesize realistic dynamic patterns.


[7] Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. (2016, June). A theory of generative convnet. In International Conference on Machine Learning (pp. 2635-2644).

Abstract: We show that a generative random field model, which we call generative ConvNet, can be derived from the commonly used discriminative ConvNet, by assuming a ConvNet for multi-category classification and assuming one of the category is a base category generated by a reference distribution. If we further assume that the non-linearity in the ConvNet is Rectified Linear Unit (ReLU) and the reference distribution is Gaussian white noise, then we obtain a generative ConvNet model that is unique among energy-based models: The model is piecewise Gaussian, and the means of the Gaussian pieces are defined by an auto-encoder, where the filters in the bottom-up encoding become the basis functions in the top-down decoding, and the binary activation variables detected by the filters in the bottom-up convolution process become the coefficients of the basis functions in the top-down deconvolution process. The Langevin dynamics for sampling the generative ConvNet is driven by the reconstruction error of this auto-encoder. The contrastive divergence learning of the generative ConvNet reconstructs the training images by the auto-encoder. The maximum likelihood learning algorithm can synthesize realistic natural image patterns.


[8] Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Cooperative training of descriptor and generator networks. arXiv preprint arXiv:1609.09408.

Abstract: This paper studies the cooperative training of two probabilistic models of signals such as images. Both models are parametrized by convolutional neural networks (ConvNets). The first network is a descriptor network, which is an exponential family model or an energy-based model, whose feature statistics or energy function are defined by a bottom-up ConvNet, which maps the observed signal to the feature statistics. The second network is a generator network, which is a non-linear version of factor analysis. It is defined by a top-down ConvNet, which maps the latent factors to the observed signal. The maximum likelihood training algorithms of both the descriptor net and the generator net are in the form of alternating back-propagation, and both algorithms involve Langevin sampling. We observe that the two training algorithms can cooperate with each other by jumpstarting each other's Langevin sampling, and they can be naturally and seamlessly interwoven into a CoopNets algorithm that can train both nets simultaneously.


[9] Han, T., Lu, Y., Zhu, S. C., & Wu, Y. N. (2017). Alternating Back-Propagation for Generator Network. In AAAI (Vol. 3, p. 13).

Abstract: This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a nonlinear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.


[10] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in Neural Information Processing Systems (pp. 2234-2242).

Abstract: We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.


[11] Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems (pp. 4743-4751).

Abstract: The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis.


[12] Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2172-2180).

Abstract: This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound of the mutual information objective that can be optimized efficiently. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing supervised methods.


[13] Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In Advances in Neural Information Processing Systems(pp. 4565-4573).

Abstract: Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.


[14] Schawinski, K., Zhang, C., Zhang, H., Fowler, L., & Santhanam, G. K. (2017). Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Monthly Notices of the Royal Astronomical Society: Letters, 467(1), L110-L114.

Abstract: Observations of astrophysical objects such as galaxies are limited by various sources of random and systematic noise from the sky background, the optical system of the telescope and the detector used to record the data. Conventional deconvolution techniques are limited in their ability to recover features in imaging data by the Shannon–Nyquist sampling theorem. Here, we train a generative adversarial network (GAN) on a sample of 4550 images of nearby galaxies at 0.01 < z < 0.02 from the Sloan Digital Sky Survey and conduct 10× cross-validation to evaluate the results. We present a method using a GAN trained on galaxy images that can recover features from artificially degraded images with worse seeing and higher noise than the original with a performance that far exceeds simple deconvolution. The ability to better recover detailed features such as galaxy morphology from low signal to noise and low angular resolution imaging data significantly increases our ability to study existing data sets of astrophysical objects as well as future observations with observatories such as the Large Synoptic Sky Telescope (LSST) and the Hubble and James Webb space telescopes.


[15] Luc, P., Couprie, C., Chintala, S., & Verbeek, J. (2016). Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408.

Adversarial training has been shown to produce state of the art results for generative image modeling. In this paper we propose an adversarial training approach to train semantic segmentation models. We train a convolutional semantic segmentation network along with an adversarial network that discriminates segmentation maps coming either from the ground truth or from the segmentation network. The motivation for our approach is that it can detect and correct higher-order inconsistencies between ground truth segmentation maps and the ones produced by the segmentation net. Our experiments show that our adversarial training approach leads to improved accuracy on the Stanford Background and PASCAL VOC 2012 datasets.


[16] Liu, M. Y., & Tuzel, O. (2016). Coupled generative adversarial networks. In Advances in neural information processing systems (pp. 469-477).

Abstract: We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images. In contrast to the existing approaches, which require tuples of corresponding images in different domains in the training set, CoGAN can learn a joint distribution without any tuple of corresponding images. It can learn a joint distribution with just samples drawn from the marginal distributions. This is achieved by enforcing a weight-sharing constraint that limits the network capacity and favors a joint distribution solution over a product of marginal distributions one. We apply CoGAN to several joint distribution learning tasks, including learning a joint distribution of color and depth images, and learning a joint distribution of face images with different attributes. For each task it successfully learns the joint distribution without any tuple of corresponding images. We also demonstrate its applications to domain adaptation and image transformation.


[17] Denton, E. L., Chintala, S., & Fergus, R. (2015). Deep generative image models using a  laplacian pyramid of adversarial networks. In Advances in neural information processing systems (pp. 1486-1494).

Abstract: In this paper we introduce a generative parametric model capable of producing high quality samples of natural images. Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion. At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach [11]. Samples drawn from our model are of significantly higher quality than alternate approaches. In a quantitative assessment by human evaluators, our CIFAR10 samples were mistaken for real images around 40% of the time, compared to 10% for samples drawn from a GAN baseline model. We also show samples from models trained on the higher resolution images of the LSUN scene dataset.


[18] Yu, L., Zhang, W., Wang, J., Yu, Y.. (2017) SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In AAAI 2017. https://arxiv.org/abs/1609.05473

Abstract: As a new way of training generative models, Generative Adversarial Net (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.


[19] Springenberg, J. T. (2015). Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390.

Abstract: In this paper we present a method for learning a discriminative classifier from unlabeled or partially labeled data. Our approach is based on an objective function that trades-off mutual information between observed examples and their predicted categorical class distribution, against robustness of the classifier to an adversarial generative model. The resulting algorithm can either be interpreted as a natural generalization of the generative adversarial networks (GAN) framework or as an extension of the regularized information maximization (RIM) framework to robust classification against an optimal adversary. We empirically evaluate our method – which we dub categorical generative adversarial networks (or CatGAN) – on synthetic data as well as on challenging image classification tasks, demonstrating the robustness of the learned classifiers. We further qualitatively assess the fidelity of samples generated by the adversarial generator that is learned alongside the discriminative classifier, and identify links between the CatGAN objective and discriminative clustering algorithms (such as RIM).

.

[20] Dai, Z., Yang, Z., Yang, F., Cohen, W. W., & Salakhutdinov, R. R. (2017). Good semi-supervised learning that requires a bad gan. In Advances in Neural Information Processing Systems (pp. 6513-6523).

Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets .


[21] Arora, S., Ge, R., Liang, Y., Ma, T., & Zhang, Y. (2017). Generalization and equilibrium in generative adversarial nets (gans). arXiv preprint arXiv:1703.00573.

We show that training of generative adversarial network (GAN) may not have good generalization properties; e.g., training may appear successful but the trained distribution may be far from target distribution in standard metrics. However, generalization does occur for a weaker metric called neural net distance. It is also shown that an approximate pure equilibrium exists1 in the discriminator/generator game for a special class of generators with natural training objectives when generator capacity and training set sizes are moderate. This existence of equilibrium inspires mix+gan protocol, which can be combined with any existing GAN training, and empirically shown to improve some of them.


[22] Zhu, J. Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017). Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems (pp. 465-476).

Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a distribution of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.


[23] Odena, A., Olah, C., & Shlens, J. (2016). Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585.

In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. We construct a variant of GANs employing label conditioning that results in 128 × 128 resolution image samples exhibiting global coherence. We expand on previous work for image quality assessment to provide two new analyses for assessing the discriminability and diversity of samples from class-conditional image synthesis models. These analyses demonstrate that high resolution samples provide class information not present in low resolution samples. Across 1000 ImageNet classes, 128 × 128 samples are more than twice as discriminable as artificially resized 32 × 32 samples. In addition, 84.7% of the classes have samples exhibiting diversity comparable to real ImageNet data.


[24] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017. arXiv:1703.10593 [cs.CV]

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G:X→Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F:Y→X and introduce a cycle consistency loss to push F(G(X))≈X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.


Paper Study:

-Presenter: Prepare slides to present details of the paper

-Summary: Write a 1-2 page to conclude 3Ws+Pros/Cons + Arguments to support your claims

-Q&A: Prepare 3-5 questions and potential answers to lead the discussion after presentation


Schedule (Subject to Change)

  1. -2/28 Peace Memorial Day

  2. -3/7 Dueling Neural Networks:  GAN introduction and Paper biding (Fang)

  3. -3/14 Tensorflow: Tensorflow [get started] and GAN implementation (DCGAN)

    HW1: 3W of 6 your own selected paper

  1. -3/21 GAN:

    HW2: 3W+ Pros/cons

    [3] Generative adversarial nets. (何, 蕭, 存) (Presentation, Summary, Q&A)

  [20] Good semi-supervised learning that requires a bad gan. (余, 佳, 仲) (Presentation, Summary, Q&A)

  1. -3/28 Convolutional Neural Networks:

    HW3: 3W+ Pros/cons

    [1] A coevolutionary approach to learn animal behavior through controlled interaction. (曾, 心, 恩) (Presentation, Summary, Q&A)

    [5] Generative modeling of convolutional neural networks. (郭, 田, 佳) (Presentation, Summary, Q&A)

  1. -4/4 Spring Break

  2. -4/11 Turing Learning:

    HW4: 3W+ Pros/cons

  [4] Unsupervised representation learning with deep convolutional generative adversarial networks. (莊, 郭, 王) (

    [2] Turing learning: a metric-free approach to inferring behavior and its application to swarms (翁, 恩, 韋)

  1. -4/18 Generative ConvNet:

    HW5: 3W+ Pros/cons

    [7] A theory of generative convnet. (田, 王, 何)

    [21] Generalization and equilibrium in generative adversarial nets (gans) (存, 青, 莊)

- 4/25 Cooperative Training: 

    HW6: 3W+ Pros/cons

    [16] Coupled generative adversarial networks. (礫, 曾, 青)

    [8] Cooperative training of descriptor and generator networks. (王, 何, 蕭)

  1. -5/2 Advanced Learning:

    HW7: 3W+ Pros/cons

   [24] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (蕭, 莊, 郭)

    [18] SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. (韋, 翁, 田)

  1. -5/9 OpenAI Gan:

    HW8: 3W+ Pros/cons

    [10] Improved techniques for training gans. (佳, 礫, 高)

    [17] Deep generative image models using a  laplacian pyramid of adversarial networks. (心, 存, 翁)

  1. -5/16 OpenAI Infogan:

    HW9: 3W+ Pros/cons

  [12] Infogan: Interpretable representation learning by information maximizing generative adversarial nets. (仲, 高, 曾)

    [22] Toward multimodal image-to-image translation. (青, 韋, 余)

  1. -5/23 OpenAI Imitation:

    HW10: 3W+ Pros/cons

    [13] Generative adversarial imitation learning. (高, 余, 礫)

    [15] Semantic segmentation using adversarial networks. (恩, 仲, 心)

  1. -5/30 System Proposal

  2. -6/6 System Implementation I (Data Collection)

  3. -6/13 System Implementation II (TensorFlow)

  4. -6/20 System Demo   

  5. 1.Team A: 佳, 莊, 高, 曾, 余

  6. 2.Team B: 登, 存, 田, 青, 何

  7. 3.Team C: 仲, 韋, 礫, 郭

  8. 4.Team D: 蕭, 翁, 王, 恩, 心

  9. -6/27 Final Report Due.


Grading Policy

  1. -Participation (30%) and HWs (40%): Attendance, Presentation, Discussion and Paper Study

  2. -System Demo (30%): Completeness, Functionality, Scalability

  3. -Late HW policy, -1% per day after the due date


================================================================ Information System Development: Smart contract and program analysis (Spring 2018)

Instructor: 郁方 (Yu, Fang) Office: 261113 (College of Commerce, 11F) Ext: 81113

Contact: yuf@nccu.edu.tw


Meeting: Wednesday 1:10-4:00pm@ 260312, College of Commerce Building


Course Objective

Blockchain has drawn a lot of attentions in the past years and their applications are increasingly spread in our life. In short, a blockchain is a globally shared, transactional database. This means that everyone can read entries in the database just by participating in the network. To change states in the database, you have to create transactions that are accepted by all others, and while a transaction is applied to the database, no other transactions can alter it. Smart contracts are programs that execute on blockchains. To ensure their correctness will be of the first priority. In this six week seminar, we will go through 1) smart contract development with solidity and 2) program analysis and verification techniques. Students are expected to develop some smart contract applications and apply program analysis techniques to verify their properties.

I. Smart contract development with Solidity

  1. -Introduction to smart contracts: A contract in the sense of Solidity is a collection of code (its functions) and data (its state) that resides at a specific address on the Ethereum blockchain.

  2. -Before you start to write Solidity: Solidity is a high-level language whose syntax is similar to that of JavaScript and it is designed to compile to code for the Ethereum Virtual Machine. Here is the Installation. Here are some Examples.

-Write your own contracts with Solidity: Solidity in Depth. Solidity supports import statements that are very similar to those available in JavaScript. Contracts in Solidity are similar to classes in object-oriented languages. Each contract can contain declarations of State Variables, Functions, Function Modifiers, Events, Structs Types and Enum Types. Furthermore, contracts can inherit from other contracts. Solidity is a statically typed language, which means that the type of each variable (state and local) needs to be specified at compile-time. Solidity provides several elementary types which can be combined to complex types.


II. Program analysis and verification on smart contracts

-Book: String analysis for software verification and security

[Springer][Bookmetrix]

-Papers:

1. Making smart contract smarter. CCS 2016 [paper]

  1. 2.Online Detection of Effectively Callback Free Objects

with Applications to Smart Contracts. POPL 2018 [paper]

  1. 3.Towards Verifying Ethereum Smart Contract Bytecode

in Isabelle/HOL. POPL/CPP 2018. [paper]


Schedule (Subject to Change)

  1. -3/7 Introduction  (Fang) and Smart Contract Introduction (Steve)

  2. -3/14 Smart Contract [Solidity]

  3. -3/21 Symbolic Execution and Static Analysis [Book]

- 3/28 Smart Contract Verification [Papers]


Grading Policy

- Participation/HWs (30%+40%): Attendance, Presentation, Discussion and Paper Study

  1. -System Implementation (30%): Completeness, Functionality, Scalability

  2. -Late HW policy, -1% per day after the due date


================================================================

Data Structures (Fall 2017)

Instructor: 郁方 (Yu, Fang) Office:261113 (Commerce Building, 11F) Ext: 81113

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 1:10-4:00pm (English Session) 

Lecture location: Commerce Building 207/311 (Chinese/English Session)

TAs: Leo Fang and Tina Tien

Lab times: Monday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom


Announcements

  1. [Makeup Exam] We will have the makeup exam on Jan 11 3:00-5:00pm.

  2. [Programing Test] We will have programming test on Nov. 6 in the lab.

  3. [Article] School children to be taught how to write software. [read more]

  4. [News] The first lab is scheduled on Monday Sep. 18th.

  5. [Article] Web Search Data can be be a key tool!

  6. [Alarm] Find a team (3-5 students) and send the list (name and contact) to the TA before Oct. 1.

  7. [News] Eclipse has been installed in the PC and Mac Classrooms!


================================================================

Advanced Information System Development: Programming with Scala (Spring 2017)

Instructor: 郁方 (Yu, Fang) Office: 261113 (College of Commerce, 11F) Ext: 81113

Contact: yuf@nccu.edu.tw


Meeting: Monday 6:10-9:00pm@ 260311, College of Commerce Building


Course Objective

Scala takes powerful features from object-oriented and functional languages, and combines them with a few novel ideas in a beautifully coherent whole.” by Neal Gafter

In this nine-week tutorial course, we will go through the Scala online courses together, learning and practicing programming with the modern functional, object-oriented, and inherited-concurrency programming language.


The first thing that you probably would like to do is watching ”Working Hard to Keep it Simple” by Martin Odersky [Youtube].


Now you probably cannot wait to try Scala. Here is the instruction to set up Scala by Martin Ordersky. [ToolSetUp]


The other online resources can be found here.

- Video Lectures of “Functional Programming Principles in Scala” by Martin Odersky [Coursera]

- Scala School

  1. -A tutorial of Scala [ScalaDoc]

  2. -Scala Library API [ScalaApi]

  3. -Scala by Example [ScalaByExample by M. Odersky]

  4. -Scala Cookbook [ScalaCookbook by A. Alexander]


Reference Textbook

Programming in Scala, Martin Odersky, Lex Spoon, and Bill Venners, Artima press 2012. [pdf]



Schedule (Subject to Change)

  1. -2/20 Introduction and Setup (Fang)

  2. -3/6 Functions and Evaluations (Steve, Peter) [slides by M. Ordersky]

  3. -3/13 Higher Order Functions (MongKang, YuanTing)[slides by M. Ordersky]

- 3/20 Data and Abstraction (YenLing, RayFeng)[slides by M. Ordersky]

  1. -3/27 Types and Pattern Matching (AiJue, ZhiShiang)[slides by M. Ordersky]

  2. -4/10 Lists (PaoHsiuang, Li)[slides by M. Ordersky]

  3. -4/17 Structural Induction (YiFang) [slides by M. Ordersky]

  4. -4/24 Parallel Computation: Actor and Spark with Scala (ZhiHuei, ChiaRun)


Grading Policy


- Participation (50%): Attendance, Presentation, Discussion and Paper Study

  1. -HWs/System Implementation (50%): Completeness, Functionality, Scalability

  2. -Late HW policy, -1% per day after the due date



================================================================

Data Structures (Fall 2015)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 2:10-5:00pm (English Session) 

Lecture location: Commerce Building 207/311 (Chinese/English Session)

TAs: 陳晉杰 and 林君翰 john.lin0420 [at] gmail.com

Lab times: Friday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom


Announcements

  1. [Demo] Project demo has been scheduled on Jan. 14. Please bring your final report and be ready at least 30 minuted earlier.

  2. Makeup exam has been scheduled on Jan. 14, 3:00-5:00pm at College of Commerce 311.

  3. Program Testing is scheduled on Nov. 6, MIS 5F. You are required to write codes from scratch on your own.


================================================================

Mobile Computation (Fall 2015)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf [at] nccu.edu.tw

Co-Instructor: 蔡瑞煌 (Tsaih, Rua-Huan) Office: 1036 (Commerce Building 10F) Ext: 81036

Contact: tsaih [at] mis.nccu.edu.tw

IBM consultants: Rebecca Chen, Pei-Yi Lin, Andy Wu and Lisa Chen


Meeting: Friday 9:10-12:00am @ the MIS Mac Classroom


Course Objective

This is a project-oriented course for senior students to experience how to realize an idea in practice with modern PaaS cloud services. The objective is to provide students a solid technical training on how to develop novel applications with mobile and cloud computation. Particularly, we are interested in (but not limited to) applications of Internet of things (IOT), wearable devices and crowdsourcing on mobile devices such as distributing mobile bandwidth and computation power. The course is featured with nine hands-on labs offered by senior IBM consultants with Bluemix, an advanced PaaS commercial service developed by IBM.


Students who participate this course will have chance to learn the most advanced techniques on mobile computation, as well as hands-on experiences to develop innovative applications with free access to the IBM Bluemix cloud platform.


Introduction


T1: Crowd sourcing with mobil devices

To be the most powerful computation provider without machines

- Crowd computing and human computation algorithms [a talk from MIT]

  1. -Mobile edge computation [a white paper from ETSI]

  2. -IOT and distributed mesh computation [a cloud summit talk form HP, and a related  blog]


T2: An IOT application with green boxes

To enjoy a green life with an in-house fun farm

-The power of green: the farm of the future [NG], plant factory [status]

  1. -Getting started with Arduino [intro]

- Building Arduino [github]


Teams:

Team 1 [ITI Lab]: 吳佳真, 林雋鈜, 陳昱銘, 連茂棋, 林子翔

Team 2 [不知所雲]: 許文凱, 李孟庭, 徐政哲, 余彥儒

Team 3 [綠.合栽]: 許峻基, 林均泰, 田韻杰, 王韋仁, 林君翰

Team 4 [T-Jieba]: 吳恆毅, 黃兆椿, 劉其峰, 鍾郁婕, 顏照銓, 張詠言  

Team 5 [Gnafuy]: 陳晉杰, 沈玟萱, 楊碩雕, 邱垂暉,  李佳倫, 褚宣凱

Team 6 [OnBoard]: 胡懷之, 厲菀之, 李明緯, 黃存宇, 林韋廷


Schedule (Subject to Change)

  1. -Sep. 18: Introduction on IOT and Crowd sourcing application development (NCCU)

  2. -Sep. 25: Topic discussion: IOT and distributed mobile computation and Team 6min pitch (NCCU)

  3. -Topics: 拼圖旅行[1], 群眾斷詞[4], 老人行蹤[2], 分群運算[5], 機不可失[6], 智能植栽[3]

- Oct. 2: Bluemix intro (IBM) [about bluemix by Rebecaa]

- Oct. 16: Bluemix IOT case study with simulator (IBM)

- Oct  23: Proposal and skill discussion (NCCU)

- Oct. 30: Mobile Boilerplate Starter: Lab 1 on crowd sourcing (IBM)

- Nov. 6: Mobile App Development: Lab 2 on crowd sourcing (IBM)

- Nov. 13: Arduino: Lab 3 on IOT (IBM)

- Nov. 20: IOT Arduino application development practice with fun farm and crowd sourcing: distributed computation (NCCU)

- Nov. 27: Middle project prototype demo (IBM)

- Dec. 4: Lab 4 on Geospatial analytics service (IBM)

- Dec. 11: Lab 5 on IOT: MQTT on mobile phone (IBM)

- Dec. 18: IOT communication (NCCU)

- Dec. 25: Crowd sourcing: app and mobile execution (NCCU)

- Jan. 8: System demo and discussion (IBM)

  1. -Jan. 15: Final system due (NCCU)

  2. -Jan. 25: Final report due


Grading Policy

  1. -Participation (30%) Attendance, Presentation, Discussion

- Project/System Implementation (70%): Creativity, Completeness, Functionality


================================================================

Cloud Computation (Spring 2015)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw


Meeting: Tuesday 9:10-12:00am @ the MIS Mac Classroom


Course Objective

Design patterns are reusable solutions to problems with domain independency. In this course, we will discuss several design patterns of map reduce programs that are commonly used in distributed cloud computation for data-related problem solving in a large scale. Learning these design patterns provides a new way of thinking with MapReduce and also facilitates a foundation for high level tools such as Pig and Hive.


This course will cover several design pattern categories including:

-Summarization

-Filtering

-Data Organization

-Joins

-Metapatterns

-Input and output


For each design pattern, we will discuss its intention, structure, implementation and applicability. Students will learn fundamental knowledge of distributed computation and gain intensive programming experiences by developing hadoop applications with these patterns.


Reference Textbook

T1: MapReduce Design Patterns

Building Effective Algorithms and Analytics for Hadoop and Other Systems [ebook]

By Donald Miner, Adam Shook

November 2012


T2: HBase Design Patterns

Design and implement successful patterns to develop scalable applications with HBase [ebook]

By Mark Kerzner, Sujee Maniyam

December 2014


Team

A: Steve, William, Roy, and David

B: JinJie, YuChieh, TingHung, ChiaYi

C: Wayne, JheWei, Hank, YaYun

D: JuiFeng, Henry, BoChun, ShenChien, John


Schedule (Subject to Change)

  1. -2/24 Introduction to software design patterns

  2. -3/3 MapReduce and Hadoop Refresh and Installation.

  3. -(1) Hadoop Installation [William’s note][William’s video] and Cloudera Installation [note] [William’s step by step video]

  4. -(2) HW1: Eng-word counting: Modify the word counting example so that it counts words after removing non-eng ([^A-Za-z]) characters.   (Due on 3/10, Checked: A, B, C, D)

  5. -3/10 A: Summarization Patterns: Numerical (Min, max, average, median, standard deviation), Inverted index and Counting [Lec1]

  6. -(1) HW2: Find the “median word” (a word whose appearance is the median) in the hunger game book. (Due on 3/17, Checked: B, D, (A-1), (C-7))

  7. -3/17 B: Filtering Patterns: (Bloom) Filtering, Top ten, Distinct [Lec2]

  8. -(1) HW3: Find “top ten” words (a word whose appearance is in the top 10) in the hunger game book (Due on 3/24, Checked: A, B, C, D)

  9. -3/24 C: Data Organization Patterns I: Hierarchy, Partition and Binning [Lec3]

  10. -(1) HW4: Count hotkey words with a bloom filter. [hotkeys] [assembly](Due on 3/31, Checked: A, B, C, D)

  11. -3/31 D: Data Organization Patterns II: Sorting and Shuffling

  12. -(1) HW5: Sort hotkeys on their counts [assembly] (Due on 4/7)

  13. -(2) Project topic discussion (Due on 4/7, Checked:B, D, (A-1), (C-20))

  14. -4/7,14 A: Join Patterns: Reduce Side, Replicated Join, Composite Join, Cartesian Product [Lec4]

  15. -(1) HW6: Find inner and anti join of hotkeys in assembly of app categories [hotkeys] [assembly] (Due on 4/14, Checked: A, B, D, (C-6))

  16. -(2) A: Call sequence counting on apps. B: Ubike flow analysis. C: Music pattern extraction. D: App external connection discovery

  17. -4/21 B: Metapatterns: Job Chaining, Chain Folding and Job Merging [Lec5]

  18. -(1) HW7: Compute call variance of app categories and compare them with inner joins [hotkeys] [assembly](Due on 4/28, Checked:(A-6), B, D)

  19. -4/28 C: Input and Output Patterns I: Input and output in Hadoop and Data Generation [Lec6]

  20. -(1) HW8: Compute call range on variance with 95% confidence of app categories (Due on 5/5, Checked:A, B, D)

  21. -5/5 D: Input and Output Patterns II: External source Input and Output, Partition Pruning

  22. -5/12 IBM: Running Java programs on Bluemix [Labs by Tony Yang]

  23. -5/26 IBM:  IoT App on Bluemix [Labs by Iris]

  24. -6/2 IBM: Deploying and using Hadoop environment of Bluemix [Labs by Rebecca]

  25. -6/9 IBM: Mobile app development with Bluemix [Labs by PeiYi]

  26. -6/16 Project Discussion and Demonstration

  27. -(1) 9:10-9:40 What apps do may not be what you think

  28. -(2) 9:50-10:20 Bike Ubike

  29. -(3) 10:30-11:00 Lets listen to the music

  30. -(4) 11:10-11:40 Where do apps connect?

  31. -6/23 Makeup Demo if needed [Final Project Report due on June 23]


Grading Policy

  1. -Participation (20%) Attendance, Presentation, Discussion

  2. -HWs (40%): Design Pattern Implementation

- Project/System Implementation (40%): Creativity, Completeness, Functionality

  1. -Late HW policy, -1% per day after the due date


Links that might be useful

  1. MapReduce Design Patterns by Barry Brumitt [slides]


================================================================

Data Structures (Fall 2014)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf [at] nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 2:10-5:00pm (English Session) 

Lecture location: Commerce Building 306 (Chinese/English Session)

TAs: 王韋仁 shadow25251[at] gmail.com, 林君翰 john.lin0420 [at] gmail.com

Lab times: Friday 12:10-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom


================================================================

Seminar on Advanced Cloud Computing Techniques (Fall 2014)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw


Meeting: Friday 10:10am-12:00  @ Room 306, College of Commerce


Course Objective

This is a graduate level seminar course, focusing on the following research topics:

  1. -VISO: Implementation of Virtualization Introspection techniques with openstack [Bitblaze]

  2. -AppReco: Implementation of a mobile application recommendation system [CHABADA]

  3. -BinFlow: Flow construction of iOS executable [MoCFI]

  4. -MIA: Multimedia Intelligent authentication [OpenPuff]

  5. -CUC: Cloud-based Unsupervised Clustering [GHSOM]

  6. -Hackers: Practicing and profiling hacking skills [OWL]


Projects: 連體 . 行善.  放下.


References:

  1. -Control-flow restrictor: compiler-based CFI for iOS. ACSAC’13 [pdf]

- Jekyll on iOS: When Benign Apps Become Evil. Usenix Security, 2013 [pdf]

  1. -AirBag: Boosting Smartphone Resistance to Malware Infection, NDSS’14 [pdf]

  2. -A cooperative botnet profiling and detection in virtualized environment, CNS’13[pdf]

  3. -BitBlaze: A New Approach to Computer Security via Binary Analysis, ICISS’08 [pdf]

  4. -Collaborative verification of information flow for a high-assurance app store, UW’14 [pdf]

  5. -On the feasibility of large-scale infections of iOS devices, Usenix’14 [pdf]


================================================================

Advanced Programming with Scala (Spring 2014)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw


Joint Instructor (類神經網路與雲端系統實作): 蔡瑞煌(Tsaih, Rua-Huan)

Contact: tsaih@mis.nccu.edu.tw


Joint Instructor (進階資訊系統研發): 劉文卿 (Liou, Wenqing)

Contact: w_liou@nccu.edu.tw

Meeting: Tuesday 6:10-9:00pm/Wednesday 9:10-12:00am @ the MIS Mac Classroom


Course Objective

The objective of this course is to develop scalable analysis tools with neural network techniques. Students are required to form a team, working together on one of the artificial intelligence topics with Scala.


- Multilayer Perceptron (MLP) [slides]

  1. -Recurrent neural network (RNN) and Echo-state network (ESN) [pdf]

  2. -Hopfield neural network (HNN) [pdf]

  3. -Cellular neural network (CNN) [pdf]

  4. -Self-organizing map (SOM) [slides] [pdf] [SOM and LVQ packages]

  5. -Learning vector quantization (LVQ) [slides][knn]

  6. -Support vector machine (SVM) [slides][libsvm]


Team Lists (類神經網路與雲端系統實作)

- A Team: Max Lin, Tsay-Zeng Lu, William Wang

  1. -B Team: John Lin, Zhao-Bin Yeh, Debbie Lin

  2. -C Team: Hank Lee, Yayuan Peng


Test Inputs (table, data)


Schedule (類神經網路與雲端系統實作:Subject to Change)

  1. -2/19 Scala: Introduction and Setup by Fang Yu

  2. -2/26 Scala: Functions and Evaluations (discussion lead by A) [slides by M. Ordersky]

  3. -3/5 Scala: Functions and Evaluations (A)/Higher Order Functions(B) [slides by M. Ordersky]

  4. -3/12 Scala: Higher Order Functions (B)

- 3/19 Scala: Data and Abstraction (C) [slides by M. Ordersky]

  1. -3/26 Scala: Types and Pattern Matching (A) [slides by M. Ordersky]

  2. -4/2 Scala: Pattern Matching

  3. -4/9 Scala: Lists (B)

  4. -4/16 Scala: Lists and Maps (B) [slides by M. Ordersky] [HW1 Due: Currying, Union, IntSetToList]

  5. -4/23 NN: Structural Induction [slides by M. Ordersky] Self Organizing Map - Execution Phase

  6. -4/30 NN: Self Organizing Maps - Learning Phase [HW2 Due: Expression Simplification]

  7. -5/7 NN: Backward Propagation- Execution Phase

  8. -5/14 NN:  Backward Propagation - Learning Phase [HW3 Due: Self Organizing Maps]

  9. -5/21 NN: Backward Propagation Implementation

  10. -5/28 NN: Collections (C) [slides by M. Ordersky], Parallel Collections [overview][slides by A. Prokopec] [HW4 Due: Backward Propagation]

  11. -6/4 NN: Scala Actor [tutorial], Parallel Algorithms, Implementation and Discussion [HW5 Due: Parallel Algorithms on SOM and NN]

  12. -6/11 NN: Resistant Learning, Outlier Detection and Envelope (by Michelle Huang) [slides by Michelle]

  13. -6/18 NN: Final Project Demo - Cloud Neural Network [Project Due: Parallel SOM and NN]


Grading Policy

- Participation (50%): Attendance, Presentation, Discussion and Paper Study

  1. -HWs/System Implementation (50%): Completeness, Functionality, Scalability

  2. -Late HW policy, -1% per day after the due date



================================================================


Data Structures (Fall 2013)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Lecture times: Thursday 9:10-12:00am (Chinese Session) / Thursday 2:10-5:00pm (English Session) 

Lecture location: 逸仙樓 5F MIS PC/Mac Classroom(Chinese/English Session)

TAs: 賴銀聖 101356041@nccu.edu.tw, 林君翰 john.lin0420@gmail.com

Lab times: Friday 12:00-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom


================================================================


Advanced Software Security (Spring 2013)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Meeting: Tuesday 2:10-4:00pm @ the MIS Mac Classroom


Course Objective

Though clouds and apps have rapidly replaced conventional computing in our life, they are known to be fraught with security risks. In this seminar course, we will discuss modern risks of cloud computing platforms and mobile applications, and discuss techniques that can be used to attack clouds and apps, as well as defense mechanisms to prevent systems or users from these attacks. Particularly, we have developed two sets of security tools to address the security issues of clouds and apps, called VIS and AppBeach, respectively. Students get chances to investigate and replay modern attacks under a closed environment and use these two tools to detect and prevent these attacks with the aim of learning and polishing security knowledge and skills in practice. At the end of this course, students shall understand the basic security concepts of cloud computing and mobile apps, and are familiar with techniques that can detect (embedded) attacks of apps and alleviate potential risks of cloud computing. Students will also gain experience on practical tool development.


Specifically, for students who are interested in clouds, they will be divided into two teams: a black team to attack cloud structure in a closed virtualization environment, and a white team to monitor and detect abnormal behaviors of VMs within the environment. We have developed a virtual introspection system (VIS) that is able to log system calls of VMs and take actions on VMs dynamically. We also incorporate VIS with advance artificial intelligence techniques to derive effect detection rules. On the other hand, the objective of the black team is to replay traditional attacks using VMs to bypass the VIS  system; while the objective of the white team is to enhance VIS to defend the kernel  and VMs of the cloud system from these attacks. Students who get involved will be familiar with techniques of modern security exploits and attacks on windows and linux systems, KVM cloud systems, virtualization, migration, and basic operations on Linux systems. 


For students who are interested in apps, we have developed a tool AppBeach (App Behavior Checker) that is able to identify and count system calls from iOS executables using static analysis techniques and reverse engineering techniques.  Students will be divided into two teams similar to teams in clouds. The goal of the black team is to investigate malicious behaviors of apps and embed them in an iOS app to act malicious behaviors (e.g., access and transmit user location to an external server, or get grants to access facebook tokens). On the other hand, the black team needs collect and classify online apps based on their behaviors, and check whether their executable has these behaviors embedded. The white team has two goals: (1) characterize the patterns of malicious behaviors, and (2) reveal malicious behaviors embedded in apps. Students who get involved will  have chance to explore large amount of apps, and will also be familiar with techniques of app development, behavior characterization, database and python scripts, and tools like otool utility and IDA pro.


Course Requirement


Graduate students will be asked to lead a team to achieve the tasks. Undergraduate students will be asked to join one team to help collect data and tool development.


  1. -Participation (20%)

  2. -Materials (80%): Creativity/Implementation/Presentation/Report

  3. 1.Each black team is required to present and replay three round system attacks (or app malicious behaviors). (start from the third week)

  4. 2.Each white team is required to use/modify VIS to defend the attacks (or Appbeach to characterize behaviors and reveal them in commercial apps).

  5. 3.The final report should detail:

  6. (1)Cloud Black Team: attacks - what and how are they replayed in VMs

  7. (2)App Black Team: malicious behaviors - what and how are they implemented in an iOS app and the set of online apps that may (or may not) include the behaviors

  8. (3)Cloud White Team: how attacks can be detected and dealt with VIS

  9. (4)App White Team: what the patterns of malicious behaviors are and the analysis result of behaviors of online apps

  10. -Catch the flag (Extra bonus 20%)

Teams who have more (un)defended attacks catch the flag, and get extra 20% increase on the final grade.


Teams


Cloud Security: TA- Shawn Lee (Email to swlee at swlee.org to join the team)

  1. Cloud Black Team: Wun-Jhih Huang, Wei-Shoa Tang, Tzu-Chien Chang, Chi-Feng Liu

  2. Cloud White Team: Sheng-Wei Lee, Wei-Cheng Yang, Li-Ching Chiu, Pei-Shan Chiang, Patrick Ong


App Security: TA- Steven Tai (Email to 100356023 at nccu.edu.tw to join the team)

  1. App Black Team: John Lin, Bo-Wei Tzeng, Wei-Jen Wang, Kuo-Yang Lee, Wei-Ming Tsai

  2. App White Team: Ching-Yuan Shueh, Yin-Sheng Lai, Annie Lin, Yu-Hsiun Chi


Schedule


  1. 2/19 Introduction [Lec0]

  2. 2/26 AppBeach and VIS.

        - Mobile Apps take data without permission [bits]

        - Apple Loophole gives developers access to photo [bits]

        - Including Ads in Mobile Apps poses privacy, security risks [newsroom]

        - AppBeach [slides] [python code][instruction]

  1. 3/5  Introduction to VIS and cloud security

        - Top cloud security risks by [Gartner][ComputerWeekly]

        - Cloud security issues [Sans]

        - VIS [slides]

  1. 3/12 Cloud Attack Round I and replay with VMs (Cloud Black Team)

        - N. Elhage, Virtualization under attack: Breaking out of KVM. [paper][slides][video]

        - Exploiting a CVE vulnerability [Huang]

  1. 3/19 App Malicious Behavior Round I and related commercial apps(App Black Team)

        - N. Percoco and S. Schulte, This is really not the Droid your are looking for ...[slides][video]

        - Suspicious calculator

        - Steal facebook data [John]

  1. 3/26 Cloud defense (Cloud White Team)

        - Attacks and VIS analysis results

        - Introduction of GHSOM [ghsom]

  1. 4/2 App defense and analysis result (App White Team)

        - Malicious behavior patterns [Lec6]

        - App detection and classification [Lin]

  1. 4/9 Cloud Attack Round II and replay with VMs (Cloud Black Team)

        - Campus Security Project Proposal Competition [isecurity]

        - Social engineering attacks [Willy], DDOS [Huang]

  1. 4/16 App Malicious Behavior Round II and related commercial apps(App Black Team)

        - Distributed DOS [John]

        - Get Facebook token with SDK

  1. 4/23 Cloud defense with VIS(Cloud White Team)

        - Mean classification and detection [Liching]

  1. 4/30 Cloud defense with VIS(Cloud White Team) part II

        - Cloud computing [cacm]

        - The rise of big data [fafocus]

        - Android annual malware report [techcrunch] [mcafee]

        - Dutch DDOS [computer]

  1. 5/7 App defense and analysis result (App White Team)

        - Analysis result on real apps of Round I attacks

        - Patterns of Round II [Lei]

  1. 5/14 Cloud Attack Round III and replay with VMs (Cloud Black Team)

        - Two papers of AppBeach have been accepted to be published. (IEEE MS 2013 and IJCNN 2013)

  1. 5/21 Guest Talk:Modern Techniques on iOS App Development” by Michael Pan

        - Pointers, Garbage collections, and Method Calls in Objective-C [slides]

  1. 5/28 App Malicious Behavior Round III and related commercial apps(App Black Team)

        - Embedding C functions [John]

  1. 6/4 Cloud defense (Cloud White Team)

        - Technical presentation: Quantitative analysis on Cloud-based Streaming Services [scc2013]

  1. 6/11 App defense and analysis result (App White Team)

        - WWDC 2013: 50 billion downloads, 74% revenue [video]

        - Private Method Detection [Lei]

  1. 6/18 App and Cloud Demo/Discussion/Summary. Final report and video due.

        - Video on app attacks and defenses

        - Video on cloud attacks and defenses

  1. 7/1 iSecurity Project Submission


The External Reading List (subject to change)


Cloud Security:

  1. -Books:

  2. C. Hoff, R. Mogull, and C. Balding, Hacking Exposed:  Virtualization and Cloud Computing: Secrets and Solutions. [amazon]

  3. -Technical papers:

  4. Y.-S. Wu, P.-K. Sun, C.-C. Huang, S.-J. Lu, S.-F. Lai, and Y.-Y. Chen. EagleEye: Towards Mandatory Security Monitoring in Virtualized Datacenter Environment. DSN 2013. [paper]

  5. B.D. Payne, M. Carbone, M. Sharif, W. Lee. Lares: An Architecture for Secure Active Monitoring Using Virtualization. [paper]

  6. J. Somorovsky, M. Heiderich, M. Jensen, J. Schwenk, N. Gruschka, L. Lo lacono. All Your Clouds are Belong to us-Security Analysis of Cloud Management Interfaces. [paper]

  7. A. Seshadri, M. Luk, N. Qu, A. Perrig. SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes. [paper]

  8. P. Sharma, S. K. Sood, and S, Kaur, Security Issues in Cloud Computing. [paper]

  9. F. Lombardi and R. D. Pietro, CUDACS: Securing the Cloud with CUDA-Enabled Secure Virtualization. [paper]

  10. N. Padmanabhan and B. Edwin, An Architecture for Providing Security to Cloud Resources. [paper]

  11. T. Ormandy, An Empirical Study into the Security Exposure to Hosts of Hostile Virtualized Environments. [paper]

  12. F. Lombardi, R. D. Pietro, Secure Virtualization for Cloud Computing, Journal of Network and Computer Applications. [paper]

  13. D. Zissis and D. Lekkas, Addressing Cloud Computing Security Issues, Future Generation Computer Systems. [report]

  14. G. Anthens, Security in the Cloud. Comm. ACM. [paper]

  15. T. Ristenpart, E. Tromer, H. Shacham, S. Savage, Hey, You, Get Off of My Cloud: Exploring Information Leakages in Third-Party Compute Clouds. [paper]

  16. M. Egele, T. Scholte, E. Kirda, and C. Krugrel. A Survey on Automated Dynamic Malware-analysis Techniques and Tools. [paper]

  17. Z. Zhao, G.-J. Ahn, H. Hu. Automatic Extraction of Secrets from Malware. [paper]


App Security:

  1. -News and Report

  2. The Hottest IT Skills? Cybersecurity [networkworld]

  3. Android programmers shifting toward Web apps [cnet]

  4. Apps could be overtaking the Web [technolog]

  5. iOS App downloads from Apple store achieve 25 billions [applestore]

  6. -Sites/Books:

  7. iPhone Hacks [site][book by O’Reilly]

  8. Apple Entitlement [site]

  9. iOS security development in GitHub by [nst]

  10. Objective C helper script in GitHub by [J. Duart]

  11. -Technical papers

  12. T. Werthmann, R. Hund, L. Davi, A. Sadeghi, T. Holz. PSiOS: bring your own privacy & security to iOS devices. ASIA CCS 2013. [paper]

  13. L. Lu, Z. Li, Z. Wu, W. Lee, and G. Jiang. CHEX: Statically Vetting Android Apps for Component Hijacking Vulnerabilities. CCS 2012. [paper]

  14. L. Davi, A. Dmitrienko, M. Egele, T. Fischer, T. Holz, R. Hund, S. Nurnberger, A. Sadeghi. MoCFI: A Framework to Mitigate Control-flow Attacks on Smartphones. NDSS 2012. [paper]

  15. M. Szydlowski, M. Egele, C. Kruegel, and G. Vigna. Challenges for Dynamic Analysis of IOS Applications. iNetSec2011. [paper]

  16. M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang. Riskanker: Scalable and Accurate Zero-day Android Malware Detection. MobiSys 2012. [paper]

  17. P. Pearce, A. P. Felt, G. Nunez, and D. Wagner. AdDroid: Privilege Separation for Applications and Advertisers in Android. ASIACSS2012. [paper]

  18. W. Zhou, Y. Zhou, X. Jiang, and P. Ning. Detecting repackaged Smartphone Applications in Third-Party Android Marketplaces. CODASPY2012. [paper]

  19. M. Grace, Y. Zhou, Z. Wang, and X. Jiang. Systematic Detection of Capability Leaks in Stock Android Smartphones. NDSS 2012. [paper]

  20. M. Grace, W. Zhou, X. Jiang, and A.-R. Sedeghi. Unsafe Exposure Analysis of Mobile In-App Advertisements. WiSEC 2012. [paper]

  21. M. Bscher, F. C. Freiling, J. Hoffmann, T. Holz, S. Uellenbeck, C. Wolf. Mobile Security Catching Up? Revealing the Nuts and Bolts of the Security of Mobile Devices. [paper]

  22. Y. Zhou, Z. Wang, W. Zhou, X. Jiang. Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets. [paper]

  23. Y. Zhou, X. Zhang, X. Jiang, and V. W. Freeh. Taming Information-Stealing Smartphone Applications (on Android). [paper]

  24. W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri, A Study of Android Application Security. [slides][paper]

  25. D. Wetherall, D. Coffnes, B. Greensten, S. Han, P. Hornyack, J. Jung, S. Schechter, and X. Wang. Privacy Revelations for Web and Mobile Apps. [paper]

  26. M. Grace, W. Zhou, X. Jiang, A. Sadeghi, Unsafe Exposure Analysis of Mobile In-App Advertisements. [paper]

  27. Sans Institute, Mac Malware Analysis. [paper]

  28. A. P. Felt, M. Finifter, E. Chin, S, Hanna, D. Wagner, A Survey of Mobile Malware in the Wild. [paper][slides]

  29. M. Egele, C, Kruegel, E. Kirda, G. Vigna, PiOS: Detecting Privacy Leaks in iOS Applications. [paper] (Mac)

  30. P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, Vision: Automated Security Validation of Mobile Apps at App Markets [paper] (AppInspector-Android)

  31. A. P. Felt, E. Chin, S. Hanna, D. Song and D. Wagner, Android Permissions Demystified. CCS 2011. [paper] [slides]

  32. K. Rieck, P. Trinius, C. Willems, and T. Holz, Automatic Analysis of Malware Behavior using Machine Learning. [paper]

  33. I. Burguera, U. Zurutuza, S. Nadjm-Tehrani, Crowdroid: Behavior-Based Malware Detection System for Android. [paper]

  34. A.-D. Chmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz, K. A. Yuksel, S. A. Camtepe, and S. Albayrak, Static Analysis of Executables for Collaborative Malware Detection on Android. [paper]


Sanitization and Pattern Matching:

  1. OWASP injection prevention routines. [escape special characters][validation]

  2. C.-H. Lin, C.-T. Huang, C.-P. Jiang, S.-C. Chang. “Optimization of Pattern Matching Circuits for Regular Expression on FPGA” [paper]

  3. B. Livshits and S. Chong. Towards Fully automatic Placement of Security Sanitizers and Declassfiers. POPL 2013. [paper]

  4. M. Samuel, P. Saxena, and D. Song. Context-sensitive Auto-sanitization in Web Templating Languages Using Type Qualifiers. CCS 2011. [paper]

  5. P. Saxena, D. Molnar, and B. Livshits. ScriptGard: Automatic Context-sensitive Sanitization for Large-scale Legacy Web Applications. CCS 2011. [paper]

  6. P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and Precise Sanitizer Analysis with BEK. Usenix Security Symposium 2011. [paper]

  7. OWASP. java-html-sanitizer.

  8. J. Esparza and P. Ganty. Complexity of Pattern-based Verification for Multithreaded Programs. POPL 2011. [paper]

  9. N. Provos and P. Honeyman. Hide and Seek: An Introduction to Steganography. IEEE S&P 2012. [paper]

================================================================


Data Structures (Fall 2012)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Lecture times: Tuesday 9:10-12:00am (Chinese Session) / Tuesday 2:10-5:00pm (English Session) 

Lecture location: 商院 260301, Commerce Building 3F (Chinese Session) / 學思樓 040303, Learning and thinking Building 3F (English Session)

TAs: 薛慶源 (101356020@nccu.edu.tw) and 賴銀聖 (101356041@nccu.edu.tw)

Lab times: Friday 12:00-2:00pm

Lab location: 逸仙樓 5F MIS PC Classroom



================================================================


Advanced Innovative Information Technologies (Fall 2012)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

joint with Prof. Rua-Huang Tsaih and Prof. Daniel Yuh Chao


Contact: yuf@nccu.edu.tw
Lecture times: Monday 6:10-9:00pm 

Lecture location: 260307 , Commerce Building 3F.


Course Objective: Modern Techniques on String Analysis

In this graduate level course, we aim at offering a solid training on theoretical study on advance innovative information technologies. In the first six weeks, Prof. Yu will talk about his recent research on string analysis, showing how a sound theoretical approach can be applied to resolve a practical security problem. The materials cover basic techniques on analyzing string manipulating programs, security vulnerabilities in Web applications, and the details of the algorithms of techniques to detect and patch modern vulnerabilities in web applications with string analysis. 


Lectures include (references are listed below):

- An automata-based approach for analyzing string manipulating programs using symbolic string analysis. The approach combines forward and backward symbolic reachability analyses, and features language-based replacement, fixpoint acceleration, and symbolic automata encoding [Yu et al. SPIN’08, ASE’09]

- An automata-based string analysis tool: Stranger can automatically detect, eliminate, and prove the absence of XSS, SQLCI, and MFE vulnerabilities (with respect to attack patterns) in PHP web applications [Yu et al. TACAS’10]

- A composite analysis technique that combines string analysis with size analysis showing how the precision of both analyses can be improved by using length automata [TACAS’09]

- A relational string verification technique using multi-track automata: We catch relations among string variables using multi-track automata, i.e., each track represents the values of one variable. This approach enables verification of properties that depend on relations among string variables [Yu et al. CIAA’10]

- An automatic approach for vulnerability signature generation and patch synthesis: We apply multi-track automata to generate relational vulnerability signatures with which we are able to synthesize effective patches for vulnerable Web applications. [Yu et al. ICSE’11]

- A string abstraction framework based on regular abstraction, alphabet abstraction and relation abstraction [Yu et al. SPIN’11]


References

  1. Fang Yu, Tevfik Bultan, Ben Hardekopf. String Abstractions for String Verification. SPIN’11.

  2. Fang Yu, Muath Alkahalf, Tevfik Bultan. Patching Vulnerabilities with Sanitization Synthesis. ICSE’11.

  3. Fang Yu, Tevfik Bultan, Oscar H. Ibarra. Relational String Analysis Using Multi-track Automata. CIAA’10.

  4. Fang Yu, Muath Alkahalf, Tevfik Bultan. Stranger: An Automata-based String Analysis Tool for PHP.  TACAS’10.

  5. Fang Yu, Muath Alkahalf, Tevfik Bultan. Generating Vulnerability Signatures for String Manipulating Programs Using Automata-based Forward and Backward Symbolic Analyses. ASE’09.

  6. Fang Yu, Tevfik Bultan, Oscar H. Ibarra. Symbolic String Verification: Combining String Analysis and Size Analysis. TACAS’09.

  7. Fang Yu, Tevfik Bultan, Marco Cova, Oscar H. Ibarra. Symbolic String Verification: An Automata-based Approach. SPIN’08.


Students are required to take a solid paper study with writing a survey paper and lead the discussion on the following topics. Selected papers are also listed below.


-Word correction techniques [WC]

薛慶源, 林承翰, 江佩珊, 陳妍樺

-String constraint solver and string analysis tools [ST]

董亦揚, 張瑋誠, 黃鼎鈞

-Malware detection and pattern recognition [MP]

邱莉晴, 陳毅, 陳一帆

-String analysis and streaming transducers [SA]

賴銀聖, 葉博凱, 呂蔡政

References

  1. [SA] M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, N. Bjorner: Symbolic finite state transducers: algorithms and applications. POPL’12. [paper]

  2. [SA] T. Tateishi, M. Pistoia, O. Tripp, Path-and Index-sensitive String Analysis Based on Monadic Second Order Logic. ISSTA’11 [paper]

  3. [SA] M. Samuel, P. Saxena, D. Song, Context-sensitive Auto-sanitization in Web Templating Languages Using Type Quantifiers. ACM CCS’11. [paper]

  4. [SA] R. Alur, P. Cerny, Streaming Transducers for Algorithmic Verification of Single-pass List-processing Programs. POPL’11 [paper]

  5. [SA] R. Alur, J. V. Deshmukh, Nondeterministic Streaming String Transducers. ICALP’11 [paper]

  6. [SA] N. Kobayashi, N. Tabuchi, and H. Unno. Higher-order multi- parameter tree transducers and recursion schemes for program verification. POPL’10. [paper]

  7. [ST] P. Saxena, D. Akhawe, S. Hanna, S. McCamant, F. Mao, and D. Song. A Symbolic Execution Framework for JavaScript. IEEE S&P’10. [paper]

         (Kaluza String Solver: http://aerie.cs.berkeley.edu/kaluza/)

  1. [ST] P. Saxena, D. Molnar, B. Livishits. SCRIPTGARD: Automatic Context-sensitive Sanitization for Large-scale Legacy Web Applications. ACM CCS’11. [paper]

  2. [ST] M. Veanes, P. de Halleux, and N. Tillmann. Rex: Symbolic Regular Expression Explorer. ICST’10. [paper]

  3. [ST] P. Hooimeijer, W. Weimer. StrSolve: solving string constraints lazily. Auto. Soft. Engineering. 2012 [paper]

  4. [WC] Y.-S. Han, S.-K. Ko, K. Salomaa. Computing the Edit-distance between a Regular Language and a Context-free Language. DLT’12 [paper]

  5. [WC] M. Mohri. Edit-Distance of Weighted Automata: General Definitions and Algorithms. IJFCS. [paper]

  6. [MP] M. Christodorescu, S. Jha. Testing Malware Detectors. ACM SIGSOFT Software Engineering Notes 29 (2004) 34–44. [paper][slides]

  7. [MP] M. Christodorescu, S. Jha. Static Analysis of Executables to Detect Malicious Patterns. USENIX Security Symposium, 2003. [paper]

  8. [MP] M. Fredrikson, S. Jha, M. Christodorescu, R. Salier, and X. Yan. Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors. IEEE S&P 2010. [paper]

  9. [MP] K. Rieck, T. Holz, C. Willems, P. Dussel, and P. Laskov. Learning and Classification of Malware Behavior. DIMVA’08. [paper]

  10. [MP] I. Santos, F. Brezo, X. Ugarte-Pedrero, P. G. Bringas. Opcode Sequences as Representation of Executables for Data-mining-based Unknown Malware Detection. Elsevier Information Science, 2011. [paper]


Course Requirement

  1. Participation/Quiz 30%

  2. Paper Study/Implementation 70%


Partial (Tentative) Schedule

Lecture Slides [lecture]

9/17

  1. Introduction on String Analysis

  2. Web Application Vulnerabilities

  3. Paper study topic selection


9/24

  1. How to be a good graduate student?

  2. Sanitization/Patch of Web Application Vulnerabilities

  3. Paper study lists [ST][WC][MP][SA]


10/1

  1. Quiz

  2. String Replacement, Widening, and Symbolic Encoding

  3. Forward and Backward Reachability Analyses

  4. Pre/post Images of String Operations

  5. Paper Study Presentation: Word Corrections and their applications [WC]


10/8

  1. Vulnerability Signature Generation

  2. Sanitization Synthesis

  3. Relational Analysis and Multi-track Automata

  4. Paper Study Presentation: String analysis [ST]


10/15

  1. Composite Analysis (String analysis + Integer Analysis)

  2. Paper Study: Malicious pattern recognition [MP]


10/22

  1. String Abstractions

  2. Paper Study: Streaming transducers [SA]


================================================================


Security in the Cloud and Apps (Spring 2012)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Meeting: Thursday 10:10-12:00am @ the MIS Mac Classroom


Course Objective

Could and Apps have rapidly replaced conventional computing in recent years. They are known to be fraught with security risks. In this course, we will discuss modern risks of cloud computing and apps, and discuss techniques that address these risks. We will study several selected papers, as well as implement practical tools. We will also discuss how to visualize programs for a security tool. At the end of this course, students shall understand the basic security concepts of cloud computing and mobile apps, and are familiar with techniques that can detect (embedded) attacks of apps and alleviate potential risks of cloud computing. Students will also gain experience on doing research via rigorous paper study, as well as experience on practical tool development.


We are particularly interested in (but are not limited to) the following techniques:

1. Characterize privacy/suspicious functions and their executable patterns

2. Analyze App executables

  1. 3.Attack KVM/VMware kernel

  2. 4.Monitor VMs in the cloud

  3. 5.Visualize programs


Course Material (subject to change)

Cloud Security:

  1. -CISE Researchers discuss”Security for Cloud Computing” [article]

... as we enter the mass-market for cloud computing services, the security and privacy of those services will become first-class features that ensure broad usability and deployment.

  1. -Top cloud security risks by [Gartner][ComputerWeekly]

  2. -An open cloud project [slides]

  3. -Books:

  4. C. Hoff, R. Mogull, and C. Balding, Hacking Exposed:  Virtualization and Cloud Computing: Secrets and Solutions. [amazon]

  5. -Technical papers:

  6. B.D. Payne, M. Carbone, M. Sharif, W. Lee. “Lares: An Architecture for Secure Active Monitoring Using Virtualization” [paper]

  7. J. Somorovsky, M. Heiderich, M. Jensen, J. Schwenk, N. Gruschka, L. Lo lacono. “All Your Clouds are Belong to us-Security Analysis of Cloud Management Interfaces” [paper]

  8. A. Seshadri, M. Luk, N. Qu, A. Perrig. “SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes” [paper]

  9. P. Sharma, S. K. Sood, and S, Kaur, Security Issues in Cloud Computing. [paper]

  10. F. Lombardi and R. D. Pietro, CUDACS: Securing the Cloud with CUDA-Enabled Secure Virtualization. [paper]

  11. N. Padmanabhan and B. Edwin, An Architecture for Providing Security to Cloud Resources. [paper]

  12. T. Ormandy, An Empirical Study into the Security Exposure to Hosts of Hostile Virtualized Environments. [paper]

  13. F. Lombardi, R. D. Pietro, Secure Virtualization for Cloud Computing, Journal of Network and Computer Applications. [paper]

  14. D. Zissis and D. Lekkas, Addressing Cloud Computing Security Issues, Future Generation Computer Systems. [report]

  15. G. Anthens, Security in the Cloud. Comm. ACM. [paper]

  16. T. Ristenpart, E. Tromer, H. Shacham, S. Savage, Hey, You, Get Off of My Cloud: Exploring Information Leakages in Third-Party Compute Clouds. [paper]

  17. N. Elhage, Virtualization under attack: Breaking out of KVM. [paper][slides][video]


App Security:

  1. -News and Report

  2. Hottest IT Skills? Cybersecurity [networkworld]

  3. Android programmers shifting toward Web apps [cnet]

  4. Apps could be overtaking the Web [technolog]

  5. Including Ads in Mobile Apps poses privacy, security risks [newsroom]

  6. iOS App downloads from Apple store achieve 25 billions [applestore]

  7. Mobile Apps take data without permission [bits]

“ While Apple says it prohibits and rejects any app that collects or transmits users’ personal data without their permission, that has not stopped some of the most popular applications for the iPhone, iPad and iPod — like Yelp, Gowalla, Hipster and Foodspotting — from taking users’ contacts and transmitting it without their knowledge.”

  1. Apple Loophole gives developers access to photo [bits]

“After a user allows an application on an iPhone, iPad or iPod Touch to have access to location information, the app can copy the user’s entire photo library, without any further notification or warning, according to app developers.”

  1. -Sites/Books:

  2. iPhone Hacks [site][book by O’Reilly]

  3. -Technical papers

  4. M. Bscher, F. C. Freiling, J. Hoffmann, T. Holz, S. Uellenbeck, C. Wolf. Mobile Security Catching Up? Revealing the Nuts and Bolts of the Security of Mobile Devices. [paper]

  5. Y. Zhou, Z. Wang, W. Zhou, X. Jiang. Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets. [paper]

  6. Y. Zhou, X. Zhang, X. Jiang, and V. W. Freeh. Taming Information-Stealing Smartphone Applications (on Android). [paper]

  7. W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri, A Study of Android Application Security. [slides][paper]

  8. D. Wetherall, D. Coffnes, B. Greensten, S. Han, P. Hornyack, J. Jung, S. Schechter, and X. Wang. Privacy Revelations for Web and Mobile Apps. [paper]

  9. M. Grace, W. Zhou, X. Jiang, A. Sadeghi, Unsafe Exposure Analysis of Mobile In-App Advertisements. [paper]

  10. Sans Institute, Mac Malware Analysis. [paper]

  11. A. P. Felt, M. Finifter, E. Chin, S, Hanna, D. Wagner, A Survey of Mobile Malware in the Wild. [paper][slides]

  12. M. Egele, C, Kruegel, E. Kirda, G. Vigna, PiOS: Detecting Privacy Leaks in iOS Applications. [paper] (Mac)

  13. P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, Vision: Automated Security Validation of Mobile Apps at App Markets [paper] (AppInspector-Android)

  14. A. P. Felt, E. Chin, S. Hanna, D. Song and D. Wagner, Android Permissions Demystified. CCS 2011. [paper] [slides]

  15. K. Rieck, P. Trinius, C. Willems, and T. Holz, Automatic Analysis of Malware Behavior using Machine Learning. [paper]

  16. I. Burguera, U. Zurutuza, S. Nadjm-Tehrani, Crowdroid: Behavior-Based Malware Detection System for Android. [paper]

  17. A.-D. Chmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz, K. A. Yuksel, S. A. Camtepe, and S. Albayrak, Static Analysis of Executables for Collaborative Malware Detection on Android. [paper]


Visualizing Programs:

-Books

  1. Visualizing Data by Ben Fry, O’Reilly Media, Dec. 2007

  2. -Links

  3. 29 Sexy iPhone App Design [link]

  4. A Graph Visualization Software [Graphviz: dot]

  5. -Technical papers

  6. W. D. Pauw, E. Jensen, N, Mitchell, Visualizing the Execution of Java Programs. [paper]

  7. M. Kersten, G. C. Murphy, Using Task Context to Improve Programmer Productivity. [paper]

  8. A. Bragdon, S. P. Reiss, R. Zeleznik, S. Karumuri, W. Cheung, J. Kaplan, C. Coleman, F. Adeputra, and J. J. LaViola, Code Bubbles: Rethinking the User Interface Paradigm of Integrated Development Environments. [paper]

  9. M. J. Pacione, Software Visualization for Object-Oriented Program Comprehension, [paper] [full report]

  10. J. A. Jones, M. J. Harrold, J. Stasko, Visualization of Test Information to Assist Fault Localization. [paper] [slides]

  11. H. Ahmadi, J. Kong, User-centric Adaptation of Web Information for Small Screens. [paper]

  12. P. Gross, J. Yang, and C. Kelleher, Dinah: An Interface to Assist Non-Programmers with Selecting Program Code Causing Graphical Output. [paper]

  13. P. Gross and C. Kelleher, Non-programmers Identifying Functionality in Unfamiliar Code: Strategies and Barriers. [paper]

  14. T.-H.  Chang, T. Yeh, R. Miller, Associating the Visual Representation of User Interfaces with their Internal Structures and Metadata. [paper]

  15. P. Dragicevic, S. Huot, F. Chevalier, Animating from Markup Code to Rendered Documents and Vice Versa. [paper]

  16. T.  Karrer, J.-P. Kramer, J. Diehl, B. Hartmann, J. Borchers, Stacksplorer: Call Graph Navigation Helps Increasing Code Maintenance Efficiency. [paper]


Miscellaneous:

-News

  1. Dr. Chao-I Che’s visit [asc3D][visualsize][isvc paper]

  2. iSecurity Project Competition [link]

  3. Software Engineer is ranked No. 1 of the best ten jobs in 2012 [ws journal]


-Media Security

  1. J. K. Paruchuri and S.-C. S. Cheung. Joint Optimization of Data Hiding and Video Compression. [paper]

  2. A. K. Bhaumik, M. Choi, R. J. Robles, and M. O. Balitanas. Data Hiding in Video. [paper]

  3. X. Quan and H. Zhang, Data Hiding in MPEG Compressed Audio Using Wet Paper Codes. [acmdl]

  4. B.-Y. Lei, K.-T. Lo, J. Feng. Digital Watermarking Techniques for AVS Audio. [paper]

  5. M. Wu and Bede Liu. Multimedia Data Hiding. [book]

  6. N. Memon and P.W. Wong, Protecting Digital Media Contents. [paper]

  7. C.-S. Lu and H.-Y. Mark Liao, Multipurpose Watermarking for Image Authentication and Protection. [paper]

  8. J. J. Chae and B. S. Manjunath, Data Hiding in Video. [paper]

  9. M. Wu and B. Liu. Data Hiding in Image and Video: Part I-Fundamental Issues and Solutions. [paper]

  10. M. Wu and B. Liu. Data Hiding in Image and Video: Part II-Designs and Applications. [paper][summary][slides]

  11. D.  Mukherjee, J. J. Chae, S. K. Mitra. A Source and Channel-coding Framework for Vector-Based Data Hiding in Video. [paper]

  12. A. S. Abbass, E. A. Soleit, and S. A. Ghoniemy. Blind Video Data Hiding Using Integer Wavelet TRansforms. [paper]

  13. E. T. Lin, A. M. Eskicioglu, R. L. Lagendijk, E. J. Delp. Advances in Digital Video Content Protection. [paper]

  14. More references by visionlab@ucsb [link]



Course Requirement


Graduate students will be asked to present and lead the discussion of research papers in their field.

We also expect the graduate students lead a team to develop related systems/tools. Undergraduate students will be asked to join one team to help tool development.


  1. -Participation (20%)

  2. -Paper Study(40%)

  3. 1.Each team will present 6-8 papers.

  4. 2.For each one, the team needs to turn in one page review in English, including the summary, advantages, disadvantages, and the comparison against your work

  5. -System Development (40%)

  6. 1.Each team needs to present its progress biweekly

  7. 2.Each team needs to turn in the proposal (in the middle of the class) and the final report (at the end of the class) 


Teams


Cloud Security (C-Team)

  1. Sheng-Wei Lee (Email to swlee@swlee.org to join the team)

  2. You will be familiar with Cloud Architecture/Implementation, Linux, KVMs, VMware, Shell script, Python, System breaches, VM Monitoring 


App Security (A-Team)

  1. Steven Tai (Email to 100356023@nccu.edu.tw to join the team)

  2. You will be familiar with iOS, Android, Objective C, Java, Reverse Engineering, Binary Analysis, Bytecode Analysis, Hardoop/Distributed Computation,


Program Visualization (V-Team)

  1. I-Yang Dong (Email to 100356021@nccu.edu.tw to join the team)

  2. You will be familiar with 3D Graphics, Unity, Java, App and Web Server Development, Program Analysis, XML, User Interface


Schedule


  1. 2/24 Course and Project Introduction

  2. 3/1  [Paper Study] “A Survey of Mobile Malware in the Wild” by A-team

  3. 3/8  [Paper Study] ”Addressing cloud security issues” and “Virtualization under attack: Breaking out of KVM” by C-team

  4. 3/15 [Paper Study] “User-centric Adaptation of Web Information for Small Screens” and the Visualizing Data Book Chapter 8 “Networks and Graphs” by V-team [Project Discussion] “The Web Vulnerability Patcher”  by V-team 

  5. 3/22 [Paper Study] “PiOS: Detecting Privacy Leaks in iOS Applications”  by A-team [Project Discussion] “AppBeAch” by A-team

  6. 3/29 [Paper Study] “Secure Virtualization for Cloud Computing” and “An Architecture for Providing Security to Cloud Resources”[Project Discussion] Detecting Malicious Behaviors of VMs by C-team

  7. 4/5 [Happy Spring Break] [Proposal Due (extended to Apr. 15)]

  8. 4/12 [Paper Study] “Visualizing the Execution of Java Programs” and “Software Visualization for Object-Oriented Program Comprehension” [Project Discussion] 3D-rize XML graphics, and Patcher online (presented in English)

  9. 4/19 [Paper Study] “Crowdroid: Behavior-Based Malware Detection System for Android”  [Project Discussion] Identifying Arguments of Obj_MsgSend in assembly by A-team

  10. 4/26 [Paper Study] “Hey, You, Get Off of My Cloud: Exploring Information Leakages in Third-Party Compute Clouds” by C-team [Project Discussion] Auditing and path tracking of VMs by C-team

  11. 5/3 [Paper Study]  “Graphiz” and “Dinah: An Interface to Assist Non-Programmers with Selecting Program Code Causing Graphical Output” by V-team[Project Discussion] Visualizing Vulnerabilities in 3D and Mobile Devices

  12. 5/10 [Paper Study] “Vision: Automated Security Validation of Mobile Apps at App Markets” and “Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets” by A-team[Project Discussion] Decoding and decryption by A-team

  13. 5/17 [Paper Study] “SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes”, “All Your Clouds are Belong to us-Security Analysis of Cloud Management Interfaces and “CUDACS: Securing the Cloud with CUDA-Enabled Secure Virtualization” by C-team [Project Discussion] VIS by C-team

  14. 5/24 [Paper Study] “Visualization of Test Information to Assist Fault Localization” and “Using Task Context to Improve Programmer Productivity” by V-team [Project Discussion] Stranger with the all new interface by V-team

  15. 5/31 [Break] Work on your projects.

  16. 6/7 [System Demo and Discussion] A- (9:10-10:00), C (10:10-11:00), V-team (11:10-12:00), lunch and discussion (12:10-13:00)

  17. 6/14-21 [Final Report Due] Individual Study by V-, A-, C- team



================================================================


Data Structures (Fall 2011)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Lecture times: Thursday 9:10-12:00 (Session A) / Thursday 1:10-4:00 (Session B) 

Lecture location: 逸仙樓 5F 資管系PC 教室

TAs: 廖文成, 99356013@nccu.edu.tw (Session A) and

         邱芃瑋, 99356027@nccu.edu.tw (Session B)

Lab times: Friday 12:00-1:00 and Monday 12:00-1:00

Lab location: 逸仙樓 5F 資管系PC 教室


================================================================


Innovative Information Technologies (Fall 2011)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

joint with Prof. Jiang and Prof. Liou


Contact: yuf@nccu.edu.tw
Lecture times: Friday 1:10-4:00  

Lecture location: 260312 , Commerce Building 3F.


Course Objective

This is a learn-by-doing course, addressing cloud and mobile application development this semester. ‭ The lectures that are offered by Prof. Chiang, Prof. Yu, and Prof. Liou will cover the basic concepts of modern techniques related to cloud computing, parallel computation and mobile application development. We will also invite speakers in the industrial and academic fields to offer the advance lectures and share their experiences. At the end of this course, students should know the basic concepts and skills of modern cloud and app techniques and feel comfortable to develop mobile/cloud applications.


Students will be asked to work on a middle-size team project and will be assigned different criteria for the project. The project is about developing cloud/mobile applications. The criteria include server side mechanisms, such as distributed database management, scalable/flexible cloud server architecture, load balancing, as well as client side applications, such as friendly user interface, content trajectory, privacy leaks, and security defenses. Each team will be required to present and refine their work in the final six weeks, and consult with the advisors for details.


Course Requirement

  1. Participation and Paper Presentation: 30%

  2. System Development: 70%
    - National Palace Museum Channel (Zhe-Liang Yang, Chung-Shun Wu) [iPalace]

         - Word Warehouse (Eric Huang, Wen-Yang Tzeng) [Wordnet]

         - Security Analysis of Web Applications (I-Yang Dong, Sheng-Wei Lee) [Stranger]

         - Static Analysis of App Executables (Steven Tai, Shan-Hau Ho) [Binary Analysis Platform]


Partial (Tentative) Schedule

9/30, 10/7 Introduction

  1. Syllabus and Project Announcement [slides]

  2. A Tutorial of KVM and its Setup (by Sheng-Wei Lee) [slides]

  3. N. Chohan, C. Bunch, S. Pang, C. Krintz, N. Mostafa, S. Soman, and R. Wolski. AppScale: Scalable and Open App Engine Application Development and Deployment. In International Conference on Cloud Computing, Oct. 2009 [paper]

  4. Bunch , J. Kupferman, and C. Krintz, Active Cloud DB: A RESTful Software-as-a-Service for Language Agnostic Access to Distributed Datastores, International Conference on Cloud Computing (CloudComp), Oct, 2010 [paper]

  5. N. Chohan, C. Bunch, C. Krintz, and Y. Nomura, Database-Agnostic Transaction Support for Cloud Infrastructures. IEEE Cloud11: International Conference on Cloud Computing, July, 2011 [paper]


11/11 Advance Cloud Techniques and Basic iOS development

  1. Writing a Map Reduce code (by Steven Tai) [install][map reduce]

  2. Stealing Private Information from iOS Apps (by Edward Lee) [slides] [sample code]

  3. More materials on iOS App development (by Michael Pan)[intro][mvc][compare]


11/18 iOS Development (by Michael Pan) in the MIS Mac Classroom, 5F, Yi-Shien Building.

  1. iOS 5 New Features

  2. Michael’s blogs [storyboard][segue_view][segue_delegate]


12/23 Project and Literature Review

  1. Monitoring VMs in the cloud (by Sheng-Wei Lee) [slides][sample code]

  2. WordNet (by Eric) [slides]

  3. Literature Review (by Yang) [slides]


12/30 Project Review

  1. Detecting Suspicious Behaviors in iOS Apps (by Steven Tai) [slides]

  2. National Palace Museum in the Cloud (by Yang) [slides]

  3. Stranger online (by I-Yang) [slides]



Special Issues


Cloud Security

  1. D. Zissis and D. Lekkas, Addressing cloud computing security issues, Future Generation Computer Systems. [paper]

  2. G. Anthens, Security in the Cloud. Comm. ACM. [paper]

  3. T. Ristenpart, E. Tromer, H. Shacham, S. Savage, Hey, You, Get Off of My Cloud: Exploring Information Leakages in Third-Party Compute Clouds. [paper]

  4. S. Kamara, K. Lauter, Cryptographic Cloud Storage. [paper]

  5. N. Elhage, Virtualization under attack: Breaking out of KVM. [slides][video]


Balancing the Load of Video-based Services

  1. V. K. Adhikari, S. Jain, Z.-L. Zhang, YouTube Traffic Dynamics and Its Interplay with a Tier-1 ISP: An ISP Perspective. [paper]

  2. R. Krishnan, H. V. Madhyastha, S. Srinivasa, S. Jain, A. Krishnamurthy, T. Anderson, J. Gao, Moving Beyond End-to-End Path Information to Optimize CDN Performance. [paper]

  3. H. Yin, X. Liu, T. Zhan, V. Sekar, F. Qu, C. Lin, H. Zhang, B. Li, Design and Deployment of a Hybrid CDN-P2P System for Live Video Streaming: Experiences with LiveSky. [paper]


App Security

  1. M. Egele, C, Kruegel, E. Kirda, G. Vigna, PiOS:Detecting Privacy Leaks in iOS Applications. [paper]

  2. A. P. Felt, M. Finifter, E. Chin, S, Hanna, D. Wagner, A Survey of Mobile Malware in the Wild. [paper]

  3. P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, Automating Privacy Testing of Smartphone Applications. [paper] (AppInspector-Android)

  4. Sans Institute, Mac Malware Analysis. [paper]


String Analysis

  1. T. Tateishi, M. Pistoia, O. Tripp, Path-and Index-sensitive String Analysis Based on Monadic Second Order Logic. ISSTA’11 [paper]

  2. R. Alur, P. Cerny, Streaming Transducers for Algorithmic Verification of Single-pass List-processing Programs. POPL’11 [paper]

  3. R. Alur, J. V. Deshmukh, Nondeterministic Streaming String Transducers. ICALP’11 [paper]



================================================================


Software Security (Spring 2011)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Lecture times: Tuesday 9:10-12:00  

Lecture location: MIS Mac Lab,  逸仙樓 5F


Course Objective

Software security attracts great attention as there is continuous increase in cyber/computer crimes. Over the last few years the necessity for software security has grown rapidly as web sites have been defaced, credit card information has been stolen, publicly available hacking tools have become more sophisticated and viruses and worms cause more damage than ever before.


This course is an introduction-level course about software security. We will discuss modern web application (in)security issues in this course with an emphasis on how to secure programs with static source code analysis. Students will learn how to identify vulnerabilities in web applications, how to exploit vulnerabilities in web applications, and how to prevent the exploits and remove vulnerabilities in web applications. Students also get chance to learn how to apply advance (static analysis) techniques and tools to develop more secure and more reliable software. Beyond technical materials, students will also have chance to polish their English presentation/writing  skills by presenting selected security papers and book chapters and writing up the term paper.


At the end of this course, students shall have a clear view of web application (in)security and know some static analysis techniques.  Students shall be familiar with common vulnerabilities and exploits, such as Cross Site Scripting and SQL Injections in Web applications. Students shall also know how to detect, prevent and remove these software flaws in the systems/applications via static analysis.‭


Course Requirement

  1. Participation: 10%

  2. Presentation: Paper 20% and Chapter 20%

  3. Term Paper: 50%
    - Select modern attack/security tools to attack/analyze (open source) web applications

         - Report the details with the methodology, discovered vulnerabilities, exploits, etc.

         - Teams:

        1. Anthony Cimo, Alexis Kirat, Kuan-Ming Chen and I-Yang Dong

        2. Juilette Maxime Lessing, Hsing Huang and Chen-Yi Yang

        3. Jorina van Malsen,  Eric Huang and Ruei-Chen Dai

        4. Adam Fremd, Vincent Liou and Ruei-Jiun Liang


Text books

  1. The Web Application Hacker's Handbook: Discovering and Exploiting Security Flaws” by Dafydd Stuttard and Marcus Pinto, Wiley Publishing, Inc, 2007

  2. “Secure Programming with Static Analysis” by Brain Chess and Jacob West, Addison-Wesley Professional, 2007


Selected Papers

  1. Davide Canali, Marco Cova, Christopher Kruegel, and Giovanni Vigna. “Prophiler: a Fast Filter for the Large-Scale Detection of Malicious Web Pages.” In  Proc. of the World Wide Web Conference (WWW 2011)

  2. Fang Yu, Muath Alkahalaf, Tevfik Bultan. “Patching Vulnerabilities with Sanitization Synthesis” In Proc. of the 33th Internation Conference on Software Engineering (ICSE 2011)

  3. Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, Dawn Song. “A Symbolic Execution Framework for JavaScript.” In Proc. of the 31st IEEE Symposium on Security & Privacy (Oakland 2010)

  4. Prateek Saxena, Steve Hanna, Pongsin Poosankam, Dawn Song. “FLAX: Systematic Discovery of Client-side Validation Vulnerabilities in Rich Web Applications.“ In Proc. of the 17th Network and Distributed System Security Symposium (NDSS 2010)

  5. Adam Barth, Adrienne Porter Felt, Prateek Saxena, aaron Boodman. “Protecting Browsers from Extension Vulnerabilities.“ In Proc. of the 17th Network and Distributed System Security Symposium (NDSS 2010)

  6. Marco Cova, Christopher Kruegel, and Giovanni Vigna. “Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code.” In  Proc. of the World Wide Web Conference (WWW 2010)

  7. V. Felmetsger, L. Cavedon, C. Kruegel, and G. Vigna. “Toward Automated Detection of Logic Vulnerabilities in Web Applications.” In Proc. of the USENIX Security Symposium Washington, 2010

  8. Gary Wassermann and Zhendong Su. “Static Detection of Cross-site Scripting Vulnerabilities.” In Proc. of the 30th International Conference on Software Engineering (ICSE 2008)

  9. Yichen Xie and Alex Aiken. “Static Detection of Security Vulnerabilities in Scripting Languages.” In Proc. of the 15th USENIX Security Symposium (USENIX 2006)


Tools/Projects

  1. WebScarab: Web Application Spider/HTTP(S) Connection Intercepter

  2. WebGoat: A safe environment for you to learn/practice vulnerability exploit skills

  3. Skipfish: A black-box testing tool for web applications

  4. Nmap: A security scanner with thousands fingerprints for networking and hacking 

  5. Metasploit: A project about exploitation techniques

  6. Burp Suite: An integrated platform for perfoming security test of web applications

  7. Httprint: A web server fingerprint tool

  8. Stranger: STRing AutomatoN GEneratoR - An automata-based static string analysis tool for PHP

  9. JSA: Java String Analyzer- A grammar-based static string analysis tool for JAVA

  10. AppScan: A web application vulnerability scanner from IBM

  11. WebInspect: A web application security testing and assessment tool from HP

  12. PCL: A powerful password cracking library



Lectures/Schedules (Subject to change)

February -Welcome!

  1. 2/22: Opening: About this course [Lec0]
    - Refresh: “The Social Network”


March and April- Be familiar with Web application vulnerabilities!

  1. 3/1: Web Application (In)Security and Core Defense Mechanisms
    - Handbook Chapter 1 and Chapter 2 (by Fang) [Lec1]

  2. 3/8: Web Application Technologies/Mapping Web Applications
    - Handbook Chapter 3 and Chapter 4 (by Fang) [Lec2]

         - A brief introduction of security tools/projects

  1. 3/15: Bypassing Client-Side Control/SQL Injections

         - Handbook Chapter 5 (by Tony Cimo) [Slides]

         - Handbook Chapter 9 (by Fang) [Lec3]

  1. 3/22: Attacking Authentication/Command Injections

         - Handbook Chapter 6 (by Adam Fremd) [Slides]

         - Handbook Chapter 9 (by Fang) [Lec4]

         - Introduction to Stranger: Automatic Detection and Removal of Injection Flaws

           (by Fang) [Slides

  1. 3/29: Attacking Session Management and Access Control [Lec5]

         - Handbook Chapter 7 (by Juliette) [Slides]

         - Handbook Chapter 8 (by Jorina) [Slides]

         - Introduction to Stranger: Automatic Detection and Removal of Injection Flaws (Continued,

           by Fang)

  1. 4/5: Spring Break

  2. 4/12: Attacking Application Logics and Automating Bespoke Attacks [Lec6]

         - Handbook Chapter 11 (by Eric Huang) [Slides]

         - Handbook Chapter 13 (by Ruei-Jiun Liang) [Slides]

         - Prophiler: a Fast Filter for the Large-Scale Detection of Malicious Web Page (by Tony)[paper][Slides]

  1. 4/19: Exploiting Path Traversal and A Web Application Hacker’s Toolkit [Lec7]

         - Handbook Chapter 10 (by Hsin Huang) [Slides]

         - Handbook Chapter 19 (by Kuan-Ming) [Slides]

         - Protecting Browsers from Extension Vulnerabilities (by Adam) [Slides]

        - Burp Suite: Tool Demonstration (by Hsin Huang) [Slides

  1. 4/26: Exploiting Information Disclosure/Attacking Compiled Applications [Lec8]

         - Handbook Chapter 14 (by Vincent Liou) [Slides]

         - Handbook Chapter 15 (by Alex) [Slides]

         - FLAX: Systematic Discovery of Client-side Validation Vulnerabilities in Rich Web

           Applications (by Jorina) [Slides]

  1. 5/3: Attacking Application Architecture and Web Server [Lec9]

         - Project Proposal Due

         - Handbook Chapter 16 (by Ruei-Chen) [Slides]

         - Handbook Chapter 17 (by I-Yang) [Slides]

         - A Symbolic Execution Framework for JavaScript (by Ruei-Jiun) [Slides]


May - Static Analysis/Tools

  1. 5/10: Finding Vulnerabilities in Source Code [Lec10]

         - Handbook Chapter 18 (by Chen-Yi) [Slides]

         - Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code

           (by Kuan-Ming) [Slides]

  1. 5/17: Taint Analysis, String Analysis, and Stranger Tool Demonstration [Lec11]

         - Static Detection of Security Vulnerabilities in Scripting Languages (by Alex) [Slides]

         - Toward Automated Detection of Logic Vulnerabilities in Web Applications (by Eric Huang) [Slides]

          - Patching Vulnerabilities with Sanitization Synthesis (by Fang)  [Slides]

  1. 5/24: Invited Talk: Apple iOS application development (by Michale), 10:00-12:00am.

  2. 5/31: Security Tool Demonstration [Lec12]

         - Nmap: tool presentation (by Steven)

         - Static Detection of Cross-site Scripting Vulnerabilities (by Juliette Lessing)

         - WebGoat: Tool Demonstration (by I-Yang)

        

 

June - Team Project/Presentation

  1. 6/7: Advance Security/Static Analysis Tools / Project Discussion

        - Skipfish: Tool Demonstration (by Chen-I)

        - Stranger: Tool Demonstration (by Vincent Liou)

        - Face mesh and Password Cracking System (by Hsin Huang, Chen-I, Juliette)

      

  1. 6/14: Project Discussion

        - WebGoat Handbook (by I-Yang, Kuan-Min, Tony, and Alex)

        - Path-sensitive String Analysis (by Ruei-Jiun, Vincent, Adam)

        - FTP Service Port Scan (by Eric, Steven, Jorina)


  1. 6/21: Project report/system due (Meet in my office 150409)



Links that might be useful

  1. Here you can find numerous open source applications [sourceforge.net]

  2. Know major software threats in Common Vulnerability and Exposure [cve]

  3. Follow the top security issues with The Open Web Application Security Project [OWASP]

  4. Security Course@UCSB by Prof. Giovanni Vigna [Web Application Vulnerabilities]



================================================================


Program Analysis Seminar (Spring 2011)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Lecture times: Monday 12:10-2:00  

Lecture location: 260102,  商館 1F


Course Objective

We will cover basic concepts about program analysis with an emphasis on static analysis and string analysis techniques. Students will learn by doing. Students will be asked to take a middle size project, e.g., writing a parser or using an existed tool to parse a language of server/client-side web application development. We will also introduce the string analysis tool Stranger, which is able to detect and prevent severe vulnerabilities in web applications.

At the end of this course, students shall be able to understand the basic concepts of program analysis, discuss modern issues of program analysis, and gain experience to develop mid-size projects.


Course Projects:

1. Path Sensitive String Analysis (Ruei-Jun)

  1. -Branch conditions and their representations [Slides]

  2. -Path-sensitive string analysis

2. Web Server Development of Stranger (Wei, Yu-Wen, Yi-Chun)

  1. -Client-side [Slides]

  2. -Sever-side Functionalities [Slides]

  3. -Cloud Server [Slides]

  4. 3.Symbolic String Analysis Tool Enhancement (Yuan-Jie, Chia-Min, Po-Wei, Chi-Hau)

  5. -Abstraction [Slides]

  6. -Refine Constant Collection (Po-Wei, Chi-Hau)

  7. -C library and trace generation (Yuan-Jie, Chia-Min)

  8. -Experiments

4. Objective C and Xcode Development (Chia-Yin, Ruo-Ting)

  1. -Introduction to an iOS application development


Course Requirement

  1. Participation: 20%

  2. Project/Class Presentation: 40%

  3. Project Completeness: 40%



================================================================


Data Structures (Fall 2010)

Instructor: 郁方 (Yu, Fang) Office: 150409 (Health Center 4F) Ext: 77453

Contact: yuf@nccu.edu.tw
Lecture times: Thursday 9:00-12:00 (Session A) / Thursday 2:00-5:00 (Session B) 

Lecture location: 研究大樓 250301

TAs: 廖文成, 99356013@nccu.edu.tw (Session A) and

         邱芃瑋, 99356027@nccu.edu.tw (Session B)

Lab times: Monday 12:00-1:00 (Session A) / Wednesday 12:00-1:00 (Session B) 

Lab location: 逸仙樓 5F 資管系PC 教室