MSc Theses

# A Novel Approach to Evaluating Decision Tree Learning Algorithms

By Tianqi Xiao, August 2020. Full paper: xiao20200810.pdf

Decision tree learning algorithms performance evaluation is essential for building or selecting the optimal algorithm to solve classification problems. Generally, research studies use datasets provided by universities, reputable companies and organizations, or collected by the research teams themselves. However, there are two major problems with this approach. First, the number of datasets used in these studies are generally limited (usually less than 25 datasets are used), which suggests the results may be to a large degree dependent on a specific dataset. Secondly, many traditional metrics rely on cross-validation to measure the correctness of the classification. In this case, the evaluation process is done by estimating how well the inferred model classifies the data examples in the test set without knowing what the actual model is. We recognize these problems and propose a new approach that evaluates the performance of decision tree learning algorithms generically. The underlying idea of this approach is paths tracing and comparing. We assess various decision tree algorithms with our new framework and compare their performance. We also investigate the relation between the inferred trees and the properties of the training datasets.

# Parse trees, Interference, and Mixed Nodes for Context-Free Parallel Communicating Grammar Systems

By Mehrdad Naserdoust, April 2020. Full paper: naserdoust20200413.pdf

Parallel Communicating Grammar Systems (PCGS) were introduced for a better understanding of concurrent systems on a language-theoretic level. Some research has been done regarding relationship between PCGS and practical computing systems, though this has not received sustained attention. In another more practical attempt, parse trees for PCGS have been investigated. It is relatively straightforward to construct a parse tree for every PCGS derivation, but going the other way around (reconstructing a derivation for every PCGS parse tree) is not always possible. The possibility of doing this has been found to depend on the property of interference in a PCGS. We refine this result, showing that there is one extra property that needs to be taken into account. We show that even in the absence of interference there exist parse trees that cannot correspond to any derivation when strings containing nonterminals are communicated between PCGS components. This reduces the utility of PCGS parse trees even further than previously thought.

# Verification and TCTL Model Checking of Real-Time Systems on Timed Automata and Timed Kripke Structures, and Inductive Conversion Issues in Dense Time

By Negar Nourollahi, December 2019. Full paper: nourollahi20191215.pdf

The design and verification of concurrent and real-time systems are difficult problems. While model checking proved to be successful as an automatic and effective solution, real-time systems do not benefit directly from classical model checking since they feature infinite clock values. Instead, bisimulation and abstraction can be used to build an abstract finite state machine from the real-time model, which can in turn be model checked.

In the first part of this thesis we consider formal methods for the specification and automatic verification of finite-state real-time systems with dense time. Both automata and temporal logic are extended to allow them to model timing delays and to verify real-time requirements.

Time automata are chosen as the underlying semantic of real-time system because they are the standard modeling method in designing real-time systems, and their state reachability problem is also decidable. We demonstrate how the tools for analysis of untimed finite state systems can also be deployed in order to verify timed systems. While dense time in timed automata generates infinite state spaces, we show that time abstracted bisimulations are decidable for timed automata, and so strong time-abstract bisimulation can be used to reduce the infinite state-space of a given timed automata model to a finite quotient graph and finite transition system. Time-abstract bisimulation also preserves both the linear and branching time properties of the original model sufficiently for verification, while the exact time delays are abstracted away. The strongly non-zeno timed automata are also proposed as an extension of timed automata in order to address the deadlock and timelock issues in verifying timed systems. This solution is based on the finite quotient graph and comes from strong time-abstract bisimulation. We then show how TCTL model checking can be reduced to CTL model-checking using strong time-abstracting bisimulation.

In the second part of the thesis we demonstrate that the satisfaction of TCTL formulas under a natural semantics for both discrete-time and dense-time timed Kripke structures can be reduced to a model-checking problem in the point-wise semantics for timed Kripke structures. Discrete TCTL-preserving abstraction methods of timed Kripke structures, the so-called gcd-transformation and τ -transformation are introduced.

Some effort has been spent in the untimed domain to bride logical approaches (such as model checking) and algebraic approaches to formal methods (such as model-based testing). One of the necessary steps in this direction is a process of establishing an equivalence between the underlying semantic models in the two domains (Kripke structures and labeled transition systems, respectively. In particular, several inductive, algorithmic methods that generate a compact Kripke structure equivalent with given labeled transition systems were developed. We intended to extend this effort to the dense time domain, but we found instead that inductive conversion methods are infeasible in large scale, concurrent real-time systems with dense time domain. While both timed automata and timed Kripke structures can be used for verification in the dense time domain, we cannot feasibility convert timed automata to timed Kripke structures in any feasible way. The reasons include the undecidability of trace properties for timed automata and also an inherent state explosion problem for any inductive conversion algorithm.

# Two Problems Believed to Exhibit Superunitary Behaviour Turn out to Fall within the Church-Turing Thesis

By Abdollah Dadizadeh, January 2019. Full paper: dadizadeh20190118.pdf

The Turing machine has been throughout decades one of the widely accepted universal models of computation. Relatively recently however challenges of the universality of this model have been raised. Several problems that are at least apparently not solvable by Turing machines have been introduced. These problems belongs to different paradigms than the classical theory of computation. They are most often, but not always, associated to real time. We consider two such problems that apparently contradict the theory of universal computation model. The first problem is real time and was used to establish a strong hierarchy on the number of processors for real-time problems, thus contradicting the universality of the “uniprocessor” Turing machine. The second problem is not real time, but features constraints on the intermediate steps of the computation; it feature a sharp cut-off on the number of processors necessary to solve it, thus once mode contradicting the universality of the Turing machine. We find that instances of both these problems can be converted to CTL formulae such that the instance is solvable iff the respective CTL formula is satisfiable. Given that satisfiability for CTL is decidable, we thus show, in a however roundabout way, that these two problems do not violate the universality of the Turing machine. We believe that this technique can be applied to other problems contradicting the universality of the Turing machine, perhaps even all of them.

# Improved Balance in Multiplayer Online Battle Arena Games

By Chailong Huang, April 2018. Full paper: huang20180425.pdf

The Multiplayer Online Battle Arena (MOBA) game is a popular type for its competition between players. Due to the high complexity, balance is the most important factor to secure a fair competitive environment. The common way to achieve dynamic data balance is by constant updates. The traditional method of finding unbalanced factors in a MOBA game like DOTA2 is mostly based on professional tournaments, which is a small minority of all the games and not real-time. We develop in this thesis an evaluation system for the DOTA2 based on big data with clustering analysis, neural networks, and a small-scale data collection as a sample. We then provide an ideal matching system based on the Elo rating system and an evaluation system to encourage players to try more different heroes for a diversified game environment and more data supply, which makes for a virtuous circle within the evaluation system.

# The Equivalence of Computation Tree Logic and Failure Trace Testing under Multiple Conversion Algorithms

By Rui Zuo, April 2018. Full paper: zuo20180424.pdf

The two major systems of formal verifications are model checking and model-based testing. Model checking is based on some temporal logic such as Linear Temporal Logic (LTL) or Computation Tree Logic (CTL, the focus of this thesis) for specification and on Kripke structures as models for the systems under test. Model-based testing is an algebraic technique which is based on some operational semantics of processes (such as traces and failures) and its associated pre-orders. One of the most fine-grained pre-order is based on both failures and traces. A previous line of investigation showed that CTL and failure trace testing are equivalent. This equivalence was based in turn on a constructive equivalence between LTS and Kripke Structure; in order for this equivalence to work sets of states rather than individual states need to be considered in model checking. In this paper we will consider another, equally constructive equivalence based on another line of investigation of the matter. We find that this equivalence does not require any modification to the model checking algorithms and CTL and failure trace testing continue to be equivalent under it.

# On the Equivalence between Computation Tree Logic and Failure Trace Testing

By Sunita Singh, August 2017. Full paper: singh20170827.pdf

The two major systems of formal verifications are model checking and model-based testing. Model checking is based on some form of temporal logic for instance Linear Temporal Logic (LTL) or Computation Tree Logic (CTL, the focus of this thesis). Model-based testing is an algebraic technique which is based on some operational semantics of processes (such as traces and failures) and its associated preorders. The most fine-grained preorder is based on failure traces. A previous paper showed that CTL and failure trace testing are equivalent, in the sense that for any failure trace test there exist an equivalent CTL formula and other way around. However, the proof of the conversion from failure trace test to CTL is partially incorrect. We provide in this thesis a corrected proof. We also note that whenever a failure trace test contains cycles the CTL formula produced by the previous conversion algorithm is infinite in size. We develop a modified conversion algorithm which under certain, pretty general conditions produces finite formulae instead.

# Communicating Multi-Stack Visibly Pushdown Processes

By Davidson Madudu, March 2016. Full paper: madudu201603.pdf

Visibly Pushdown Languages (VPL) were proposed as a formalism to model and verify complex, concurrent and recursive computing systems. However, the lack of closure under shuffle for VPL makes it unsuitable for the specification and verification of such complex systems. Multi-stack Visibly Pushdown Languages (MVPL) appear to give a more realistic expression of concurrency and recursion in computational systems, but they turn out to have similar limitations. However, natural modifications of the definition of MVPL operations result in a formalism that becomes suitable for the specification and verification of complex computing systems.

With this result in mind, we introduce an MVPL-based process algebra called Communicating Multi-stack Visibly Pushdown Processes (CMVP). CMVP defines a superset of CSP by combining the interesting properties of finite-state algebras (such as CSP) with the context-free features of MVPL. Unlike any other process algebra, CMVP includes support for parallel composition but also for the general form of recursion. We present the syntax, operational semantics, trace semantics, trace specification, and trace verification of CMVP. In addition to the above, a CMVP trace observer can extract stack and module information from a trace; as a result one can specify and verify many software properties which cannot be specified in other existing process algebra. Such properties include the access control of a module, stack limits, concurrent stack properties, internal property of a module, pre- and post-conditions of a module, etc. CMVP lays the basis of algebraic conformance testing for infinite-state processes, such as application software.

# Profile-Based Access Control in Cloud Computing Environments with applications in Health Care Systems

By Umair Mukhtar Ahmed Naushahi, February 2016. Full paper: naushahi201602.pdf

Recently cloud computing has gained tremendous attention among researchers from both academia and industry. It has emerged as a promising technology for delivering on demand information technology services over the Internet while providing great flexibility in enhancing the hardware, software and network infrastructure of organizations. Consequently many institutions, organizations and individual users are migrating from a traditional computing environment to a cloud computing environment. Cloud computing has also been used in providing health care services such as information sharing, accessing various health care services, and utilizing resources across the board. While cloud computing offers numerous benefits, it also introduces some new and recurring challenges such as data computation & communication integrity, security and privacy. Due to its potential data security and privacy concerns, adopting a cloud based infrastructure for health care domain while meeting the privacy and security regulations recommended by HIPAA, PIPEDA and PHIPA is a big challenge. We study and analyze the security and privacy requirements of health care systems in a cloud computing environment and present a profile based access control system to improve the security and privacy of health care data while providing seamless service provisioning. We extend the concept of access control list by incorporating the Profile attribute and define rules for each profile to grant access to the system and its resources. In addition, when user authentication is done, profiles are mapped to the applications running on the cloud and eliminate the need of repetitive authentication requests for accessing each application. We also decompose the access control list in different parts. Such a decomposition helps reduce the management and administration cost, and allows updating or adding new rules without maintaining complex rule matrices. The profile authentication process can be offered as a service to support authentication and system availability across different cloud deployments. Simulation results show that our solution offers reduced data access time and improves service provisioning while reducing authentication requests. Considering the hierarchical organizational structure of health care systems, the profile based access control approach best fits its requirements.

# Computation Tree Logic Is Equivalent to Failure Trace Testing

By A. F. M. Nokib Uddin, July 2015. Full paper: uddin20150720.pdf

The two major systems of formal verification are model checking and algebraic techniques such as model-based testing. Model checking is based on some form of temporal logic such as linear temporal logic (LTL) or computation tree logic (CTL). CTL in particular is capable of expressing most interesting properties of processes such as liveness and safety. Algebraic techniques are based on some operational semantics of processes (such as traces and failures) and its associated preorders. The most fine-grained practical preorder is based on failure traces. The particular algebraic technique for formal verification based on failure traces is failure trace testing.

It was shown earlier that CTL and failure trace testing are equivalent; that is, for any failure trace test there exists a CTL formula equivalent to it, and the other way around. Both conversions are constructive and algorithmic. The proof of the conversion from failure trace tests to CTL formulae and implicitly the associated algorithm is however incorrect.

We now provide a correct proof for the existence of a conversion from failure trace tests to CTL formulae. We also offer intuitive support for our proof by providing worked examples related to the examples used earlier to support the conversion the other way around, thus going full circle not only with the conversion but also with our examples. The revised proof continues to be constructive and so the conversion continues to be algorithmic.

Our corrected proof completes an algorithmic solution that allows the free mix of logic and algebraic specifications for any system, thus offering increased flexibility and convenience in system specification for formal verification.

# The “ptrace” Solution to Stack Integrity Attacks in GNU/Linux Systems

By Erick Leon, May 2015. Full paper: leon20150504.pdf

Security in Information Technologies is an ever-changing field due to the fact that threats appear constantly with newer, more complex methods of attack as time goes on. Constant updates are therefore necessary on the systems which perform the operations, as well as the knowledge-pool of the personnel in charge of their security. With that in mind, it is also imperative to establish that while a team of human resources may be in charge of a system, they cannot realistically overview and monitor every single operation of it in order to detect anomalies or incidents that could jeopardize individual operations or in some cases, the entire system. As such, we argue that countermeasures against potential threats and attacks should be performed by the system itself while allowing it to remain useful. In addition, precise and concise notifications should be issued to the human resources in charge of such systems whenever threats and attacks are detected. That is the basis for our work: allow protection of a system while minimizing operational impact and issue proper notifications with precise data to the human resources.

The work outlined in this thesis addresses issues with cyber attack techniques that make use of certain logical address manipulations such as buffer overflows and code injection in GNU/Linux systems. The development of the solution explained in this document picks up where others left off, and so is mainly based on an earlier approach that was left at a proof-of-concept stage. We take this approach to a real-world scenario while attempting to remain true to the previous statements about the protection of systems and how feasible it can be.

Specifically, we develop a working, tested code that evaluates an executing process under GNU/Linux environments and whenever a call opcode has been detected, it stores and manages the corresponding stack pointer, which then gets validated once a ret opcode is found. This is performed during execution-time, which makes it a valuable tool to intercept buffer overflow attacks in real-time.

# Toward a Model Checker for Ambient Logic Using the Process Analysis Toolkit

By Yujie Sun, January 2015, QA 76.76 .V47 S86 2015 (Old Library). Full paper: sun201501.pdf

Model checking has emerged as an effective verification method in both software and hardware development. A dedicated model checker is thus a useful tool for most application domains. Ambient Calculus is a process calculus where processes may reside within a hierarchy of locations. Processes can then move through the location hierarchy and modify it. The Ambient Logic is a modal logic designed to specify properties of distributed and mobile computations programmed in the Ambient Calculus. It includes logical operators that can be used to make assertions about locations and their names.

This thesis is concerned on how to build a model checker for mobile ambient processes against specification described as formulas in the Ambient Logic. For a given program written in ambient calculus and its properties specified as an ambient logic formula, the ambient model checker will be used to determine automatically whether the process satisfies the specification (logic formula). We start the work toward such an ambient model checker by implementing the syntax and semantics of the Ambient Calculus as a PAT (Process Analysis Toolkit) module. We also implemented in this module a syntactic definition for the Ambient Logic. The actual implementation of the model checking algorithm is the subject of future work.

# Parallel Communicating Grammar Systems with Context-Free Components Are Really Turing Complete

By Mary Sarah Ruth Wilkin, November 2014, QA 267.3 .W55 2014 (Old Library). Full paper: wilkin201411.pdf

Parallel Communicating Grammar Systems (PCGS) were introduced as a language-theoretic treatment of concurrent systems. A PCGS extends the concept of a grammar to a structure that consists of several grammars working in parallel, communicating with each other, and so contributing to the generation of strings. PCGS are generally more powerful than a single grammar of the same type. PCGS with context-free components (CF-PCGS) in particular were shown to be Turing complete. However, this result only holds when a specific type of communication (which we call broadcast communication, as opposed to one-step communication) is used. We expand the original construction that showed Turing completeness so that broadcast communication is eliminated at the expense of introducing a significant number of additional, helper component grammars. We thus show that CF-PCGS with one-step communication are also Turing complete. We introduce in the process several techniques that may be usable in other constructions and may be capable of removing broadcast communication in general.

We also show how an earlier result proving that CF-PCGS only generate context-sensitive languages is incorrect. We discover that this proof relies of coverability trees for CF-PCGS, but that such coverability trees do not contain enough information to support the proof. We are also able to conclude that coverability trees are not really useful in any pursuit other than the one already considered in the paper that introduces them (namely, determining the decidability of certain decision problems over PCGS).

# An Approach to Stack Overflow Counter-Measures Using Kernel Properties

By Benjamin Teissier, November 2013, QA 76.9 .A25 T45 2013 (Old Library). Full paper: teissier201311.pdf

The computer security became recently a subject of general concern for the main public, for governments, and for private companies alike. Indeed, the latest news involving stuxnet, flame, and so many other viruses made clear to everybody that privilege escalation and security threats in general deserve a greater attention. The range of buffer overflow exploitations in particular is large, ranging from data leaks to taking over a complete computing system. These are all dangerous, and most of the time the possibilities of exploitation are only limited by the skills of the attacking hacker.

Our work contributes to this investigation by analyzing the hacking techniques for exploiting buffer overflows in order to understand the existing counter measures, and eventually find a more accurate way to prevent the exploitation of buffer overflows. Our working platform is the GNU/Linux family of operating systems. Our work is at the highest privilege levels and in the safest part of a GNU/Linux system, namely the kernel. We provide a system that allows the kernel to detect overflows and prevent their exploitation. Our system allows the kernel to inject at launch time some (minimal) code into the binary being run, and subsequently use this code to monitor the execution of that program with respect to its stack use, thus detecting stack overflows. The system stands alone in the sense that it does not need any hardware support; it also works on any program, no matter how that program was conceived or compiled. Beside the theoretical concepts we also present a proof-of-concept patch to the kernel supporting our idea. Overall we effectively show that guarding against buffer overflows at run time is not only possible but also feasible, and we also take the first steps toward implementing such a defense.

# A Distributed Architecture for Remote Service Discovery in Pervasive Computing

By Farzad Salehi, December 2011, QA 76.5915 .S25 2011 (Old Library). Full paper: salehi201112.pdf

The area of investigation of this dissertation is service discovery in pervasive com- puting. Service discovery is very important in realizing the concept of pervasive computing. In addition, service discovery protocols must be able to work in the het- erogeneous environment offered by pervasive computing. Remote service discovery in particular has not been properly achieved so far.

In an attempt to remedy this we propose a new architecture for enabling typical (local) service discovery mechanisms (without the ability of remote service discov- ery) to discover services remotely. Our architecture uses Universal Plug and Play (UPnP) as a prototype for normal (local) service discovery protocols, and Gnutella as a prototype for peer to peer distributed search protocols. We introduce a module called service mirror builder to the UPnP mechanism, and a remote communication protocol over a Gnutella network. As a consequence, UPnP networks become able to discover services in remote networks (that is, remote service discovery).

# A Testing Framework for Real-Time Specifications

By Chun Dai, December 2008, QA 76.54 .D34 2008 (Old Library). Full paper: dai200901.pdf.

This thesis proposes a new semantic model for reasoning about real-time system specifications featuring a combination of timed processes and formulas in linear-timed temporal logic with time constraints (TLTL). Based on a theory of timed omega-final states, the thesis presents a framework of timed testing and two refinement timed preorders similar to De Nicola and Hennessy's may and must testing. The paper also provides alternative characterizations for these relations to show that the new preorders are extensions of the traditional preorders and to lay the basis for a unified logical and algebraic approach to conformance testing of real-time systems. The thesis then establishes a tight connection between TLTL formula satisfaction and the timed must-preorder. More precisely, it is shown that a timed labeled transition system satisfies a TLTL formula if and only if it refines an appropriately defined timed process constructed from the formula. Consequently, we developed a timed must-preorder which allows for a uniform treatment of traditional notions of process refinement and model checking under time constraints. The implications of this novel theory are illustrated by means of a simple example system, in which some components are specified as transition systems and others as TLTL formulas. Key words: real time, real-time transition system, timed process, timed preorder, timed testing, conformance testing, model checking, LTL.

# Communicating Visibly pushdown Processes

By Md. Tawhid Bin Waez, December 2008, QA 76.9 .F67 B56 2008 (Old Library). Full paper: waez200812.pdf

Visibly pushdown languages are a subclass of context-free languages that is closed under all the useful operations, namely union, intersection, complementation, renaming, concatenation, prefix, and Kleene star. The existence of a concurrent, fully compositional process algebra based on such languages require that these languages be also closed under two more operations, namely shuffle, and hiding. We prove here both of these closure properties. We also give the semantics of visibly pushdown automata in terms of labelled transition systems. We then propose Communicating Visibly pushdown Processes (CVP), a fully compositional concurrent process algebra based on visibly pushdown automata. CVP is a superset of CSP, thus combining all the good properties of finite-state algebrae with context-free features. Unlike any other process algebra, CVP includes support for parallel composition but also for self-embedding recursion. We present the syntax, operational semantics, trace semantics, trace specification, and trace verification of CVP. A CVP trace observer can extract stack and module information from the trace; as a result one can specify and verify many software properties which cannot be specified in any other existing process algebra. Such properties include the access control of a module, stack limits, concurrent stack properties, internal property of a module, pre-/post-conditions of a module, etc. CVP lays the basis of algebraic conformance testing for infinite-state processes, such as application software.

# Model Checking is Refining-Computational Temporal Logic is Equivalent to Failure Trace Testing

By Zhyiu Zhang, December 2008, QA 76.76 .V47 Z53 2008 (Old Library). Full paper: zhangz200812.pdf

The two major systems of formal verification are model checking and algebraic model-based testing. Model checking is based on some form of temporal logic such as LTL or CTL. The most powerful and realistic logic being used is CTL, which is capable of expressing most interesting properties of processes such as liveness and safety properties. Model-based testing is based on some operational semantics of processes (such as traces, failures, or both) and their associated preoders. The most fine-grained preorder beside bisimulation (of theoretical importance only) is the one based on failure traces. We show that these two most powerful variants are equivalent. That is to say, we show that for any failure trace test, there exists a CTL formula equivalent to it (meaning that a system passes the test if and only if the system satisfies the formula and the other way around). When we specify the system, we can use temporal logic formulas such as CTL formulas to express the properties of it. We can also use algebraic method such as labelled transition system or finite state automaton to describe the system's desired behaviour. If parts of a larger system are specified by these two means at the same time, combining the result of doing model checking and the result of applying model-based testing won't be ideal, satisfactory and even correct. In this case, we convert one specification to the form of the other. We then do a formal verification for the whole system. The result will be more convincing and correct

# The Relationship Between Several Parallel Computational Models

By Yuanqiao Zhang, December 2008, QA 76.58 .Z53 2008 (Old Library). Full paper: zhangy200812.pdf

The Parallel Random Access Machine (PRAM for short) is the most convenient and widely used model of parallel computation. Other, more complex models have also been studied. Examples include the Broadcast with Selective Reduction (or BSR) and the Reconfigurable Multiple Bus Machine (or RMBM). While the PRAM and the BSR are shared memory models, the RMBM accomplishes the communication between processors using buses. In this thesis we identify surprising relationships between these models. We show that several variants are equivalent with each other in a strong sense, and we establish strict distinctions between the other variants. Some models are folklorically considered feasible and some others are not considered so. We find a delimitation that matches the folklore, but we also find important (and intriguing) equivalencies.