Talk: Sparse models for integrative analysis of fMRI and genetic data, 6/13

CSEE Talk

Sparse models for integrative analysis

of fMRI and genetic data

Dr. Yu-Ping Wang
Biomedical Engineering Department
Biostatistics & Bioinformatics Department
Tulane University

2pm Thursday, 13 June 2013, ITE 346

In the last few years, the combination of imaging and genetic approaches has become an emerging area, where multiple complementary data are utilized for systematic and comprehensive analysis of a patient. While imaging approaches such as functional MRI (fMRI) continue to be major diagnostic tools for extracting structural and functional patterns at the tissue and organ levels, genetic techniques such as SNPs, microarray gene expression and the emerging next generation sequencing (NGS) add new dimensions by revealing structural variations at genomic level. The integration of these multiscale and multimodality approaches has been promising for complex disease diagnosis and prognosis. However, the combination of these data has been challenging because these data are of different nature, format, organization and structure are produced by different genomic platforms at multiple scales; each of these imaging data is currently still analyzed separately and the results are interpreted independently. Being a powerful approach recently developed in statistics and signal processing, sparse data representations or compressive sensing provides a promising way to address these challenges facing multiscale genomic imaging informatics. In this talk, I will present our recent research on the development of sparse models such as sparse canonical correlation analysis (sCCA) and joint sparse representation of multi-modal data that can better capture the interrelations between these data. We show latest examples of using these models for integrative analysis of SNP and fMRI to identify biomarkers, and use the joint information for the identification of schizophrenia diseases.

Dr. Yu-Ping Wang received the BS degree in applied mathematics from Tianjin University, China, in 1990, and the MS degree in computational mathematics and the PhD degree in communications and electronic systems from Xi'an Jiaotong University, China, in 1993 and 1996, respectively. After his graduation, he had visiting positions at the Center for Wavelets, Approximation and Information Processing of the National University of Singapore and Washington University Medical School in St. Louis. From 2000 to 2003, he worked as a senior research engineer at Perceptive Scientific Instruments, Inc., and then Advanced Digital Imaging Research, LLC, Houston, Texas. In the fall of 2003, he returned to academia as an assistant professor of computer science and electrical engineering at the University of Missouri-Kansas City. He is currently an Associate Professor of Biomedical Engineering and Biostatistics & Bioinformatics at Tulane University School of Science and Engineering & School of Public Health and Tropical Medicine. He is also a member of Tulane Center of Bioinformatics and Genomics and Tulane Cancer Center. His research interests lie in the interdisciplinary biomedical imaging and bioinformatics areas, where he has over 100 publications. He has served on numerous program committees and NSF/NIH review panels, and was a member of Machine Learning for Signal Processing technical committee of the IEEE Signal Processing Society.

Tutorials by Center for Hybrid Multicore Productivity Research students,1-5 Wed 6/12

UMBC's Center for Hybrid Multicore Productivity Research, an NSF Industry & University Cooperative Research Center is holding its Industry Advisory Board meeting at UMBC 12-14 June. Students from UMBC and UCSD will present tutorials on a number of the technologies underlying ongoing CHMPR projects in a session from 1:00-5:00 on Wednesday June 12 in ITE 456. The tutorial session is free and open to the public.

  • 3-D Printing – Timothy Blattner (UMBC)
  • Semantic Table Information – Varish Mulwad (UMBC)
  • Social Media Elastic Search – Oleg Aulov (UMBC)
  • Machine Learning for Social Media – Han Dong (UMBC)
  • Virtual World Interactions – Erik Hill (UCSD)

PhD defense: On Prediction and Estimation for Datastreams Utilizing Sparsity and Structure, 6/6

Ph.D. Dissertation Defense

On Prediction and Estimation for Datastreams

Utilizing Sparsity and Structure

Shiming Yang

10:00am-12:00pm, 6 June 2013, ITE 325b, UMBC

With the unprecedented fast growth of data, we have better opportunities to understand our complex world, and simultaneously face pervasive challenges in efficiently inferring the meaning behind these vast amounts of data. It is particularly important to explore the intrinsic structures in data to increase our rational understanding of the latent mechanisms that generate them. In modeling, structures are features used to characterize the underlying systems, such as the rank of a system, the number of clusters, the levels of hierarchy, and the order of spatio-temporal correlations in multiple measurements.

In this thesis, we present our research contributions on utilizing structures and sparsity in observed data to improve estimation and prediction of trajectories of system states for two systems: the highway traffic system and the human physiology systems. Both systems exhibit features that are seen in many other applications.

For the traffic problem, it is useful to know the near–term traffic conditions after the occurrence of some events which have noticeable impact on the road traffic. Often used macroscopic models, which view road traffic as fluid flowing in pipes, suffer from various inaccuracies, which could be mitigated by incorporating past observations to correct predictions. However, we often have limited observation and computing resources (e.g., probe vehicles, smartphones, bandwidth, sensors) to gather and process past observations. We describe a novel low-overhead strategy to adaptively select observation sites in real-time by using the density of the mesh of the numerical solution of the underlying mathematical model to capture the variability of that solution. We show that our proposed strategy improves the numerical accuracy of near–term traffic forecasting with limited observation resources as compared with with uniform deployment of the observation resources. In addition to deploying limited observation resources, one is often concerned with detecting special traffic events. To this end, we propose a novel method to decompose traffic observations into normal background and sparse events. Our method couples multiple traffic datastreams so that they share a certain sparse spatio–temporal structure.

We also study the utility of sparseness and structure in physiological datastreams. Missing values hinder the use of many machine learning methods. We show how to incorporate ideas from compressive sensing into handling the missing values problem in continuous intracranial pressure (ICP) datastreams from patients with traumatic brain injury. We experimentally evaluate the proposed method in experiments where randomly selected ICP values are marked as missing. We find our method gives estimated missing values that are in better agreement with the true values as compared with k–nearest neighbor and expectation maximization data imputation methods.

Moreover, predicting the near–term intracranial pressure for traumatic brain injury patients is of great importance to clinicians. Traditional regression methods, need an explicit parametric form of the model to fit. However, due to our limited knowledge of the complex brain physiology, it is difficult to specify an accurate parametric model. To overcome this difficulty, our model uses Gaussian processes to quantify our prior beliefs on the smoothness of the regression model, and performs regression in an infinite dimensional space. We show that the proposed Gaussian process regression model shows predicts ICP changes in clinically useful timeframes and may support future development of minimally-invasive ICP monitoring systems, earlier intervention strategies, and better patient outcomes.

Committee: Drs. K. Kalpakis (Chair), Alain Biem (IBM TJ Watson), Chein-I Chang, Colin MacKenzie, Dhananjay Phatak, Yaacov Yesha

MS Defense: Nimbus: Scalable, Distributed, In-Memory Data Storage 6/6

MS Defense

Nimbus: Scalable, Distributed, In-Memory Data Storage

Adam Shook

1:30pm Thursday, 6 June 2013, 325b ITE, UMBC

The Apache Hadoop project provides a framework for reliable, scalable, distributed computing. The storage layer of Hadoop, called the Hadoop Distributed File System (HDFS), is an append-only distributed file system designed for commodity hardware. The append-only nature of the file system limits the ability for applications to have random reads and writes of data. This was addressed by Apache HBase and Apache Accumulo, which both allow for quick random access to a highly scalable key/value store.

However, these projects still require data to be read from the local disk of the server, and therefore cannot handle the type of I/O throughput that many applications require. This limits the potential for "hot" data sets that cannot be stored in memory of one machine, but do not need the scalability of HBase, i.e. the ones that can be sharded and stored in memory on dozens of machines. These data sets are often referenced by many applications and can be dozens of gigabytes in size.

Nimbus is a project designed for Hadoop to expose distributed in-memory data structures, backed by the reliability of HDFS. By executing a series of I/O benchmarks against HBase, Nimbus's architecture and implementation are validated by demonstrating the performance advantage over HBase, allowing for high-throughput data fetch operations. The overall architecture and design of each component are discussed to validate Nimbus's design goals, as well as a description of relevant use cases and future work for the project.

Committee: Drs. Tim Finin (chair), Anupam Joshi and Konstantinos Kalpakis

Phd Defense: Dingkai Guo, Mid-Infrared Photonic Integration 6/4

Ph.D. Dissertation Defense

Mid-Infrared Photonic Integration

Dingkai Guo

10:00am Tuesday, 4 June 2013, TRC CASPR conference room

The mid-Infrared (Mid-IR) wavelength range is important for applications including medical and security imaging, environmental trace gas sensing and free space communications. However, photonic integrated circuits (PICs) in the mid-IR range are completely under-developed which significantly slows the reduction of mid-IR system size, weight, and coupling losses and limits the development of highly functional mid-IR photonic modules with lower cost. In this dissertation, a solution to mid-IR photonic integration was demonstrated using a compact widely tunable mid-IR transmitter and a mid-IR amplifying photo-detector, which can be integrated with the mid-IR source.

This integrated widely tunable mid-IR source is fabricated by incorporating super structure grating (SSG) to the mid-IR quantum cascade laser (QCL) waveguide. The emission wavelength of the fabricated SSG-DBR QCL can be well controlled by varying the injection currents to the two grating sections. The wavelength can be tuned from 4.58μm to 4.77μm (90cm-1) with a supermode spacing of 30nm. This SSG-DBR QCL can be a compact replacement for the external cavity QCL used in current mid-IR sensors.

Mid-IR amplification and detection can be achieved using the same material as the mid-IR source. This QCL amplifier has an adjustable bandwidth and tunable gain peak, so it can function as a tunable mid-IR filter. By biasing the QCL just below its threshold, we demonstrated more than 11dB optical gain and over 28dB electrical gain at specified wavelengths. In the electrical gain measurement process, the resonant amplifier also functioned as a detector. This indicates that intersubband-based gain materials are ideal candidates for mid-IR photonic integrations.

Beside the optimized fabrication processes, new characterization technique based on the electrical derivative of the QCL I-V curves is used to quickly acquire the QCL threshold and leakage current, and explore the device carrier transport. The leakage currents present in different QCL waveguide structures are also studied and compared using this technique.

Finally, we report that the telecom wavelengths induced optical quenching effects on mid-IR QCLs when the QCLs are operated well above their threshold. The quenching effect is a result of intersubband bandbending and it depends on the coupled near-IR intensity, wavelength, and the QCL voltage bias. The quenching effects not only can be used for mid-IR QCL optical switching and modulation but also reveal that the mid-IR QCLs can function as “converters” to convert the telecom optical signal into the mid-IR optical signal at the near-IR fiber end.

A coherent mid-IR transceiver with both transmitting and receiving functions can be realized based on each integrated component introduced in this dissertation. This compact transceiver includes an integrated widely tunable mid-IR source, a mid-IR filter, amplifier, and detector based on the same material system.

Committee: Drs. Fow-Sen Choa (Chair), Anthony Johnson, Terrance Worchesky (Physics) , Li Yan, Gymama Slaughter

MS defense: A Multilayer Framework to Catch Data Exfiltration

MS Thesis Defense

A Multilayer Framework to Catch Data Exfiltration

Puneet Sharma

10:30am Wednesday, 5 June 2013, 325b ITE, UMBC

Data exfilteration is the unauthorized leakage of confidential data from a particular system. It is a specific form of intrusion that is particularly hard to catch due to the most common cause: an insider entity who is responsible for the leak. That entity could be a person employed in the organization or a malicious hardware component bought from an unreliable third party. Catching such intrusions, therefore, can be extremely difficult. We describe a framework comprising multiple parameters that are constantly monitored in a system. These parameters can cover the entire stack of the computer architecture, from the hardware up to the application layer. Malicious behavior is detected by different modules monitoring these parameters and an aggregated attack alert is produced if multiple modules detect malicious activity within a short period of time. A more distributed and comprehensive monitoring framework should ensure that designing an attack becomes extremely difficult since an attack must go through multiple detectors present in the system without raising any alarms.

Committee: Drs. Anupam Joshi (chair), Tim Finin, Chintan Patel

PhD proposal: Yu Wang, Solving the Physically-Based Modeling and Animation Problem with a Unified Solution

Ph.D. Dissertation Proposal

The Modeling Equation: Solving the Physically-Based

Modeling and Animation Problem with a Unified Solution

Yu Wang

11:00am Monday, June 3, 2013, VANGOGH Lab, ITE 352

Physically-based modeling, i.e. the ability to model sophisticated geometrical shapes and objects in complex physical environments, is an important and popular research area in computer graphics, especially in animation and modeling. Rigid body dynamics studies how solid objects react to external forces without considering collisions (unconstrained), or the interaction between rigid bodies without inter-penetration (constrained). Deformable object modeling accounts for the effects of material properties, external forces, and environment constrains on object deformation. Fluid simulation in computer graphics heavily studies efficient way of solving and/or approximating the physically-based Navier-Stokes equations.

It’s difficult to account for these behaviors from a mechanics point of view, but they have analogous rheological equations. To be exact, rheology studies deformation and flow of matters by accounting for the movements of particles that comprise the material relative to each other. There are three different rheological properties: if we apply definite forces to a material to make it reach a definite deformation, and the deformation goes back when the forces are removed, the material is elastic; if the deformation remains permanent, the material is plastic; or under definite forces, the deformation keeps increase without a limit, the material flows.

I’m proposing to create physically-accurate material behaviors using a generalized formulation based on rheological theories, i.e. kinematic and dynamic properties of rigid bodies, deformable objects, fluid-like materials can be represented by the same formulation with different weights to their rheological properties.

Committee: Drs. Marc Olano (Chair and Advisor), Matthias K. Gobbert (Mathematics and Statistics), Penny Rheingans, Lynn Sparling (Physics), Jian Chen

MS defense: Extracting cybersecurity related entities, terms and concepts from text

MS Thesis Defense

Extracting cybersecurity related entities,
terms and concepts from text

Ravendar Lal

10:30am Tuesday, 28 May 2013, ITE325b, UMBC

Securing computers, data, cyber-physical systems and networks is a growing problem as society's dependence on them increases while they remain vulnerable to attacks by both criminals and rival nation states. Creating 'situationally aware' computer systems that defend against new "zero day" software vulnerabilities requires them to automatically integrate and use new security-related data from a wide variety of sources. One important source is information found in text from security bulletins, vulnerability databases, news reports, cybersecurity blogs and Internet chat rooms.

We describe an information extraction framework to extract cybersecurity-relevant entities, terms and concepts from text. We use a Conditional Random Field based model trained on manually annotated data to identify and extract the relevant terms. These are then mapped to a previously developed OWL ontology and represented as RDF linked data. We evaluated the system's performance by comparing its results on test data from the National Vulnerability Database and security bulletins from Microsoft and Adobe.

Committee: Drs. Tim Finin (Advisor), Anupam Joshi, Tim Oates

PhD defense: Quantum Cascade Laser Arrays for Standoff Photoacoustic Chemical Detection, 5/17

Ph.D. Dissertation Defense

High Power Mid-infrared Quantum Cascade Laser Array
for Standoff Photoacoustic Chemical Detection

Xing Chen

1:00-3:00pm Friday, 17 May 17 2013, TRC CASPR Conference Room

Quantum cascade lasers (QCLs) are compact, portable, powerful semiconductor laser sources with emission wavelengths from mid-infrared (mid-IR) to terahertz (THz) regions of the electromagnetic spectrum. Mid-IR (i.e. wavelengths from 3 to 20 µm) QCLs are of great importance in a wide range of applications such as trace gas sensing, environmental monitoring, free space communication, medical diagnosis and so on. High power QCLs are particularly important to applications such as infrared counter measure (IRCM) and standoff chemical detections. In such applications, the system performances critically depend on the amount of power a QCL can produce. This dissertation includes two major studies: the first part of the dissertation includes design, fabrication and characterization of high power mid-IR QCL arrays; the second part involves standoff chemical detection using QCLs as laser sources and photoacoustic effect as sensing technologies.

In the first part of the dissertation, we design, fabricate and characterize multi-emitter QCL arrays consisting of multiple narrow laser stripes. Simulation results indicate that the proposed multi-emitter laser arrays present much better thermal performance than a broad area laser device, while having the same thermal management ability as a single narrow stripe device. We have successfully fabricated edge emitting and surface emitting QCL arrays with 5 and 16 emitters. Experimental results show that, with the same laser cavity length, a QCL array with 5 emitters produces over 3 times more power than a single emitter laser device. QCL array with 16 emitters generates about 4 W output peak power at wavelength ~7.9 µm. We have also fabricated surface emitting QCL arrays and demonstrated single mode emission.

The second part of the dissertation involves using high power mid-IR QCLs to perform standoff chemical detections based on photoacoustic sensing technologies. Photoacoustic effect is a light-matter interaction effect that involves generation of acoustic waves when a medium absorbs electromagnetic energy from light. It has been known as a sensitive spectroscopic technique for chemical sensing.

Standoff photoacoustic chemical detection with distance more than 41 feet using quantum cascade laser operated at relatively low power, less than 40 mW, is demonstrated. A simplified theoretical model is developed for pulsed laser photoacoustic effect in open-air environment. The standoff photoacoustic signal can be calibrated as a function of different parameters such as laser pulse energy, gas vapor concentration and detection distance. The results yield good agreements with theoretical model. Standoff detection of solid phase explosive chemicals has also been demonstrated by the use of an ultra-sensitive microphone and acoustic reflector. More than 8 feet detection distance is obtained for standoff photoacoustic sensing of explosives.

Committee: Drs. Fow-Sen Choa (Chair), Brian Cullum, Yordan Kostov, Ryan Robucci, Chen-Chia Wang and Li Yan

PhD proposal: A Semantic Resolution Framework for Manufacturing Capability Data Integration

Ph.D. Dissertation Proposal

A Semantic Resolution Framework for
Manufacturing Capability Data Integration

10:30am Tuesday, May 14, 2013, ITE 346, UMBC

Yan Kang

Building flexible manufacturing supply chains requires interoperable and accurate manufacturing service capability (MSC) information of all supply chain participants. Today, MSC information, which is typically published either on the supplier’s web site or registered at an e-marketplace portal, has been shown to fall short of the interoperability and accuracy requirements. This issue can be addressed by annotating the MSC information using shared ontologies. However, ontology-based approaches face two main challenges: 1) lack of an effective way to transform a large amount of complex MSC information hidden in the web sites of manufacturers into a representation of shared semantics and 2) difficulties in the adoption of ontology-based approaches by the supply chain managers and users because of their unfamiliar of the syntax and semantics of formal ontology languages such as OWL and RDF and the lack of tools friendly for inexperienced users.

The objective of our research is to address the main challenges of ontology-based approaches by developing an innovative approach that can effectively extract a large volume of manufacturing capability instance data, accurately annotate these instance data with semantics and integrate these data under a formal manufacturing domain ontology. To achieve the objective, a Semantic Resolution Framework is proposed to guides every step of the manufacturing capability data integration process and to resolve semantic heterogeneity with minimal human supervision. The key innovations of this framework includes 1) three assisting systems, including a Triple Store Extractor, a Triple Store to Ontology Mapper and a Ontology-based Extensible Dynamic Form, that can efficiently and effectively perform the automatic processes of extracting, annotating and integrating manufacturing capability data.; 2) a Semantic Resolution Knowledge Base (SR-KB) that incrementally filled with, among other things, rules/patterns learned from errors. This SR-KB together with an Upper Manufacturing Domain Ontology (UMO) provide knowledge for resolving semantic differences in the integration process; 3) an evolution mechanism that enables SR-KB to continuously improve itself and gradually reduce the human involvement by learning from mistakes.

Committee: Yun Peng (chair), Charles Nicholas, Tim Finin, Yaacov Yesha, Boonserm Kulvatunyou (NIST)

1 34 35 36 37 38 58