Current Research Projects

Common Sense

The Common Sense team is developing mobile sensing platforms to support citizen science. Increasing numbers of mobile devices have the potential to become personal environmental sensors. While some types of sensors (e.g., geolocation, motion, sound, etc.) are already commonly present in consumer devices, other kinds of compact, low-power sensors (e.g., air quality) are not yet commonly included but offer the ability to collect additional data of individual and social interest... more »
The Common Sense team is developing mobile sensing platforms to support citizen science. Increasing numbers of mobile devices have the potential to become personal environmental sensors. While some types of sensors (e.g., geolocation, motion, sound, etc.) are already commonly present in consumer devices, other kinds of compact, low-power sensors (e.g., air quality) are not yet commonly included but offer the ability to collect additional data of individual and social interest.

To leverage this potential, we are developing sensor-equipped mobile devices that allow everyday citizens to collect environmental data. We are also collaborating with researchers at the University of California, Berkeley on the development of a novel sensor to measure particulate matter, which is a critical air pollutant.

Further, we are developing mobile and Internet-based software applications that allow people and communities to analyze and share the environmental data they collect, so that they can influence environmental regulations and policies. To make environmental sensing useful for practical action, one must do more than just “collect” and “present” data. While mobile sensing is an active research area, as yet little is known of how such systems might fit into the context of real-world environmental action. In order to inform future applications of mobile and pervasive technology, we have conducted design fieldwork on the social and organizational landscape of environmental action – government agencies, public health NGOs, atmospheric scientists, and so on.

We are leveraging our fieldwork in the design and development of a family of devices and software. First, we have developed a personal mobile device that collects data as people go about their daily lives. We are deploying this device and accompanying web software in West Oakland, in collaboration with a community action group. Second, we have developed a vehicular platform. We have collaborated with the City of San Francisco to put this system on the municipal fleet of street sweepers. The goal is to leverage mobile infrastructure to collect street-by-street readings as the vehicles move throughout the city. Our devices currently report GPS, carbon monoxide, ozone, nitrogen oxide, temperature, and humidity data. « less

Learn more about Common Sense at www.communitysensing.org
Project Team Intel Labs Berkeley Paul Aoki
Alan Mainwaring
Allison Woodruff
Collaborators City of San Francisco Office of the Mayor
Department of the Environment
Department of Public Works
Isopod Design Chris Myers
Nokia Research R.J. Honicky
State University of Nizhny Novgorod Nikolay Chistyakov
Max Sokolov
UC Berkeley EECS Maneesh Agrawala
John Canny
Frederick Doering
Prabal Dutta
Neil Kumar
Richard White
Wes Willett
Baladitya Yellapragada
UC Berkeley Atmospheric Science Center Ron Cohen
Paul Wooldridge
West Oakland Environmental Indicators Project Brian Beveridge

Confrontational Computing

Life is an endless encounter with disputed information; everything from misleading adverts, and biased journalism all the way down to conversations in a bar. The Confrontational Computing project aims to understand how and why people argue on the web, and to develop tools that help people form and promote opinions... more »
Life is an endless encounter with disputed information; everything from misleading adverts, and biased journalism all the way down to conversations in a bar. The Confrontational Computing project aims to understand how and why people argue on the web, and to develop tools that help people form and promote opinions.

How do people use the web to help them form beliefs about the world? How do people promote their own opinions to others online? What are the impacts of opinions expressed online, particularly when they are confrontational? Can we build tools that make it easier for a user to know when other people disagree with the opinion that they are reading? Can we build tools that help a user understand why others disagree with them?

Dispute Finder - Disputed Information on the Web

We have built Dispute Finder, a web browser extension that alerts a user when information they read online is disputed by a source that they might trust. Dispute Finder examines the text on every page the user reads and highlights claims and opinions that it believes conflict with information from other sources that the user might trust. If the user clicks on a highlighted phrase then Dispute Finder shows them a list of articles that put forward alternative points of view.

Dispute Finder builds a database of known disputed claims by crawling web sites that already maintain lists of disputed claims, and by allowing users to enter claims that they believe are disputed. Dispute Finder indentifies snippets that make known disputed claims by running a simple textual entailment algorithm inside the browser extension.

Disputed Information in Conversation

Much of the information we encounter in our lives comes from listening to other people talk. We are building an experimental device that listens to speech and alerts the user if they say are hear anything that is disputed. If a user hears or says something that is disputed, the device alerts them that there is another point of view that they might want to consider.

Understanding the "how" and the "why" of Online Arguing

In addition to building new tools, this project is pursuing a social research agenda to better identify who is arguing online, why they argue, and how they go about arguing. Preliminary interviews with potential Dispute Finder users revealed a strong and nearly universal goal to "determine the bias" of regularly browsed news sources. This bias detection process generally involves comparing information between news sources and against personal experience, with a focus on which statements are in conflict and which ones are not. In terms of expressing one's (often controversial) opinions, we found that a slight majority of people avoid it. Their reasons are quite idiosyncratic, although they roughly fall into two the general categories: "don't have time" or "don't want to get involved". Of the remaining people that do express their opinions online, we have found that they very rarely express their opinions with the genuine hope of changing their readers' beliefs. Rather, they seem to be more motivated by the thrill of the conflict, or by the simple fact that expressing their opinions displays aspects of their own identity or skills. Come talk to us to learn more about what we have found and we are using the findings to refine Dispute Finder and the other tools we are building. « less
A preview of Dispute Finder is available at disputefinder.cs.berkeley.edu
Project Team Intel Labs Berkeley Rob Ennals
Intel Labs Santa Clara John Mark Agosta
Intel Labs People and Practices Research Tye Rattenbury
Tad Hirsch
Collaborators UC Berkeley Beth Trushkowsky
Michael Armbrust
Wesley Willet
Dan Byler
Nick Kong
Joe Hellerstein
Christine Robson
Jesse Trutna
Nick Lanham
Armando Fox
Maneesh Agrawala

RouterBricks

In recent years, there has been significant concern within the computer science community over the inflexibility of the Internet’s infrastructure. Simply put – the original designers of the Internet never anticipated its astounding success. Consequently, traditional network design took a fairly narrow view of the functionality that network equipment (e.g., network routers or switches) must support... more »
In recent years, there has been significant concern within the computer science community over the inflexibility of the Internet’s infrastructure. Simply put – the original designers of the Internet never anticipated its astounding success. Consequently, traditional network design took a fairly narrow view of the functionality that network equipment (e.g., network routers or switches) must support. However, it has become increasingly clear that this traditional approach to building network equipment can no longer keep pace with the growing demands being placed on the Internet’s underlying infrastructure.

The RouterBricks project is exploring a simple, but radical solution: that networks be built from general-purpose computers, rather than the narrowly specialized equipment used today. If feasible, this could lead to a whole new approach to building networks – one that leverages the familiarity and flexibility of the PC ecosystem to reshape the world of network equipment and services. By changing how networks are architected, new models for networked applications can be brought to the data center, internet, and networking infrastructure world. In short, what the PC did for computing could be extended to network infrastructure and programming.

But this isn’t just about recreating current networks with new building blocks. A transition from specialized networks to general-purpose ‘open’ infrastructure will pave the way for new network-centric services and business models – just as the PC broke the stranglehold mainframes had on the fledgling computer market. For example: “server-only” datacenters where compute servers do double duty as switches, with a fluid boundary between application and network processing; isolated “virtual” networks that can be leased to (and customized for) different customers and applications (video, gaming), akin to how data-centers used virtualization to launch cloud-computing; ISPs and data centers that can differentiate by quickly reprogramming their networks, deploying new services or implementing new security mechanisms.

In the RouterBricks project we have designed a software router architecture that achieves scalability by parallelizing router functionality both across multiple servers and across multiple cores within a single server. We have build a fully programmable 4-server prototype RouterBricks router (or “RB4”, as we call it) using the familiar Click/Linux environment and only off-the-shelf, general-purpose Intel server hardware. We are currently developing a set of applications to fully exploit the potential of the RouterBricks architecture, including content delivery, power management in enterprises, new data-center architectures and so forth. « less

Learn more about RouterBricks at www.routerbricks.org
Project Team Intel Labs Berkeley Kevin Fall
Gianluca Iannaccone
Maziar Manesh
Sylvia Ratnasamy
Collaborators Ecole Polytechnique Federale de Lausanne (EPFL) Katerina Argyraki
Mihai Dobrescu
Lancaster University Norbert Egi
UC Los Angeles Eddie Kohler

Disaster Response Communications

During a disaster, people need to communicate. First responders need to communicate in order to coordinate and provide relief, but individuals (“Joe the Citizen”) also have communications needs. Joe requires the ability to communicate, with a reasonable expectation of privacy and security, even when much of the regular communication infrastructure may be degraded... more »
During a disaster, people need to communicate. First responders need to communicate in order to coordinate and provide relief, but individuals (“Joe the Citizen”) also have communications needs. Joe requires the ability to communicate, with a reasonable expectation of privacy and security, even when much of the regular communication infrastructure may be degraded. He needs to communicate for several reasons: to locate critical resources (food, water), to ascertain the location and condition of loved-ones, to facilitate important exchange of status with officials, to coordinate activities, and to obtain tutorial information relating to survival techniques. The current primary infrastructures (cellular networks, Internet) do not satisfy these needs well during disasters.

While governments help to organize response equipment such as neighborhood caches, communication technology is usually left to the basics (e.g., radios that may be unfamiliar or poorly maintained). The goal of the disaster response communications (DRC) project is to enable citizens to continue using familiar Internet applications on their personal devices (e.g., smart phones, laptops) even when the network infrastructure is degraded or barely functioning. There are three big challenges we are addressing in order to achieve this goal.

The first challenge is to provide communications in the face of degraded infrastructure. We propose handling this using the recently emerging Delay/Disruption Tolerant Networking (DTN) technology. DTN is a network architecture capable of tolerating significant connection disruption. DTN-equipped devices make use of relays that provide a store-carry-forward function (SCF). SCF is able to relay messages from one communication device to when a live network exists, can store messages until network connectivity is restored, and can also physically carry a message from one place to another (e.g., if mounted on or in a vehicle).

In our vision, regular laptops can serve as relays, thereby augmenting relays provided by disaster services. In order to cope with the stress and anxiety they experience during a disaster, people tend to communicate a great deal by any means possible. This can lead to the creation of an enormous amount of redundant content, such as “I’m ok” messages sent to everyone a person knows. This natural behavior aggravates the problem of limited connectivity, and leads to congestion and inefficient use of shared communication, storage, and power resources. To address this second big challenge we are designing support for sophisticated prioritization mechanisms, ranging from content-filtering, content compression, and meta data management. Using their personal devices, individuals can contribute text, images, audio and video to disaster information services or to their local community.

However relying on information collected from or distributed to individuals for critical decision making, creates a third challenge, namely that the supporting communication system needs to be trustworthy. In this context, security refers primarily to an ability to verify the origin and integrity of data, and to provide privacy and access control if requested. Common solutions to these issues require re-consideration in networks with intermittent communications. We are extending DTN with a security framework that supports data use controls. The approach allows the owner of data to specify the way data may be processed, stored, and combined in a format that is cryptographically bound to the data described (e.g., “disclose my location only to first responders”). How such controls may be relaxed or changed, by appropriate authorities, when lives or property are at risk, is a new area of research relevant to communications during disasters and in other extraordinary circumstances.

DRC is designing and implementing the above communications technologies, and is also interacting with the disaster response community to gain deeper understanding of the “real world” problems that arise during emergencies and disasters. « less

Project Team Intel Labs Berkeley Kevin Fall
Gianluca Iannaccone
Nina Taft
Collaborators UPMC Paris Universitas Fernando Silveira
Anna Pietilainen
UC Berkeley Jayanthkumar Kannan
Megan Finn
Nokia Research and ICSI Pasi Sarolahti
ESHARC - East Shore Amateur Radio Club Jordan Hayes
Trinity College, Dublin, Ireland Alex McMahon
Stephen Farrell
Jet Propulsion Laboratory, NASA Scott Burleigh

Power Aware Perception

Within the next few years, your phone will know your social network. It will catalog your belongings, your physical activities, and your most exciting experiences. And it will help you remember them. It will also know your favorite places, your favorite things at these places, and how to get those things... more »
Within the next few years, your phone will know your social network. It will catalog your belongings, your physical activities, and your most exciting experiences. And it will help you remember them. It will also know your favorite places, your favorite things at these places, and how to get those things. It will teach you how to fix a broken car, and how to clean the espresso machine in the office kitchen. In short, your smartphone will soon move from being just your digital assistant to being your best friend and personal factotum.

The perception algorithms that make this future possible already exist today, but only as prototypes in universities and research labs. The algorithms aren’t accurate enough as yet and running them on phones will require throttling them to impractical speeds, which will drain your phone’s batteries extremely fast.

The PAPe project (pronounced "pa-pee") is about migrating these algorithms from research prototypes to compelling mobile applications on phones. We are designing new machine perception algorithms that consume less power, and developing a sophisticated cross layer power management architecture. In addition, given that existing hardware falls short in supporting such goals, we are also building a research prototype smartphone that will help the rest of the research community.

At the heart of most mobile perception systems are a host of compute- (and power-) hungry machine learning algorithms. To reduce their power consumption, we are building machine learning algorithms that run thousands of times faster than existing ones. These include algorithms for supervised learning, automatic clustering, and numerical optimization.

Our power management system is a closed-loop control system that measures the speed, power consumption, and accuracy of running programs and tunes their parameters on the fly to find a sweet spot between speed, power and accuracy. This control system simplifies the way people write programs: instead of carefully tuning every aspect of their applications when they develop their code, programmers can concentrate on functionality and simply expose the parameters of their applications to our run-time environment; the PAPe control system will measure the power footprint of the application and adaptively tune these parameters on the fly.

We are also building a mobile perception research platform that will let researchers experiment with these ideas on real hardware. This research platform isn't your typical smartphone: it has about 10x the computational capacity of your desktop; a variety of sensors help it see, hear, and be aware of its position and orientation; a built-in introspection system that allows it to measure the power consumption of every piece of hardware inside itself; and unlike your phone, which sits timidly in your pocket, you brandish the supercharged smartphone, so it can see your world the way you see it. « less

Project Team Intel Labs Berkeley Jaideep Chandrashekar
Ling Huang
Ali Rahimi
Intel Labs Seattle Ben Greenstein
Collaborators UC Berkeley Michael Jordan
Ariel Kleiner
Daniel Ting
Steve Dawson-Haggerty
Andrew Krioukov
University of Illinois Urbana Champaign Rakesh Kumar
Joseph Sloan
David Kesler

Flexible and Secure Distributed Systems
for Mobile and Cloud Applications

Our work in the Customized Secure Networked Systems project focuses on providing the tools in software, hardware, and theory to build secure, dependable systems that can adapt to change, without losing their properties. We target a variety of settings, including both back-end services and mobile applications... more »
Our work in the Customized Secure Networked Systems project focuses on providing the tools in software, hardware, and theory to build secure, dependable systems that can adapt to change, without losing their properties. We target a variety of settings, including both back-end services and mobile applications.

Dependable systems are secure systems that can tolerate faults during operation and are easy to manage. Building dependable systems is tricky, partly because the designer must anticipate the unexpected, and partly because the bar is high: such systems are expected to remain secure in the face of adverse conditions, which means protecting the privacy of sensitive data they hold, ensuring critical operations can only be performed by those authorized to do so, and that service is uninterrupted. Because of their complexity, dependable systems tend to be overspecialized: after you design, prove correct, build, and deploy a typical dependable system, you can't just go in and replace a component with a different one. If you do, you risk compromising all the desirable quality guarantees made by the designers, with adverse, unpredictable effects. On the other hand, deploying systems in practice requires not only doing the correct thing, but also doing it well given the available network bandwidth, CPUs, storage capacity, and other characteristics. For example, a system that is fast when the network bandwidth is ample may need to be changed significantly to be fast when the network bandwidth is constrained.

MOMMIE

Dependable replicated services (e.g., for high-assurance domains including banking, finance, defense, health) are notorious for their complexity and subtle deployment challenges. Our work focuses on defining a clear, simple, expressive, and intuitive language that algorithm designers can use to express their distributed algorithms precisely and correctly, without worrying about deployment details and optimizations. In parallel, we define a simple yet safe interface that deployment engineers (or even a mathematical optimizer) can use to plug in particular optimizations to match a given environment (network, CPUs, trust assumptions, etc.), without worrying about violating the correctness of the algorithm expressed by the designer. The system resulting from the combination of the two independently developed pieces—the algorithm and the deployment plan—can adapt to significantly more deployment scenarios than a monolithically designed system would, without burdening designers with undue complexity and without tying down deployment engineers to a few, "vetted" optimization options. Our prototype, MOMMIE (Middleware for Optimized Messaging in Insecure Environments) demonstrates the ideas and provides us with an experimentation platform for deeper optimizations and deeper algorithmic abstractions.

Secure Data Capsules

The second focus of our flexible security work is web services, such as on-line stores, image and video sharing sites, or brokerage services. Such services typically contain sensitive information about their customers, ranging from simple credit card numbers and mailing addresses to high-volume information such as DVD-watching preferences, detailed day-trading strategies, and health records. Because this information lies within service data centers "in the raw," mixed with complex service software, it is often abused. On one hand, misconfiguration or software bugs may disclose it accidentally. On the other hand, malicious insiders may exploit it for profit. Our work on Secure Data Capsules aims to wrap customers' sensitive information in a protected access interface that only discloses data in accordance with the interface properties and the customer's desires. For benign services, this provides an isolation buffer that protects their customers' data and their own reputation. For less established services, this allows customers to require and verify the existence of trusted hardware or other trusted infrastructure, before yielding their sensitive data. Depending on the needs of the particular service, the expected performance, and the level of dependability the customer requires, secure data capsules offer a variety of implementation choices for the same logical isolation between customer data and service code. We explore in particular physical isolation, software isolation via virtualization, and software isolation via trusted hardware enforcement. Our work will lead to greater dependability for web services, greater privacy for customers, and more choice in the right balance between cost and performance.

CloneCloud

The CloneCloud project takes the concept of flexibility to its extreme. It provides elastic execution for mobile applications by executing them on clouds of clones of the mobile device. CloneCloud specifically improves the performance of applications from resource-starved devices such as smartphones, by opportunistically off-loading them to available cloud resources in nearby datacenters. The idea is simple: clone the entire set of data and applications from the smartphone onto the cloud and selectively execute some operations on the clones, reintegrating the results back into the smartphone. One can have multiple clones for the same smartphone, clones pretending to be more powerful smartphones, etc. We can execute very expensive operations via cloud cloning such as image search, virus scanning, and data leak detection (a) without requiring application designers to explicitly plan for cloning, (b) without eating up the smartphone's battery power, and (c) with significant performance improvement. This same approach is broadly applicable to other weak devices such as tablets, netbooks, and mobile Internet devices.

*-scope

The *-scope project seeks whole-system understanding, to ensure that applications do what they think they do. This project answers the question: are the expected security properties provided by the running system? Those properties include data privacy, availability, and various performance guarantees. At a "micro" level, *-scope traces data at a fine granularity as they course through the different components of a distributed application (e.g., smartphone applications, cloud software, and enterprise networks). At a "macro" level, *-scope discovers application and data dependencies, which it mines for property violations. Our work will lead to better mobile device and cloud management and greater security and privacy for customers. « less
Learn more about CloneCloud at berkeley.intel-research.net/bgchun/clonecloud
Project Team Intel Labs Berkeley Petros Maniatis
Byung-Gon Chun
Collaborators UC Berkeley Jayanthkumar Kannan
Gunho Lee
Lucian Popa
Brown University Babi Papamanthou
Rice University Michael Dietz
Princeton University Sunghwan Ihm
Intel Labs Seattle Jaeyeon Jung

Eco-Sense Buildings

The primary objective is to increase energy efficiency and people comfort in (smart) office buildings... more »
The primary objective is to increase energy efficiency and people comfort in (smart) office buildings. This is to be accomplished through a combination of holistic building management and innovative techniques that include new construction materials, natural ventilation, advanced computer-based control of lighting, heating and cooling, office IT equipments and shared facilities, such as cafeteria. We rely on a combination of pervasive ambient and people-activity sensing to proactively drive coordinated power behaviors of IT technology in concert with systems that control heating, cooling, and lighting.

This work is performed in collaboration with Intel Labs Hillsboro and the Enjeu Energie Positive consortium in France that brings together key players in the eco-system for construction and operation of smart buildings. Consortium members include construction (Bouygues), building-management systems (Schneider Electric, Siemens), IT (Intel, Lexmark), lighting (Philips), office furniture (Steelcase), food preparation (Sodexo), alternative energy generation (Tenesol) and others. « less

Recent Research Projects

Yada

Future chips will scale by adding new cores rather than increasing frequency. To take advantage of this new reality, programmers need a much easier way of exploiting parallel processing than offered by the current dominant parallel processing paradigms: threads with shared memory or processes communicating by message-passing... more »
Future chips will scale by adding new cores rather than increasing frequency. To take advantage of this new reality, programmers need a much easier way of exploiting parallel processing than offered by the current dominant parallel processing paradigms: threads with shared memory or processes communicating by message-passing. In collaboration with the UC Berkeley Parallel Laboratory, we are designing and building Yada, a new language which aims to make parallel programming practical for regular programmers. In furtherance of this goal, Yada is designed to feel like existing sequential languages, and to simplify using high-performance parallel libraries and frameworks.

Yada aims to feel like existing sequential programming languages: programmers write code using the constructs they are familiar with (objects, loops, arrays, etc), but with explicit indications of parallelism: run this loop in parallel, run these two statements in parallel, etc. Yada guarantees that the parallel executions of these programs behave as if the loops and parallel statements were executed sequentially. For example, the radix sort example (facing page) can be understood as a sequential radix sort by considering that the forall loops are normal C for loops and by ignoring all the other Yada keywords (reduce, scan, barrier).

To ensure sequential-like behavior in a parallel execution, Yada programs must use special "sharing types" to declare data that is accessed in parallel in "interesting" (i.e. not just reads) fashion. For instance, in a Yada variable declared with a 'reduce(+)' annotation allows parallel increments. This annotation is used in the declaration of the buckets array in the radix sort example to allow the parallel increments (line 11) used to compute the histogram of the array being sorted. Similarly, the 'scan(+)' annotation allows increments and reads to be performed in parallel. This annotation is used in the declaration of the offsets array (line 5) to allow the parallel execution of the loop at line 18 which distributes elements from the input array x to the output array y, based on the earlier histogram results.

As a result of their sequential-like behavior, Yada programs can be understood, tested and debugged like sequential programs. This makes parallel program development much easier than in the more common, non-deterministic (two executions with the same input may produce different results) threaded and message-passing parallel programming paradigms. The second key element in Yada's design is explicit support for using parallel libraries. Currently multiple parallel libraries can readily only be used from sequential programs, greatly limiting their applicability. Fixing this problem is crucial to fast and cost-effective software development based on reusing existing libraries and frameworks.

An additional challenge is maintaining Yada's deterministic execution guarantee: if using a library in Yada reintroduces all the usual debugging and correctness problems common to parallel programming, then libraries will not help productivity. Thus, Yada will additionally enforce that programs using libraries remain deterministic, under reasonable assumptions about library behavior. We have built an initial prototype of Yada to help evaluate these ideas and refine our design. Our experience with this prototype on a collection of eight parallel algorithms and four applications shows that is practical to express realistic algorithms and applications in a deterministic programming language, with few changes from a sequential implementation. Furthermore, our prototype already achieves speedups (see the speedup graph for four sorting algorithms on various input sizes on an eight-core machine) that are competitive with implementations in non-deterministic programming environments. « less

Project Team Intel Labs Berkeley David Gay
Mayur Naik
Collaborators UC Berkeley Joel Galenson
Kathy Yelick
Susan Graham
Paul Hilfinger

Intel Mash Maker

Mash Maker is a browser extension that understands the meaning of the pages that users browse and suggests ways that it can improve the current page so as to be more useful to the user... more »
Mash Maker is a browser extension that understands the meaning of the pages that users browse and suggests ways that it can improve the current page so as to be more useful to the user. Mash Maker relies on users to teach it both how to understand web pages, and also how particular kinds of web page might be improved in interesting ways.

As views a web page, Mash Maker will suggest ways that it can make the page more useful, and suggest these improvements on its tool bar. If the user clicks on the button for such an improvement then Mash Maker will apply it to the current page, potentially using other web sites and remote APIs, and potentially applying widgets that produce new visualizations or compute new data.

Mash Maker suggests improvements based on the meaning of the current page, the meaning of pages that the user has recently browsed, and the behavior of other users. « less


Learn more about Mash Maker at mashmaker.intel.com

Data Mining for Anomaly Detection

Networked-based computing systems play an ever increasingly vital role in our society. However they suffer from two major problems. They are the frequent targets of cyber crime activity, and they are increasingly difficult to diagnose. Our research addresses aspects of both these problems... more »
Networked-based computing systems play an ever increasingly vital role in our society. However they suffer from two major problems. They are the frequent targets of cyber crime activity, and they are increasingly difficult to diagnose. Our research addresses aspects of both these problems, through the general application of data mining techniques to detect anomalous activity (whether malicious or benign). In particular, we focus on designing algorithms to protect endhosts and enterprises from botnets, as well as developing solutions for data center reliability.

Endhost and Enterprise Botnet Protection

The PROTEUS project aims to provide protection from botnets by tackling the problem from two vantage points: the end user and the centralized enterprise network control center. Some of our solutions are intended to live on laptops and desktops; other solutions are targeted to help IT departments manage network security more effectively within their enterprise. PROTEUS' guiding principle to helping users is that of building rich, user specific, location dependent behavioral profiles. These are composed by collecting a variety of data including network traffic patterns, location context, internet sites visited, user presence indicators, to name a few. Our behavioral-based detectors can successfully uncover covert botnet communication (when a PC communicates with the attack command and control center), and can identify attack activity in progress. A big focus in PROTEUS is that of reducing the number of false alarms that are generated, which are the bane of existing mitigation mechanisms in prevalence today.

We have designed techniques that can rapidly differentiate a piece of malware as truly new (never seen before) from those malwares that are polymorphic variants of existing malware. Because IT departs can observe a few thousands of new malwares each day, this greatly helps human operators to sort out which malware requires manual inspection and which ones don't. A tool based on our malware classifier thus speeds up the productivity and effectiveness that IT security operators can provide to their enterprises.

We also work on protecting enterprise level mechanisms for DoS and scan detection from data poisoning. A key challenge in designing data driven mechanisms is protecting against adversaries that can inject erroneous data into the measurement infrastructure, which leads to an incorrect or inaccurate estimation of the normal behavior. If an algorithm learns the wrong model, the corresponding detector will behave poorly. To provide protection from data poisoning, we design algorithms that draw on methods from robust statistics to guard against such adversaries.

Diagnosis in Data Centers

Today's large-scale Internet services run on large server clusters in datacenters and cloud computing environments. The scale and complexity of such systems make it very difficult to monitor, debug and maintain the services. However, modern computers have more and more computing cores, and multiple cores can be allocated to monitoring the system itself; moreover, cloud computing makes it easy to use massively parallel infrastructure to process large-scale data for delivering timely monitoring and diagnosis results. In this project, we take advantage of the abundant computing power to mine console logs, the natural tracing information included in almost every software system, for system monitoring, problem detection and diagnosis. Our novel approach for mining console logs integrates source code analysis with text mining to extract structured information from textual console logs. This makes it very easy and flexible for system operators to create a variety of (application-specific) features, so that powerful machine learning methods can be applied to perform high quality pattern mining and accurate problem detection for the system. Our research yielded the first automated log mining process that can not only detect a large portion of runtime anomalies, but also provide easy-to-understand explanations to system operators.

Researchers on these projects include Nina Taft, Jaideep Chandrashekar, Ling Huang, Dina Papagiannaki and Anthony Joseph. We collaborate with the RADlab at UC Berkeley, as well as with Cornell University, CMU, U.C. Irvine and U.C. Davis. « less

Millimeter Wavelength Systems

This project is exploring both fundamental and applied research topics in the space of millimeter wavelength systems. There is increasing interest in these systems given the ever increasing appetite for network bandwidth, the need for more energy efficiency, and desire for better spatial reuse in a world with larger and larger numbers of wireless devices... more »
This project is exploring both fundamental and applied research topics in the space of millimeter wavelength systems. There is increasing interest in these systems given the ever increasing appetite for network bandwidth, the need for more energy efficiency, and desire for better spatial reuse in a world with larger and larger numbers of wireless devices.

The cost-effective generation, modulation and detection of terahertz energies from 500Ghz to 10Thz. Thz energies occupy an interesting space that bridge the traditional domains of radio electronics and optical systems, and there are both electrical as well as optical approaches generating these signals. While the bandwidth of wireless systems operating at these frequencies will be very large, e.g., from 10's to 100's of Gbps, there are many other compelling applications for imaging and sensing systems. The realization of passive (blackbody) and active (illuminated) imaging systems, as well as micro-spectroscopy devices, are stepping stones in the development of component technologies leading to the eventual production of terahertz communication systems.

The application of electromagnetic metamaterials to antenna designs and engineering. Metamaterials are artificial structures of growing interest, made readily available by advances in micro- and nano-fabrication, that exhibit interesting properties, such as a negative index of refraction that do not normally occur in nature. These materials may both be characterized in material terms, such as permittivity and permeability as well as in transmission line terms, such as inductance, capacitance, and impedance. Metamaterials have broad implications for antenna design, especially for guiding, reflecting, and refracting electromagnetic waves at very high frequencies.

The design of electrically switched or steerable antenna systems and conformal arrays with controllable radiation patterns. Wireless systems operating at 60Ghz and above require high-gain directional antennas in order to achieve sufficient link margins because of the strong atmospheric attenuation and the limited transmission power achievable with small, low-power consumer devices. Moreover, these antenna systems must be co-designed given an understanding of the enclosure and form-factor of the end system. There are design challenges in both the antenna sub-system as well as its integration into the host device.

The exploration of higher-level protocols and algorithms for antenna discovery, alignment and tracking. The presence of high-gain antennas has broader implications for wireless systems. In the simple case, two devices may hear each other while using omni-directional antenna patterns and then iteratively optimize and refine their patterns to have increasing higher gain and narrow beams. In the more general case, two devices may only hear each other when their narrowly focused beams happen to be aligned, in which case a more general discovery and alignment system is necessary. Both RF techniques as well as new algorithms may contribute to the new generation of "directional MACs" needed to realize the potential of millimeter wavelength WLANs. « less

Delay Tolerant Networking

The protocols of today's Internet can perform poorly when faced with operating environments characterized by very long delay paths, frequent network partitions, and severe power or memory constraints... more »
The protocols of today's Internet can perform poorly when faced with operating environments characterized by very long delay paths, frequent network partitions, and severe power or memory constraints. Delay Tolerant Networking, covering also the more recently-named Disruption Tolerant Networking or DTN, is examining a new network architecture and application interface structured around optionally-reliable asynchronous message forwarding, with limited expectations of end-to-end connectivity and node resources. DTN enables a range of applications to be used in environments with poor connectivity---from email and voicemail to offline search engine queries, electronic form filling, and "instant-enough messaging"---at a reduced cost. DTN is the basis for the 'Orbital Internet,' named among Time Magazine's top 10 inventions of 2008. « less
Learn more about DTN at www.dtnrg.org
© 2010 Intel Corporation | Terms of Use | Trademarks | Privacy